Hi. Welcome back to Statistics One.
We're up to lecture eight, almost have way through the course.
And today, I'd like to introduce you to multiple regression.
In the last lecture, we did simple regression, which is just one predictor in
the regression equations. Today, we'll see how multiple regression,
multiple regression works where we have multiple predictors.
It get's a little more complicated in terms of interpreting and rushing
coefficients and mathematically, it get's a little more complicated.
So, I'll start out in the first segment just by introducing you to the multiple
regression equation, and I'll present an example so you can get a flavor for how to
interpret multiple regression coefficients in one example.
Then in the second segment, I'll review some matrix algebra, because I'm assuming
not everybody is familiar with matrix algebra particularly, matrix
multiplication. And, it's necessary to understand how
matrix algebra works to understand how these regression coefficients are
estimated sort of simultaneously, in one equation.
So then in the third segment, we'll take a closer look again at the regression
equation and talk about how those regression coefficients are estimated.
So let's start the first segment. In segment one, I'm just going to present
one example, relatively easy example, so you can understand how to interpret the
regression coefficients when there are multiple predictors in the equation.
The important concepts to take away from this segment are, again, understanding the
equation and the components of the equation, how to interpret the, both the
unstandardized and the standarized regression coefficients.
And then, I'll talk about the difference between what I'll call a standard multiple
regression, where we enter all predictors in together, and what I call a sequential
multiple regression, where we enter predictors in a particular order.
So, in the last lecture we talked about simple regression.
We said one predictor in the equation. Today, we're just going to enter in more
predictors. So, let's move to the equation.
Before, when we just did simple regression, that was our equation.
We just had one predictor. So, the predicted score on y was the
regression constant, or the intercept, and the slope times an individual's score on
x. So, that's just the, the formula for a
line intercept plus slope, and that's simple regression.
What we're doing now is we're going to add in as many predictors as we like.
And, as you can see, I could extend this out to K predictors.
So, I'll just summarize the multiple regression equation as follows.
We're still going to have an intercept, that's still the predicted score on Y when
the X's are zero. And then, we can have multiple predictors
and multiple regression coefficients. So again, the trick is, trying to
understand how to interpret multiple regression coefficients together and,
mathematically how does that happen. And that's what, where, well, requires a
little matrix algebra to fully appreciate. So, this just breaks down everything
that's in the equation. This should be obvious by now, the
predicted value on Y, the predicted value on Y when all the predictors are zero and
so on. The other thing I eluded to last lecture,
and we didn't really get into simple regression is this idea when model R and
model R squared. I talked about this when I, I mentioned
alternatives to Null Hypothesis, significance testing and estimates of
effect size, right? If we want to compare models, for example,
we want, want to look at model A's R squared versus model B's R squared.
Now, this didn't really come up much in simple regression because the model was
just one predictor. It was just that equation with one
regression coefficient in it. So, the, the R squared was just the
correlation squared, is the correlation between X and Y squared, that was your
model R and that was your model R squared. Now, we can get a better correlation
because we're adding in multiple predictors.
And the way to get the model R, is to just look at the correlation between the
observed scores and the predicted scores, so that's the correlation between the
observed score is Y, and the predicted scores Y with a little hat over it.
And then you just square that, and that gives you the percentage of variance in Y
explained by the model and we'll see that in that example.
So, I'm just going to walk through this example sort of conceptually, not terribly
much detailed mathematically, we'll get into that in the next segment.
I want to introduce this example just so you get a flavor for how to interpret the
regression coefficients and what the model R and R squared look like.
So, this is sort of an, this is sort of an old example now, so don't worry about the
faculty salaries if you're in academia and you're going into if you're on the job
market particularly, I'm looking at someone whose on the job market.
These are old faculty salaries. And we're going to predict faculty salary
not from one predictor but from multiple predictors.
So, one, how long has it been since the faculty member received his or her PhD?
So you would think that the more years out, the higher their salary, so there's
probably a positive correlation between those two.
The number of publications a faculty member has, that's often a predictor of
how much money the faculty member makes because if a professor is very prolific
and publishes a lot, then they tend to be more marketable, more sought-after, and
they probably make more money. And we'll also look and see if there's a
gender effect. So, we could look at male and female
faculty members and see if there's a difference in their salary while taking
into account, and this is the important part, while taking into account the time
since their PhD and their number of publications.
If we just wanted to look at, is there a difference between men and women, we could
just do a T test. Right?
But here, we're taking into account these other variables so it gets a little more
complicated in interpreting the gender difference.
So, here's some descriptive statistics for this example.
As I said this is, this is an old example. So, there's 150 professors and the average
salary is 64,000. This is an old example.
But they are relatively young professors so the time since PhD is only eight years.
It's funny, this example's so old that I used to think that was not a young cohort.
[laugh] now I think that is a young cort, cohort, eight years since the PhD.
And the average number of publications for this group of faculty members, again,
suggesting its sort of a younger group, is fifteen publications on average.
Now, but the question is, how do these all predict faculty salary?
Before I get to that, I, we have to code our categorical independent variable as,
as numerical. So, I can't just enter, obviously, I can't
like, tell R to run an analysis with males and females just coded as string variables
like that. We have to, we have to convert them into
numbers. So, for this example, I just coded the
male faculty members with zero, and the female faculty members as one.
And, if you run this in R, this is the regression equation that you get out, and
I'll walk you through all of this and show you where this comes from in R.
But basically, the predicted score equals 46,911, that's the regression constant.
So, that's the predicted score on Y when all the X's are zero.
Well, that would be someone who just graduated, who has no publications, and is
male, and code male is zero. So, a male professor with no publications,
just out of graduate school, we're predicting that professor would get about
46,911. Pretty low.
But, again, it's, it's sort of a meaningless point because it's for someone
who, with no publications, who's just getting out of grad school.
Then, we can look at the slopes for each individual predictor.
So, for time since the PhD, it's $1,382. What that means is, for every one unit
increase in time, so for every one more year out of grad school, I predict another
$1,382 in the faculty salary. How about publications?
Well, for a one unit increase in publications, we predict about 500 more
dollars, so 502 to be exact, to be exact. And then, for gender, that's the G out
there. The coefficient is negative 3,484.
Why is it negative? Because in this sample, the male faculty
members are making more than the female faculty members, while taking into account
these other variables. So, for a one unit increase in gender,
that's going from zero to one, from male to female, we predict a drop in salary of
$3,484. So, let's look at the output that you
would see, from any statistical software package or R will give you output that
looks like this. I've organized it a little bit so R won't
look exactly like this. I've put the unstandardized and the
standardized coefficients together in this table so we could get a feel for what they
mean. What you see in the unstandardized column,
are all those numbers I just walked through in the previous slide.
Those are the coefficients that go into the regression equation.
That's how we get predicted scores on Y, from a set of values on, set of
predictors, X1, X2, and X3. Time, publications, and gender.
So, we can just plug values in and get a predicted score.
What we'll also see is the standard error association, associated with each
regression coefficient, and that will give us a T test.
And in associated P value, that's the Null Hypothesis Significance testing part.
What that tells us is, is each one, each one of these predictors are they
significantly predicting salary or not? And, what we see is for time, it is a
significant predictor for publications, it is the significant predictor that gender
difference of 3,000 is not of significant difference in this analysis.
Now, that doesn't mean that there is not a difference between men and women in their
salary. This is where it's important to think
about, how to interpret these coefficients.
What that difference means is, women in this sample, are making $3,483 less than
men but, while we're taking into account time and publications.
What do I mean by taking into account? What I'm saying is, the difference between
men and women is 3,000 and change, assuming that the professors have all been
out an average amount of time, and assuming they're all publishing an average
amount. That's a big assumption, and we could test
whether or not that assumption is valid. So, for example, it, it might be that
publications, maybe they matter more at the beginning of your career than later in
your career. So maybe at the beginning, you have the
slope relating publications to salary, maybe it's steeper at the beginning than
later. That would imply that time and
publications interact to predict salary. We haven't tested that here, alright?
All we're doing is testing the additive effects of each predictor.
When we, get past the midterm and we go into mediation and moderation, we'll test
other effects beyond this additive effects.
But for now, it's important to remember what we're looking at here are the effects
assuming the average level of every other predictor.
So, the slope for time is $1,382, but that's assuming an average number of
publications, and it's assuming that the effects are additive.
And we can test that later. So now, on to the model R and model R
squared, for this model, the correlation between the observed scores and the
predicted sores is 0.513. And if we square that, we get about 26
percent of the variants and faculty salary is explained by just these three
variables. Now, that doesn't mean that these
individual predictors are correlated that strongly, alright?
This, that's the beauty of multiple regression and building a model with
multiple predictors, is we can account for more variance with this set of predictors.
And this, this, this, this particular linear combination of predictors.
Then. We would if we used any one predictor by
itself. We can look back at the standardized
coefficients. To get a sense for how much each one of
these predictors by themselves would explain in faculty salary.
So, it looks like time accounts for the most amount of variants, that's cuz it's,
it's standardized coefficient. It's, has the highest value in terms of
absolute value. But again, now that we're in multiple
regression, this is not the same as the correlation coefficient.
So remember in simple regression, the standardized regression coefficient was
the same as the correlation coefficient. Now it's not, because this is the
effective time on salary assuming an average an number of publications and
taking into account males and females. So this will not be exactly the same as
the correlation coefficient. But, all of those do give us estimates of
effects size in a sense. So we can say, that the effect of time on
salary is stronger than the effect of publications on salary, and we can say
something about the amount of variance explained in salary by this particular
model. So, that last thing I want to talk about
in this segment is, just the difference between two types of approaches with
multiple regression. One, I'm referring to just the standard,
another, I refer to here as sequential. You may see this referred to in other
places as hierarchical regression and I'm avoiding that term to avoid any confusion
with things like hierarchical linear models, which is a whole different type of
analysis, so I'm just going to say sequential.
And there are other types of approaches you could take as well.
Another is step wise, we're not doing that.
I'm just going to talk about standard and sequential here.
So, it's important to know that if the predictors themselves are not correlated,
then it won't matter how you run the regression.
Whether you run it standard, or sequential, or stepwise, or whatever.
If your predictors aren't correlated, if they're orthogonal to one another, then,
these different approaches won't matter. So, if we take a Venn Diagram approach to
thinking about the variants in all of these measures or all of our variables.
Again, I'm using Y as the outcome variable.
So, assume that X1, X2 and X3 are three predictors, they all account for little
bit of variance in Y but, they're orthogonal to one another, they don't
overlap themselves, then this is real easy and the math is real easy.
They won't matter which part we take. But, in these types of studies, where
we're just doing observational studies here, we're not doing randomized
controlled experiments and we're just doing observational studies, it's often
the case that the predictors are correlated, like the faculty salary
example. So, time since the PhD is probably
correlated with number of publications, right?
The more time you've been out of grad school, the more time you've had to
publish, so there's a probably a positive correlation there.
So, how do we untangle that positive correlation in trying to predict the
outcome measure? That's where it gets more complicated.
One, in terms of interpretation, and two, mathematically.
So, here's an example where things are correlated.
Meaning, the predictors are correlated, so x1 and x3 have a little bit of overlap,
and x2 and x3 have a little bit of overlap.
In the standard approach, we're just going to throw in all the predictors into one
analysis, into one regression equation and that's what I did with the faculty salary
example. Each predictor will only get to sort of
claim the variance and why that's unique to it.
So, I'll show you in that in the Venn Diagram in a moment.
But, a way to think about that is, only, only the variants that's uniquely
predicted by each individual predictor is reflected in the regression coefficients.
The overlapping areas where there's shared variance among the predictors and the
outcome, that's absorbed into the model R squared in the model R but it's not
assigned to each individual predictor. It's best to sort of look at that in a
Venn Diagram, at least for me. [laugh] some students get confused by
this. So, if there's confusion on that hopefully
it'll, you'll, you'll see this come through as we do these analysis in R in
the next lecture. But for a lot of students this is helpful.
So, if we think about what gets assigned to predictor X1, in the regression
equation, in the, what gets to assigned to X1 is just the area unique to X1.
So, just A. That will be represented in the regression
coefficient, for X1. And, likewise, for X2, just area E, and
for X3, just area C because those, those are the areas that are unique to each
predictor. So that's what I'm doing by showing you
out here, those are the areas that'll be represented or reflected in each
individual predictor's regression coefficient.
The model R squared will take into account the entire proportion of var, of variants
that's shared with the entire set of predictors.
So the model R squared will take up A + B + C + D + E.
In a sequential regression, we as the researchers or as the experimenters, we
will decide how to enter the predictor's into the regression equation.
So, we may want to enter some variables first and other variables second.
A common example of this is you'll see demographic variables entered into a
regression equation first, and then sort of key experimental variables entered in
after that. So in this example, the Venn Diagrams,
assume that I put variable X1 into the equation first.
Then, X1 gets to sort of soak up all that area in Y.
So, X1 now gets A and B because it got to go in first and sort of, it's privileged
access. Then, in step two, if we enter X2 and X3
together, then they're all in the equation.
So, it looks just like the standard. So, X2 just gets area E, X3 just gets area
C. Again, the model R squared, will be the
entire area that's overlapping between Y and the set of predictors that are in the
model at that step. So it's step one, it's just A + B.
Step two, it's all of those, A + B + C + D + E.
Okay. That wraps up this segment.
And again, the important things to take away at this point are just the idea of
multiple regression, knowing the components of the equation.
Knowing how to interpret those coefficients and we'll go through examples
of this again not in the next segment but in the last segment and in the next
lecture. And actually, in the lecture after that.
So, we'll do a lot of these examples, this is just the first one.
S, if it does, this is still a little rough, don't worry, we're going to do lots
of examples. And then, this idea of doing things sort
of sequentially or just putting all of the predictors into an equation all at once