Hi. Welcome back to Statistics One.

We're up to lecture eight, almost have way through the course.

And today, I'd like to introduce you to multiple regression.

In the last lecture, we did simple regression, which is just one predictor in

the regression equations. Today, we'll see how multiple regression,

multiple regression works where we have multiple predictors.

It get's a little more complicated in terms of interpreting and rushing

coefficients and mathematically, it get's a little more complicated.

So, I'll start out in the first segment just by introducing you to the multiple

regression equation, and I'll present an example so you can get a flavor for how to

interpret multiple regression coefficients in one example.

Then in the second segment, I'll review some matrix algebra, because I'm assuming

not everybody is familiar with matrix algebra particularly, matrix

multiplication. And, it's necessary to understand how

matrix algebra works to understand how these regression coefficients are

estimated sort of simultaneously, in one equation.

So then in the third segment, we'll take a closer look again at the regression

equation and talk about how those regression coefficients are estimated.

So let's start the first segment. In segment one, I'm just going to present

one example, relatively easy example, so you can understand how to interpret the

regression coefficients when there are multiple predictors in the equation.

The important concepts to take away from this segment are, again, understanding the

equation and the components of the equation, how to interpret the, both the

unstandardized and the standarized regression coefficients.

And then, I'll talk about the difference between what I'll call a standard multiple

regression, where we enter all predictors in together, and what I call a sequential

multiple regression, where we enter predictors in a particular order.

So, in the last lecture we talked about simple regression.

We said one predictor in the equation. Today, we're just going to enter in more

predictors. So, let's move to the equation.

Before, when we just did simple regression, that was our equation.

We just had one predictor. So, the predicted score on y was the

regression constant, or the intercept, and the slope times an individual's score on

x. So, that's just the, the formula for a

line intercept plus slope, and that's simple regression.

What we're doing now is we're going to add in as many predictors as we like.

And, as you can see, I could extend this out to K predictors.

So, I'll just summarize the multiple regression equation as follows.

We're still going to have an intercept, that's still the predicted score on Y when

the X's are zero. And then, we can have multiple predictors

and multiple regression coefficients. So again, the trick is, trying to

understand how to interpret multiple regression coefficients together and,

mathematically how does that happen. And that's what, where, well, requires a

little matrix algebra to fully appreciate. So, this just breaks down everything

that's in the equation. This should be obvious by now, the

predicted value on Y, the predicted value on Y when all the predictors are zero and

so on. The other thing I eluded to last lecture,

and we didn't really get into simple regression is this idea when model R and

model R squared. I talked about this when I, I mentioned

alternatives to Null Hypothesis, significance testing and estimates of

effect size, right? If we want to compare models, for example,

we want, want to look at model A's R squared versus model B's R squared.

Now, this didn't really come up much in simple regression because the model was

just one predictor. It was just that equation with one

regression coefficient in it. So, the, the R squared was just the

correlation squared, is the correlation between X and Y squared, that was your

model R and that was your model R squared. Now, we can get a better correlation

because we're adding in multiple predictors.

And the way to get the model R, is to just look at the correlation between the

observed scores and the predicted scores, so that's the correlation between the

observed score is Y, and the predicted scores Y with a little hat over it.

And then you just square that, and that gives you the percentage of variance in Y

explained by the model and we'll see that in that example.

So, I'm just going to walk through this example sort of conceptually, not terribly

much detailed mathematically, we'll get into that in the next segment.

I want to introduce this example just so you get a flavor for how to interpret the

regression coefficients and what the model R and R squared look like.

So, this is sort of an, this is sort of an old example now, so don't worry about the

faculty salaries if you're in academia and you're going into if you're on the job

market particularly, I'm looking at someone whose on the job market.

These are old faculty salaries. And we're going to predict faculty salary

not from one predictor but from multiple predictors.

So, one, how long has it been since the faculty member received his or her PhD?

So you would think that the more years out, the higher their salary, so there's

probably a positive correlation between those two.

The number of publications a faculty member has, that's often a predictor of

how much money the faculty member makes because if a professor is very prolific

and publishes a lot, then they tend to be more marketable, more sought-after, and

they probably make more money. And we'll also look and see if there's a

gender effect. So, we could look at male and female

faculty members and see if there's a difference in their salary while taking

into account, and this is the important part, while taking into account the time

since their PhD and their number of publications.

If we just wanted to look at, is there a difference between men and women, we could

just do a T test. Right?

But here, we're taking into account these other variables so it gets a little more

complicated in interpreting the gender difference.

So, here's some descriptive statistics for this example.

As I said this is, this is an old example. So, there's 150 professors and the average

salary is 64,000. This is an old example.

But they are relatively young professors so the time since PhD is only eight years.

It's funny, this example's so old that I used to think that was not a young cohort.

[laugh] now I think that is a young cort, cohort, eight years since the PhD.

And the average number of publications for this group of faculty members, again,

suggesting its sort of a younger group, is fifteen publications on average.

Now, but the question is, how do these all predict faculty salary?

Before I get to that, I, we have to code our categorical independent variable as,

as numerical. So, I can't just enter, obviously, I can't

like, tell R to run an analysis with males and females just coded as string variables

like that. We have to, we have to convert them into

numbers. So, for this example, I just coded the

male faculty members with zero, and the female faculty members as one.

And, if you run this in R, this is the regression equation that you get out, and

I'll walk you through all of this and show you where this comes from in R.

But basically, the predicted score equals 46,911, that's the regression constant.

So, that's the predicted score on Y when all the X's are zero.

Well, that would be someone who just graduated, who has no publications, and is

male, and code male is zero. So, a male professor with no publications,

just out of graduate school, we're predicting that professor would get about

46,911. Pretty low.

But, again, it's, it's sort of a meaningless point because it's for someone

who, with no publications, who's just getting out of grad school.

Then, we can look at the slopes for each individual predictor.

So, for time since the PhD, it's $1,382. What that means is, for every one unit

increase in time, so for every one more year out of grad school, I predict another

$1,382 in the faculty salary. How about publications?

Well, for a one unit increase in publications, we predict about 500 more

dollars, so 502 to be exact, to be exact. And then, for gender, that's the G out

there. The coefficient is negative 3,484.

Why is it negative? Because in this sample, the male faculty

members are making more than the female faculty members, while taking into account

these other variables. So, for a one unit increase in gender,

that's going from zero to one, from male to female, we predict a drop in salary of

$3,484. So, let's look at the output that you

would see, from any statistical software package or R will give you output that

looks like this. I've organized it a little bit so R won't

look exactly like this. I've put the unstandardized and the

standardized coefficients together in this table so we could get a feel for what they

mean. What you see in the unstandardized column,

are all those numbers I just walked through in the previous slide.

Those are the coefficients that go into the regression equation.

That's how we get predicted scores on Y, from a set of values on, set of

predictors, X1, X2, and X3. Time, publications, and gender.

So, we can just plug values in and get a predicted score.

What we'll also see is the standard error association, associated with each

regression coefficient, and that will give us a T test.

And in associated P value, that's the Null Hypothesis Significance testing part.

What that tells us is, is each one, each one of these predictors are they

significantly predicting salary or not? And, what we see is for time, it is a

significant predictor for publications, it is the significant predictor that gender

difference of 3,000 is not of significant difference in this analysis.

Now, that doesn't mean that there is not a difference between men and women in their

salary. This is where it's important to think

about, how to interpret these coefficients.

What that difference means is, women in this sample, are making $3,483 less than

men but, while we're taking into account time and publications.

What do I mean by taking into account? What I'm saying is, the difference between

men and women is 3,000 and change, assuming that the professors have all been

out an average amount of time, and assuming they're all publishing an average

amount. That's a big assumption, and we could test

whether or not that assumption is valid. So, for example, it, it might be that

publications, maybe they matter more at the beginning of your career than later in

your career. So maybe at the beginning, you have the

slope relating publications to salary, maybe it's steeper at the beginning than

later. That would imply that time and

publications interact to predict salary. We haven't tested that here, alright?

All we're doing is testing the additive effects of each predictor.

When we, get past the midterm and we go into mediation and moderation, we'll test

other effects beyond this additive effects.

But for now, it's important to remember what we're looking at here are the effects

assuming the average level of every other predictor.

So, the slope for time is $1,382, but that's assuming an average number of

publications, and it's assuming that the effects are additive.

And we can test that later. So now, on to the model R and model R

squared, for this model, the correlation between the observed scores and the

predicted sores is 0.513. And if we square that, we get about 26

percent of the variants and faculty salary is explained by just these three

variables. Now, that doesn't mean that these

individual predictors are correlated that strongly, alright?

This, that's the beauty of multiple regression and building a model with

multiple predictors, is we can account for more variance with this set of predictors.

And this, this, this, this particular linear combination of predictors.

Then. We would if we used any one predictor by

itself. We can look back at the standardized

coefficients. To get a sense for how much each one of

these predictors by themselves would explain in faculty salary.

So, it looks like time accounts for the most amount of variants, that's cuz it's,

it's standardized coefficient. It's, has the highest value in terms of

absolute value. But again, now that we're in multiple

regression, this is not the same as the correlation coefficient.

So remember in simple regression, the standardized regression coefficient was

the same as the correlation coefficient. Now it's not, because this is the

effective time on salary assuming an average an number of publications and

taking into account males and females. So this will not be exactly the same as

the correlation coefficient. But, all of those do give us estimates of

effects size in a sense. So we can say, that the effect of time on

salary is stronger than the effect of publications on salary, and we can say

something about the amount of variance explained in salary by this particular

model. So, that last thing I want to talk about

in this segment is, just the difference between two types of approaches with

multiple regression. One, I'm referring to just the standard,

another, I refer to here as sequential. You may see this referred to in other

places as hierarchical regression and I'm avoiding that term to avoid any confusion

with things like hierarchical linear models, which is a whole different type of

analysis, so I'm just going to say sequential.

And there are other types of approaches you could take as well.

Another is step wise, we're not doing that.

I'm just going to talk about standard and sequential here.

So, it's important to know that if the predictors themselves are not correlated,

then it won't matter how you run the regression.

Whether you run it standard, or sequential, or stepwise, or whatever.

If your predictors aren't correlated, if they're orthogonal to one another, then,

these different approaches won't matter. So, if we take a Venn Diagram approach to

thinking about the variants in all of these measures or all of our variables.

Again, I'm using Y as the outcome variable.

So, assume that X1, X2 and X3 are three predictors, they all account for little

bit of variance in Y but, they're orthogonal to one another, they don't

overlap themselves, then this is real easy and the math is real easy.

They won't matter which part we take. But, in these types of studies, where

we're just doing observational studies here, we're not doing randomized

controlled experiments and we're just doing observational studies, it's often

the case that the predictors are correlated, like the faculty salary

example. So, time since the PhD is probably

correlated with number of publications, right?

The more time you've been out of grad school, the more time you've had to

publish, so there's a probably a positive correlation there.

So, how do we untangle that positive correlation in trying to predict the

outcome measure? That's where it gets more complicated.

One, in terms of interpretation, and two, mathematically.

So, here's an example where things are correlated.

Meaning, the predictors are correlated, so x1 and x3 have a little bit of overlap,

and x2 and x3 have a little bit of overlap.

In the standard approach, we're just going to throw in all the predictors into one

analysis, into one regression equation and that's what I did with the faculty salary

example. Each predictor will only get to sort of

claim the variance and why that's unique to it.

So, I'll show you in that in the Venn Diagram in a moment.

But, a way to think about that is, only, only the variants that's uniquely

predicted by each individual predictor is reflected in the regression coefficients.

The overlapping areas where there's shared variance among the predictors and the

outcome, that's absorbed into the model R squared in the model R but it's not

assigned to each individual predictor. It's best to sort of look at that in a

Venn Diagram, at least for me. [laugh] some students get confused by

this. So, if there's confusion on that hopefully

it'll, you'll, you'll see this come through as we do these analysis in R in

the next lecture. But for a lot of students this is helpful.

So, if we think about what gets assigned to predictor X1, in the regression

equation, in the, what gets to assigned to X1 is just the area unique to X1.

So, just A. That will be represented in the regression

coefficient, for X1. And, likewise, for X2, just area E, and

for X3, just area C because those, those are the areas that are unique to each

predictor. So that's what I'm doing by showing you

out here, those are the areas that'll be represented or reflected in each

individual predictor's regression coefficient.

The model R squared will take into account the entire proportion of var, of variants

that's shared with the entire set of predictors.

So the model R squared will take up A + B + C + D + E.

In a sequential regression, we as the researchers or as the experimenters, we

will decide how to enter the predictor's into the regression equation.

So, we may want to enter some variables first and other variables second.

A common example of this is you'll see demographic variables entered into a

regression equation first, and then sort of key experimental variables entered in

after that. So in this example, the Venn Diagrams,

assume that I put variable X1 into the equation first.

Then, X1 gets to sort of soak up all that area in Y.

So, X1 now gets A and B because it got to go in first and sort of, it's privileged

access. Then, in step two, if we enter X2 and X3

together, then they're all in the equation.

So, it looks just like the standard. So, X2 just gets area E, X3 just gets area

C. Again, the model R squared, will be the

entire area that's overlapping between Y and the set of predictors that are in the

model at that step. So it's step one, it's just A + B.

Step two, it's all of those, A + B + C + D + E.

Okay. That wraps up this segment.

And again, the important things to take away at this point are just the idea of

multiple regression, knowing the components of the equation.

Knowing how to interpret those coefficients and we'll go through examples

of this again not in the next segment but in the last segment and in the next

lecture. And actually, in the lecture after that.

So, we'll do a lot of these examples, this is just the first one.

S, if it does, this is still a little rough, don't worry, we're going to do lots

of examples. And then, this idea of doing things sort

of sequentially or just putting all of the predictors into an equation all at once