In this video we are going to calculate the MANOVA for a large part manually.

We are going to do this based on the following dataset:

we have two dependent variables (dependent variable 1 and dependent variable 2)

and we have an independent variable called 'factor', with three levels: k1 , k2, and k3.

First we are going to calculate the sum of squares of the model.

The sum of squares of the model is the sum of the number of elements of k

multiplied by the mean of group k

minus the total mean,

and that squared.

Let's first display the mean of the groups and then the total mean,

both for the first and for the second dependent variable.

So if I want the mean for the first dependent variable for k1,

then I have to add 4 + 2 + 1 + 4 and divide the resulting number by 4, which results in 2.75.

For k2, I have to add 5 + 6 + 5 + 4 and divide the resulting number by 4, which results in 5.

For k3, I have to add 6 + 8 + 8 + 6 and divide the resulting number by 4, which results in 7.

For the second dependent variable, I have to add 1 + 4 + 3 + 1 and divide the resulting number by 4, which results in 2.25 for k1.

For k2, I have to add 4 + 5 + 4 + 6 and divide the resulting number by 4, which results in 4.75.

For k3, I have to add 8 + 7 + 8 + 6 and divide the resulting number by 4, which results in 7.25.

We also need the total means for both dependent variables.

For the first dependent variable this means we have to add all the numbers and divide the result by 12,

which results in 4.917.

For the second dependent variable that is all numbers of the second dependent variable added and divided by 12,

which results in 4.75.

Now that we have these values, it is fairly easy to calculate the sum of squares of the model,

because what it says here is that we have to do the following calculation: the mean of group k minus the total mean.

If we want to do that for the first dependent variable, then it means that we have to do the following calculation:

the mean of group k1, 2.75,

minus 4.917,

then square the resulting number

and then multiply it by 4,

because factor k1 contains 4 elements.

Therefore, we have to multiply it by 4.

Then we have to add the next factor: 5 minus 4.917 squared.

k2 also has 4 elements.

The last factor, k3, has to be added as well: 7 minus 4.917 squared, and that also multiplied by 4.

These are the calculations for the first dependent variable.

For the second dependent variable you have to do exactly the same thing.

Therefore, for the second dependent variable we have to calculate

2.25 minus 4.75 squared and multiplied by 4

plus 4.75 minus 4.75 squared and multiplied by 4 (that equals 0)

plus 7.25 minus 4.75 squared and multiplied by 4.

If you add all numbers, you have the sum of squares of the model,

for the first dependent variable the sum of squares of the model equals 36.17,

and for the second dependent variable the sum of squares of the model equals 50.

The second thing we need is the sum of squares of the error.

The formula looks like this:

the sum of the variance of k

multiplied by the number of elements of k,

minus 1.

In other words, what we really want is the sum of squares of all the k's together:

so basically the observed score, 4, minus what we predict in the model, so 2.75.

So 4 minus 2.75 squared

plus 2 minus 2.75 squared

plus 1 minus 2.75 squared

plus 4 minus 2.75 squared.

Then you go to the next component:

5 minus 5 squared.

Continue until you reach the next component:

6 minus 7 squared.

If you add everything together, you have the sum of squares of the error for the first dependent variable.

The same goes for the second dependent variable.

1 minus what we predict for the second dependent variable, so 2.25.

We have to square the result,

plus 4 minus 2.25 squared,

and so forth until the last value: 6 minus 7.25 squared.

Now you have the sum of squares of the error for the second dependent variable.

Now that we have these two elements, we actually have enough information to take the next step;

that's basically looking at the covariance,

i.e. how these two variables are related, both for the model and for the error.

What we need to calculate this is the cross-product,

and first we are going to calculate the cross-product for the model.

The formula for the cross-product for the model contains the sum over the groups, so k1 to k3.

What we need is the number of persons within the group

multiplied by the group mean of the first dependent variable

minus the total mean of the first dependent variable

multiplied by the group mean of the second dependent variable

minus the total mean of the second dependent variable.

That looks very complicated, but in fact it is not.

We just need the group mean of the first dependent variable,

that is 2.75 minus 4.917

multiplied by 2.25 minus 4.75.

For this k we have to do this 4 times.

As you can see, these numbers are exactly the same 4 times.

Then we go to the next k. So 5 minus 4.917

multiplied by 4.75 minus 4.75.

We have to do that 4 times for this k, too.

And for the last k: 7 minus 4.917

multiplied by 7.25 minus 4.75.

Here, too, we have to do this four times.

If we add these numbers for all 12 people, which we have just done,

then we have the cross-product for the model.

The next element we need is the cross-product for the error.

The cross-product for the error is the sum of each person's score on the first dependent variable

minus the total mean of that first dependent variable

multiplied by the score on the second dependent variable

minus the total mean of that second dependent variable.

The score of person 1 on the first dependent variable was 4.

From this score we subtract the model, 2.75.

Then we multiply this by the score of person 1 on the second dependent variable

minus 2.25 of what we had predicted.

So we multiply the error of the first dependent variable with the error of the other dependent variable for this first person.

We do the same for the second person up to and including the last person.

For the last person the following applies: 6 minus 7 multiplied by 6 minus 7.25.

Well, if we add all these numbers, we have the cross-product for the error.

With these elements - the sum of squares of the model and the error for both dependent variables

and the cross-products for the model and for the error -

we can make the cross-product matrices.

In the cross-product matrices for the error and for the model we enter the values that we have just calculated.

Let's first make the cross-product matrix for the model.

That one is called H in the book by Field.

In this matrix we enter the sum of squares of the model for the first dependent variable,

and we also enter the cross-product.

We just calculated that, that was 42.5.

42.5 is displayed twice in the matrix.

In the error matrix we display the sum of squares of the error for the first and for the second dependent variable

and we display the cross-product twice.

Well, what we now want to know is the ratio between model and error, just like with an F-value;

in this case the ratio between matrix H and matrix E.

For an F-value you would divide the model by the error,

but now that we work with matrices this is no longer possible, because matrices cannot be divided.

Too bad! What you can do is multiply by the inverse.

Well, the inverse is tricky to calculate manually, but the function 'Solve' in R can do this for you.

That results in the inverse of the error.

First we will display the inverse of the error here.

You can try this yourself in R by creating this matrix and then applying the 'Solve' function to it.

This results in the following values: 0.1036, 0.04670, 0.04670 (of course the same value twice), and 0.09949.

We are not done yet; the last step we have to take is to multiply the inverse that we have just calculated with the model,

and then we have to calculate its eigenvalues.

These are some tricky steps that are explained in the book by Field.

The resulting eigenvalues are 12.744188 - a lot of decimals - and 0.001328.

These eigenvalues are the values we use to calculate the test statistics,

preferably using the Hotelling-Lawley trace or Roy's largest root,

and these values are then used to test whether the MANOVA results in significant differences.