A factor is a basis for categorizing data. For example, if you count the number of sit-ups individuals can do, one basis of categorization is age. For age, you might have the following levels:
Level 0 | 6 years old to 10 years old |
Level 1 | 11 years old to 15 years old |
Another possible factor is weight, with the following levels:
Level 0 | less than 50 kg |
Level 1 | between 50 and 75 kg |
Level 2 | more than 75 kg |
Now, suppose that you made a series of observations to see how many sit-ups people could do. If you took a random sampling of n people, you might find the following results:
Person 1 | 8 years old (level 0) | 30 kg (level 0) | 10 sit-ups |
Person 2 | 12 years old (level 1) | 40 kg (level 0) | 15 sit-ups |
Person 3 | 15 years old (level 1) | 76 kg (level 2) | 20 sit-ups |
Person 4 | 14 years old (level 1) | 60 kg (level 1) | 25 sit-ups |
Person 5 | 9 years old (level 0) | 51 kg (level 1) | 17 sit-ups |
Person 6 | 10 years old (level 0) | 80 kg (level 2) | 4 sit ups |
and so on.
If you plot observations as a function of factor A and factor B, they fall into cells of a matrix with factor A as rows and factor B as columns. Each cell must contain at least one observation, and each cell must contain the same number of observations.
To perform the analysis of variance, you specify an array X of observations, with values 10, 15, 20, 25, 17, and 4. The array Index A specifies the level (or category) of factor A to which each observation applies. In this case, the array would have the values 0, 1, 1, 1, 0, and 0.
The array Index B specifies the level (or category) of factor B to which each observation applies. In this case, the array would have the values 0, 0, 2, 1, 1, and 2. Finally, there are two possible levels for factor A and three possible levels for factor B, so you pass in a value of 2 for the A levels parameter and a value of 3 for the B levels parameter.
You can apply any one of the following models, where L is the specified observations per cell: