April 28, 2007

Group-average correlations between traits do not imply individual correlations between traits

David B from Gene Expression has an interesting post on Correlation and Aggregation:
If data are aggregated and averaged, in some non-random way, the correlation between the resulting average values will often be higher than for the original disaggregated data, and may well increase with the level of aggregation.
David B is talking about a problem that I have talked about before in my post on Inference of between-individual differences from between-group differences.

The problem is the following:

Suppose that you have two traits, say TALLNESS and BLUENESS. You also have N populations of individuals. When you calculate the averages of these two traits in the N populations, you discover that TALL groups tend to be BLUER. Can you infer from this fact that TALL people tend to be BLUER within groups? Can you fact say anything about the relationship of TALLNESS and BLUENESS within groups?

The answer is no. In fact even a perfect positive correlation (+1.0) of TALLNESS with BLUENESS across groups may in fact mask a perfect negative correlation (-1.0) of TALLNESS and BLUENESS among individuals within groups.

The above claim can be proven by example. (b, t) represents the (BLUENESS, TALLNESS) pair of an individual:

Group A: (5,1), (4,2), (3,3), (2,4), (1,5)
Group B: (6,2), (5,3), (4,4), (3,5), (2,6)
Group C: (7,3), (6,4), (5,3), (4,6), (3,7)

Thus, within each of the three groups, there is a perfect negative correlation (-1.0) between BLUENESS and TALLNESS.

The averages of the groups are:

Group A: (3,3)
Group B: (4,4)
Group C: (5,5)

Thus, there is a perfect positive correlation (+1.0) between BLUENESS and TALLNESS across groups.

Of course, we could "pool" all individuals from the three different groups and calculate a single correlation coefficient. In the above example, this correlation coefficient turns out to be -0.5, which is again opposite to what we would expect by looking at the between-group correlation.

The conclusion to bear in mind is that whenever you hear that there is a correlation between two traits across groups (e.g., a party's tally in a state vs. average IQ in a state; a country' average skin color vs. a country's average IQ; a district's average cephalic index vs. a district's average income), then you should always ask: yes, but what about individuals? It may turn out that the correlations will point to the same direction, or that there are no significant correlations for individuals, or even that correlations for groups mask a completely different picture when it comes to individuals.

No comments: