Scott Alexander once wrote an article on Allan Crossman’s suggestion that that parapsychology is the control group of science. In parapsychology we’ve got pretty good reasons to think nothing is going on, so the fact that people keep managing to generate significant results using the same methods they would for any experiment is an indictment of normal scientific processes. I want to think here about another kind of control group- a situation where we know that there really is a relationship, and apply normal social-scientific methods to see if we can recover that relationship. In the case I want to discuss the relationship can be recovered, but the effect size shown using self-report survey data is nothing like we have good reason to think the real effect size is.
Consider the following variables:
“How often do you meet socially with friends, relatives or colleagues.”
“How often do you take part in social activities compared to others of the same age”.
They’re basically identical right? There are only two major ways they can come apart: one takes into account age and the other doesn’t, and one includes the possibility of socialising with relative strangers, and the other doesn’t.
Stop reading now, and ask yourself what you think the portion of variance the one question will explain of the other is. Try to give your honest assessment, not drawing on your knowledge of the title of this piece, or being contrarian because you know I’ve got to be setting up for something surprising.
They are only correlated .36, which is to say that each explains 13% of the variance in the other. When we introduce age into the regression the R² only rises to 16%. To be fair, for many variables this wouldn’t be a terrible degree of correlation. For two variables that are almost conceptually identical though? It seems abysmal. Whatever the true relationship between meeting socially with friends, relatives or colleagues and taking part in social activities more or less compared to others(1) I suspect it’s closer to R²=.9 than R²=.16
To put this in perspective, to be 80% confident of detecting a relationship of a magnitude of .36 with a significance cut-off of .05 you would need a sample size of 58. This for two variables which are, on paper, essentially the same thing!
Edit: And look, a lot of people keep asking me if I’m sure it’s not something to do with age. It absolutely isn’t. Even if we take everyone out except people who are exactly thirty years old (and yes, due to the massive size of survey this does still leave us with an adequate sample), the size of the correlation actually falls to .3. And yes, there is still sufficient variance in both questions that the correlation could and should be higher than this.
There are three possibilities:
- There is a lot of socialising going on with people who are relative strangers. This socialising isn’t very collinear with socialising with friends, relatives and colleagues, and in fact this socialising explains almost all of the variation in engaging in social activities, and in the difference between the two variables. I find this intrinsically unlikely.
- People are mistaken in their interpretation of these questions, or in their recollection of their own activities, or in their understanding of how much others their own age socialise. These mistakes cause the answers to the two questions to diverge.
- Tricky mathematical issues like restriction of range, non-linearity etc. The short answer here is having looked at the data I suspect some of the reduction in magnitude can be explained this way, but not all or even most. If someone better equipped for this kind of analysis wants to give it a try, the data is here: http://www.europeansocialsurvey.org/.
For my part this makes me think that, in the future, I should be more careful of dismissing statistically significant correlations in self report surveys simply because the effect size is very small, since apparently conceptually almost identical questions can be related only very weakly. I’ve long been a partisan of the view that effect sizes should matter as much in evaluating hypotheses as significance tests- everything correlates at 0.1 in the social sciences, because everything is connected in vast matrix of causal relationships. Thus if you can’t find a decently sized R value your hypothesis is on shaky ground even if the P value is very low. Perhaps though I should reflect more on how observed effect sizes can tell us very little about the real size of the relationship.
I’ve also long been pretty blasé about self-report, tending to side with those who say that it’s flaws have been over-blown and it’s a lot more reliable than people give it credit for. Maybe this was also too hasty, maybe self-report data is a lot worse than I had imagined, especially if it involves assessing yourself against the average.
Perhaps one study worth considering is informally ‘bench-marking’ correlations- comparing correlations we are interested in against another correlation in the same population found using the same method (e.g. self report), that we have theoretical reasons to believe is strong in reality, whatever our observations. I know that when looking at the European Social Survey from now on, I’ll be thinking of correlations in terms of fractions of the relationship between socialising compared to others and frequency of social activities. A correlation of .2 or even .1 now seems far more impressive in this population.
— — — — — —
(1). Note that I’ve removed age from the description of the variable because we’ve controlled for age