This is an interlude to “A Philosophy for a New Old Welfare Economics” but doesn’t require you to have read the previous post
1. The broad outlines of the problem
Sometimes a change makes everyone in a group better off, sometimes a change makes everyone in a group worse off, in these cases, it is not hard to say that the group is better or worse off. Sometimes however a change makes some people in a group better off, and others in the group worse off. Under these conditions, we might want to know whether the group is, on the whole, better off.
Intuitively sometimes changes do make some people worse off, but make the group as a whole better off. Suppose for example Janet and Jill have one car between them. Janet wants to go to her dear brother’s wedding, while Jill wants to go out partying. Allocating the car to Janet will make Janet happier and Jill less happy, and vice versa if we allocate the car to Jill- but intuitively it is likely that the pair will be happier overall if the car goes to Janet, since her brother’s wedding will be of greater importance to her.
Many popular positions in ethics require us to know how our policies will affect aggregated welfare, including utilitarianism, prioritarianism and nearly any form of consequentialism in which welfare is an input. It may be of some embarrassment to these views therefore that there is no generally accepted method for arriving at an aggregation.
I am concerned about this question because I am concerned about the optimal distribution of consumption and income. Even if that is not an interest of yours though, there are many other reasons to be interested. Suppose for example you’re interested in AI safety. I can only imagine that not having a clear sense of what it means for a group to be better off in aggregate or “overall” is going to make the design of a benevolent AI harder.
2. Two concepts of welfare and the puzzle as it applies to each
It turns out the difficulties of the problem vary with which kind of welfare you are looking at. For one kind of welfare one problem is more pronounced, and for the other, another problem is.
2.1 Hedonic utility
The first concept of welfare we will consider is happiness, by which we here mean a preponderance of positive over negative emotion.
Often psychologists ask questions about subjective happiness, i.e. “On a scale from 0 to 10, how happy are you overall”. These allow for very easy averaging of happiness over a whole group, while this procedure is certainly mathematically possible, is it legitimate?
There are really two problems here. The first is the problem of knowing that your 7 is equivalent to my 7. Call this the interpersonal comparison problem. This problem is, I think, overblown. It is very unlikely that my 7 is exactly the same as your 7, but if there were no correspondence at all, psychologists couldn’t use regression at the level of a population to predict happiness or use happiness to predict other variables. That such an enterprise is possible commends the idea that my 7 is in expectation similar to your 7.
The second problem is more subtle. In order to work out average happiness we’ve got to know not just what your 7 is to my 7, your 6 is to my 6 and so on, we’ve also got to know something about the interval between your 6 and your 7, your 7 and your 8 and so on.
The simplest case would be one in which the gap between each pair of adjacent responses (2 & 3, 6 & 7, 9 &10 etc.) was equal, but this need not be true, in fact it is quite unlikely to be true. For example, for a lot of people “10” represents an almost transcendent level of happiness, one that they will be blessed with for only a few moments throughout their life, if at all, whereas “0” is equal to a pit of suffering and despair hellish beyond imagining. This is because respondents intuitively assume the scale should be capable of representing the whole range of human emotion. This suggests that towards each end of the scale the gaps between each response lengthen.
This can really make a difference. For example Bond & Lang find that which model we use of the distribution of gaps between pairs of possible responses can greatly alter the rankings of which countries are happiest. https://voxeu.org/article/using-happiness-scales-inform-policy-strong-words-caution
Maybe you think that we can use a similar argument about regression to establish linearity as we used to establish the comparability of 7’s. Happiness has a linear relationship with many things- how would this make sense if happiness scales weren’t roughly linear in actual happiness? The answer is that these linear correlations between happiness scales and other variables could really be non-linear correlations between the underlying happiness and the variable of interest- or the variables of interest could be non-linear in their own respective scale. For example, if a scale of happiness has a linear correlation with a scale of mental health, this may be because the gaps between answers get longer at the ends of both scales.
What’s the solution then? With existing data I don’t know that there is one, though I would suggest continuing to use arithmetic averaging of scales in the absence of an alternative- in conjunction with a healthy dose of common sense and sensitivity modelling.
Myself and a friend of mine called Kieran (it was mostly his idea) have come up with a range of experimental studies that could get us the information we need. It’s a little bit complex, but the basic idea is to have experimental participants either rate or choose between different possible lives.
One of the simplest ways to do it would take us outside hedonistic utilitarianism as strictly understood into a kind of preference utilitarianism. Viz:
1. Ask people to state how happy they are now to get them used to the idea of rating happiness from 0 to 10.
2. Offer them a series of choices of the form: “Would you prefer a 50% chance of feeling like you would if your happiness was a 6, and a 50% chance of feeling you would if your happiness was a 8, or a 100% chance of feeling like you would if your happiness was a 7”. Use these responses to assign Von Neumann-Morgernstern decision utilities to each level of happiness- more on decision utilities in the next section.
We can then use utility weights this generates in averaging happiness.
Note: there are other options if we would prefer to remain strict hedonic utilitarians, but the experimental designs are rather more complicated and involve problems like trying to establish additively separable “good” and “bad” modules for lives, combining them into lives and having participants rate the resulting lives in terms of happiness, so I won’t go into them here.’
2.2 Preference utility
You have preference utility to the degree you get what you want, regardless of whether it makes you feel good or not.
Contra Lionel Robbins this approach to utility fares quite well with respect to the problem of setting intervals. Consider your welfare right now. Now consider what your welfare would be if you won a Bank of Sweden prize in economic sciences in memory of Alfred Nobel. Now consider your welfare if you instead won a Nobel prize in medicine & physiology.
We can very easily find out which you prefer, by asking you. But how can we find out the extent to which you value one over the other? Relatively simple. For any three outcomes (one of which can be “things staying as they are now”) We ask a series of questions of the form:
“Would you prefer an x percent chance of outcome A and a 1-x percent chance of outcome B OR would you prefer a 100% chance of outcome C.”
Let’s say that using this procedure we find that you are indifferent between on the one hand a 25% chance of getting a Nobel prize in medicine and physiology and a 75% chance of no prize at all AND on the other hand, a 100% chance of winning a Bank of Sweden memorial prize in honour of Alfred Nobel. This tells us roughly that, relatively to the way things are now, you would like a Nobel prize in medicine & physiology four times as much as you would like a Bank of Sweden memorial prize.
This, for those following at home, is a very rough description of the Von Neumann-Morgenstern derivation of cardinal utilities using risk.
If the preference approach to utility works well with setting intervals it has somewhat more difficulty though with regards to the problem of fixing a common level of comparison between people. In this regard, its strengths and weaknesses are the mirror opposite of the hedonic utility approach. With the hedonic approach, there was reason to think my 7 is like your 7, but analogous moves seem unconvincing for the preference approach. Your value on going on a date with Felix Biderman is almost certainly not the value I would place on going on a date with Felix Biderman.
I guess if there were self-report tests along the lines that exist for hedonic welfare- “rate the degree to which your overall preferences are satisfied in life from 0 to 10” we could use a similar appeal to statistics manoeuvre to that we used in the case of hedonic welfare and establish statistically that your 7 is likely about equivalent to my 7. Such tests don’t seem to exist though.
Consider income as an example. Using the Von-Neumann derivation of cardinal utilities we can derive the shape of people’s preferences over money- for example- how much more important to them an extra dollar is at $10,000 income versus $50,000 income. However, we can’t derive the magnitude of my curve relative to yours- how much my 50,000th dollar matters relative to your 50,000th dollar.
One option is to pick some income- say $25,000, and stipulate that we are assuming everyone’s utility is equal at that income.
The problem is that this is highly unlikely- surely some people would be closer to fulfilment at 25,000 than others. We do have one powerful tool in our arsenal- equal ignorance- a concept developed by Lerner and further elaborated by Sen. The concept of equal ignorance shows very roughly that if there may be differences in whose preferences are more intense, but we have no information on whose preferences are more intense, the best strategy is just to assume everyone’s preferences are roughly equally intense.
3. A word on risk
In the first post in this series, I argued that while there are epistemic risks inherent in engaging in interpersonal comparison, there are epistemic risks inherent in all sciences, and the choice of which risks we take can, and should, be guided by the social consequences.
There is an obvious counterargument to this line of reasoning which I foolishly omitted to discuss for the sake of a clean exposition- viz:
“If we’re going to be attendant to the epistemic risks of not founding our welfare economics on interpersonal comparisons, we must equally consider the social risks of founding our welfare economics on interpersonal comparisons and getting it wrong. This might be altogether worse than leaving interpersonal comparisons untouched.”
My defence against this counterargument is simple- regardless of whether economics attempts to tackle interpersonal comparison, the demos, politicians, bureaucrats and individuals will continue to do so. Only they won’t have access to nearly as many scientific tools or mathematical tools as welfare economists, so the chances of them getting it even approximately right are lower. Better then to make some effort at studying this, however difficult. No more searching for the keys where the light is.
4. Conclusions
The problem of interpersonal utility aggregation first rose to its absolutely dominant position as a “block” against a certain line of research in the 1930s, when behaviourism was a powerful tendency. However, when we freshly reexamine the issue it is not clear that the problem holds such terror outside the context of behaviourism. While there is no risk-less, entirely clean way to work out an aggregate or average utility there are reasonable options.
5. Postscript- average vs aggregate
After I posted this it occured to me that given the title of the piece mentions both aggregate and average utility I should really explain and describe the difference.
Average utility is just that- the average. You don’t need to know a zero point for average utility, much like you can average temprature on a Farenheit scale even without knowing what true absolute zero is on that scale (there is a zero of course, but it’s not the real zero point).
Aggregate utility is equal to the sum of everyone’s utility. For this you do need to know the zero point.
The two can be used equivalently when the population is fixed- but when policy can alter the size of the population the difference becomes important.
To use aggregate utility it will be necessary to find a non-arbitary “zero point” of utility- which intutively is equal to the level at which you would be indifferent between living or dying. It’s unclear that such a point exists objectively however.
Links to the rest of the series
Part 1 of “A Philosophy for a New Old Welfare Economics” can be read here
Part 2 is still forthcoming