I. Introduction
The common method for ranking n wines evaluated by m judges is the following. Each judge has to rate the n wines. The points are then aggregated and the outcome results in a unique ordering (though there may be ties). In the famous 1976 Paris wine tasting,Footnote 1 11 judges had to taste 10 wines and give marks between 0 and 20 to each wine. The marks were then simply added, and a ranking was computed on the basis of the average marks.
There are two problems in ranking on the basis of marks. First, some judges are generous, and give high marks; some are less so and give low ones. In the Paris wine tasting, the most generous judge had an average mark of 13.8, while the strictest averaged only 9.2. Secondly, the range of marks used by judges can vary dramatically. In the Paris tasting, one judge graded between 2 and 17, while another chose the range 8 to 14. As noted by Ashenfelter and Quandt (Reference Ashenfelter and Quandt1999), this “may give greater weight to judges who put a great deal of scatter in their numerical scores and thus express strong preferences by numerical differences.”Footnote 2 And indeed, the aggregate ranking based on the judges’ ranks is different from the one based on their scores. This had already been observed by Borda (Reference Borda1781).
To show, however, that this method may not be satisfying either, consider the following example of 3 wines A, B and C, and 60 judges.Footnote 3 23 rank A first, B second, and C third; 2 rank B first, A second, and C third; 17 rank B first, C second, and A third; 10 rank C first, A second, and B third; finally 8 rank C first, B second, and A third. Counting the ranks for each wine, adding them and then averaging, turns out to rank B first, A second and C third. Consider now pairs of candidate wines: A wins against B by 33 (=23 + 10) to 27 (=2 + 17 + 8); B wins against C by 42 to 18. Obviously, if A wins against B and B against C, it should be true that A is the winner. But C wins against A by 35 to 25, so C wins against A. This is an example of the Condorcet Paradox and non-transitive aggregate preferences, and there is in reality no (so-called Condorcet) winner.Footnote 4 It represents a special case of Arrow's Impossibility Theorem (Arrow, Reference Arrow1953) that can be (loosely) stated as follows:
When there are at least three choices, there is no aggregate ranking that can simultaneously satisfy the following four properties or axioms:
Property 1. Unrestricted domain. All individual preferences are allowed.
Property 2. Pareto efficiency. If every judge ranks A before B, then the aggregate order must rank A before B.
Property 3. Independence of irrelevant alternatives. If A is ranked before B, then introducing a new choice C (or discarding a choice C from the list of choices) must not make B ranked before A: C is irrelevant in the choice between A and B.
Property 4. Non-dictatorship. No judge can impose his own ranking.
It is, therefore, impossible to construct an unassailable method that produces a ranking of wines based on the rankings of those who judge them.
Borda (Reference Borda1781) also suggested what came to be called approval voting,Footnote 5 in which each judge can cast a vote for as many candidates (wines) as she wishes, without ranking them. The votes are then added for each candidate, so that candidates can be ranked. We suggest a variant of approval voting in which, if a judge votes for a sub group of size k, 0⩽k⩽n of the n candidate wines, then each of them gets a fraction 1/k of one vote (a judge who chooses k = 0, does not vote). These fractions of votes are then added as above, and a ranking is computed. Though both procedures look simpler than ranking, since they only require to choose or not to choose a wine as “meritorious,” they do not escape Arrow's impossibility theorem. However, the method that we suggest asks each judge for a partial ordering over the wines' space, while the method used in the Paris tasting requires from each judge a complete ordering over that space. In our case, this leads to a unique ranking (possibly with ties) that satisfies the conditions (axioms) imposed by Shapley (Reference Shapley, Kuhn and Tucker1953) to obtain the so-called Shapley Value, which measures the “power,” “influence,” or “weight” of each candidate.
The method is based on Ginsburgh and Zang (Reference Ginsburgh and Zang2003, Reference Ginsburgh and Zang2004) who suggested employing the Shapley Value in the sharing of the income generated by a museums pass program. The theoretical model, suggested in Ginsburgh and Zang (Reference Ginsburgh and Zang2003), applies to the problem presented here, and the two problems are analytically identical. In Ginsburgh and Zang (Reference Ginsburgh and Zang2003, Reference Ginsburgh and Zang2004) the players in the game are the participating museums. Here they are the competing wines. In both cases, the special structure of the problem yields a very simple procedure for calculating the Shapley Values.
The paper is organized as follows. Section II develops the concept of Shapley Value that leads to a ranking of wines, and shows how these values should be computed. Section III is devoted to the examples of the 1976 Paris wine contest, and the 2012 Princeton Judgment. Section IV concludes.
II. Shapley Ranking
We assume that each judge is allowed to vote once, and that she can vote for any sub group of the n wines. By voting for such a group, she indicates that she favors any wine belonging to this group over wines that are excluded from the group, and that, as far as she is concerned, every wine chosen is a candidate for the first place or a medal, while non-chosen wines are not. Note that a group can consist of a single wine, or of all the n wines, or it can be empty (no vote). Judges vote simultaneously so that none of them is aware of another's choice, and no judge can vote twice for the same wine.
We now turn to how scores are calculated. In approval voting, one point is assigned to each wine belonging to the group chosen by a judge, and the sum over all judges of points collected by each wine is computed. The winner is the wine that collects the largest number of points, but all other wines are ranked as well.
The problem here is that a judge who chooses to vote for a large group of wines is exercising more political or strategic influence than the one who chooses to vote for one wine only. The solution is to let each judge have one vote to cast (1 unit), with no option to overspend or leverage on this amount. Hence, when she votes for a group of wines, each group member receives a fraction of her single unit of voting, and these fractions should sum up to 1. This is fractional voting as described, among others, by Nambiar (2012).
A special case of fractional voting is to divide each judge's single unit of voting equally among the members of the group that she chooses. Like in the previous case, these shares are added wine by wine and an overall ranking is computed. The argument for equal sharing of votes is that the judge votes for a group of wines without expressing preferences over the members of the group. It turns out that the total “amount of votes” (hereafter AVs), associated to each wine, is its Shapley Value in a related cooperative game.Footnote 6 As a byproduct, we have described verbally how easy these Shapley Values are to calculate.
A vote cast by a judge includes a group of wines she prefers. Some of the competing wines are likely to be recognized of better quality, hence chosen more often by judges, and accumulating larger AVs. Groups of wines containing a wine of an extreme high quality are likely to be chosen more often. In a similar way, groups containing substitute wines are likely to be penalized, while those containing unique complements are likely to be valued by judges and compensated through their overall ranking.
In view of the above, we adopt the view that the AVs of wines reflect their relative contribution to overall quality, or their attractiveness. To supplement this, we have a powerful game theory tool, the Shapley Value and the theory from which it emanates as a central allocation rule.Footnote 7 Using the background theory established in Ginsburgh and Zang (Reference Ginsburgh and Zang2003), it turns out that the AV of each wine is its Shapley Value, yielding a measure for its overall contribution (or quality, or weight). The Shapley Value is known for satisfying the following set of weak and natural properties:
Property 1. Full Distribution. The total AV, cast by the judges, is fully distributed among the participating wines.
Property 2. Symmetry. If a wine contributes the same additional value (measured by its AV) to each group of wines,Footnote 8 then this will be the AV assigned to this wine.
Property 3. Anonymity. The AVs, allocated to the various wines, do not change if one changes the order in which the wines are processed within the contest.
Property 4. Additivity. If the judges are split into two classes (say California and French wine experts), and the AVs, assigned to the various wines by each class of judges are computed, then the sum of those two AVs would yield the AV obtained by applying the process to the whole un-split population of judges.Footnote 9
Applying these four properties as requirements leads to the unique value system known as the Shapley value (Shapley, Reference Shapley, Kuhn and Tucker1953), where the AV of each wine is its Shapley Value. The Shapley Value allocation is, in general, very difficult to compute once the number of candidates (here wines) becomes large. It turns out, however, that for this particular structured application, the computation is straightforward,Footnote 10 and boils down to the following very simple and intuitive procedure:
The members of a group of wines selected by a judge equally share her endowment of one unit of voting.
The AV of each wine is the sum of its AVs over all judges
The wines are ranked by their AVs – the higher, the better.
The AV of each wine is its Shapley Value in the related cooperative game.
It should be pointed out that the Shapley allocation also satisfies the following additional intuitive property:
Marginality. If an additional judge participates to the contest, then only the AVs corresponding to the group of wines chosen by the additional judge will change, and each wine will receive an equal share of the single unit of voting brought in by her.
The calculations needed to implement the Shapley allocation are now illustrated using the renowned Judgment of Paris as well as its 2012 remake in Princeton.
III. Two Examples: The Judgments of Paris and Princeton
In 1976, Steven Spurrier, a well known English wine trader and owner of the Caves de la Madeleine in Paris, and American born Patricia Gallagher from the French Académie du Vin, turned things upside down by organizing in Paris a blind tasting of white Burgundies and red Bordeaux (four in each case),Footnote 11 and Californian wines (6 whites and 6 reds), at best unknown, at worst ignored in Europe. The eleven judges were all extremely competent wine connoisseurs (sommeliers, producers of famous wines, wine journalists, and owners of Michelin starred restaurants). The tasting ended up electing a Californian wine as winner, both for white wines (Chateau Montelena) and red wines (Stag's Leap Wine Cellars). This resulted in boosting the reputation of Californian wines and the, so-called, Judgment of ParisFootnote 12 changed the traditional view, shared by experts, that only French wines can be of high quality. It led to increased competition between French and Californian wines, and quickly extended to discovering quality wines in many other countries and continents, including Australia, South America and South Africa.
We now analyze the contest for red wines (Table 1), and compare the final rankings using several methods:
(a) Average marks; this was the method used to compute the official ranking of the contest.
(b) Ranking the wines on the basis of the marks given by each judge, adding the ranks, and computing ranks based on this sum, as suggested by Borda (Reference Borda1781), and later by Ashenfelter and Quandt (Reference Ashenfelter and Quandt1999).
(c) Rankings obtained by simulating Shapley rankings.
Before turning to the results, some remarks are needed to explain how we ran the simulations to compute Shapley rankings. Indeed, since the voting procedure of the original contest was not organized on the basis of approval ranking, we do not observe for which wines judges would have voted had they not been forced to rank all ten wines. We therefore had to simulate the number of wines chosen, but, obviously, taking into account the marks that each judge had actually given.
Table 1 The Paris 1976 Wine Tasting: Red Wines, Judges and Ratings
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_tab1.gif?pub-status=live)
Wines: A: Stag's Leap Wine Cellars, 1973; B: Château Mouton-Rothschild, 1970; C: Château Montrose, 1970; D: Château Haut-Brion, 1970; E: Ridge Vineyards Monte Bello, 1971; F: Château Léoville Las Cases, 1971; G: Heitz Wine Cellars, 1970; H: Clos du Val Winery, 1972; I: Mayacamas Vineyards, 1971; J: Freemark Abbey Winery, 1969.
In all experiments we first generate for each judge the size of the group (number of wines) she would have recommended, and then assign to this group the top wines from her list. In the first experiment, we ran three simulations assuming that each judge would have chosen a unique wine, or two wines, or three wines.
In the second experiment, we picked the number of wines chosen by each judge at random. The numbers were generated from a Gaussian distribution with mean 3 and standard deviation of 1. Non-integer numbers were rounded to the closest integer. We ran five such simulations, each time with a newly generated set of random numbers.
In both cases, there is a problem with ties that appear quite frequently, as can be seen from Table 1. Judge Brejoux gives the same marks to wines A and H (14), and to wines C and G (12). Judge Kahn gives identical marks to wines B, C, D and F (12). When there are ties and the tied wines have to be chosen among the one, two or three wines, we introduced all the wines that were tied. This usually results in forcing us to choose more than one, two or three wines. Take for example the case of judge De Villaine, in the case in which we decide to simulate approval voting with two wines. He gives a mark of 16 to wine C, and 15 to wines A and D. This leads us to accept all three wines as being “approved,” while there should only be two.
In the third experiment, we start, for each judge, with the highest grade and then go down, until we reach a gap of two points. Those wines that are rated before the gap occurs are selected. Consider Judge Brejoux in the Paris tasting. He gave 17 to wine D, 16 to wine B, and then there is a gap of two points, since the wine that comes next is A with a grade of 14. So we assume he would have chosen only wines B and D. We run this procedure for each judge, and add for each wine the shared votes.
The results expressed in terms of aggregate rankings obtained using the various methods appear in Table 2. It is difficult to determine which method is preferable on the basis of our simulations. It is however remarkable that wines A, B, C, D and E belong to the group of better wines (with the exception of random simulation 4) whatever the ranking method used. However, the approach that we suggest seems to be better founded than others as it is based on widely used game theory principles, employing Shapley's (Reference Shapley, Kuhn and Tucker1953) axioms and the Shapley Value directly in the voting. Additional results using Condorcet's approach are considered in Appendix 1.
Table 2 The Paris 1976 Wine Tasting: Ranking Wines using Different Methods
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_tab2.gif?pub-status=live)
Wines: A: Stag's Leap Wine Cellars, 1973; B: Château Mouton-Rothschild, 1970; C: Château Montrose, 1970; D: Château Haut-Brion, 1970; E: Ridge Vineyards Monte Bello, 1971; F: Château Léoville Las Cases, 1971; G: Heitz Wine Cellars, 1970; H: Clos du Val Winery, 1972; I: Mayacamas Vineyards, 1971; J: Freemark Abbey Winery, 1969.
Similar computations are made for the Princeton Judgment, organized by George Taber, Orley Ashenfelter and Karl Storchmann during the 6th international conference of the American Association of Wine Economists in June 2012. The same French wines (4 red and 4 white wines) as the ones tasted in 1976, though of more recent vintages, are compared to six New Jersey wines, instead of Californian wines, in each flight.Footnote 13
This time, a French white wine (Clos des Mouches, 2010) and a French red wine (Château Mouton-Rothschild, 2004) are ranked first in each category. But the important conclusion of the ranking, analyzed by Richard Quandt, is that Clos des Mouches is statistically significantly better than the nine other whites, which are all judged of equal quality, while a New Jersey red wine is statistically worse than all other nine reds, but none of the remaining ones, whether French or from New Jersey is statistically different from any other, which implies that Château Mouton-Rothschild and Château Haut-Brion, two French superstars cannot be distinguished from New Jersey reds, which are of course twenty five times cheaper than the two top French clarets.Footnote 14
Results of the tasting for red wines are given in Table 3, and computations identical to those performed for the Judgment of Paris can be found in Table 4. They lead to similar observations as those made for the Paris tasting, except that the simulations are less stable. The reason seems to be that there are numerous ties in the rankings made by the judges, and the fact that often we are led to include more wines than those that are prescribed by our two first selection methods. Given that each unit of vote of a judge has to be shared equally between the wines, this leads to more fractioning compared to the Paris contest.
Table 3 The Princeton 2012 Wine Tasting: Red Wines, Judges and Ratings
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_tab3.gif?pub-status=live)
Note: The letters given to the wines are those used at the 2012 Princeton wine tasting. To be consistent with Table 1 we ranked the wines according to their average marks. Wines: B: Château-Mouton Rothschild, 2004; J: Château Haut-Brion, 2004; D: Heritage Estate BDX, 2010; E: Bellview Lumière, 2010; A: Château Montrose, 2004; G: Château Léoville Las Cases, 2004; F: Tomasello Oak Reserve, 2007; H: Amalthea Europe VI, 2008; C: Silver Decoy Cab. Franc, 2008; I: Four Jg's Cab. Franc, 2008.
Table 4 The Princeton Wine Tasting: Ranking Wines using Different Methods
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_tab4.gif?pub-status=live)
Note: The letters given to the wines are those used at the 2012 Princeton wine tasting. To be consistent with we ranked the wines according to their average marks. Wines: B: Château-Mouton Rothschild, 2004; J: Château Haut-Brion, 2004; D: Heritage Estate BDX, 2010; E: Bellview Lumière, 2010; A: Château Montrose, 2004; G: Château Léoville Las Cases, 2004; F: Tomasello Oak Reserve, 2007; H: Amalthea Europe VI, 2008; C: Silver Decoy Cab. Franc, 2008; I: Four Jg's Cab. Franc, 2008.
IV. Conclusions
The Shapley method is based on a set of reasonable axioms. It is also simpler to use, as judges do not have to rate or rank all the wines they evaluate. It can hardly lead to strategic voting, such as choosing only one wine that one knows and give it the full one vote to enhance its final rank, since the tasting is blind. Finally, it gives results that are quite close to those obtained by the usual ranking methods. Hence, we believe that it should be used more often.
Appendix 1
The example of the Paris Judgment shows that there may exist both a Condorcet winner, a consistent Condorcet ranking of all wines, and Condorcet cycles. Consider first the table that follows which gives the number of wins between all pairs of wines that appeared in the contest.
Table A1 The Paris 1976 Tasting: Number of Wins Between Pairs of Wines
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_tab5.gif?pub-status=live)
In each row of the table, one finds the number of judges who give a higher (or lower) rank to any other wine than the one in the row. For instance, in row A column B the number 6.5 shows that there are 6.5 judges who prefer A to B etc. This number is computed as follows from the numbers in Table A1: 5 judges prefer A to B, 3 judges prefer B to A, and 3 judges are indifferent.
Half of these “indifferent” votes are added to A preferred to B, the other half is added to B preferred to A. Therefore A is preferred to B by 5 + 1.5 = 6.5 judges and B is preferred to A by 3 + 1.5 = 4.5 judges.
Wine C (Château Montrose) is clearly the (unique) Condorcet winner, since it is preferred by at least half of the judges to any other wine in all pair-wise comparisons. Wine I (Mayacamas Vineyards) is the (unique) Condorcet loser since all other 9 wines are preferred to I by a majority of judges.
One possible Condorcet ranking (based on the majority rule) is the following (by X > Y, we mean that wine X has more than 5.5 votes (a majority) when compared to Y; X = Y means that both X and Y have 5.5 votes):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151128095854837-0422:S1931436112000351_eqnU1.gif?pub-status=live)
But, there may exist many other rankings that satisfy the majority rule. However, one can also find Condorcet cycles such as A > B = D > E = C > A.