I. Introduction
Ties in the ratings that commercial wine competitions employ to grant awards are common, controversial, and they can be difficult to break.
Ties are common because, in all of the dozens of competitions reviewed by the authors, awards are based on averaging ratings assigned by panels of judges. Whatever the merits and demerits of this approach, averages are easy to calculate and easy to communicate. The authors accept that ease of calculation and clarity of communication are important for judges and wine competition administrators, so rather than debating the merits and demerits of averaging ratings, this article focuses on ties in those averages. Ties are common and examples include the 2019 California State Fair Commercial Wine Competition, where 72% of Gold Medal awards were part of a tie, the 2016 Wines of Portugal Challenge, where more than 50% of average scores were involved in a tie, and the 2019 Setúbal Challenge, where 20% of average scores were part of a tie.
Ties are controversial because wine competitions differ in what they expect from judges and the significance they attach to the awards that they grant. The Executive Director of the Cloverdale Citrus Fair, home of the San Francisco Chronicle Wine Competition, stated “we don't break ties” and the Fair awarded two best of show awards in four of the five categories in 2018 (Kallen, Reference Kallen2013; College Cellars, Reference Cellars2019). The California State Fair is similar; the Fair awards medals to all wines that qualify, without limitation. In sharp contrast, the Indy (Indianapolis) International Wine Competition Judging Guidelines state that ties shall be broken and that only one Gold, Silver, and Bronze medal shall be awarded in each category (Indy International Wine Competition, 2019). In Portugal, two large wine competitions limit medals to certain percentages of the wines entered.
It can be difficult to break ties in a way that is easy to calculate, easy to communicate, effective, unbiased, logical, and fair to judges and vintners. That difficulty led the organizers of several large commercial wine competitions to ask the authors to propose tractable methods of breaking ties. This article is a response to these requests. Data on the prevalence of ties are presented in Section II, different methods of breaking ties are described in Section III, these methods are evaluated in Section IV, and then conclusions follow in Section V. In addition, the supporting data and MATLAB code are available on request, and code files are indicated in the text by (BTfilename).
II. Prevalence of Ties
This section presents several examples of the prevalence of ties, explains why they are common, and then summarizes the difficulties that ties can cause in wine competitions.
The 19th Setúbal Challenge was held in Setúbal, Portugal on March 14 and 15, 2019. A jury of eight or nine judges evaluated and scored each of a total of 207 different blind samples. The tasting protocol was a sequential taste-and-score and in accordance with the International Organization of Vine and Wine (OIV) guidelines. The samples were grouped into similar series, such as the white Protected Geographical Indication (IGP) and red Denominação de Origem Protegida (DO). Each judge assigned a score between 50 and 100 for each wine. For still and sparkling wines, Gold Medal wines had to have a minimum score of 88 and silver medal wines a minimum of 82. Gold Medal fortified wine had to have a minimum score of 90 and silver fortified a minimum of 86. In addition, a percentage constraint limited the potential effects of score inflation. The number of gold medals in each category could not exceed 33% of those who earn gold and silver, and the number of gold and silver medals could not exceed 30% of all the wines entered in each category of the Challenge. No bronze medals were awarded at the Setúbal Challenge.
The prevalence of ties in the averages of scores assigned by the judges in the 2019 Setúbal Challenge appears in Table 1 (TBset). Average scores are calculated and compared here at the MATLAB default precision of 16 significant digits. Rounding to lower precision would yield more ties. Table 1 shows that 12 / 45 = 27% of white IGP and 27 / 90 = 30% of red IGP had average scores that tied with at least one other wine. Considering all the categories, on average, 41 / 207 = 20% of the wines were members of a tie in the average score. Of the wines that earned an average score high enough to qualify for a Gold Medal, (56–6) / 56 = 90% were members of a tie for Gold.
Table 1 Summary of Ties in Three Wine Competitions
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20191218172435990-0932:S193143611900035X:S193143611900035X_tab1.png?pub-status=live)
Notes: a The number of wines qualifying, according to an average rating assigned, for a medal in each category. Peninsula de Setúbal: score > 90 for fortified and >= 88 for all other wines. Wines of Portugal: score >= 90 for all wines. California State Fair: gold medals as assigned.
The 2016 Wines of Portugal Challenge was held from May 9–12, 2016 at the premises of the National Agriculture Fair in Santarém, Portugal. See Bodington and Malfeito-Ferreira (Reference Bodington and Malfeito-Ferreira2018) for a description of the protocol, scoring guidelines, and an analysis of the results. In sum, 151 judges sampled 1,328 wines and turned in a total of 8,445 scores. Four different medals were awarded. Bronze for wines with a mean score of at least 80 points (up to a maximum of 25% of all award-winning wines including Gold and Silver), Silver for wines with a mean score of more than 84 points (up to a maximum of 12% of all wines entered), and Gold for wines with mean scores of more than 90 points (up to a maximum of 6% of all wines entered). A fourth medal, Great Gold, is awarded by a Grand Jury to the best wine in each of several categories (up to a maximum of 25% of the number of gold medals). Results for eight categories appear in Table 1 (TBpc). On average, 675 / 1,328 = 51% of the wines were members of a tie in the average score. Of the wines that earned an average score high enough to qualify for a Gold Medal, (210–8) / 210 = 96% were members of a tie for Gold.
The 2019 California State Fair Commercial Wine Competition (CSF) was held for three days during May 2019. Eighteen panels composed of three judges sampled 2,811 wines in 91 categories. Each judge assigned a Gold+, Gold, Gold-, Silver, Bronze or No Award to each wine that he or she tasted. Each wine was rated by each judge on one of the panels. Among the three medals assigned to each wine, competition officials then chose the most frequent or middle award as the final medal for each wine. The authors obtained the judge-specific and final awards from the CSF by filing a California Public Records Act (PRA) request.Footnote 1 The results for the ties appear in Table 1 (TBcsf). Of the wines that competition officials granted a Gold Medal, (325–91) / 325 = 72% were members of a tie for Gold.
A trend in the previous data is obvious. The more wines are grouped, the larger the sample size, the more common are ties. Of course. When the scores that judges assign are within a bounded set of integers, the number of potential averages is finite, thus, the probable number of ties must increase with sample size. Rounding averages of scores to the nearest integer, or choosing one from a small set of medal categories, further increases the number of ties.
More than sample size and rounding contribute to the prevalence of ties. Judges’ ratings are not uniformly distributed. Some ratings are more likely than others, so some means are more likely than others, thus, ties between these means are more likely. In addition, Bodington and Malfeito- Ferreira (Reference Bodington and Malfeito-Ferreira2018) and Bodington (Reference Bodington2017a) showed that anchoring is common. For example, a score of 89, just one point above the 88-point threshold for a gold medal, was the most often assigned score in the Setúbal Challenge. The higher frequency of this score means that averages that include this score are also more common.
Ties create several difficulties for wine competition officials. First, ties make it tough to differentiate between wines. The vectors of judges’ scores may not be the same, only the average of these scores are the same. The individual judges, thus, may not agree that the wines are the same, the tie is a fiction of the averaging methodology alone. Reporting only averages and ties to consumers may erode the perceived value of judges’ efforts.
Reports of score inflation in the wine trade press are eroding the perceived value of judges’ efforts, and this inflation compounds the potential harm of ties. Laube (Reference Laube2013) wrote in Wine Spectator that “Wine scores seem to get bigger every year.” Goode (Reference Goode2017) states, “Score inflation is everywhere and its killing wine criticism.” Fridjhon (Reference Fridjhon2019) reports that from 2017 to 2018 in the Prescient Cabernet Report, the number of 90-point-plus wines increased from 25% to 45% of total entries. He says that quality cannot have increased so much so fast and, thus, concludes that the increase is “incontrovertible evidence of ratings devaluation.” That inflation in scores concentrates scores within and narrower range and that concentration then increases the number of ties.
Another difficulty occurs when, as in the Setúbal Challenge, awards are subject to a maximum percentage limitation. Suppose, for example, that gold is awarded to wines scoring at least 88, but not more than 33% of wines qualifying for gold and silver, and suppose that the means for four wines are (90, 90, 87, 84). This tie at the top makes the rules impossible to follow. Although actual examples are not that simple, the principle holds, and officials must either award a gold medal to too many wines or they need to break the tie. This difficulty is not unique to wine competitions. Immigration, Refugees, and Citizenship Canada assigns scores to people who apply for Canada's Express Entry program (Canada Abroad, Reference Abroad2019). The nation also has a quota—it has found that the number of applicants exceeds the quota—and also that there are many applicants with the same minimum acceptable score. Canada breaks ties in scores by prioritizing earlier applications. Ashlagi and Nihzad (Reference Ashlagi and Nihzad2017) found a similar problem in the child-school matching algorithms employed by public school systems. There were more applicants with similar qualifications than seats. The school systems they examined break ties using a random draw.
III. Practical Tie Breakers
There are many methods for breaking ties. Many of these methods employ information other than scores or ratings equivalent to scores. Canada's use of earlier application dates to break ties in Express Entry application scores is an example. Some sport competitions break ties between athletes’ average scores by considering performance in previous competitions or sub scores for “overall impression.” At the Indy International Wine Competition, the Chief Judge breaks ties by tasting the subject wines and reviewing the judges’ tasting notes. None of these methods are considered here. This analysis focuses on competitions in which the only information available, or what can be practically considered, is a vector of scores for each wine.
There is a second limitation to this analysis. As stated in the Introduction, this article accepts that average scores are widely employed because they are easy to calculate and easy to communicate. To be useful to wine competition administrators, this must also be true for the methods used to break ties. The methods below (1) have been employed in a commercial or publicly-sponsored competition of some type, (2) have been published in use for aggregating scores and/or breaking ties, (3) employ only addition, counting, and/or division, and (4) can be easily implemented in Microsoft Excel. To date, the authors are unaware of any literature showing that methods using transformations, such as from scores to ranks, or exponential functions, such as variance, have been or would be routinely used in major competitions.
Smith and Smith (Reference Smith and Smith2007), in an analysis of range voting, describe four methods of breaking ties in score averages. Their first tiebreaker is merely a random draw, a coin toss for example (“Random Draw”). The second is a random draw from each set of scores whose averages are tied and the higher of the two scores drawn wins the preference (“Random Draw of Scores”). For example, let the sets of scores assigned by four judges to two wines be (75, 80, 85, 90) and (80, 80, 80, 90). The average of both sets is 82.5. A random draw from each set could yield (75, 80) and the second set in this example would win the tie. A third method depends on the median of each set, and the set with the highest median wins the tie (“Median”). The medians of the sets are (82.5, 80.0), so using Median, the first set won the tie.
The fourth method described by Smith and Smith employs a count of the number of scores higher than average in each set, and the set with the highest count wins the preference (“Count > Average”). The Count > Average for the two sets above is (2, 1), so by this method, the first set will win. Smith and Smith recommend this method and describe it as maximizing the number of voters who are “happy” with the result. Their use of “happy” appears to be an application of the concept of economic utility. The data and sample sizes in this application to wine competitions severely limit what analysis of utility may be possible. Subject to this qualification, a variation proposed here is the number of judges who assigned above-average scores for the tiebreak winning wine, plus the number of judges who assigned below-average scores for the tiebreak losing wine. Both types of judges would feel affirmed by the direction of the break (“Judges Affirmed”). For example, again, let the sets be (75, 80, 85, 90) and (80, 80, 80, 90) and both have an average of 82.5. The judges affirmed for winning set one are 2 + 3 = 5. The judges affirmed for winning set two are 1 + 2 = 3. The maximum of Judges Affirmed, thus, breaks the tie in favor of set one.
Another method for breaking ties is the approach used in many Olympic events to calculate an aggregate score. Olympic events that rely on panels of judges who assign scores to competitors’ performances include diving, figure skating, gymnastics, and snowboard halfpipe. In diving, seven judges assign scores, the two highest and the two lowest scores are discarded and a competitor's final score is the sum of the remaining scores multiplied by a diving difficulty factor. For figure skating, nine judges assign scores to various aspects of each competitor's performance, the highest and lowest scores for each component are discarded, and the competitor's final score is the average of the remaining scores. For gymnastics, six judges assign scores, the highest and lowest scores are discarded, and then a competitor's score is the average of the remaining scores. Finally, like gymnastics, six judges score competitors in the halfpipe, the highest and lowest scores are discarded, and then a snowboarder's score is the average of the remaining set of scores. Although those sports and scoring systems differ in many respects, all four discard the highest and lowest one or two scores, and then a competitor's rating is the average of the remaining scores. Using the sample data above, discarding the highest and lowest scores (“Olympic Average”) yields an average score for the first set of 82.5 and 80.0 for the second set. In that example, the tie is broken in favor of the first set.
Before moving forward, converting scores into rankings and then comparing ranking aggregates is another tiebreaker to consider.Footnote 2 See examples of using rank-based methods to compare 5 to 15 wines, published by authors skilled in mathematics and the logic of transitivity, in Liquid Assets (Reference Assets2019), Quandt (Reference Quandt2012), Ginsburg and Zang (Reference Ginsburgh and Zang2012), and Hulkower (Reference Hulkower2012). Methods for ranking and comparing wines include ranking means, Borda counts, and Shapely values. Methods for dealing with ties in the ranks include prohibiting ties, assigning the same rank to members of a tie, assigning a mean ranking, and evaluating the expected values of tie permutations. Methods for ranking and comparing ties between different, but in some cases overlapping sets of wines, were not addressed. The three commercial competitions employed in this article involved more than 200 to more than 2,800 wines, and each judge assessed from 50 to more than 100 wines in a sequential taste-and-score protocol. The lack of consensus in the literature about ranking methods, resolving ties between ranks, comparing different sets, and the difficulties in communicating about ranking methods among commercial wine competition officials, judges, and consumers put, for now, rank-based tiebreakers outside the criteria established at the beginning of Section III.
The simple examples above demonstrate the problem at hand. Not only are averages in scores tied, but depending on the judge-specific ratings, different methods of breaking ties can yield different results. Several aspects of each method are, thus, compared in the next section.
IV. Comparison of Tie Breakers
As shown in Table 1, there were 430 ties in average scores among the 692 red wines judged at the 2016 Wine of Portugal Challenge. That large sample is employed in this Section to answer four questions about each of the tiebreakers described in Section III. Is it effective? Is it biased? Does it maximize the judges affirmed? Is it directionally consistent with a method of aggregating scores that is not prone to ties? The analytical approach to answering each question is described below and the results appear in Table 3.
Is it effective? Some of the tiebreakers are prone to ties, thus, these methods are less effective and useful for wine competition administrators. Table 3 shows the percentage of ties in average scores that each method did break. The maximum is 100% and the random effectiveness is 50%.
Is it biased? A tendency to break a tie either for or against the first wine of two wines in a tie could indicate that a tiebreaker method is prone to serial position bias. Wine competition administrators are unlikely to be interested in a method that does not seem fair. The potential for position bias is tested here by calculating the percentage of ties that are broken in favor of the first wine. For an unbiased method, this percentage should be approximately 50%. A higher percentage indicates bias toward the first wine, and a lower percentage indicates bias toward the second wine.
Does it maximize the judges affirmed? Smith and Smith (Reference Smith and Smith2007) favored the tiebreaker that maximized the number of “happy” voters. A variation of “happy,” Judges Affirmed, employs the sum of number of judges who assigned above-average scores to a tiebreaking winning wine, plus the number of judges who assigned below-average scores to the tiebreaking losing wine. Both types of judge would feel affirmed by the direction of the break.
For every tie, a tiebreaker method in Section III is assigned either unity, if it breaks a tie in the same order as that implied by maximizing Judges Affirmed, or zero if it does not. The average of those assignments over the 430 ties is then reported as a percentage in Table 3. 100% indicates that the direction of a tiebreaker matches the direction of maximum affirmation, and 50% indicates a random result.
Is it directionally consistent with a method of aggregating scores that is not prone to ties? A probability mass function (PMF) for the distribution of ratings that judges assign to a wine appears in Table 2, Equation (1). See Mallows (Reference Mallows1957), Marden (Reference Marden1995), and Alvo and Yu (Reference Alvo and Yu2014) for applications of this functional form and Bodington (Reference Bodington2017b) for a previous application of this form in wine scores. Equation (1) is a discrete and bounded function that reflects the stochastic nature of the judges’ scores, and a maximum likelihood estimate (MLE) of the central tendency is calculated by maximizing the log likelihood in Equation (2). As shown below, MLE of central tendency is not prone to ties.
Table 2 A PMF for Stochastic Ratings
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20191218172435990-0932:S193143611900035X:S193143611900035X_tab2.png?pub-status=live)
The central tendency MLEs yielded by Equations (1) and (2) are employed here as a directional check of the more practical tiebreakers described in Section III. For every tie, the method in Section III is assigned either unity, if it breaks a tie in the same order as that implied by the corresponding central tendency MLEs, or zero if it does not. 100% indicates that the direction of a tiebreaker matches the direction to the MLEs every time, and 50% indicates a random result.
A comparison of tiebreaker methods is shown in Table 3. For the sample of 430 ties, none of the methods appear to be materially biased or directionally inconsistent with a method not prone to ties. Results for those measures of comparison cluster at about 50%. In contrast, results differ regarding effectiveness and maximizing the share of judges affirmed. Random Draw, Random Draw of Scores, and Olympic Average cluster as the most effective followed by Median. For maximizing the judges affirmed, the top three methods are Judges Affirmed, Count > Average, and Median followed by Olympic Average. However, the results below also show that maximizing the number of judges affirmed is no more than randomly consistent with the implications of Equations (1) and (2). This finding decreases the importance of maximizing the number of Judges Affirmed and leaves the most support for Olympic Average, followed by Median. In addition, the Olympic Average has the added imprimatur of widespread use in Olympic contests.
Table 3 Comparison of Tiebreaker Methods
(2016 Wines of Portugal Challenge, 692 Red Wines, 430 Ties)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20191218172435990-0932:S193143611900035X:S193143611900035X_tab3.png?pub-status=live)
V. Conclusion
Ties between the averages of scores assigned by judges to wines in commercial wine competitions are common. Those ties make it difficult for competition officials to differentiate between wines, they erode the perception of judges’ expertise, and they can make compliance with competition rules arithmetically impossible.
Responding to a request from competition officials, this article presented and evaluated six methods for breaking ties in averages of scores: a random draw, a random draw of scores, the median, a count of above-average scores, the maximum of judges affirmed, and the Olympic Average. Using an Olympic Average to break ties in averages of wine scores is easy to calculate, easy to communicate, effective, unbiased, has the Olympic imprimatur, and it is not inconsistent with the implications of a method of aggregating scores that is not prone to ties.