Evaluating Wine-Tasting Results and Randomness with a Mixture of Rank Preference Models*

Jeffrey C. Bodington

doi:10.1017/jwe.2014.41

Evaluating Wine-Tasting Results and Randomness with a Mixture of Rank Preference Models*

Published online by Cambridge University Press: 25 February 2015

Jeffrey C. Bodington

Show author details

Jeffrey C. Bodington*: Affiliation:
Bodington & Company, 50 California St. #630, San Francisco, CA 94111; e-mail: jcb@bodingtonandcompany.com.

Article contents

Abstract
Introduction
Previous Applications of Rank and Mixture Models to Tasting Data
Blind Tasting of Pinot Gris
Mixture Model of Wine-Tasting Results
Solving for the Unknown Parameters
Pinot Gris Results
Conclusion
Footnotes
References

Rights & Permissions

Abstract

Evaluating observed wine-tasting results as a mixture distribution, using linear regression on a transformation of observed results, has been described in the wine-tasting literature. This article advances the use of mixture models by considering that existing work, examining five analyses of ranking and mixture model applications to non-wine food tastings and then deriving a mixture model with specific application to observed wine-tasting results. The mixture model is specified with Plackett-Luce probability mass functions, solved with the expectation maximization algorithm that is standard in the literature, tested on a hypothetical set of wine ranks, tested with a random-ranking Monte Carlo simulation, and then employed to evaluate the results of a blind tasting of Pinot Gris by experienced tasters. The test on a hypothetical set of wine ranks shows that a mixture model is an accurate predictor of observed rank densities. The Monte Carlo simulation yields confirmatory results and an estimate of potential Type I errors (the probability that tasters appear to agree although ranks are actually random). Application of the mixture model to the tasting of Pinot Gris, with over a 95% level of confidence based on the likelihood ratio and t statistics, shows that agreement among tasters exceeds the random expectation of illusory agreement. (JEL Classifications: A10, C10, C00, C12, D12)

Keywords

mixture model preference rank statistics wine tasting

Type: Articles
Information: Journal of Wine Economics , Volume 10 , Issue 1 , May 2015 , pp. 31 - 46

DOI: https://doi.org/10.1017/jwe.2014.41 [Opens in a new window]
Copyright: Copyright © American Association of Wine Economists 2015

I. Introduction

A mixture distribution is the result of combining the distributions of two or more random variables. The distribution of the mixture is observable, and the underlying component distributions may be unobservable or latent. Bodington (Reference Bodington2012) showed in JWE that the relative ranks assigned by wine tasters are heterogeneous and posited that those ranks have a mixture distribution with random, common preference, and idiosyncratic preference mixture components. Cao (Reference Cao2014), also in JWE, applied a mixture model with two components to results for the 2009 California State Fair Commercial Wine Competition. Cao used a transformation of rank assignments, score simulations, and linear regression to estimate the proportion of scores that appeared to be random and the remaining proportion that appeared to reflect consensus among the wine judges.

This article seeks to build on Bodington (Reference Bodington2012) and Cao (Reference Cao2014) in several respects. Assigning relative preference ranks to wines is common and ordinal scores are also sometimes transformed into preference rankings.Footnote ¹ Ranked preference and mixture models have been applied to tastings as diverse as soft drinks and salad dressings, and the literature for five applications is summarized in Section II. Next, as a foundation for applying such models to wine-tasting data and so that the results herein are replicable, the results of a blind tasting of Pinot Gris appear in Section III. Then, in Section IV, a mixture of Plackett-Luce rank preference models is derived that yields a multinomial probability mass function (PMF) for each wine in the Pinot Gris tasting. The unknown parameters in the mixture model are estimated in Section V using the expectation maximization (EM) algorithm. The additional benefits of this approach include its direct use of observed tasting data, use of information embedded in the variance of observed data, reliance on the standard EM solution, and its feasibility for the small sample sizes that are typical of wine tastings. The mixture model also yields estimates of the share of ranks that appear to be random and the probability of Type I errors.

Section V ends with a test using hypothetical tasting data that have a known result to show that the model is an accurate predictor of observed densities. Further, a Monte Carlo simulation with random rankings yields confirmatory results and shows that the probability of illusory consensus among tasters, a Type I error, is 0.20. In Section VI, the mixture model is applied to the tasting of Pinot Gris. Results show that, with likelihood ratio tests and t statistics supporting a level of confidence of over 95%, random ranking behavior accounts for approximately 30% of ranks and that there is nonrandom agreement among tasters on a common preference order. Conclusions follow in Section VII.

As noted throughout, this article applies ranking and mixture models to tasting results in which each taster determines an order of preference for the wines in a tasting. Each taster's result is a preference order vector. There are related methods for evaluating tied rankings and ordinal scores that are transformed into ranks, and the application of those methods to wine-tasting results must begin with this starting point.

II. Previous Applications of Rank and Mixture Models to Tasting Data

Ranked preference models are employed widely to express and evaluate comparisons. The models are diverse, Mallows (Reference Mallows1957) is an early article, and Marden (Reference Marden1995) is a widely referenced text. Among many applications, ranking and mixture models are employed to divide heterogeneous agent or “expert” or “judge” behavior into latent components, subpopulations, or classes of agents that express homogeneous behavior (see, e.g., Gormley and Murphy, Reference Gormley and Murphy2008; Marden, Reference Marden1995, p. 133; Mengersen et al., Reference Mengersen, Robert and Titterington2011; and Vigneau et al., Reference Vigneau, Courcoux and Semenou1999). In a few cases, ranking and mixture models have been employed to evaluate taste-related data. Those applications are described below.

First, a notional mixture model for ranked data appears in Equation (1). For a vector of observed preference ranks (x), the PMF for those ranks (f) is the sum of the probabilities (π_i) that an expert belongs to a particular latent class (i with a total of n classes) in which each π_i is multiplied by the PMF for that latent class (f _i with parameters θ _i). Under Equation (2), the probabilities are bounded and their sum equals unity. The π_i are also known as mixing proportions, mixture weights, or component weights.

(1)

$$f\left( {{\rm x}\vert{\rm \pi}, \; {\rm \theta}}\right) = \; \mathop \sum \nolimits_{i = 1}^n \pi_{i}\, f_{i} \left({{\rm x}\vert\; {\rm \theta}_i}\right)$$

(2)

$$1.0 = \mathop \sum \nolimits_{i = 1}^n {\rm \pi}_i \;{\rm and} \; 0 \le \; {\rm \pi}_i \; \le 1.0$$

Critchlow (Reference Critchlow1985) analyzed the preferences of 16 mothers and 22 boys for five types of crackers (Critchlow, Reference Critchlow1985, p. 119; Linacre, Reference Linacre1992, p. 6; Marden, Reference Marden1995, p. 284). In the framework of Equation (1), Critchlow estimated parameters for two classes, both f _i (x|θ _i) were Plackett-Luce models, and Critchlow solved for θ using a maximum likelihood estimator (MLE). In Placket-Luce, each unknown parameter in θ is the probability that the respective object will be selected as first or most-preferred among the choices (Marden, Reference Marden1995, pp. 119, 216; Plackett, Reference Plackett1975), and this model also is derived and discussed in Section IV below. To conclude, Critchlow found that the boys' top choice was animal crackers.

Bockenholt (Reference Bockenholt1992) analyzed the preferences of 278 male psychology students for eight different soft drinks. Within Equation (1), Bockenholt estimated parameters for a cola class (Coke Classic, Pepsi, Diet Coke, Diet Pepsi) and a non-cola class (Seven Up, Sprite, Diet Seven Up, Diet Sprite). Both f _i (x|θ _i) were Pendergrass-Bradley models, and Bockenholt solved for θ using MLE. He asked tasters to compare drinks in sets of three, and Pendergrass-Bradley is a ranking model for triple comparisons (Pendergrass and Bradley, Reference Pendergrass, Bradley and Olkin1960). Bockenholt found that the students liked Coke Classic the most and Diet Seven Up the least.

Marden (Reference Marden1995), using data from Nombekla et al. (Reference Nombekla, Murphy, Gonyou and Marden1993), analyzed the preferences of six cows for animal feed that was either a control or flavored with sucrose, hydrochloride, urea, or sodium chloride. Each cow's preference for each feed was measured by how much of each feed it ate. In the framework of Equation (1), there was one class of taster, f _i (x|θ ₁) was the Plackett-Luce model noted above, and Marden solved for θ using MLE. Again, the Plackett-Luce model is derived and discussed in Section IV below. Marden concluded that cows preferred the sucrose-flavored feed the most.

Vigneau et al. (Reference Vigneau, Courcoux and Semenou1999) analyzed the preferences of 56 consumers of seven snacks that differed in cheese flavor and water content. Again using Equation (1), the authors estimated parameters for two latent classes of taster. Both f _i(x|θ _i) were Bradley Terry Mallows (BTM) models, and Vigneau et al. solved for θ and π using an EM algorithm. In BTM, each unknown parameter is a numerical expression of relative preference, and, in some cases, BTM can be computationally burdensome (Mallows, Reference Mallows1957; Marden, Reference Marden1995, p. 117). Vigneau found that water content most differentiated the two classes of snack.

Thuesen (Reference Thuesen2007), using data from Critchlow and Fligner (Reference Critchlow and Fligner1991), evaluated 24 tasters' preferences for four types of salad dressing. In the form of Equation (1), Thuesen estimated parameters for two latent classes, both f _i(x|θ _i) were BTM models, and Thuesen solved for θ and π using an EM algorithm. The BTM model was summarized above, and Thuesen explains that the results appear odd but does not diagnose and fix a flaw if there is one.

All the evaluations above employed ranking models, and those that considered more than one class of taster employed mixture models. All solved for the unknown parameters using MLE, and those that needed to estimate mixing proportions π used the EM algorithm in which MLE is a step. However, none of the evaluations above addressed an analog to the findings of Ashton (Reference Ashton2012), Cao (Reference Cao2014), Maltman (Reference Maltman2013), Soares et al. (Reference Soares, Sousa, Mateus and de Freitas2012), and others in JWE and elsewhere that some wine tasters, even judges, are just not reliable. None of the five evaluations described above take a priori information about random ranking behavior into account.Footnote ² In each analysis above, the probability that an agent with random preferences chooses any of the choices is 1/m, where m is the number of choices. The PMF for the random class or mixture component is merely 1/m. The PMF for a latent class with random expressions of preference is thus already known; the only remaining question about that class is its mixing proportion (π_r). If tasting data do not support the existence of such a class then π_r=0.

With application to wine tasting, a ranking and mixture model solved using EM is proposed here in Sections IV through V below. This model does include a random class of taster. To structure the analysis and provide a replicable example, we next present data from a tasting of Pinot Gris.

III. Blind Tasting of Pinot Gris

This tasting of Pinot Gris involved six wines and 18 tasters.Footnote ³ Each of the tasters had more than ten years' experience as a wine maker, writer, distributor, collector, or enthusiast.

The wines were in brown bags identified only by the letters A through F. Each taster had seven glasses, one for water and the others for the wines. The six wines were poured in a flight, and then tasting began. The protocol was open, meaning that tasters could taste and re-taste in any order before assigning any ranks. Each taster ranked each wine from 1 (most preferred) to 6 (least preferred) and recorded his or her ranks on a score sheet. There was no discussion of the wines until all tasters had completed their score sheets and the results had been tabulated. Neither the tasters nor the person recording the results knew the vintner or price of any wine until all scores had been recorded; the tasting was blind. The ranks assigned by the tasters appear in Table 1.

Table 1 Pinot Gris Tasting Results

Note: Rank #1 is most preferred, #6 is least.

The mean ranks in the table imply that the preference order is BFADCE. However, the results above are heterogeneous. Variances differ, some ranks skew left, some skew right, some are positively correlated with the mean, and others are the opposite. Many tasters ranked wine E last, and many ranked wine F first or second. Together, these findings imply the possibility of a mixture distribution. There may be a latent class or subpopulation of tasters who have preferences in common. There may also be classes of tasters who exhibit random and idiosyncratic ranking behavior.

IV. Mixture Model of Wine-Tasting Results

For the Pinot Gris results above, the ranks (x) assigned to the corresponding wines A, B, C, … by any one taster (t) for a wine (i) are the rank vector (x _t with elements x _{t, i}) in Equation (3) below. If the corresponding wines are arranged according to their ranks from most- to least-preferred, the result is an order vector (y _t with elements y _{t, j}). For example, the order vector for Taster #1 of Pinot Gris appears in Equation (4). Taster #1 likes wine B the most and wine E the least.

(3)

$${\bi x}_t = (x_{t,1}, \; x_{t,2}, \; x_{t,3}, \; x_{t,4}, \; x_{t,5}, \; x_{t,6} )$$

(4)

$${\bi y}_{t = 1} = (B,\; A,\; F,\; D,\; C,E)$$

For six wines, the order vector in Equation (4) has 6! or 720 permutations. Assuming just three wines with 3! permutations to make an example tractable, the order vector for one taster is one of ABC, ACB, BAC, BCA, CAB, and CBA. Assume further that the probability of ranking wine A first is 0.5, ranking B first is 0.3 and C first is 0.2. On that basis for example, the probability of the order vector CAB appears in Equation (5). As a check, the sum of the probabilities for the six potential order vectors is unity; ABC = 0.300, ACB = 0.200, BAC = 0.214, BCA = 0.086, CAB = 0.125, and CBA = 0.075. The general form appears in Equation (6), and this is the Plackett-Luce model. The probability of a particular preference order for n wines (f(y _t| ρ)) is the product of the probabilities that each wine would be ranked first (ρ _i for each wine i) where the ρ _i have been normalized with the sum of the ρ _i of the wines remaining in the taster's order vector. The first denominator and the last quotient will always equal unity. Under Equation (7), the ρ _i are bounded, and their sum equals unity.

(5)

$$f\left( {{\bi y}_t = \; CAB} \right) = 0.2/\left( {0.2 + 0.5 + 0.3} \right)\cdot \; 0.5/\left( {0.5 + 0.3} \right)\cdot 0.3/0.3 = 0.125$$

(6)

$$f\left( {{\bi y}_t \vert{\bi \rho}}\right) = \mathop \prod \nolimits_{i = 1}^n \left( {\displaystyle{{\rho _i}\over {\mathop \sum \nolimits_{\,j\; = i}^n \; \left( {\rho_j}\right)}}\bigg\vert{\bi y}_t, {\bi \rho}}\right)$$

(7)

$$0 \le \; \rho _i \; \le 1.0 \; {\rm and} \; 1.0 = \mathop \sum \nolimits_{i = 1}^n \rho_i $$

One purpose of this application is to rank the wines according to the tasters who have preferences in common and thus to divide the population of heterogeneous rankings into a subpopulation of those who appear to express preferences that some tasters have in common. That leaves two other subpopulations: those who exhibit random ranking behavior and those whose preferences appear to be idiosyncratic. Idiosyncratic preferences are those that vary from taster to taster; some prefer oaky Chardonnay while others prefer an austere style, some like more citrus or acid than others, some prefer more tannin in Zinfandel than others, some like more evident fruit in Pinot Noir than others, some like toast in white wines—there are many examples. Those individual preferences are neither random nor held in common. As described in Section II, the PMF for random ranking behavior is already known, and only its mixture weight is unknown. Regarding idiosyncratic preferences, to date, no method exists for statistically identifying rankings that are based on neither common preference nor random-appearing behavior. Consequently, idiosyncratic preferences must account for model error or lack of fit. Note that Bockenholt (Reference Bockenholt1992), Critchlow (Reference Critchlow1985), Marden (Reference Marden1995), Thuesen (Reference Thuesen2007), and Vigneau et al. (Reference Vigneau, Courcoux and Semenou1999) made the same assumption and did not model either random or idiosyncratic classes.

A mixture model with two latent classes of tasters, random tasters (r) and those who express similar preferences (p), appears in Equation (8). In Equation (8), the PMF for the observed order vector (f(y _t)) is the weighted sum of the PMFs for the random and preference-based order vectors (f _r(y _t)) and (f _p(y _t)) with parameters θ _r and θ _p), in which the mixture weights (π_r and π_p) are the respective probabilities of the taster's membership in the respective latent class. In Equation (9), the mixture weights are bounded and their sum equals unity.

(8)

$$f\left( {{\bi y}_t \vert \hat{\bi \pi}, \; \hat{\bi \theta}}\right) = \widehat{{\pi_r}} \cdot f_r \left( {{\bi y}_t \vert\widehat{{\theta _r}}}\right)\; + \; \widehat{{\pi _p}} \cdot \; f_p \left( {{\bi y}_t \vert\widehat{{\theta _p}}}\right)$$

(9)

$$1.0 = \widehat{{\pi _r}}+ \widehat{{\pi _p}} \; {\rm and} \; 0 \le \; \widehat{{\pi_{r\; {\rm and}\; p}}} \; \le 1.0$$

Again as explained in Section II, the PMF for random ranks is known. Random tasters assign ranks as if drawing from an urn with six balls labeled #1, #2, and so on, through #6. For any wine, the probability of any rank is 1/6. For any random draw of six wines, the Plackett-Luce model yields a probability of 0.0014, and this is, of course, also 1/6!. Substituting this result and the Plackett-Luce PMF into Equation (8) yields the likelihood function ( $${\rm {\cal L}}\left( {{\bi y}\vert \hat{\rm \pi}, \; \hat{\rm \theta}} \right)$$ ) for the mixture model in Equation (10) with, using conventional notation for mixture model parameters, $$\hat{\rm \theta} = \hat{\rm \rho} $$ .

(10)

$${\rm {\cal L}}\left( {{\bi y}\vert \hat {\bi \pi}, \; \hat {\bi \theta}}\right) = \mathop \prod \nolimits_{t = 1}^T \left[ {\widehat{{\pi _r}} \cdot \left( {\displaystyle{1 \over {6!}}} \right)\; + \; (1 - \widehat{{\pi _r}} )\cdot \; \mathop \prod \nolimits_{i = 1}^n \left( {\displaystyle{{\widehat{{\rho _i}}}\over {\mathop \sum \nolimits_{\,j = i}^n \left( {\widehat{{\rho _j}}}\right)}}\bigg\vert{\bi y},\hat {\bi \theta}}\right)} \right]$$

Moving forward, the Plackett-Luce model yields the probability of ending on one branch of a probability tree with n! branches. Using the same tractable three-wine example as above, the multinomial PMF for wine B appears in Equation (11). It shows how the probabilities of the six branches of the probability tree add to yield a PMF, and, as a check, the sum of the probabilities is unity. The general form appears in Equation (12), in which the probability that wine i is ranked l ( $f_{i,l}^{\prime} \left( {i,l\vert{\bi \rho}} \right)$ ) equals the sum of the Plackett-Luce probability for every possible order vector permutation (P with one permutation P_k) multiplied by unity if a respective order vector has wine i in order l or, if not, multiplied by zero. This is merely the sum of the probabilities for the probability tree branches that have the right wine in the right place. Calculating Equation (12) for each possible rank for each wine yields the n x n matrix of probabilities in Equation (13). In Equation (13), each row is a vector expressing the multinomial PMF for a particular wine, and the sum of each row vector is unity. As an example, results for the three-wine tasting appear at the right in Equation (13).

(11)

$$\eqalign{& For Probability \left( {ABC} \right)= {\it 0.300}, ACB= {\it 0.200}, BAC= {\it 0.214}, \cr & BCA= {\it 0.086}, CAB= {\it 0.125} \; and \; CBA= {\it 0.075}\!\!: \cr & Pr\left( {{\rm rank\; on\; B} = \; 1} \right) = 0 + 0 + 0.214 + 0.086 + 0 + 0 = 0.300 \cr & Pr\left( {{\rm rank\; on\; B} = \; 2} \right) = 0.300 + 0 + 0 + 0 + 0 + 0.075 = 0.375 \cr & Pr\left( {{\rm rank\; on\; B} = \; 3} \right) = 0 + 0.200\; + 0\; + 0\; + 0.125 + 0 = 0.325} $$

(12)

$$\eqalign{& f_{i,l}^{\prime} \left( {i,l\vert{\bf \rho}}\right) \cr & = \mathop \sum \nolimits_{k = 1}^{n!} \left( {\left( {\mathop \prod \nolimits_{i = 1}^n \left( {\displaystyle{{\rho _i}\over {\mathop \sum \nolimits_{\,j\; = i}^n \; \left( {\rho _j}\right)}}} \right)\cdot \left\{ {\matrix{ {1.0\; if\; wine\; i\; has\; rank\; l\;} \cr {0.0\; otherwise\; \;}}} \right.} \right)\bigg\vert{\bf \rho}, \; {\rm P}_k}\right)\;}$$

(13)

$$f^{\prime} \left( {n\vert{\bf \rho}}\right) = \; \left[ {\matrix{ {\,f_{1,1}^{\prime} \left( {1,1\vert{\bf \rho}}\right)} & \ldots & {\,f_{1,n\;}^{\prime} \left( {1,n\vert{\bf \rho}}\right)} \cr \ldots & \ldots & \ldots \cr {\,f_{n,1}^{\prime} \left( {n,1\vert{\bf \rho}}\right)} & \ldots & {\,f_{n,n}^{\prime} \left( {n,n\vert{\bf \rho}}\right)} }} \right] = \; \left[ {\matrix{ {0.500} & {0.339} & {0.161} \cr {0.300} & {0.375} & {0.325} \cr {0.200} & {0.286} & {0.514} }} \right]$$

Building on Equation (13), a mixture model expressing the density of each rank l for each individual wine i appears in Equation (14). The PMF for random ranks is 1/6, and the PMF for common preference assignments is the ith multinomial PMF from Equations (12) and (13). Equation (14) is thus a mixture PMF for each wine that is consistent with the underlying Plackett-Luce probabilities $\hat {\rm \rho} $ . Also, for reference, Equation (14) is employed to calculate the mixture model results shown in Figures 2, 3, and 4.

Figure 1 Log Likelihood as EM Iterates

Figure 2 Distribution of Ranks on Wine C

Figure 3 Distribution of Ranks on Wine B

Figure 4 Distribution of Ranks on Wine E

(14)

$$f_{i,l}^{\prime} \left( {i,l\vert\widehat{{\pi _r}}, \; \hat{\bf \rho}}\right) = \widehat{{\pi _r}} \cdot \displaystyle{1 \over 6}\; + (1 - \; \widehat{{\pi _r}} )\cdot \; f_{i,l}^{\prime} \left( {i,l\vert \hat{\bf \rho}}\right)$$

V. Solving for the Unknown Parameters

The EM algorithm is a widely employed method for estimating the unknown parameters in mixture models. An often-cited initial journal reference is Dempster et al. (Reference Dempster, Laird and Rubin1977), McLachlan and Peel (Reference McLachlan and Peel2000) explains EM, and a more recent text is Mengersen et al. (Reference Mengersen, Robert and Titterington2011). In sum, EM begins with exogenous estimates of the unknown parameters and then iteratively climbs to the maximum of a likelihood function. The MATLAB code written by the author for the EM algorithm and integrated MLE employed here is available on request.

Before turning to results for the Pinot Gris tasting, the mixture model and EM solution were subjected to two tests. The first test employed hypothetical data that were selected to yield a known result. The second test employed a Monte Carlo simulation in which tasters assigned their ranks randomly. Both tests yield results that validate the mixture model and EM solution.

A model should solve to yield an obvious answer, when there is one, so the first test is designed to have an obvious answer. Six hypothetical tasters are assumed to have ranked the wines with (1 2 3 4 5 6), (2 3 4 5 6 1), (3 4 5 6 1 2), and so on. Although those assumed ranks are not random, together, they give each rank on each wine a random expectation. In addition to those six tasters, twelve tasters are assumed to have ranked every wine (2 1 4 3 6 5). Those tasters represent common preference rank assignments by tasters with identical preferences. By design in this simple test, the mixture weight for the random class is thus $\widehat{{{\rm \pi} _r}} = \displaystyle{6 \over {18}} = 0.33$ , and the mixture density from Equation (14) for each wine's common-preference rank is thus (0.33)(1/6) + (1–0.33)(1.00) = 0.72.

EM begins with exogenous estimates of the unknown parameters. For the hypothetical tasting data above, the null hypothesis starting parameters were $\widehat{{{\rm \pi} _r}} = 1.00$ and $\widehat{{{\rm \rho} _i}} = 1/6$ . EM results appear in Figures 1 and 2 and Table 2. Figure 1 shows that the log likelihood does increase monotonically and the EM solution does converge. Using fourth-ranked wine C as an example, Figure 2 shows that the mixture model is an accurate predictor of the observed rank densities. Beginning from null hypothesis initial conditions, Table 2 shows that the EM solution is close to the solution used to design the test data. Except for the most-preferred wine B, as they should be, the estimates of $\widehat{{{\rm \rho} _i}} $ are small but in proportions that establish a preference order for the Plackett-Luce PMF. Further, the $\widehat{{{\rm \rho} _i}} \; $ in Table 2 imply the correct common preference order (2 1 4 3 6 5 or BADCFE). The likelihood ratio statistic (LRS) is 135, and that value is significant with a chi-square of p < 0.01.Footnote ⁴

Table 2 Simple Test Results

The simple test above employs hypothetical ranks. Next, the mixture model and EM solution were tested with a Monte Carlo simulation. Each taster's ranks for each wine were assigned randomly in each of 1,000 iterations. Results appear in Table 3. As they should be for random rankings, none of the $\widehat{{{\rm \rho} _i}} $ are significantly different from 1/6 = 0.167. However, the estimate of the random class mixture weight ${\rm \pi} _r $ shows that the probability of illusory common preference agreement, although rankings are actually random, a false positive and Type I error, is approximately 1.000–0.807 = 0.193. Almost 20% of random ranks appear to be based on illusory common preference.

Table 3 Monte Carlo Simulation Results (1,000 iterations)

The Monte Carlo simulation results quantify a problem evident in the Judgment of Paris and many wine tastings. That problem is sample size. The Judgment involved nine French tasters (see Taber, Reference Taber2004). See also a survey of wine tastings in Hanson (Reference Hanson2013) and the sample sizes for Liquid Assets wine tastings. The Pinot Gris results above involved eighteen tasters and six wines. A simple example with just two wines demonstrates the effect of a small sample size. For two tasters, the observed result is that they agree either that one wine is better or that they differ. If the tasters agree then, the EM solution for ${\rm \pi} _r $ will be zero and π_p will be unity. Even if each taster's preference was determined by the flip of a coin, the EM solution for π_p is unity when the tasters' flips agree by chance. That is an illusion of common-preference agreement false-positive Type I error, and with only two tasters, the probability that they appear to agree is 0.500. With many coin-flipping tasters, a large sample size, the probability that they all appear to agree tends to zero. In that case, the EM solution for π_p will also tend to zero, and Type I error vanishes. Wine-tasting sample sizes are small enough that the illusion of common preference agreement is material, and the Monte Carlo simulation quantifies the extent of that illusion. Results above show that the probability of Type I error for 18 tasters and six wines is 0.193. Additional Monte Carlo simulations show that the probability of Type I error is 0.48 for 6 tasters and 0.08 for 72 tasters. Those additional results confirm the downward and asymptotic-to-zero trend in Type I error with sample size.

The findings in Table 2 set a threshold. The hypothesis that observed results are not random must be tested against a null hypothesis defined by the Monte Carlo simulation results. This approach quantifies the possibility of illusory agreement, false-positive Type I error.

VI. Pinot Gris Results

Results for the Pinot Gris tasting appear in Table 4 and Figures 3 and 4. Note that the t statistic for each parameter estimate, based on the corresponding Monte Carlo null hypothesis mean and standard deviation, also appears in Table 4.Footnote ⁵ The parameter estimates for the most- and least-preferred wines, B and E respectively, are significant at a level of confidence of over 95%. The estimate of the random class mixture weight π_r is also significant at a level of confidence of over 95%. In addition, comparing the solution to the random Monte Carlo expectation, the LRS is 14 and significant with a chi-square of p < 0.05.

Table 4 Pinot Gris Results

First, the $\widehat{{\rho _i}} $ in Table 4 make sense. The sum of the $\widehat{{\rho _i}} $ sum is unity. Wine B is most preferred according to both $\widehat{{\rho _B}} $ and its mean rank in Table 1. Wine E is least preferred according to both $\widehat{{\rho _E}} $ and its mean rank in Table 1.

More important, Table 4 yields information about the Pinot Gris tasting results that is not evident in Table 1. The preference order implied by the $\widehat{{\rho _i}} $ is BADFCE, and the order implied by ranks in Table 1 is BFADCE. Wines A, F, and D are in different places. This implies that the preference order based on mean ranks in Table 1 is influenced by random assignments and, for example, that wine F is thus an unreliable first or second choice. Next, the aggregate fraction of wine rankings that appear to be random, $\widehat{{\pi _r}} $ , in the Pinot Gris tasting is 0.299 or approximately 0.30. Cao (Reference Cao2014) found 0.60 or more for State Fair wine data and, although for an undisclosed consumer product, Cleaver and Wedel (Reference Cleaver and Wedel2001) found a random share of 0.48. Many factors may account for the differences in those results; all of them imply that random-looking expressions of preference are material. Although the fraction of random rank assignments in the Pinot Gris tasting is material, 1 – $\widehat{{\pi _r}} = \widehat{{\pi _p}} $ thus the results in Table 4 also show that approximately 70% of observed ranks appear to be based on nonrandom common preference assignments.

Further, Figures 3 and 4 depict the Equation (14) results for the most and least preferred wines B and E. Figure 3 shows that the mixture model fits between the high-probability first and second ranks that earned wine B the highest preference and then trends to the right and down for ranks that appear to be random or idiosyncratic. Figure 4 shows the reverse, the mixture model results track the random or idiosyncratic ranks at the left and then lift to the higher-probability but low ranks than earned wine E the lowest preference.

VII. Conclusion

Bodington (Reference Bodington2012) posited that observed wine-tasting results have a mixture distribution, and Cao (Reference Cao2014) evaluated California State Fair results using a mixture model. This article extends that work by providing a literature review of previous applications of ranking and mixture models to non–wine-tasting data and then deriving a model for application to observed wine-tasting results. The benefits of the approach include its direct use of observed tasting data, its use of information embedded in the variance of observed data, absence of bias introduced by simulation-based parameters, a multinomial PMF for each wine, reliance on the standard EM solution, feasibility for the small sample sizes typical of wine tastings, quantification of Type I error, and quantification of the share of ranks that appear to be random. An application of the mixture model to results for a blind tasting of Pinot Gris shows, with likelihood ratio tests and t statistics supporting a level of confidence of over 95%, that random ranking behavior accounts for approximately 30% of ranks and that there is nonrandom agreement among approximately 70% of tasters on a common preference order.

This article presents a model with two latent classes of taster, those who appear to have preferences in common and those tasters who appear to rank randomly. Tasters may also have idiosyncratic preferences that account for a lack of fit or unexplained variance and further analysis of those idiosyncratic preferences is work for the future. In addition, this article applied a ranking and mixture model to wine-tasting results for one tasting in which each taster determined an order of preference for the wines in the tasting. Application of ranking and mixture models to other tastings, to results with tied rankings and to ordinal scores must follow.

Footnotes

The author thanks an anonymous reviewer and Professor Thomas Brendan Murphy, School of Mathematical Sciences at University College Dublin for their helpful comments. All remaining errors and omissions are the responsibility of the author alone.

¹ Judges in the California State Fair competition assign each wine an ordered rank label of No Award, Bronze, Bronze+, Silver-, Silver, Silver+, Gold- and Gold. Cao (2014) analyzed averages of numbers assigned to those ranks. Wines in the 2006 Re-Judgment of Paris were compared using preference ranks. Liquid Assets' tasting results are reported as preference ranks. An example of ordinal scores evaluated as preference ranks is Quandt (Reference Quandt2012)'s analysis of the Judgment of Princeton.

² Cleaver and Wedel (Reference Cleaver and Wedel2001) add a random scoring class to a regression model of scores assigned by consumers to an unidentified product or set of products. Their model is primarily a mixture of normal distributions and they do not employ a ranking model.

³ Tasting of Pinot Gris on June 2, 2014, by FOG, a San Francisco based wine-tasting group started by wine writer and judge Steven R. Pitcher and collector David S. Rosen that has hosted blind wine tastings each month since the early 1980s. The author is a member. For results and tasting protocol, contact David Rosen at daverosen1114@gmail.com.

⁴ ${\rm LRS} = - 2 (\log \left( {{\rm {\cal L}}_{null\; hypothesis\; initial\; conditions}} \right) - \; \log \left( {{\rm {\cal L}}_{solution}} \right))$

⁵ For Monte Carlo (MC) results and Pinot Gris (PG) results, each t statistic is calculated using the form: $$t\; statistic = \displaystyle{{\left \vert {x_{PG} - \; \mu _{MC}}\right\vert} \over {\sigma _{MC} /\sqrt {18}}} $$

References

Ashton, R.H. (2012). Reliability and consensus of experienced wine judges. Journal of Wine Economics, 7(1), 70–87.Google Scholar

Bockenholt, U. (1992). Thurstonian representation for partial ranking data. British Journal of Mathematical and Statistical Psychology, 45, 31–49.Google Scholar

Bodington, J. (2012). 804 tastes: Evidence on randomness, preferences and value from blind tastings. Journal of Wine Economics, 7(2), 181–191.Google Scholar

Cao, J. (2014). Quantifying randomness versus consensus in wine quality ratings. Journal of Wine Economics, 9(2), 202–213.Google Scholar

Cleaver, G., and Wedel, M. (2001). Identifying random-scoring respondent in sensory research using finite mixture regression results. Food Quality and Preference, 12 (2001), 373–384.Google Scholar

Critchlow, D.E. (1985). Metric Methods for Analyzing Partially Ranked Data. New York: Springer.Google Scholar

Critchlow, D.E., and Fligner, M.A. (1991). Paired comparison, triple comparison, and ranking experiments as generalized linear models, and their implementation on glim. Psychometrika, 56(3), 517–533.CrossRef Google Scholar

Dempster, A.P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.Google Scholar

Gormley, I.C., and Murphy, T.B. (2008). A mixture of experts model for rank data with applications in election studies. Annals of Applied Statistics, 2(4), 1452–1477.Google Scholar

Hanson, D.J. (2013). Historic Paris Wine Tasting of 1976 and Other Significant Competitions. Accessed May 10, 2014, at http://www2.potsdam.edu/alcohol/Controversies/20060517115643.html#.U3KmVy8x_5U.Google Scholar

Linacre, J.M. (1992). Rank order and paired comparisons as the basis for measurement. Paper presented at American Educational Research Association Annual Meeting, April.Google Scholar

Mallows, C.L. (1957). Non-null ranking models. Biometrika, 44 (1–2), 114–130.Google Scholar

Maltman, A. (2013). Minerality in wine: A geological perspective. Journal of Wine Research, 24(3), 169–181.Google Scholar

Marden, J.I. (1995). Analyzing and Modeling Rank Data. London: Chapman & Hall.Google Scholar

McLachlan, G., and Peel, D. (2000). Finite Mixture Models. New York: John Wiley & Sons.Google Scholar

Mengersen, K.L., Robert, C.P., and Titterington, D.M. (2011). Mixtures: Estimation and Application. New York: John Wiley & Sons.Google Scholar

Nombekla, S.W., Murphy, M.R., Gonyou, H.W., and Marden, J.I. (1993). Dietary preferences in early lactation cows affected by primary tastes and some common feed flavors. Journal of Dairy Science, 77, 2393–2399.Google Scholar

Pendergrass, P.N., and Bradley, R.A. (1960). Ranking in triple comparisons. In Olkin, I. et al. (Ed.), Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling. Palo Alto: Stanford University Press, 331–351.Google Scholar

Plackett, R.L. (1975). The analysis of permutations. Applied Statistics, 24, 193–202.CrossRef Google Scholar

Quandt, R.E. (2012). Comments on the Judgment of Princeton. Journal of Wine Economics, 2(7), 152–154.Google Scholar

Soares, S., Sousa, A., Mateus, N., and de Freitas, V. (2012). Effect of condensed tannins addition on the astringency of red wines. Chemical Senses, 32(2), 191–198.Google Scholar

Taber, G.M. (2004). Judgment of Paris: California vs. France and the Historic 1976 Paris Tasting That Revolutionized Wine. New York: Scribner.Google Scholar

Thuesen, K.F. (2007). Analysis of ranked preference data. Master's thesis, Technical University of Denmark, Kongens Lyngby, Denmark.Google Scholar

Vigneau, E., Courcoux, P., and Semenou, M. (1999). Analysis of ranked preference data using latent class models. Food Quality and Preference, 10(1999), 201–207.Google Scholar