MODELLING INDIVIDUAL EXPERTISE IN GROUP JUDGEMENTS

Dominik Klein; Jan Sprenger

doi:10.1017/S0266267114000388

MODELLING INDIVIDUAL EXPERTISE IN GROUP JUDGEMENTS

Published online by Cambridge University Press: 19 February 2015

Dominik Klein and

Jan Sprenger

Show author details

Dominik Klein: Affiliation:
Tilburg Center for Logic, General Ethics and Philosophy of Science (TiLPS), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands. Email: d.klein@uvt.nl.
Jan Sprenger: Affiliation:
Tilburg Center for Logic, General Ethics and Philosophy of Science (TiLPS), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands. Email: j.sprenger@uvt.nl. URL: http://www.laeuferpaar.de.

Article contents

Abstract:
INTRODUCTION
THE MODEL AND BASELINE RESULTS
BIASED AGENTS
INDEPENDENCE VIOLATIONS
OVER- AND UNDERCONFIDENCE
DISCUSSION
Footnotes
References

Rights & Permissions

Abstract:

Group judgements are often – implicitly or explicitly – influenced by their members’ individual expertise. However, given that expertise is seldom recognized fully and that some distortions may occur (bias, correlation, etc.), it is not clear that differential weighting is an epistemically advantageous strategy with respect to straight averaging. Our paper characterizes a wide set of conditions under which differential weighting outperforms straight averaging and embeds the results into the multidisciplinary group decision-making literature.

Keywords

group decision-making expertise judgement aggregation statistical estimation social psychology

Type: Symposium on Individual and Social Deliberation
Information: Economics & Philosophy , Volume 31 , Issue 1 , March 2015 , pp. 3 - 25

DOI: https://doi.org/10.1017/S0266267114000388 [Opens in a new window]
Copyright: Copyright © Cambridge University Press

1. INTRODUCTION

Groups frequently make judgements that are based on aggregating the opinions of its individual members. A panel of market analysts at Apple or Samsung may estimate the expected number of sales of a newly developed cell phone. A group of conservation biologists may assess the population size of a particular species in a specific habitat. A research group at the European Central Bank may evaluate the merits of a particular monetary policy. Generally, such problems occur in any context where groups have to combine various opinions into a single group judgement (for a review paper, see Clemen Reference Clemen1989).

Even in cases of fully shared information, the assessment of the evidence will generally vary among the agents and depend on factors such as professional training, familiarity with similar situations in the past, and personal attitude toward the results. Thus, it will not come as a surprise that the individual judgements may differ. But how shall they be aggregated?

Often, some group members are more competent than others. Recognizing these experts may then become a crucial issue for improving group performance. Research in social psychology and management science has investigated the ability of humans to properly assess the expertise of other group members in such contexts (Clemen Reference Clemen1989; Bonner et al. Reference Bonner, Baumann and Dalal2002; Larrick et al. Reference Larrick, Burson and Soll2007). Most of this research stresses that recognizing experts is no easy task: perceived and actual expertise need not agree, data are noisy, questions may be too hard, and expertise differences may be too small to be relevant (e.g. Littlepage et al. Reference Littlepage, Schmidt, Whisler and Frost1995). This motivates a comparison of two strategies for group judgements: (i) deferring to the agent who is perceived as most competent, and (ii) taking the straight average of the estimates (Henry Reference Henry1995; Soll and Larrick Reference Soll and Larrick2009). The overall outcomes suggest that the straight average is often surprisingly reliable, apparently being one of those ‘fast and frugal heuristics’ (Gigerenzer and Goldstein Reference Gigerenzer and Goldstein1996) that help boundedly rational agents to make cost-effective decisions.

On the other hand, even if not explicitly recognized as such, experts tend to exert greater influence on group judgements than non-experts (Bonner et al. Reference Bonner, Baumann and Dalal2002). This motivates a principled epistemic analysis of the potential benefits of expertise-informed group judgements. We characterize conditions under which differentially weighted averages, fed by incomplete and perhaps distorted information on individual expertise, ameliorate group performance, compared with a straight average of the individual judgements. Our paper approaches this question from an analytical perspective, that is, with the help of a statistical model. We follow the social permutation approach (e.g. Bonner Reference Bonner2000) and model the agents as unique entities with different abilities. This differs notably from more traditional social combination research where individual agents are modelled as interchangeable (e.g. Davis Reference Davis1973). Our main result – that individual expertise makes a robust contribution to group performance – is not without surprise, given the generality of our conditions that also allow for perturbations such as individual bias or correlations among the group members. Therefore, our analytical results provide theoretical support to research on the recognition of experts in groups (e.g. Baumann and Bonner Reference Baumann and Bonner2004), and they directly relate to empirical comparisons of differentially weighted group judgements to ‘composite judgements’, such as the group mean or median (Einhorn et al. Reference Einhorn, Hogarth and Klempner1977; Hill Reference Hill1982; Libby et al. Reference Libby, Trotman and Zimmer1987; Bonner Reference Bonner2004).

Our work is also related to two other research streams. First, there is a thriving epistemological literature on peer disagreement and rational consensus, where consensus is mostly reached by deference to (perceived) experts. However, this debate either focuses on social power and mutual respect relations (e.g. Lehrer and Wagner Reference Lehrer and Wagner1981), or on principled philosophical questions about resolving disagreement (e.g. Elga Reference Elga2007). By means of a performance-focused mathematical model, we hope to bring this literature close to its primary target: the truth-tracking abilities of various epistemic strategies. There is also a vast literature on group decisions preference and judgement aggregation (e.g. List Reference List2012), but two crucial features of our inquiry – the aggregation of numerical values and the particular role of experts – do not play a major role in there.

Second, there is a fast increasing body of literature on expert judgement and forecasting, which has emerged from applied mathematics and statistics and became a flourishing interdisciplinary field. This strand of research deals with the theoretical modelling of expert judgement, most notably the (Bayesian) reconciliation of probability distributions (Lindley Reference Lindley1983), but it also includes more practical questions such as comparison of calibration methods, choice of seed variables, analyses of the use of expert judgement in the past (Cooke Reference Cooke1991), and the study of general forecasting principles, such as the benefits of opinion diversity (Armstrong Reference Armstrong and Armstrong2001; Page Reference Page2007). We differ from that approach in pooling individual (frequentist) estimators instead of subjective probability distributions, but we study similar phenomena, such as the impact of in-group correlations.

Admittedly, our baseline model is very simple, but due to this simplicity, we are able to prove a number of results regarding the behaviour of differentially weighted estimates under correlation, bias and benchmark uncertainty. Here, our paper builds on analytical work in the forecasting and social psychology literature (Bates and Granger Reference Bates and Granger1969; Hogarth Reference Hogarth1978), following the approach of Einhorn et al. (Reference Einhorn, Hogarth and Klempner1977).

The rest of the paper is structured as follows: we begin with explaining the model and stating conditions where differentially weighted estimates outperform the straight average (Section 2). In the sequel, we show that this relation is often preserved even if bias or mutual correlations are introduced (Sections 3 and 4). Subsequently, we assess the impacts of over- and underconfidence (Section 5). Finally, we discuss our findings and wrap up our conclusions (Section 6).

2. THE MODEL AND BASELINE RESULTS

Our problem is to find a good estimate of an unknown quantity μ. For reasons of convenience, we assume without loss of generality that μ = 0.Footnote ¹

We model the group members’ individual estimates X_i, i ⩽ n, as independent random variables that scatter around the true value μ = 0 with variance σ²_i. The X_i are unbiased estimators of μ, that is, they have the property $\mathbb {E}[X_i]=\mu$. This baseline model is inspired by the idea that the agents try to approach the true value with a higher or lower degree of precision, but have no systematic bias in either direction. The competence of an agent is explicated as the degree of precision in estimating the true value. No further assumptions on the distributions of the X_i are made – only the first and second moments are fixed.

In this model, the question of whether the recognition of individual expertise is epistemically advantageous translates into the question of which convex combination of the X_i, $\hat{\mu } :=\sum _{i=1}^n c_i X_i$, outperforms the straight average $\overline{\mu } := \frac{1}{n} \sum _{i=1}^n X_i$. Standardly, the quality of an estimate is assessed by its mean square error (MSE) which can be calculated as

(1)

\begin{eqnarray} \text{MSE}(\hat{\mu }) \, := \, \mathbb {E}[(\hat{\mu } - \mu )^2] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] \nonumber \\ [6pt]&=& \sum _{i=1}^n c_i^2 \, \mathbb {E}\left[X_i^2 \right] + \sum _{i=1}^n \sum _{j \ne i} c_i c_j \, \mathbb {E}[X_i] \, \mathbb {E}[X_j] \nonumber \\ [6pt]&=& \sum _{i=1}^n c_i^2 \, \sigma _i^2 \end{eqnarray}

which is minimized by the following assignment of the c_i (cf. Lehrer and Wagner Reference Lehrer and Wagner1981: 139):

(2)

\begin{equation} c_i^* = \left( \sum _{j=1}^n \frac{\sigma _i^2 }{\sigma _j^2} \right)^{-1}. \end{equation}

Thus, naming the c*_i as the ‘optimal weights’ is motivated by two independent theoretical reasons:

1. As argued above, for independent and unbiased estimates X_i with variance σ²_i, mean square error of the overall estimate is minimized by the convex combination X = ∑_ic*_iX_i. Thus, for a standard loss function, the c*_i are indeed the optimal weights.
2. Even when the square loss function is replaced by a more realistic alternative (Hartmann and Sprenger Reference Hartmann and Sprenger2010), the c*_i can still define the optimal convex combination of individual estimates. In that case, we require stronger distributional assumptions.Footnote ²

The problem with these optimal weights is that each agent’s individual expertise would have to be known in order to calculate them. Given all the biases that actual deliberation is loaded with, e.g. ascription of expertise due to professional reputation, age or gender, or bandwagon effects, it is unlikely that the agents succeed at unravelling the expertise of all other group members (cf. Nadeau et al. Reference Nadeau, Cloutier and Guay1993; Armstrong Reference Armstrong and Armstrong2001).

Therefore, we widen the scope of our inquiry:

Question: Under which conditions will differentially weighted group judgements outperform the straight average?

A first answer is given by the following result where the differential weights preserve the expertise ranking:

Theorem 1 (First Baseline Result).Let c ₁, . . ., c_n > 0 be the weights of the individual group members, that is, ∑ⁿ_{i = 1}c_i = 1. Without loss of generality, let c ₁ ⩽ . . . ⩽ c_n. Further assume that for all i > j:

(3)

\begin{equation} 1 \le \frac{c_i}{c_j} \le \frac{c_i^*}{c_j^*} \end{equation}

Then the differentially weighted estimator $\hat{\mu } := \sum _{i=1}^n c_i X_i$outperforms the straight average. That is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$, with equality if and only if c_i = 1/n for all 1 ⩽ i ⩽ n.

This result demonstrates that relative accuracy, as measured by pairwise expertise ratios, is a good guiding principle for group judgements as long as the relative weights are not too extreme.

The following result extends this finding to a case where the benefits of differential weighting are harder to anticipate: we allow the c_i to lie in the entire [1/n, c*_i] (or [c*_i, 1/n]) interval, allowing for cases where the ranking of the group members is not represented correctly. One might conjecture that this phenomenon adversely affects performance, but this is not the case:

Theorem 2 (Second Baseline Result).Let c ₁. . .c_n ∈ [0, 1] such that ∑ⁿ_{i = 1}c_i = 1. In addition, let $c_i\in [\frac{1}{n};c_i^*]$respectively $c_i \in [c_i^*;\frac{1}{n}]$hold for all 1 ⩽ i ⩽ n. Then the differentially weighted estimator $\hat{\mu } := \sum _{i=1}^n c_i X_i$outperforms the straight average. That is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$, with equality if and only if c_i = 1/n for all 1 ⩽ i ⩽ n.

Note that none of the baseline results implies the other one. The conditions of the second result can be satisfied even when the ranking of the group members differs from their actual expertise, and a violation of the second condition (e.g. c*_i = 1/n and c_i = 1/n + ε) is compatible with satisfaction of the first condition. So the two results are really complementary.

We have thus shown that differential weighting outperforms straight averaging under quite general constraints on the individual weights, motivating the efforts to recognize experts in practice. The next sections extend these results to the presence of correlation and bias, thereby transferring them to more realistic circumstances.

3. BIASED AGENTS

The first extension of our model concerns biased estimates X_i, that is, estimates that do not centre around the true value μ = 0, but around B_i ≠ 0. We still assume that agents are honestly interested in getting close to the truth, but that training, experience, risk attitude or personality structure bias their estimates into a certain direction. For example, in assessing the impact of industrial development on a natural habitat, an environmentalist will usually come up with an estimate that significantly differs from the estimate submitted by an employee of an involved corporation – even if both are intellectually honest and share the same information.

For a biased agent i, the competence/precision parameter σ²_i has to be re-interpreted: it should be understood as the coherence (or non-randomness) of the agent’s estimates instead of the accuracy. This value is indicative of accuracy only if the bias B_i is relatively small.

Under these circumstances, we can identify an intuitive sufficient condition for differential weighting to outperform straight averaging.

Theorem 3.Let X ₁, . . ., X_n be random variables with bias B ₁, . . ., B_n.

(a) Suppose that the c_i in the estimator $\hat{\mu }= \sum _{i=1}^n c_i X_i$satisfy one of the conditions of the baseline results (i.e. either 1 ⩽ c_i/c_j ⩽ c*_i/c*_jor c_i ∈ [1/n, c*_i] respectively c_i ∈ [c*_i, 1/n]). In addition, let the following inequality hold:

(4)

\begin{equation} \left(\sum _{i=1}^{n} c_iB_i\right)^2 < \left(\sum _{i=1}^{n} \frac{1}{n}B_i\right)^2 \end{equation}

Then differential weighting outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) < \text{MSE}(\overline{\mu })$.

(b) Suppose the following inequality holds:

(5)

\begin{equation} \left(\sum _{i=1}^{n} c_i B_i\right)^2>\left(\sum _{i=1}^{n}\frac{1}{n}B_i\right)^2 + \frac{1}{n^2}\sum _{i=1}^{n}\sigma ^2_i \end{equation}

Then differential weighting does worse than straight averaging if condition (b) holds, that is, $\text{MSE}(\hat{\mu }) > \text{MSE}(\overline{\mu })$.

Intuitively, condition (4) states that the differentially weighted bias is smaller or equal than the average bias. As one would expect, this property favourably affects the performance of the differentially weighted estimator. Condition (5) states, on the other hand, that if the difference between the mean square biases of the weighted and the straight average exceeds the mean variance of the agents, then straight averaging performs better than weighted averaging.

When the group size grows to a very large number, both parts of Theorem 3 collapse into a single condition, as long as the biases and variances are both bounded. This is quite obvious since the last term of (5) is of the order $\mathcal {O}(1/n)$. Theorem 3 applies in particular in the case where agents are biased into the same direction and less biased agents make more coherent estimates (that is, with smaller variance):

Corollary 1.Let X ₁, . . ., X_n, be random variables with bias B ₁, . . ., B_n ⩾ 0 such that c_i ⩾ c_j implies B_i ⩾ B_j (or vice versa for B ₁, . . ., B_n ⩽ 0). Then, with the same definitions as above:

• $\text{MSE}(\overline{\mu }) \ge \text{MSE}(\hat{\mu })$.
• If there is a uniform group bias, that is, B ≔ B ₁ = . . . = B_n, then $\text{MSE}(\overline{\mu })-\text{MSE}(\hat{\mu })$is independent of B.

So even if all agents have followed the same training, or have been raised in the same ideological framework, expertise recognition does not multiply that bias, but helps to increase the accuracy of the group’s judgement. In particular, if there is a uniform bias in the group, the relative advantage of differential weighting is independent of the size of the bias. All in all, these results demonstrate the importance of expertise recognition even in groups where the members share a joint bias – a finding that is especially relevant for practice.

4. INDEPENDENCE VIOLATIONS

We turn to violations of independence between the group members. Consider first the following fact that compares two groups with different degrees of correlation:

Fact 1.If $0\le \mathbb {E}\left[ X_i X_j \right]\le \mathbb {E}\left[ Y_i Y_j \right] \; \forall i\ne j \le n$and $\mathbb {E}[X_i^2]=\mathbb {E}[X_j^2]$, then both straight averaging and weighted averaging on X_i yield a lower mean square error than the same procedures applied to Y_i.

Fact 1 shows that less correlated groups perform better, ceteris paribus. For practical purposes, this suggests that heterogeneity of a group is an epistemic virtue since strong correlations between the agents are less likely to occur, making the overall result more accurate (cf. Page Reference Page2007).

Regarding the comparison of straight and weighted averaging, we can show the following result:

Theorem 4.Let X ₁, . . ., X_n be unbiased estimators, that is, $\mathbb {E}[X_i]=\mu =0$, and let the c_i satisfy the conditions of one of the baseline results, with $\hat{\mu }$defined as before. Let I⊆{1, . . ., n} be a subset of the group members with the property

(6)

\begin{equation} \forall i, j \ne k \in I: c_i\ge c_j \; \Rightarrow \; \mathbb {E}[X_j X_k] \ge \mathbb {E}[X_i X_k] \ge 0. \end{equation}

(i) Correlation vs. Expertise. If I = {1, . . ., n}, then weighted averaging outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$.

(ii) Correlated Subgroup. Assume that $\mathbb {E}\left[ X_i X_j \right] = 0$if i∉I or j∉I, and that

(7)

\begin{equation} \frac{1}{|I|} \sum _{i\in I} c_i \le \frac{1}{n} \sum _{i=1}^n c_i. \end{equation}

Then weighted averaging still outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$.

To fully understand this theorem, we have to clarify the meaning of condition (6). Basically, it says that in group I, experts are less correlated with other (sub)group members than non-experts.Footnote ³

Once we have understood this condition, the rest is straightforward. Part (i) states that if I equals the entire group, then differential weighting has an edge over averaging. That is, the benefits of expertise recognition are not offset by the perturbations that mutual dependencies may introduce. Arguably, the generality of the result is surprising since condition (6) is quite weak. Part (ii) states that differential weighting is also superior whenever there is no correlation with the rest of the group, and as long as the average competence in the subgroup is lower than the overall average competence (see equation (7)).

It is a popular opinion (e.g. Surowiecki Reference Surowiecki2004) that correlation of individual judgements is one of the greatest dangers for relying on experts in a group. To some extent, this opinion is vindicated by Fact 1 in our model. However, expertise-informed group judgements may still be superior to composite judgements, as demonstrated by Theorem 4. The interplay of correlation and expertise is subtle and not amenable to broad-brush generalizations.

5. OVER- AND UNDERCONFIDENCE

We now consider a specific family of c_i’ in order to study how group members’ self-assessment in terms of quality affects group performance as a whole, modelled again as unbiased estimates X_i with variance σ²_i.

Suppose that the group members have some idea of their own competence. That is, they are able to position themselves in relation to a commonly known benchmark: they are able to assess how much better or worse they expect themselves to perform compared with a default agent, modelled as an unbiased random variable with variance s ². Such a scenario may be plausible when agents have a track record of their performance, or obtain performance feedback. The agents then express how much weight they should ideally get in a group of n − 1 default agents:

(8)

\begin{equation} c_i \, = \, \left( 1 + \sum _{j \ne i} \frac{\sigma _i^2}{s^2} \right)^{-1} \, = \frac{s^2}{s^2+(n-1)\sigma _i^2} \end{equation}

Assume further that every agent uses the same benchmark, that these weights also determine to what extent a group member compromises his or her own position, and that decision-making takes place on the basis of the normalized c_is. It can then be shown (proof omitted) that the differentially weighted estimator $\hat{\mu }$ defined by equation (8) outperforms straight averaging – in fact, this is entailed by the Second Baseline Result (Theorem 2).

Here, we want to study how over- and underestimating the competence of a ‘default agent’ will affect group performance. Is it always epistemically detrimental when the agents misguess the group competence?

The answer is, perhaps surprisingly, no. To explain this result, we first observe that the less confidence we have in the group (=s ² is large), the more does the weighted average resemble the straight average. Recalling equation (8), we note that all c_i will be very close to 1. This implies that the expertise-informed average will roughly behave like the straight average.

Conversely, if the group is perceived as competent (=small value of s), then the c_i will typically not be close to 1 such that differential weights will diverge significantly from the straight average. This intuitive insight leads to the following theorem:

Theorem 5.Let $\hat{\mu }_{s^2}$and $\hat{\mu }_{{\tilde{s}}^2}$be two weighted, expertise-informed estimates of μ, defined according to equation (8) with benchmarks s ²and ${\tilde{s}}^2$, respectively. Then $\text{MSE}(\hat{\mu }_{s^2}) \le \text{MSE}(\hat{\mu }_{{\tilde{s}}^2})$if and only if $s^2 \le {\tilde{s}}^2$.

It can also be shown (proof omitted) that this procedure approximates the optimal weights c*_i if the perceived group competence approaches perfection, that is, s → 0. In other words, as long as the group members judge themselves accurately, optimism with regard to the abilities of the other group members is epistemically favourable. On the other hand, overconfidence in one’s own abilities relative to the group typically deteriorates performance.

6. DISCUSSION

We have set up an estimation model of group decision-making in order to study the effects of individual expertise on the quality of a group judgement. We have shown that, in general, taking into account relative accuracy positively affects the epistemic performance of groups. Translated into our statistical model, this means that differential weighting outperforms straight averaging, even if the ranking of the experts is not represented accurately.

The result remains stable over several representative extensions of the model, such as various forms of bias, violations of independence, and over- and underconfident agents (Theorems 3–5). In particular, we demonstrated that differential weighting is superior (i) if experts are, on average, less biased; (ii) for a group of uniformly biased agents; (iii) if experts are less correlated with the rest of the group than other members. We also showed that uniform overconfidence in one’s own abilities is detrimental for group performance whereas (over)confidence in the group may be beneficial. These properties may be surprising and demonstrate the stability and robustness of expertise-informed judgements, implying that the benefits of recognizing experts may offset the practical problems linked with that process.

Our model can in principle also be used for describing how groups actually form judgements. In that case, the involved tasks should neither be too intellective (that is, there is a demonstrable solution) or too judgemental (Laughlin and Ellis Reference Laughlin and Ellis1986): in highly intellective tasks, group will typically not perform better than the best individual (=the one who has solved the task correctly). This differs from our model where any agent has only partial knowledge of the truth. On the other hand, if the task is too judgemental, any epistemic component will be removed and the individual weights may actually be based on the centrality of a judgement, such as in Hinsz’s (Reference Hinsz1999) SDS-Q scheme.

Finally, we name some distinctive traits of our model. First, unlike other models of group judgements that are detached from the group members’ individual abilities (Davis Reference Davis1973; DeGroot Reference DeGroot1974; Lehrer and Wagner Reference Lehrer and Wagner1981; Hinsz Reference Hinsz1999), it is a genuinely epistemic model, evaluating the performance of different ways of making a group judgement.Footnote ⁴ Thus, our model can be used normatively, for supporting the use of differential weights in group decisions, but also descriptively, for fitting the results of group decision processes.

Second, we did not make any specific distributional assumptions on how the agents estimate the target value. Our assumptions merely concern the first and second moment (bias and variance). We consider this parsimony a prudent choice because those distributions will greatly vary in practice, and we do not have epistemic access to them. Classical work in the social combination literature makes much more specific distributional assumptions (e.g. the multinomial distributions in Thomas and Fink (Reference Thomas and Fink1961) and Davis (Reference Davis1973)), restricting the scope of that analysis.

Third, we are not aware of other analytical models that take into account important confounders such as correlation, bias and over-/underconfident agents. Thus, we conclude that our model makes a substantial contribution to understanding the epistemic benefits of expertise in group judgements.

ACKNOWLEDGEMENTS

Dominik Klein thanks the Netherlands Organisation for Scientific Research (NWO) for supporting his research through Vidi grant #no. 016.094.345, held by Eric Pacuit. Jan Sprenger thanks the Netherlands Organisation for Scientific Research (NWO) for supporting his research through Veni grant #no. 016.104.079 and Vidi grant #no. 016.144.342.

APPENDIX. PROOF OF THE THEOREMS

We will need the following inequalities repeatedly in the subsequent proofs. Let c ₁, . . ., c_n > 0. Then

(9)

\begin{equation} \sum _{i=1}^n\frac{1}{c_i}\ge \frac{n^2}{\sum _{i=1}^n c_i} \end{equation}

with equality if and only if c ₁ = . . . = c_n. Moreover

(10)

\begin{equation} n\sum _{i=1}^n c_i^2 \ge \left(\sum _{i=1}^nc_i\right)^2 \end{equation}

again with equality if and only if c ₁ = . . . = c_n. Both inequalities are special cases of the Power Mean Theorem (cf. Wilf Reference Wilf1985: 258).

For the First Baseline Result, we need the following

Lemma 1.Let k < n and let (c ₁, . . ., c_n) be a sequence such that

(1) ∑ⁿ_{i = 1}c_i = s for some s > 0 and all c_i are positive;
(2) c ₁ = . . . = c_k and c _{k + 1} = . . . = c_n;
(3) c_k ⩽ c _{k + 1}and $1\le \frac{c_{k+1}}{c_k}\le \frac{c_{k+1}^*}{c_k^*}$.

Further assume that σ₁ ⩾ . . . ⩾ σ_n. Then

\begin{equation*} \sum _{i=1}^n \left(\frac{s}{n}\right)^2\sigma _i \ge \sum _{i=1}^n c_i^2\sigma _i \end{equation*}

Furthermore, we show that under the above conditions (i.e. ∑ⁿ_{i = 1}c_i = s), the value of the sum ∑ⁿ_{i = 1}c_i ²σ_idecreases as the quotient $\frac{c_{k+1}}{c_k}$increases.

Proof of Lemma 1. Fix r such that

• $c_i=\frac{s}{n}-\frac{r}{k}$ for i ⩽ k
• $c_i=\frac{s}{n}+\frac{r}{n-k}$ for i > k

Then we have to show that:

\begin{equation*} \sum _{i\le k}\left(\frac{s}{n}-\frac{r}{k}\right)^2\sigma _i+\sum _{i>k}\left(\frac{s}{n}+\frac{r}{n-k}\right)^2\sigma _i-\sum _{i=1}^n \left(\frac{s}{n}\right)^2\sigma _i\le 0 \end{equation*}

The above equation reduces to:

(11)

\begin{equation} r^2\left(\sum _{i\le k}\frac{1}{k^2}\sigma _i+\sum _{i>k}\frac{1}{(n-k)^2}\sigma _i\right)-\frac{2s}{n}r\left(\sum _{i\le k}\frac{1}{k}\sigma _i-\sum _{i>k}\frac{1}{n-k}\sigma _i\right) \le 0 \end{equation}

Now the left hand side of the above equation is a quadratic function in r with zeros at 0 and

(12)

\begin{equation} r_0=\frac{2s}{n}\frac{\sum _{i\le k}\frac{1}{k}\sigma _i-\sum _{i>k}\frac{1}{n-k}\sigma _i}{\sum _{i\le k}\frac{1}{k^2}\sigma _i+\sum _{i>k}\frac{1}{(n-k)^2}\sigma _i} \end{equation}

Since the σ_i are ordered decreasingly we get

\begin{equation*} r_0\ge \frac{2s}{n}\frac{\sum _{i\le k}\frac{1}{k}\sigma _i-\sigma _{k+1}}{\sum _{i\le k}\frac{1}{k^2}\sigma _i+\frac{1}{(n-k)}\sigma _{k+1}} \end{equation*}

Now this is a function of the form $\frac{kx-a}{x+b}$ with a, b > 0. Since these functions are increasing for x > −b, the inequality above can be strengthened to

\begin{equation*} r_0\ge \frac{2s}{n}\frac{\sigma _k-\sigma _{k+1}}{\frac{1}{k}\sigma _k+\frac{1}{(n-k)}\sigma _{k+1}} \end{equation*}

Recall that $\frac{c_{k+1}}{c_k}\le \frac{c_{k+1}^*}{c_k^*}=\frac{\sigma _k}{\sigma _{k+1}}=:\sigma$. Inserting this transforms the above equation into:

\begin{equation*} r_0\ge \frac{2s}{n}\frac{(\sigma -1)\sigma _{k+1}k(n-k)}{\sigma _{k+1}((n-k)\sigma +k)} \end{equation*}

Our assumptions about the c_i translate into

\begin{equation*} \frac{\frac{s}{n}+\frac{r}{n-k}}{\frac{s}{n}-\frac{r}{k}}\le \frac{c_{k+1}^*}{c_k^*}=\frac{\sigma _k}{\sigma _{k+1}} \end{equation*}

This transforms to

\begin{equation*} r\le \frac{s}{n}\frac{(\sigma -1)k(n-k)}{(n-k)-\sigma k} \end{equation*}

In particular r < r ₀, finishing the proof of (11). For the last statement of Lemma 1, observe that the left hand side of (11) is a quadratic function with minimum $\frac{1}{2}r_0$, and that $r\le \frac{1}{2}r_0$.$\Box$

Proof of Theorem 1. By assumption the c_i are ordered increasingly, thus the σ_i are ordered decreasingly. For a vector of weights $\mathbf {w}\in \mathbb {R}^n$ (i.e. all w_i positive and ∑_iw_i = 1), we denote the mean square error of the estimator ∑w_iX_i by Ψ(w): That is:

\begin{equation*} \Psi (\mathbf {w}):=\sum w_i^2\sigma _i \end{equation*}

Thus for c = (c ₁. . .c_n) as in the theorem we have to show Ψ(c) ⩽ Ψ(e), where e is the equal weight vector $(\frac{1}{n},\ldots ,\frac{1}{n})$. To this end we will construct a sequence of weight vectors e = d₀, . . ., d_{n − 1} = c such that:

(i) each d_i satisfies the assumptions of Theorem 1;
(ii) for d_i = (d ₁. . .d_n), there is some $k \in \mathbb {N}$ such that
- d ₁ = . . . = d_k and d ₁ > c ₁; . . .; d_k > c_k;
- d_j = c_j for $k<j\le k+i\hspace{28.45274pt}$(where i is the index of d_i);
- d _{k + i + 1} = . . . = d_n and d _{k + i + 1} ⩽ c _{k + i + 1}; . . .; d_n ⩽ c_n;
(iii) Ψ(d _{i − 1}) ⩾ Ψ(d_i).

Thus d_{i − 1} = c and Ψ(c) ⩽ Ψ(e) as desired. The d_i are constructed inductively as follows: Assume d_{i − 1} = (d′₁. . .d′_n) has already been constructed. If i = 1 let k be the unique index such that $c_k<\frac{1}{n}$ and $c_{k+1}\ge \frac{1}{n}$. If i > 1 let k be as in the above conditions for d_{i − 1}. First note that if k = 0, then d′_j ⩽ c_j for all j and thus d_{i − 1} = c since both are weight vectors and we are done. Thus assume k ⩾ 1 for the rest of the proof. With a similar argument, we can show that k + i + 1 ⩽ n. Now choose the maximal $r \in \mathbb {R}$ that satisfies

(13)

\begin{equation} d^{\prime }_k-c_k \ge \frac{r}{k} \quad c_{k+i+1}-d^{\prime }_{k+i+1} \ge \frac{r}{n-k-i-1} \end{equation}

By the above conditions, r ⩾ 0. Then define d_i = (d ₁, . . ., d_n) by:

• $d_j = d^{\prime }_j-\frac{r}{k}$ for j ⩽ k;
• d_j = c_j for k < j ⩽ k + i;
• $d_j = d^{\prime }_j+\frac{r}{n-k-i-1}$ for j ⩾ k + i + 1.

To see that d_i satisfies conditions (i)-(iii), first note that since r was chosen to be maximal, one of the two inequalities in (13) has to be an equality. Thus we either have d_k = c_k or d _{k + i + 1} = c _{k + i + 1} and condition (ii) is satisfied. Further note that

\begin{equation*} \sum _{i=1}^n d_i=\sum _{i=1}^n d^{\prime }_i -\sum _{i\le k}+\frac{r}{k}+\sum _{i\ge k+i+1}\frac{r}{n-k-i-1}=1 \end{equation*}

Using that the c_i are ordered increasingly, it is easy to see that d_i satisfies the assumptions of Theorem 1. Furthermore, applying the monotonicity part of Lemma 1 to the set of indices I ≔ {1, . . ., k}∪{i + k + 1, . . ., n}, we get ∑_Id_iσ²_i ⩽ ∑_Id′_iσ²_i. Thus Ψ(d_i) ⩽ Ψ(d_{i − 1}) since d_{i − 1} and d_i coincide outside I. This finishes the proof. $\Box$

Proof of Theorem 2. We would like to show that the mean square error of the straight average $\overline{\mu } := (1/n) \sum _{i=1}^n X_i$ exceeds the mean square error of the weighted estimate $\hat{\mu }$. The MSE difference can be calculated as

\begin{eqnarray*} \Delta (c_1, \ldots , c_n) &:=& \text{MSE}\left( \overline{\mu } \right) - \text{MSE}\left( \overline{\mu } \right) \, = \, \frac{1}{n^2} \sum _{i=1}^n \sigma _i^2 - \sum _{i=1}^n c_i^2 \sigma _i^2 \nonumber \\ &=& \frac{1}{n^2} \left( \sum _{j=1}^n \frac{1}{\sigma _j^2} \right)^{-1} \sum _{i=1}^{n} \frac{1}{c_i^*} \left( 1-n^2c_i^2 \right) \nonumber \end{eqnarray*}

where we have made use of $\mathbb {E}[X_i \, X_j] = 0, \; \forall i \ne j$, and of $c_i^* = ( \sum _{j=1}^n \frac{\sigma _i^2 }{\sigma _j^2} )^{-1}$ (cf. equation (2)). Thus, instead of considering Δ, it suffices to show that

\begin{equation*} \Delta ^{\prime }(c_1\ldots c_n) := \sum _{i=1}^n \frac{1}{c_i^*}\left(1-n^2c_i^2\right) \ge 0. \end{equation*}

To this end, let I_i ≔ [1/n; c*_i] (respectively [c*_i; 1/n]) and let $\mathcal {Q}:= I_1\times \ldots \times I_n$. Then,

\begin{equation*} \mathcal {D} := \mathcal {Q} \cap \left\lbrace (c_1, \ldots , c_n) | \sum _{i=1}^n c_i =1 \right\rbrace \end{equation*}

defines the ‘domain’ of our theorem, and it is a polygon. Moreover, since $\sum _i \frac{n^2}{c_i^*}c_i^2$ is a positive determinate quadratic form in the c_i, we get that Δ′^{− 1}([0; ∞)) is convex. Thus, it suffices to show that Δ′ is positive on the vertices of $\mathcal {D}$. Note that since {x|∑x_i = 1} is of dimension n − 1, the vertices of $\mathcal {D}$ are of the form v = (c*₁, . . ., c _{k − 1}*, c_k, 1/n, . . ., 1/n) – the ordering is assumed for convenience, and c_k is defined such that ||v||₁ = 1. Thus we have to show that Δ′(c*₁, . . ., c _{k − 1}*, c_k, 1/n, . . ., 1/n) ⩾ 0.

In the case k = 1, the desired inequality holds trivially since c_k = 1 − (n − 1) · (1/n) = 1/n. Thus we assume k > 1 for the remainder of this proof. Let l denote the real number satisfying

\begin{equation*} \sum _{i=1}^n c_i^*=l\frac{k-1}{n} \end{equation*}

Observe that for $c_i=\frac{1}{n}$ the corresponding summands in Δ′ vanish. Thus we have to show that

\begin{equation*} \sum _{i=1}^{k-1} \frac{1}{c_i^*}\left(1-n^2{c_i^*}^2\right)+\frac{1}{c_k^*}\left(1-n^2c_k^2\right)\ge 0 \end{equation*}

Using the definition of l from above and inequality (9) gives $\sum _{i=1}^{k-1}\frac{1}{c_i^*} \ge (k-1)^2/(\sum _{i=1}^{k-1} c_i) \ge \frac{n(k-1)}{l}$. Thus, it suffices to show

(14)

\begin{equation} n (k-1) \left( \frac{1}{l} - l \right) + \frac{1}{c_k^*}\left(1-n^2c_k^2\right) \ge 0 \end{equation}

Since the c_i add up to one, we can express the dependency between l and c_k by

(15)

\begin{equation} c_k = \frac{(k-1)(1-l)+1}{n} \quad \text{or by } \quad l = \frac{k-nc_k}{k-1} \end{equation}

Inserting this into (14) gives

\begin{eqnarray*} \Delta ^{\prime }(c_1, \ldots , c_n) &=& \left( \frac{1}{l}-l \right) n(k-1)- \frac{1}{c_k^*} \left( (1-l)^2(k-1)^2+2(1-l)(k-1) \right) \nonumber \\ &=& \frac{k-1}{l}\left[\left(1-l^2\right)n-\frac{l}{c_k^*}\left((1-l)^2(k-1)+2(1-l)\right) \right] \nonumber \\ &=& \frac{k-1}{l}\left[(1-l)\left((1+l)n-\frac{l}{c_k^*}((1-l)(k-1)+2)\right) \right] \nonumber \end{eqnarray*}

Since the first factor is always positive, it suffices to show that the factor in the square brackets, denoted by P(l), is positive for every l that can occur in our setting. We do this by a case distinction on the value of c*_k

Case 1.c*_k ⩽ 1/n. Noting $c_k \in [c_k^*,\frac{1}{n}]$ and the dependency (15) between l and c_k, we have to show that P(l) ⩾ 0 for all $l \in [1;\frac{k-nc_k^*}{k-1}]$. We observe that P is a polynomial of third order with zero points of P given by P(1) = 0 and

\begin{equation*} r_\pm = \frac{k+1-nc_k^*\pm \sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}}{2(k-1)} \end{equation*}

with r ₊ denoting the larger of these two numbers. With some algebra it also follows that P′(1) ⩾ 0 if and only if c*_k ⩽ 1/n. From the functional form of P(l) – a polynomial of the third degree with negative leading coefficient – we can then infer that l = 1 must be the middle zero point of P. To prove that P(l) ⩾ 0 in the critical interval, it remains to show that for the rightmost zero point, we have $r_+ \ge \frac{k-nc_k^*}{k-1}$:

\begin{eqnarray*} \begin{array}{lc@{\quad}cl} &\displaystyle\frac{k-nc_k^*}{k-1} &\le & r_+\\[4pt] \Leftrightarrow & \displaystyle\frac{2k-2nc_k^*}{2(k-1)} & \le & \displaystyle\frac{k+1-c_k^*n+\sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}}{2(k-1)}\\[12pt] \Leftrightarrow & k-1-nc_k^* &\le & \sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}\\[4pt] \Leftrightarrow & c_k^*n &\le & 1 \end{array} \end{eqnarray*}

completing the proof for the case c_k* ⩽ 1/n.

Case 2.c_k* ⩾ 1/n. In this case we are dealing with the interval $l \in [\frac{k-nc_k^*}{k-1};1]$. The same calculations as above yield

\begin{equation*} \frac{k-nc_k^*}{k-1} \ge r_+ \qquad \text{if and only if} \qquad c_k^*n \ge 1. \end{equation*}

in particular r ₊ < 1. Thus l always lies between the middle and the rightmost zero point of P(l), and in particular, P(l) ⩾ 0 for all $l \in [\frac{k-nc_k^*}{k-1};1]$.

Proof of Theorem 3. Let the X_i centre around B_i > 0. Then $\mathbb {E}[X_i - B_i] =0$, and we observe

\begin{eqnarray*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] +\left(\frac{1}{n}\sum _{i=1}^n B_i\right)^2 \end{eqnarray*}

Analogously, we obtain

\begin{eqnarray*} \mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] + \left( \sum _{i=1}^n c_i B_i\right)^2. \end{eqnarray*}

Like in Theorem 2, we define $\Delta (c_1, \ldots , c_n) := \text{MSE}(\overline{\mu }) - \text{MSE}(\hat{\mu })$ as the difference in mean square error between both estimates and show that Δ(c ₁, . . ., c_n) ⩾ 0 if equation (4) is satisfied.

(16)

\begin{eqnarray} \Delta (c_1, \ldots , c_n) &:=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] - \mathbb {E}\left[ \left(\sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] \nonumber \\ [6pt]&+& \left( \frac{1}{n}\sum _{i=1}^n B_i \right)^2 - \left( \sum _{i=1}^n c_i B_i\right)^2 \end{eqnarray}

By Theorem 1 and/or Theorem 2, the first line is greater or equal to zero, and by equation (4), the second line is also non-negative. Thus Δ(c ₁, . . ., c_n) ⩾ 0, showing the superiority of differential weighting.

For the second part of the theorem, we just observe that

\begin{equation*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] - \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] \ge \frac{1}{n^2}\sum _{i=1}^{n}\sigma ^2_i. \end{equation*}

$\Box$

Proof of Corollary 1. It is easy to see that the conditions of the corollary satisfy the requirements of part (a) of Theorem 3. This yields the desired result for the first part of the theorem. For the second, part, let the X_i all centre around B ≠ 0. Then X_i − B is unbiased, and we observe

\begin{eqnarray*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B) \right)^2 \right] + B^2 \\ [6pt]\mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B) \right)^2 \right] + B^2. \end{eqnarray*}

Therefore, under the conditions of the theorem,

\begin{eqnarray*} \Delta (c_1, \ldots , c_n) &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B) \right)^2 \right] - \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B) \right)^2 \right] \nonumber \\ \end{eqnarray*}

showing that Δ only depends on the centred estimates.$\Box$

Proof of Fact 1. First we deal with straight averaging:

\begin{eqnarray*} &&\mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] - \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n Y_i \right)^2 \right] \nonumber\\ &&\quad = \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_i X_j \right] - \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ Y_i Y_j \right] \, \ge \, 0 \end{eqnarray*}

The proof exploits that X_i and Y_i have the same variance, thus $\mathbb {E}\left[ X_i^2\right]=\mathbb {E}\left[ Y_i^2 \right]$. The proof for differential weights is similar, making use of the fact that the c_i are the same for X_i and Y_i because they only depend on the variance of the random variable.$\Box$

Proof of Theorem 4, part (i). First, assume without loss of generality that c_i ⩾ c _{i + 1} for all i < n. Thus, our assumption on the $\mathbb {E}[X_i X_j]$ reduces to $\mathbb {E}[X_iX_k]\le \mathbb {E}[X_jX_k]$ for i ⩾ j ≠ k. First, we show the theorem under the assumption that all $\mathbb {E}[X_i X_j]$ with i ≠ j are equal, say $\mathbb {E}[X_i X_j]=\gamma$. By Theorem 1 and/or 2, it suffices to show that

\begin{equation*} \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_i,X_j \right] - \sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]\ge 0 \end{equation*}

Inserting $\mathbb {E}[X_i X_j]=\gamma$ this reduces to

(17)

\begin{equation} \gamma \cdot \left(\frac{n-1}{n} - \sum _{i=1}^n \sum _{j\ne i} c_i \, c_j \, \right) \ge 0 \end{equation}

The point (1/n, . . ., 1/n) is a global minimum of the function f(x) = ∑_ix ²_i under the constraints x ₁, . . ., x_n ⩾ 0 and ∑_ix_i = 1. Thus we have

(18)

\begin{equation} \frac{1}{n}=f\left(\frac{1}{n},\ldots ,\frac{1}{n}\right) \le f\left(\mathbf {c}\right) = \sum _{i=1}^n {c_i}^2 \end{equation}

Observing ∑ⁿ_{i = 1}∑_{j = 1}ⁿc_i c_j = (∑ⁿ_{i = 1}c_i)² = 1 and combining this equality with (17) and (18), we obtain

(19)

\begin{eqnarray} \frac{n-1}{n} - \sum _{i=1}^n \sum _{j\ne i} c_i \, c_j &=& \frac{n-1}{n} - \sum _{i=1}^n \sum _{j=1}^n c_i \, c_j + \sum _{i=1}^n {c_i}^2 \ge 0 \end{eqnarray}

thus proving the statement in the case that all $\mathbb {E}[X_i X_j]$ are the same.

For the general case let us assume that not all c_i are the same (otherwise the theorem is trivially true). Thus we either have c ₁ > c _{n − 1} or c ₂ > c_n since the c_i are ordered decreasingly. In the following, we assume c ₂ > c_n, the other case works with a similar argument. First observe that

\begin{equation*} \sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]=2\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]. \end{equation*}

Thus, we can concentrate on $\lbrace \mathbb {E}[X_i X_j]|i>j\rbrace$. We fix a natural number c and let S_c be the set of all vectors $(\mathbb {E}[X_i X_j])_{(i>j)}$ fulfilling the conditions of our theorem and $\sum _{i> j}\mathbb {E}[X_iX_j]=c$ We then consider the functional

\begin{eqnarray*} \tilde{\varphi }(e) &:=& \frac{1}{n^2} \displaystyle\sum _{i=1}^n \sum _{j<i} \mathbb {E}\left[ X_iX_j \right] - \,\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right] \\ &=& \displaystyle\frac{1}{2}\left[\sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_iX_j \right] - \,\sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]\right] \end{eqnarray*}

on S_c. Observe that every S_c contains exactly one point e_eq where all $\mathbb {E}[X_i X_j]$ are equal. By the first part of this proof, $\tilde{\varphi }(e_{eq})$ is non-negative. Thus, it suffices to show that e_eq is an absolute minimum of $\tilde{\varphi }$ on S_c. First, observe that the value of $\frac{1}{n^2} \sum _{i=1}^n \sum _{j<i}\mathbb {E}\left[ X_i,X_j \right]$ is constantly $\frac{c}{n^2}$ on S_c, thus it suffices to show that

(20)

\begin{equation} \varphi (e):=\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_i X_j \right] \end{equation}

attains its maximum on S_c in e_eq.

To do so, we show the following: For every e ∈ S_c with e ≠ e_eq there is some e′ ∈ S_c with φ(e′) > φ(e). In particular, φ does not take its maximum on S_c in e. Thus assume that $e=(\mathbb {E}[X_i X_j])_{(i> j)}\in S_c$ is given. Since e ≠ e_eq there are some indices s > t and k > l such that $\mathbb {E}[X_s X_t]\ne \mathbb {E}[X_k X_l]$. Furthermore, we can assume that t ⩾ l. Without loss of generality (by potentially replacing one of the two entries with $\mathbb {E}[X_sX_l]$) we can assume that either s = k or t = l. In the following we assume s = k, the other case works similar. The idea of the following construction is: We show that moving towards a more equal distribution of the entries $\mathbb {E}[X_iX_j]$ increases φ(e). In particular, we construct $e^{\prime }=(\mathbb {E}^{\prime }[X_i X_j])_{(i> j)}\in S_c$ as follows: In every row $r_i:=\left\langle \mathbb {E}[X_i X_1]\ldots \mathbb {E}[X_i X_{i-1}]\right\rangle$ of e we replace all the entries of this row by their arithmetic mean. Formally, that is for all i and j (independent of j):

\begin{equation*} \mathbb {E}^{\prime }[X_i X_j]=\frac{1}{i-1}\sum _{l<i}\mathbb {E}[X_i X_l] \end{equation*}

Trivially this operation satisfies for all i:

\begin{equation*} \sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{j<i}\frac{1}{i-1}\sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{j=1}^{i-1}\mathbb {E}^{\prime }[X_i X_j] \end{equation*}

and thus also for the double sum:

\begin{equation*} \sum _{i=1}^n\sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{i=1}^n\sum _{j<i}\mathbb {E}^{\prime }[X_i X_j]. \end{equation*}

In particular e′ is in S_c. Furthermore, we have assumed that the c_i are ordered decreasingly. Recall that c_k > c_j implies $\mathbb {E}[X_i X_k]\le \mathbb {E}[X_i X_j]$ by assumption, therefore the rows r_i were ordered increasingly, and thus the rows of e′ − e:

\begin{equation*} \mathbb {E}^{\prime }[X_i,X_1]-\mathbb {E}[X_iX_1];\ldots ;\mathbb {E}^{\prime }[X_i,X_{i-1}]-\mathbb {E}[X_iX_{i-1}] \end{equation*}

are ordered decreasingly (since the rows of e′ are constant). In particular, we have for any i:

(21)

\begin{equation} 0=\sum _{j<i}\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\le \sum _{j<i}c_ic_j(\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]) \end{equation}

where the ⩽ comes from the fact that both c_j and $\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]$ are decreasing in j. Summing that up over all i we get that

\begin{equation*} 0=\sum _{i=1}^n\sum _{j<i}\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\le \sum _{i=1}^n\sum _{j<i}c_ic_j\left(\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\right)=\varphi (e^{\prime })-\varphi (e) \end{equation*}

Thus we have φ(e′) ⩾ φ(e) as desired. Now observe that (21) for i = s is the following:

\begin{equation*} \begin{array}{rcl} 0&=&\sum _{j<s}\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\\ &=&\sum _{j<s,j\ne t,\;l}\left(\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\right)+\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t]+\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l] \end{array} \end{equation*}

with both,

\begin{equation*} \sum _{j<s,j\ne t,\;l}\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\le \sum _{j<s,j\ne t,\;l}c_sc_j\left(\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\right) \end{equation*}

and

\begin{equation*} \begin{array}{ll} &\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t]+\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l]\\ &\le c_sc_t(\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t])+c_sc_l(\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l]). \end{array} \end{equation*}

By construction we have $\mathbb {E}[X_s X_t]\ne \mathbb {E}[X_s X_l]$, thus we would have a strict inequality in the last summand (and thus in the entire sum) if we knew that c_t ≠ c_l. Unfortunately, this is not always the case. However, we have put ourselves in a situation where applying the same construction again with $\mathbb {E}^{\prime }[X_2X_1]$ and $\mathbb {E}^{\prime }[X_nX_1]$ replacing $\mathbb {E}[X_sX_t]$ and $\mathbb {E}[X_sX_l]$ yields the desired (since we have assumed that c ₂ > c_n). To see this, observe that

• $\mathbb {E}[X_2X_1]=\mathbb {E}^{\prime }[X_2X_1]$ by construction
• $\mathbb {E}^{\prime }[X_sX_1]>\mathbb {E}[X_sX_1]$ since $\mathbb {E}[X_sX_t]\ne \mathbb {E}[X_s,X_l]$ and $\mathbb {E}[X_sX_1]$ is the minimal element in the row r_s
• $\mathbb {E}[X_2X_1]\le \mathbb {E}[X_sX_1]$ by assumption

Thus we have

\begin{equation*} \mathbb {E}^{\prime }[X_2X_1]=\mathbb {E}[X_2X_1]\le \mathbb {E}[X_sX_1]<\mathbb {E}^{\prime }[X_sX_1]\le \mathbb {E}^{\prime }[X_nX_1] \end{equation*}

By assumption we have c ₂ > c_n and repeating the construction from above with coloumns replacing rows and $\mathbb {E}^{\prime }[X_2,X_1],\mathbb {E}^{\prime }[X_n,X_1]$ as the two reference points yields the desired.

Proof of Theorem 4, part (ii). We have to show that the statement holds if all $\mathbb {E}[X_i X_j]$ with i ≠ j ∈ I are the same. The step from this case to the general statement works as in the proof above. As in the proof of (i), it suffices to show that

\begin{equation*} \frac{1}{n^2} \sum _{i \in I} \sum _{j\ne i \in I} 1 \ge \,\sum _{i \in I} \sum _{j\ne i \in I} c_ic_j \end{equation*}

Let $\overline{c}=\frac{1}{|I|}\sum _{i\in I}c_i$. By equation (10) we have

\begin{equation*} \sum _{i\in I}c_i^2\ge \frac{1}{|I|}\left(\sum _{i\in I}c_i\right)^2=\frac{1}{|I|}|I|^2\overline{c}^2=|I|\overline{c}^2 \end{equation*}

thus

\begin{equation*} \begin{array}{lc} \sum _{i \in I} \sum _{j\ne i \in I} c_ic_j&\le (|I|^2-|I|)\overline{c}^2 \le |I|^2-|I|=\frac{1}{n^2}\sum _{i \in I} \sum _{j\ne i \in I} 1 \end{array} \end{equation*}

with the last inequality coming from our assumption that $\overline{c}<1$.

Proof of Theorem 5. Let the benchmark agent have standard deviation s > 0, that is, variance s ². We will show that Δ(s, σ₁, . . ., σ_n) – the MSE difference between the differentially weighted and the straight average – is strictly monotonically decreasing in the first argument. To this effect, we calculate

\begin{equation*} \Delta (s,\sigma _1,\ldots ,\sigma _n)=\frac{1}{n^2} \sum _{i=1}^n \sigma _i^2 - \left( \frac{1}{\sum _k c_k} \right)^2 \,\sum _{i=1}^n c_i^2 \sigma _i^2. \end{equation*}

Now we show that $\frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)\le 0$, where c′_i denotes (∂/∂)sc_i:

\begin{equation*} \begin{array}{rcl} \frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)&=&-\frac{\partial }{\partial {}s} \left( \sum _{i=1}^n \frac{c_i^2}{(\sum _kc_k)^2}\sigma _i^2 \right)\\[4pt] &=& - \sum _{i=1}^n\sigma _i^2 \cdot 2\cdot \left(\frac{c_i}{\sum _kc_k}\right)\frac{c_i^{\prime }\sum _jc_j-c_i\sum _j c_j^{\prime }}{(\sum _kc_k)^2}\\[6pt] &=& - \frac{2}{(\sum _kc_k)^3}\sum _{i=1}^n \sigma _i^2 c_i \left( \sum _{j \ne i} c_i^{\prime }c_j-c_ic_j^{\prime } \right)\\[6pt] &=& - \frac{2}{(\sum _kc_k)^3}\sum _{i=1}^n \sum _{j<i}\left(\sigma _i^2 c_i-\sigma _j^2c_j\right)\left(c_i^{\prime }c_j-c_i c_j^{\prime }\right) \end{array} \end{equation*}

Since we are only interested in the sign of the first derivative and $-\frac{2}{(\sum _kc_k)^3}<0$, it suffices to show that:

(22)

\begin{equation} \left(\sigma _i^2 c_i-\sigma _j^2c_j\right)\left(c_i^{\prime }c_j-c_j^{\prime }c_i\right)\ge 0 \end{equation}

We show that the terms in both brackets have the same sign.

For the first bracket we have:

\begin{eqnarray*} \sigma _i^2 c_i-\sigma _j^2c_j &=& s^2 \frac{\sigma _i^2}{s^2+(n-1)\sigma _i^2} - s^2 \frac{\sigma _j^2}{s^2+(n-1)\sigma _j^2}\\ &=& s^4 \frac{\sigma _i^2 - \sigma _j^2}{(s^2+(n-1)\sigma _i^2)(s^2+(n-1)\sigma _j^2)} \end{eqnarray*}

which is larger than or equal to 0 if and only if σ²_i > σ_j². Similarly, we observe for the second bracket that

\begin{equation*} c_i^{\prime } = \frac{2(n-1)s\sigma _i^2}{(s^2+(n-1)\sigma _i^2)^2} \end{equation*}

which allows us to conclude

\begin{eqnarray*} && c_i^{\prime }c_j-c_j^{\prime } c_i \\ &&\quad= \frac{2(n-1)s\sigma _i^2}{(s^2+(n-1)\sigma _i^2)^2} \cdot \frac{s^2}{s^2+(n-1)\sigma _j^2} - \frac{2(n-1)s\sigma _j^2}{(s^2+(n-1)\sigma _j^2)^2} \cdot \frac{s^2}{s^2+(n-1)\sigma _i^2}\\ &&\quad= 2 (n-1) s^5\ \frac{\sigma _i^2 - \sigma _j^2}{(s^2+(n-1)\sigma _i^2)^2 (s^2+(n-1)\sigma _j^2)^2} \end{eqnarray*}

Thus, both factors in (22) have the same sign, implying $\frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)\le 0$ which is want we wanted to prove.$\Box$

Footnotes

¹ Rewriting our results for the general case μ ≠ 0 is just a matter of affine transformation, but comes with some notational baggage. Therefore we focus without loss of generality on μ = 0.

² Hartmann and Sprenger (Reference Hartmann and Sprenger2010) prove the optimality of the c*_i for the case of Normally distributed independent and unbiased estimates with variance σ²_i and the loss function family L _α(x) = 1 − exp( − x ²/2α²). That paper also contains an elaborate justification for choosing this family of loss functions.

³ Recall that $\mathbb {E}[X_i,X_k] \le \mathbb {E}[X_j,X_k]$ can be rewritten as σ_i/σ_j ⩽ ρ_jk/ρ_ik with ρ_ij defined as the Pearson correlation coefficient $\rho _{ij} := \mathbb {E}[X_i X_j]/\sigma _i \sigma _j$. Also, if c_i ⩾ c_j then automatically σ_i ⩽ σ_j.

⁴ Lehrer and Wagner also defend their model from a normative point of view, but their arguments for this claim are not particularly persuasive, see e.g. Martini et al. (Reference Martini, Sprenger and Colyvan2013).

References

REFERENCES

Armstrong, J. S. 2001. Combining forecasts. In Principles of Forecasting: A Handbook For Researchers and Practitioners, ed. Armstrong, J. Scott. Norwell, MA: Kluwer Academic.Google Scholar

Bates, J. M. and Granger, C. W. J.. 1969. The combination of forecasts. Operational Research Quarterly 20: 451–468.Google Scholar

Baumann, M. R. and Bonner, B. L.. 2004. The effects of variability and expectations on utilization of member expertise and group performance. Organizational Behavior and Human Decision Processes 93: 89–101.Google Scholar

Bonner, B. L. 2000. The effects of extroversion on influence in ambiguous group tasks. Small Group Research 31: 225–244.CrossRef Google Scholar

Bonner, B. L. 2004. Expertise in Group Problem Solving: Recognition, Social Combination, and Performance. Group Dynamics: Theory, Research, and Practice 8: 277–290.Google Scholar

Bonner, B. L., Baumann, M. R. and Dalal, R. S.. 2002. The effects of member expertise on group decision-making and performance. Organizational Behavior and Human Decision Processes 88: 719–736.Google Scholar

Clemen, R. T. 1989. Combining forecasts: a review and annotated bibliography. International Journal of Forecasting 5: 559–583.CrossRef Google Scholar

Cooke, R. M. 1991. Experts in Uncertainty. Oxford: Oxford University Press.Google Scholar

Davis, J. H. 1973. Group decision and social interaction: A theory of social decision schemes. Psychological Review 80: 97–125.Google Scholar

DeGroot, M. 1974. Reaching a consensus. Journal of the American Statistical Association 69: 118–121.Google Scholar

Einhorn, H. J., Hogarth, R. M. and Klempner, E.. 1977. Quality of Group Judgment. Psychological Bulletin 84: 158–172.Google Scholar

Elga, A. 2007. Reflection and Disagreement. Noûs 41: 478–502.Google Scholar

Gigerenzer, G. and Goldstein, D. G.. 1996. Reasoning the fast and frugal way: models of bounded rationality. Psychological Review 103: 650–669.Google Scholar

Hartmann, S. and Sprenger, J.. 2010. The weight of competence under a realistic loss function. The Logic Journal of the IGPL 18: 346–352.CrossRef Google Scholar

Henry, R. A. 1995. Improving group judgment accuracy: information sharing and determining the best member. Organizational Behavior and Human Decision Processes 62: 190–197.CrossRef Google Scholar

Hill, G. W. 1982. Group versus individual performance: Are N + 1 heads better than one? Psychological Bulletin 91: 517–539.Google Scholar

Hinsz, V. B. 1999. Group decision making with responses of a quantitative nature: the theory of social decision schemes for quantities. Organizational Behavior and Human Decision Processes 80: 28–49.Google Scholar

Hogarth, R. M. 1978. A note on aggregating opinions. Organizational Behavior and Human Performance 21: 40–46.Google Scholar

Larrick, R. P., Burson, K. A. and Soll, J. B.. 2007. Social comparison and overconfidence: when thinking you’re better than average predicts overconfidence (and when it does not). Organizational Behavior and Human Decision Processes 102: 76–94.Google Scholar

Laughlin, P. R. and Ellis, A. L.. 1986. Demonstrability and social combination processes on mathematical intellective tasks. Journal of Experimental Social Psychology 22: 177–189.Google Scholar

Lehrer, K. and Wagner, C.. 1981. Rational Consensus in Science and Society: A Philosophical and Mathematical Study, Vol. 21. Berlin: Springer.Google Scholar

Libby, R., Trotman, K. T. and Zimmer, I.. 1987. Member variation, recognition of expertise and group performance. Journal of Applied Psychology 72: 81–87.Google Scholar

Lindley, D. V. 1983. Reconciliation of probability distributions. Operations Research 31: 866–880.Google Scholar

Littlepage, G. E., Schmidt, G. W., Whisler, E. W. and Frost, A. G.. 1995. An input-process-output analysis of influence and performance in problemsolving groups. Journal of Personality and Social Psychology 69: 877–889.Google Scholar

List, C. 2012. The theory of judgment aggregation: an introductory review. Synthese 187: 179–207.Google Scholar

Martini, C., Sprenger, J. and Colyvan, M.. 2013. Resolving disagreement through mutual respect. Erkenntnis 78: 881–898.CrossRef Google Scholar

Nadeau, R., Cloutier, E. and Guay, J.-H.. 1993. New evidence about the existence of a bandwagon effect in the opinion formation process. International Political Science Review 14: 203–213.Google Scholar

Page, S. E. 2007. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton, NJ: Princeton University Press.Google Scholar

Soll, J. B. and Larrick, R. P.. 2009. Strategies for revising judgment: how (and how well) people use others’ opinions. Journal of Experimental Psychology 35: 780–805.Google Scholar

Surowiecki, J. 2004. The Wisdom of the Crowds. Harpswell, ME: Anchor.Google Scholar

Thomas, E. J. and Fink, C. F.. 1961. Models of group problem solving. Journal of Abnormal and Social Psychology 63: 53–63.Google Scholar

Wilf, H. S. 1985. Some examples of combinatorial averaging. American Mathematical Monthly 92: 250–261.Google Scholar

Article contents

MODELLING INDIVIDUAL EXPERTISE IN GROUP JUDGEMENTS

Abstract:

Keywords

1. INTRODUCTION

2. THE MODEL AND BASELINE RESULTS

3. BIASED AGENTS

4. INDEPENDENCE VIOLATIONS

5. OVER- AND UNDERCONFIDENCE

6. DISCUSSION

ACKNOWLEDGEMENTS

APPENDIX. PROOF OF THE THEOREMS

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests