Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-05T23:11:17.132Z Has data issue: false hasContentIssue false

MODELLING INDIVIDUAL EXPERTISE IN GROUP JUDGEMENTS

Published online by Cambridge University Press:  19 February 2015

Dominik Klein
Affiliation:
Tilburg Center for Logic, General Ethics and Philosophy of Science (TiLPS), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands. Email: d.klein@uvt.nl.
Jan Sprenger
Affiliation:
Tilburg Center for Logic, General Ethics and Philosophy of Science (TiLPS), Tilburg University, P.O. Box 90153, 5000 LE Tilburg, the Netherlands. Email: j.sprenger@uvt.nl. URL: http://www.laeuferpaar.de.
Rights & Permissions [Opens in a new window]

Abstract:

Group judgements are often – implicitly or explicitly – influenced by their members’ individual expertise. However, given that expertise is seldom recognized fully and that some distortions may occur (bias, correlation, etc.), it is not clear that differential weighting is an epistemically advantageous strategy with respect to straight averaging. Our paper characterizes a wide set of conditions under which differential weighting outperforms straight averaging and embeds the results into the multidisciplinary group decision-making literature.

Type
Symposium on Individual and Social Deliberation
Copyright
Copyright © Cambridge University Press 

1. INTRODUCTION

Groups frequently make judgements that are based on aggregating the opinions of its individual members. A panel of market analysts at Apple or Samsung may estimate the expected number of sales of a newly developed cell phone. A group of conservation biologists may assess the population size of a particular species in a specific habitat. A research group at the European Central Bank may evaluate the merits of a particular monetary policy. Generally, such problems occur in any context where groups have to combine various opinions into a single group judgement (for a review paper, see Clemen Reference Clemen1989).

Even in cases of fully shared information, the assessment of the evidence will generally vary among the agents and depend on factors such as professional training, familiarity with similar situations in the past, and personal attitude toward the results. Thus, it will not come as a surprise that the individual judgements may differ. But how shall they be aggregated?

Often, some group members are more competent than others. Recognizing these experts may then become a crucial issue for improving group performance. Research in social psychology and management science has investigated the ability of humans to properly assess the expertise of other group members in such contexts (Clemen Reference Clemen1989; Bonner et al. Reference Bonner, Baumann and Dalal2002; Larrick et al. Reference Larrick, Burson and Soll2007). Most of this research stresses that recognizing experts is no easy task: perceived and actual expertise need not agree, data are noisy, questions may be too hard, and expertise differences may be too small to be relevant (e.g. Littlepage et al. Reference Littlepage, Schmidt, Whisler and Frost1995). This motivates a comparison of two strategies for group judgements: (i) deferring to the agent who is perceived as most competent, and (ii) taking the straight average of the estimates (Henry Reference Henry1995; Soll and Larrick Reference Soll and Larrick2009). The overall outcomes suggest that the straight average is often surprisingly reliable, apparently being one of those ‘fast and frugal heuristics’ (Gigerenzer and Goldstein Reference Gigerenzer and Goldstein1996) that help boundedly rational agents to make cost-effective decisions.

On the other hand, even if not explicitly recognized as such, experts tend to exert greater influence on group judgements than non-experts (Bonner et al. Reference Bonner, Baumann and Dalal2002). This motivates a principled epistemic analysis of the potential benefits of expertise-informed group judgements. We characterize conditions under which differentially weighted averages, fed by incomplete and perhaps distorted information on individual expertise, ameliorate group performance, compared with a straight average of the individual judgements. Our paper approaches this question from an analytical perspective, that is, with the help of a statistical model. We follow the social permutation approach (e.g. Bonner Reference Bonner2000) and model the agents as unique entities with different abilities. This differs notably from more traditional social combination research where individual agents are modelled as interchangeable (e.g. Davis Reference Davis1973). Our main result – that individual expertise makes a robust contribution to group performance – is not without surprise, given the generality of our conditions that also allow for perturbations such as individual bias or correlations among the group members. Therefore, our analytical results provide theoretical support to research on the recognition of experts in groups (e.g. Baumann and Bonner Reference Baumann and Bonner2004), and they directly relate to empirical comparisons of differentially weighted group judgements to ‘composite judgements’, such as the group mean or median (Einhorn et al. Reference Einhorn, Hogarth and Klempner1977; Hill Reference Hill1982; Libby et al. Reference Libby, Trotman and Zimmer1987; Bonner Reference Bonner2004).

Our work is also related to two other research streams. First, there is a thriving epistemological literature on peer disagreement and rational consensus, where consensus is mostly reached by deference to (perceived) experts. However, this debate either focuses on social power and mutual respect relations (e.g. Lehrer and Wagner Reference Lehrer and Wagner1981), or on principled philosophical questions about resolving disagreement (e.g. Elga Reference Elga2007). By means of a performance-focused mathematical model, we hope to bring this literature close to its primary target: the truth-tracking abilities of various epistemic strategies. There is also a vast literature on group decisions preference and judgement aggregation (e.g. List Reference List2012), but two crucial features of our inquiry – the aggregation of numerical values and the particular role of experts – do not play a major role in there.

Second, there is a fast increasing body of literature on expert judgement and forecasting, which has emerged from applied mathematics and statistics and became a flourishing interdisciplinary field. This strand of research deals with the theoretical modelling of expert judgement, most notably the (Bayesian) reconciliation of probability distributions (Lindley Reference Lindley1983), but it also includes more practical questions such as comparison of calibration methods, choice of seed variables, analyses of the use of expert judgement in the past (Cooke Reference Cooke1991), and the study of general forecasting principles, such as the benefits of opinion diversity (Armstrong Reference Armstrong and Armstrong2001; Page Reference Page2007). We differ from that approach in pooling individual (frequentist) estimators instead of subjective probability distributions, but we study similar phenomena, such as the impact of in-group correlations.

Admittedly, our baseline model is very simple, but due to this simplicity, we are able to prove a number of results regarding the behaviour of differentially weighted estimates under correlation, bias and benchmark uncertainty. Here, our paper builds on analytical work in the forecasting and social psychology literature (Bates and Granger Reference Bates and Granger1969; Hogarth Reference Hogarth1978), following the approach of Einhorn et al. (Reference Einhorn, Hogarth and Klempner1977).

The rest of the paper is structured as follows: we begin with explaining the model and stating conditions where differentially weighted estimates outperform the straight average (Section 2). In the sequel, we show that this relation is often preserved even if bias or mutual correlations are introduced (Sections 3 and 4). Subsequently, we assess the impacts of over- and underconfidence (Section 5). Finally, we discuss our findings and wrap up our conclusions (Section 6).

2. THE MODEL AND BASELINE RESULTS

Our problem is to find a good estimate of an unknown quantity μ. For reasons of convenience, we assume without loss of generality that μ = 0.Footnote 1

We model the group members’ individual estimates Xi, in, as independent random variables that scatter around the true value μ = 0 with variance σ2i. The Xi are unbiased estimators of μ, that is, they have the property $\mathbb {E}[X_i]=\mu$. This baseline model is inspired by the idea that the agents try to approach the true value with a higher or lower degree of precision, but have no systematic bias in either direction. The competence of an agent is explicated as the degree of precision in estimating the true value. No further assumptions on the distributions of the Xi are made – only the first and second moments are fixed.

In this model, the question of whether the recognition of individual expertise is epistemically advantageous translates into the question of which convex combination of the Xi, $\hat{\mu } :=\sum _{i=1}^n c_i X_i$, outperforms the straight average $\overline{\mu } := \frac{1}{n} \sum _{i=1}^n X_i$. Standardly, the quality of an estimate is assessed by its mean square error (MSE) which can be calculated as

(1) \begin{eqnarray} \text{MSE}(\hat{\mu }) \, := \, \mathbb {E}[(\hat{\mu } - \mu )^2] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] \nonumber \\ [6pt]&=& \sum _{i=1}^n c_i^2 \, \mathbb {E}\left[X_i^2 \right] + \sum _{i=1}^n \sum _{j \ne i} c_i c_j \, \mathbb {E}[X_i] \, \mathbb {E}[X_j] \nonumber \\ [6pt]&=& \sum _{i=1}^n c_i^2 \, \sigma _i^2 \end{eqnarray}

which is minimized by the following assignment of the ci (cf.  Lehrer and Wagner Reference Lehrer and Wagner1981: 139):

(2) \begin{equation} c_i^* = \left( \sum _{j=1}^n \frac{\sigma _i^2 }{\sigma _j^2} \right)^{-1}. \end{equation}

Thus, naming the c*i as the ‘optimal weights’ is motivated by two independent theoretical reasons:

  1. 1. As argued above, for independent and unbiased estimates Xi with variance σ2i, mean square error of the overall estimate is minimized by the convex combination X = ∑ic*iXi. Thus, for a standard loss function, the c*i are indeed the optimal weights.

  2. 2. Even when the square loss function is replaced by a more realistic alternative (Hartmann and Sprenger Reference Hartmann and Sprenger2010), the c*i can still define the optimal convex combination of individual estimates. In that case, we require stronger distributional assumptions.Footnote 2

The problem with these optimal weights is that each agent’s individual expertise would have to be known in order to calculate them. Given all the biases that actual deliberation is loaded with, e.g. ascription of expertise due to professional reputation, age or gender, or bandwagon effects, it is unlikely that the agents succeed at unravelling the expertise of all other group members (cf.  Nadeau et al. Reference Nadeau, Cloutier and Guay1993; Armstrong Reference Armstrong and Armstrong2001).

Therefore, we widen the scope of our inquiry:

Question: Under which conditions will differentially weighted group judgements outperform the straight average?

A first answer is given by the following result where the differential weights preserve the expertise ranking:

Theorem 1 (First Baseline Result).Let c 1, . . ., cn > 0 be the weights of the individual group members, that is, ∑ni = 1ci = 1. Without loss of generality, let c 1 ⩽ . . . ⩽ cn. Further assume that for all i > j:

(3) \begin{equation} 1 \le \frac{c_i}{c_j} \le \frac{c_i^*}{c_j^*} \end{equation}

Then the differentially weighted estimator $\hat{\mu } := \sum _{i=1}^n c_i X_i$outperforms the straight average. That is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$, with equality if and only if ci = 1/n for all 1 ⩽ in.

This result demonstrates that relative accuracy, as measured by pairwise expertise ratios, is a good guiding principle for group judgements as long as the relative weights are not too extreme.

The following result extends this finding to a case where the benefits of differential weighting are harder to anticipate: we allow the ci to lie in the entire [1/n, c*i] (or [c*i, 1/n]) interval, allowing for cases where the ranking of the group members is not represented correctly. One might conjecture that this phenomenon adversely affects performance, but this is not the case:

Theorem 2 (Second Baseline Result).Let c 1. . .cn ∈ [0, 1] such thatni = 1ci = 1. In addition, let $c_i\in [\frac{1}{n};c_i^*]$respectively $c_i \in [c_i^*;\frac{1}{n}]$hold for all 1 ⩽ in. Then the differentially weighted estimator $\hat{\mu } := \sum _{i=1}^n c_i X_i$outperforms the straight average. That is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$, with equality if and only if ci = 1/n for all 1 ⩽ in.

Note that none of the baseline results implies the other one. The conditions of the second result can be satisfied even when the ranking of the group members differs from their actual expertise, and a violation of the second condition (e.g. c*i = 1/n and ci = 1/n + ε) is compatible with satisfaction of the first condition. So the two results are really complementary.

We have thus shown that differential weighting outperforms straight averaging under quite general constraints on the individual weights, motivating the efforts to recognize experts in practice. The next sections extend these results to the presence of correlation and bias, thereby transferring them to more realistic circumstances.

3. BIASED AGENTS

The first extension of our model concerns biased estimates Xi, that is, estimates that do not centre around the true value μ = 0, but around Bi ≠ 0. We still assume that agents are honestly interested in getting close to the truth, but that training, experience, risk attitude or personality structure bias their estimates into a certain direction. For example, in assessing the impact of industrial development on a natural habitat, an environmentalist will usually come up with an estimate that significantly differs from the estimate submitted by an employee of an involved corporation – even if both are intellectually honest and share the same information.

For a biased agent i, the competence/precision parameter σ2i has to be re-interpreted: it should be understood as the coherence (or non-randomness) of the agent’s estimates instead of the accuracy. This value is indicative of accuracy only if the bias Bi is relatively small.

Under these circumstances, we can identify an intuitive sufficient condition for differential weighting to outperform straight averaging.

Theorem 3.Let X 1, . . ., Xn be random variables with bias B 1, . . ., Bn.

(a) Suppose that the ci in the estimator $\hat{\mu }= \sum _{i=1}^n c_i X_i$satisfy one of the conditions of the baseline results (i.e. either 1 ⩽ ci/cjc*i/c*jor ci ∈ [1/n, c*i] respectively ci ∈ [c*i, 1/n]). In addition, let the following inequality hold:

(4) \begin{equation} \left(\sum _{i=1}^{n} c_iB_i\right)^2 < \left(\sum _{i=1}^{n} \frac{1}{n}B_i\right)^2 \end{equation}

Then differential weighting outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) < \text{MSE}(\overline{\mu })$.

(b) Suppose the following inequality holds:

(5) \begin{equation} \left(\sum _{i=1}^{n} c_i B_i\right)^2>\left(\sum _{i=1}^{n}\frac{1}{n}B_i\right)^2 + \frac{1}{n^2}\sum _{i=1}^{n}\sigma ^2_i \end{equation}

Then differential weighting does worse than straight averaging if condition (b) holds, that is, $\text{MSE}(\hat{\mu }) > \text{MSE}(\overline{\mu })$.

Intuitively, condition (4) states that the differentially weighted bias is smaller or equal than the average bias. As one would expect, this property favourably affects the performance of the differentially weighted estimator. Condition (5) states, on the other hand, that if the difference between the mean square biases of the weighted and the straight average exceeds the mean variance of the agents, then straight averaging performs better than weighted averaging.

When the group size grows to a very large number, both parts of Theorem 3 collapse into a single condition, as long as the biases and variances are both bounded. This is quite obvious since the last term of (5) is of the order $\mathcal {O}(1/n)$. Theorem 3 applies in particular in the case where agents are biased into the same direction and less biased agents make more coherent estimates (that is, with smaller variance):

Corollary 1.Let X 1, . . ., Xn, be random variables with bias B 1, . . ., Bn ⩾ 0 such that cicj implies BiBj (or vice versa for B 1, . . ., Bn ⩽ 0). Then, with the same definitions as above:

  • $\text{MSE}(\overline{\mu }) \ge \text{MSE}(\hat{\mu })$.

  • If there is a uniform group bias, that is, BB 1 = . . . = Bn, then $\text{MSE}(\overline{\mu })-\text{MSE}(\hat{\mu })$is independent of B.

So even if all agents have followed the same training, or have been raised in the same ideological framework, expertise recognition does not multiply that bias, but helps to increase the accuracy of the group’s judgement. In particular, if there is a uniform bias in the group, the relative advantage of differential weighting is independent of the size of the bias. All in all, these results demonstrate the importance of expertise recognition even in groups where the members share a joint bias – a finding that is especially relevant for practice.

4. INDEPENDENCE VIOLATIONS

We turn to violations of independence between the group members. Consider first the following fact that compares two groups with different degrees of correlation:

Fact 1.If $0\le \mathbb {E}\left[ X_i X_j \right]\le \mathbb {E}\left[ Y_i Y_j \right] \; \forall i\ne j \le n$and $\mathbb {E}[X_i^2]=\mathbb {E}[X_j^2]$, then both straight averaging and weighted averaging on Xi yield a lower mean square error than the same procedures applied to Yi.

Fact 1 shows that less correlated groups perform better, ceteris paribus. For practical purposes, this suggests that heterogeneity of a group is an epistemic virtue since strong correlations between the agents are less likely to occur, making the overall result more accurate (cf. Page Reference Page2007).

Regarding the comparison of straight and weighted averaging, we can show the following result:

Theorem 4.Let X 1, . . ., Xn be unbiased estimators, that is, $\mathbb {E}[X_i]=\mu =0$, and let the ci satisfy the conditions of one of the baseline results, with $\hat{\mu }$defined as before. Let I⊆{1, . . ., n} be a subset of the group members with the property

(6) \begin{equation} \forall i, j \ne k \in I: c_i\ge c_j \; \Rightarrow \; \mathbb {E}[X_j X_k] \ge \mathbb {E}[X_i X_k] \ge 0. \end{equation}

(i) Correlation vs. Expertise. If I = {1, . . ., n}, then weighted averaging outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$.

(ii) Correlated Subgroup. Assume that $\mathbb {E}\left[ X_i X_j \right] = 0$if iI or jI, and that

(7) \begin{equation} \frac{1}{|I|} \sum _{i\in I} c_i \le \frac{1}{n} \sum _{i=1}^n c_i. \end{equation}

Then weighted averaging still outperforms straight averaging, that is, $\text{MSE}(\hat{\mu }) \le \text{MSE}(\overline{\mu })$.

To fully understand this theorem, we have to clarify the meaning of condition (6). Basically, it says that in group I, experts are less correlated with other (sub)group members than non-experts.Footnote 3

Once we have understood this condition, the rest is straightforward. Part (i) states that if I equals the entire group, then differential weighting has an edge over averaging. That is, the benefits of expertise recognition are not offset by the perturbations that mutual dependencies may introduce. Arguably, the generality of the result is surprising since condition (6) is quite weak. Part (ii) states that differential weighting is also superior whenever there is no correlation with the rest of the group, and as long as the average competence in the subgroup is lower than the overall average competence (see equation (7)).

It is a popular opinion (e.g. Surowiecki Reference Surowiecki2004) that correlation of individual judgements is one of the greatest dangers for relying on experts in a group. To some extent, this opinion is vindicated by Fact 1 in our model. However, expertise-informed group judgements may still be superior to composite judgements, as demonstrated by Theorem 4. The interplay of correlation and expertise is subtle and not amenable to broad-brush generalizations.

5. OVER- AND UNDERCONFIDENCE

We now consider a specific family of ci’ in order to study how group members’ self-assessment in terms of quality affects group performance as a whole, modelled again as unbiased estimates Xi with variance σ2i.

Suppose that the group members have some idea of their own competence. That is, they are able to position themselves in relation to a commonly known benchmark: they are able to assess how much better or worse they expect themselves to perform compared with a default agent, modelled as an unbiased random variable with variance s 2. Such a scenario may be plausible when agents have a track record of their performance, or obtain performance feedback. The agents then express how much weight they should ideally get in a group of n − 1 default agents:

(8) \begin{equation} c_i \, = \, \left( 1 + \sum _{j \ne i} \frac{\sigma _i^2}{s^2} \right)^{-1} \, = \frac{s^2}{s^2+(n-1)\sigma _i^2} \end{equation}

Assume further that every agent uses the same benchmark, that these weights also determine to what extent a group member compromises his or her own position, and that decision-making takes place on the basis of the normalized cis. It can then be shown (proof omitted) that the differentially weighted estimator $\hat{\mu }$ defined by equation (8) outperforms straight averaging – in fact, this is entailed by the Second Baseline Result (Theorem 2).

Here, we want to study how over- and underestimating the competence of a ‘default agent’ will affect group performance. Is it always epistemically detrimental when the agents misguess the group competence?

The answer is, perhaps surprisingly, no. To explain this result, we first observe that the less confidence we have in the group (=s 2 is large), the more does the weighted average resemble the straight average. Recalling equation (8), we note that all ci will be very close to 1. This implies that the expertise-informed average will roughly behave like the straight average.

Conversely, if the group is perceived as competent (=small value of s), then the ci will typically not be close to 1 such that differential weights will diverge significantly from the straight average. This intuitive insight leads to the following theorem:

Theorem 5.Let $\hat{\mu }_{s^2}$and $\hat{\mu }_{{\tilde{s}}^2}$be two weighted, expertise-informed estimates of μ, defined according to equation (8) with benchmarks s 2and ${\tilde{s}}^2$, respectively. Then $\text{MSE}(\hat{\mu }_{s^2}) \le \text{MSE}(\hat{\mu }_{{\tilde{s}}^2})$if and only if $s^2 \le {\tilde{s}}^2$.

It can also be shown (proof omitted) that this procedure approximates the optimal weights c*i if the perceived group competence approaches perfection, that is, s → 0. In other words, as long as the group members judge themselves accurately, optimism with regard to the abilities of the other group members is epistemically favourable. On the other hand, overconfidence in one’s own abilities relative to the group typically deteriorates performance.

6. DISCUSSION

We have set up an estimation model of group decision-making in order to study the effects of individual expertise on the quality of a group judgement. We have shown that, in general, taking into account relative accuracy positively affects the epistemic performance of groups. Translated into our statistical model, this means that differential weighting outperforms straight averaging, even if the ranking of the experts is not represented accurately.

The result remains stable over several representative extensions of the model, such as various forms of bias, violations of independence, and over- and underconfident agents (Theorems 3–5). In particular, we demonstrated that differential weighting is superior (i) if experts are, on average, less biased; (ii) for a group of uniformly biased agents; (iii) if experts are less correlated with the rest of the group than other members. We also showed that uniform overconfidence in one’s own abilities is detrimental for group performance whereas (over)confidence in the group may be beneficial. These properties may be surprising and demonstrate the stability and robustness of expertise-informed judgements, implying that the benefits of recognizing experts may offset the practical problems linked with that process.

Our model can in principle also be used for describing how groups actually form judgements. In that case, the involved tasks should neither be too intellective (that is, there is a demonstrable solution) or too judgemental (Laughlin and Ellis Reference Laughlin and Ellis1986): in highly intellective tasks, group will typically not perform better than the best individual (=the one who has solved the task correctly). This differs from our model where any agent has only partial knowledge of the truth. On the other hand, if the task is too judgemental, any epistemic component will be removed and the individual weights may actually be based on the centrality of a judgement, such as in Hinsz’s (Reference Hinsz1999) SDS-Q scheme.

Finally, we name some distinctive traits of our model. First, unlike other models of group judgements that are detached from the group members’ individual abilities (Davis Reference Davis1973; DeGroot Reference DeGroot1974; Lehrer and Wagner Reference Lehrer and Wagner1981; Hinsz Reference Hinsz1999), it is a genuinely epistemic model, evaluating the performance of different ways of making a group judgement.Footnote 4 Thus, our model can be used normatively, for supporting the use of differential weights in group decisions, but also descriptively, for fitting the results of group decision processes.

Second, we did not make any specific distributional assumptions on how the agents estimate the target value. Our assumptions merely concern the first and second moment (bias and variance). We consider this parsimony a prudent choice because those distributions will greatly vary in practice, and we do not have epistemic access to them. Classical work in the social combination literature makes much more specific distributional assumptions (e.g. the multinomial distributions in Thomas and Fink (Reference Thomas and Fink1961) and Davis (Reference Davis1973)), restricting the scope of that analysis.

Third, we are not aware of other analytical models that take into account important confounders such as correlation, bias and over-/underconfident agents. Thus, we conclude that our model makes a substantial contribution to understanding the epistemic benefits of expertise in group judgements.

ACKNOWLEDGEMENTS

Dominik Klein thanks the Netherlands Organisation for Scientific Research (NWO) for supporting his research through Vidi grant #no. 016.094.345, held by Eric Pacuit. Jan Sprenger thanks the Netherlands Organisation for Scientific Research (NWO) for supporting his research through Veni grant #no. 016.104.079 and Vidi grant #no. 016.144.342.

APPENDIX. PROOF OF THE THEOREMS

We will need the following inequalities repeatedly in the subsequent proofs. Let c 1, . . ., cn > 0. Then

(9) \begin{equation} \sum _{i=1}^n\frac{1}{c_i}\ge \frac{n^2}{\sum _{i=1}^n c_i} \end{equation}

with equality if and only if c 1 = . . . = cn. Moreover

(10) \begin{equation} n\sum _{i=1}^n c_i^2 \ge \left(\sum _{i=1}^nc_i\right)^2 \end{equation}

again with equality if and only if c 1 = . . . = cn. Both inequalities are special cases of the Power Mean Theorem (cf.  Wilf Reference Wilf1985: 258).

For the First Baseline Result, we need the following

Lemma 1.Let k < n and let (c 1, . . ., cn) be a sequence such that

  1. (1)ni = 1ci = s for some s > 0 and all ci are positive;

  2. (2) c 1 = . . . = ck and c k + 1 = . . . = cn;

  3. (3) ckc k + 1and $1\le \frac{c_{k+1}}{c_k}\le \frac{c_{k+1}^*}{c_k^*}$.

Further assume that σ1 ⩾ . . . ⩾ σn. Then

\begin{equation*} \sum _{i=1}^n \left(\frac{s}{n}\right)^2\sigma _i \ge \sum _{i=1}^n c_i^2\sigma _i \end{equation*}

Furthermore, we show that under the above conditions (i.e.ni = 1ci = s), the value of the sumni = 1ci 2σidecreases as the quotient $\frac{c_{k+1}}{c_k}$increases.

Proof of Lemma 1. Fix r such that

  • $c_i=\frac{s}{n}-\frac{r}{k}$ for ik

  • $c_i=\frac{s}{n}+\frac{r}{n-k}$ for i > k

Then we have to show that:

\begin{equation*} \sum _{i\le k}\left(\frac{s}{n}-\frac{r}{k}\right)^2\sigma _i+\sum _{i>k}\left(\frac{s}{n}+\frac{r}{n-k}\right)^2\sigma _i-\sum _{i=1}^n \left(\frac{s}{n}\right)^2\sigma _i\le 0 \end{equation*}

The above equation reduces to:

(11) \begin{equation} r^2\left(\sum _{i\le k}\frac{1}{k^2}\sigma _i+\sum _{i>k}\frac{1}{(n-k)^2}\sigma _i\right)-\frac{2s}{n}r\left(\sum _{i\le k}\frac{1}{k}\sigma _i-\sum _{i>k}\frac{1}{n-k}\sigma _i\right) \le 0 \end{equation}

Now the left hand side of the above equation is a quadratic function in r with zeros at 0 and

(12) \begin{equation} r_0=\frac{2s}{n}\frac{\sum _{i\le k}\frac{1}{k}\sigma _i-\sum _{i>k}\frac{1}{n-k}\sigma _i}{\sum _{i\le k}\frac{1}{k^2}\sigma _i+\sum _{i>k}\frac{1}{(n-k)^2}\sigma _i} \end{equation}

Since the σi are ordered decreasingly we get

\begin{equation*} r_0\ge \frac{2s}{n}\frac{\sum _{i\le k}\frac{1}{k}\sigma _i-\sigma _{k+1}}{\sum _{i\le k}\frac{1}{k^2}\sigma _i+\frac{1}{(n-k)}\sigma _{k+1}} \end{equation*}

Now this is a function of the form $\frac{kx-a}{x+b}$ with a, b > 0. Since these functions are increasing for x > −b, the inequality above can be strengthened to

\begin{equation*} r_0\ge \frac{2s}{n}\frac{\sigma _k-\sigma _{k+1}}{\frac{1}{k}\sigma _k+\frac{1}{(n-k)}\sigma _{k+1}} \end{equation*}

Recall that $\frac{c_{k+1}}{c_k}\le \frac{c_{k+1}^*}{c_k^*}=\frac{\sigma _k}{\sigma _{k+1}}=:\sigma$. Inserting this transforms the above equation into:

\begin{equation*} r_0\ge \frac{2s}{n}\frac{(\sigma -1)\sigma _{k+1}k(n-k)}{\sigma _{k+1}((n-k)\sigma +k)} \end{equation*}

Our assumptions about the ci translate into

\begin{equation*} \frac{\frac{s}{n}+\frac{r}{n-k}}{\frac{s}{n}-\frac{r}{k}}\le \frac{c_{k+1}^*}{c_k^*}=\frac{\sigma _k}{\sigma _{k+1}} \end{equation*}

This transforms to

\begin{equation*} r\le \frac{s}{n}\frac{(\sigma -1)k(n-k)}{(n-k)-\sigma k} \end{equation*}

In particular r < r 0, finishing the proof of (11). For the last statement of Lemma 1, observe that the left hand side of (11) is a quadratic function with minimum $\frac{1}{2}r_0$, and that $r\le \frac{1}{2}r_0$.$\Box$

Proof of Theorem 1. By assumption the ci are ordered increasingly, thus the σi are ordered decreasingly. For a vector of weights $\mathbf {w}\in \mathbb {R}^n$ (i.e. all wi positive and ∑iwi = 1), we denote the mean square error of the estimator ∑wiXi by Ψ(w): That is:

\begin{equation*} \Psi (\mathbf {w}):=\sum w_i^2\sigma _i \end{equation*}

Thus for c = (c 1. . .cn) as in the theorem we have to show Ψ(c) ⩽ Ψ(e), where e is the equal weight vector $(\frac{1}{n},\ldots ,\frac{1}{n})$. To this end we will construct a sequence of weight vectors e = d0, . . ., dn − 1 = c such that:

  1. (i) each di satisfies the assumptions of Theorem 1;

  2. (ii) for di = (d 1. . .dn), there is some $k \in \mathbb {N}$ such that

    • d 1 = . . . = dk and d 1 > c 1; . . .; dk > ck;

    • dj = cj for $k<j\le k+i\hspace{28.45274pt}$(where i is the index of di);

    • d k + i + 1 = . . . = dn and d k + i + 1c k + i + 1; . . .; dncn;

  3. (iii) Ψ(d i − 1) ⩾ Ψ(di).

Thus di − 1 = c and Ψ(c) ⩽ Ψ(e) as desired. The di are constructed inductively as follows: Assume di − 1 = (d1. . .dn) has already been constructed. If i = 1 let k be the unique index such that $c_k<\frac{1}{n}$ and $c_{k+1}\ge \frac{1}{n}$. If i > 1 let k be as in the above conditions for di − 1. First note that if k = 0, then djcj for all j and thus di − 1 = c since both are weight vectors and we are done. Thus assume k ⩾ 1 for the rest of the proof. With a similar argument, we can show that k + i + 1 ⩽ n. Now choose the maximal $r \in \mathbb {R}$ that satisfies

(13) \begin{equation} d^{\prime }_k-c_k \ge \frac{r}{k} \quad c_{k+i+1}-d^{\prime }_{k+i+1} \ge \frac{r}{n-k-i-1} \end{equation}

By the above conditions, r ⩾ 0. Then define di = (d 1, . . ., dn) by:

  • $d_j = d^{\prime }_j-\frac{r}{k}$ for jk;

  • dj = cj for k < jk + i;

  • $d_j = d^{\prime }_j+\frac{r}{n-k-i-1}$ for jk + i + 1.

To see that di satisfies conditions (i)-(iii), first note that since r was chosen to be maximal, one of the two inequalities in (13) has to be an equality. Thus we either have dk = ck or d k + i + 1 = c k + i + 1 and condition (ii) is satisfied. Further note that

\begin{equation*} \sum _{i=1}^n d_i=\sum _{i=1}^n d^{\prime }_i -\sum _{i\le k}+\frac{r}{k}+\sum _{i\ge k+i+1}\frac{r}{n-k-i-1}=1 \end{equation*}

Using that the ci are ordered increasingly, it is easy to see that di satisfies the assumptions of Theorem 1. Furthermore, applying the monotonicity part of Lemma 1 to the set of indices I ≔ {1, . . ., k}∪{i + k + 1, . . ., n}, we get ∑Idiσ2i ⩽ ∑Idiσ2i. Thus Ψ(di) ⩽ Ψ(di − 1) since di − 1 and di coincide outside I. This finishes the proof. $\Box$

Proof of Theorem 2. We would like to show that the mean square error of the straight average $\overline{\mu } := (1/n) \sum _{i=1}^n X_i$ exceeds the mean square error of the weighted estimate $\hat{\mu }$. The MSE difference can be calculated as

\begin{eqnarray*} \Delta (c_1, \ldots , c_n) &:=& \text{MSE}\left( \overline{\mu } \right) - \text{MSE}\left( \overline{\mu } \right) \, = \, \frac{1}{n^2} \sum _{i=1}^n \sigma _i^2 - \sum _{i=1}^n c_i^2 \sigma _i^2 \nonumber \\ &=& \frac{1}{n^2} \left( \sum _{j=1}^n \frac{1}{\sigma _j^2} \right)^{-1} \sum _{i=1}^{n} \frac{1}{c_i^*} \left( 1-n^2c_i^2 \right) \nonumber \end{eqnarray*}

where we have made use of $\mathbb {E}[X_i \, X_j] = 0, \; \forall i \ne j$, and of $c_i^* = ( \sum _{j=1}^n \frac{\sigma _i^2 }{\sigma _j^2} )^{-1}$ (cf. equation (2)). Thus, instead of considering Δ, it suffices to show that

\begin{equation*} \Delta ^{\prime }(c_1\ldots c_n) := \sum _{i=1}^n \frac{1}{c_i^*}\left(1-n^2c_i^2\right) \ge 0. \end{equation*}

To this end, let Ii ≔ [1/n; c*i] (respectively [c*i; 1/n]) and let $\mathcal {Q}:= I_1\times \ldots \times I_n$. Then,

\begin{equation*} \mathcal {D} := \mathcal {Q} \cap \left\lbrace (c_1, \ldots , c_n) | \sum _{i=1}^n c_i =1 \right\rbrace \end{equation*}

defines the ‘domain’ of our theorem, and it is a polygon. Moreover, since $\sum _i \frac{n^2}{c_i^*}c_i^2$ is a positive determinate quadratic form in the ci, we get that Δ′− 1([0; ∞)) is convex. Thus, it suffices to show that Δ′ is positive on the vertices of $\mathcal {D}$. Note that since {x|∑xi = 1} is of dimension n − 1, the vertices of $\mathcal {D}$ are of the form v = (c*1, . . ., c k − 1*, ck, 1/n, . . ., 1/n) – the ordering is assumed for convenience, and ck is defined such that ||v||1 = 1. Thus we have to show that Δ′(c*1, . . ., c k − 1*, ck, 1/n, . . ., 1/n) ⩾ 0.

In the case k = 1, the desired inequality holds trivially since ck = 1 − (n − 1) · (1/n) = 1/n. Thus we assume k > 1 for the remainder of this proof. Let l denote the real number satisfying

\begin{equation*} \sum _{i=1}^n c_i^*=l\frac{k-1}{n} \end{equation*}

Observe that for $c_i=\frac{1}{n}$ the corresponding summands in Δ′ vanish. Thus we have to show that

\begin{equation*} \sum _{i=1}^{k-1} \frac{1}{c_i^*}\left(1-n^2{c_i^*}^2\right)+\frac{1}{c_k^*}\left(1-n^2c_k^2\right)\ge 0 \end{equation*}

Using the definition of l from above and inequality (9) gives $\sum _{i=1}^{k-1}\frac{1}{c_i^*} \ge (k-1)^2/(\sum _{i=1}^{k-1} c_i) \ge \frac{n(k-1)}{l}$. Thus, it suffices to show

(14) \begin{equation} n (k-1) \left( \frac{1}{l} - l \right) + \frac{1}{c_k^*}\left(1-n^2c_k^2\right) \ge 0 \end{equation}

Since the ci add up to one, we can express the dependency between l and ck by

(15) \begin{equation} c_k = \frac{(k-1)(1-l)+1}{n} \quad \text{or by } \quad l = \frac{k-nc_k}{k-1} \end{equation}

Inserting this into (14) gives

\begin{eqnarray*} \Delta ^{\prime }(c_1, \ldots , c_n) &=& \left( \frac{1}{l}-l \right) n(k-1)- \frac{1}{c_k^*} \left( (1-l)^2(k-1)^2+2(1-l)(k-1) \right) \nonumber \\ &=& \frac{k-1}{l}\left[\left(1-l^2\right)n-\frac{l}{c_k^*}\left((1-l)^2(k-1)+2(1-l)\right) \right] \nonumber \\ &=& \frac{k-1}{l}\left[(1-l)\left((1+l)n-\frac{l}{c_k^*}((1-l)(k-1)+2)\right) \right] \nonumber \end{eqnarray*}

Since the first factor is always positive, it suffices to show that the factor in the square brackets, denoted by P(l), is positive for every l that can occur in our setting. We do this by a case distinction on the value of c*k

Case 1.c*k ⩽ 1/n. Noting $c_k \in [c_k^*,\frac{1}{n}]$ and the dependency (15) between l and ck, we have to show that P(l) ⩾ 0 for all $l \in [1;\frac{k-nc_k^*}{k-1}]$. We observe that P is a polynomial of third order with zero points of P given by P(1) = 0 and

\begin{equation*} r_\pm = \frac{k+1-nc_k^*\pm \sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}}{2(k-1)} \end{equation*}

with r + denoting the larger of these two numbers. With some algebra it also follows that P′(1) ⩾ 0 if and only if c*k ⩽ 1/n. From the functional form of P(l) – a polynomial of the third degree with negative leading coefficient – we can then infer that l = 1 must be the middle zero point of P. To prove that P(l) ⩾ 0 in the critical interval, it remains to show that for the rightmost zero point, we have $r_+ \ge \frac{k-nc_k^*}{k-1}$:

\begin{eqnarray*} \begin{array}{lc@{\quad}cl} &\displaystyle\frac{k-nc_k^*}{k-1} &\le & r_+\\[4pt] \Leftrightarrow & \displaystyle\frac{2k-2nc_k^*}{2(k-1)} & \le & \displaystyle\frac{k+1-c_k^*n+\sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}}{2(k-1)}\\[12pt] \Leftrightarrow & k-1-nc_k^* &\le & \sqrt{(k+1-nc_k^*)^2-4(k-1)c_k^*n}\\[4pt] \Leftrightarrow & c_k^*n &\le & 1 \end{array} \end{eqnarray*}

completing the proof for the case ck* ⩽ 1/n.

Case 2.ck* ⩾ 1/n. In this case we are dealing with the interval $l \in [\frac{k-nc_k^*}{k-1};1]$. The same calculations as above yield

\begin{equation*} \frac{k-nc_k^*}{k-1} \ge r_+ \qquad \text{if and only if} \qquad c_k^*n \ge 1. \end{equation*}

in particular r + < 1. Thus l always lies between the middle and the rightmost zero point of P(l), and in particular, P(l) ⩾ 0 for all $l \in [\frac{k-nc_k^*}{k-1};1]$.

Proof of Theorem 3. Let the Xi centre around Bi > 0. Then $\mathbb {E}[X_i - B_i] =0$, and we observe

\begin{eqnarray*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] +\left(\frac{1}{n}\sum _{i=1}^n B_i\right)^2 \end{eqnarray*}

Analogously, we obtain

\begin{eqnarray*} \mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] + \left( \sum _{i=1}^n c_i B_i\right)^2. \end{eqnarray*}

Like in Theorem 2, we define $\Delta (c_1, \ldots , c_n) := \text{MSE}(\overline{\mu }) - \text{MSE}(\hat{\mu })$ as the difference in mean square error between both estimates and show that Δ(c 1, . . ., cn) ⩾ 0 if equation (4) is satisfied.

(16) \begin{eqnarray} \Delta (c_1, \ldots , c_n) &:=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] - \mathbb {E}\left[ \left(\sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] \nonumber \\ [6pt]&+& \left( \frac{1}{n}\sum _{i=1}^n B_i \right)^2 - \left( \sum _{i=1}^n c_i B_i\right)^2 \end{eqnarray}

By Theorem 1 and/or Theorem 2, the first line is greater or equal to zero, and by equation (4), the second line is also non-negative. Thus Δ(c 1, . . ., cn) ⩾ 0, showing the superiority of differential weighting.

For the second part of the theorem, we just observe that

\begin{equation*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B_i) \right)^2 \right] - \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B_i) \right)^2 \right] \ge \frac{1}{n^2}\sum _{i=1}^{n}\sigma ^2_i. \end{equation*}

$\Box$

Proof of Corollary 1. It is easy to see that the conditions of the corollary satisfy the requirements of part (a) of Theorem 3. This yields the desired result for the first part of the theorem. For the second, part, let the Xi all centre around B ≠ 0. Then XiB is unbiased, and we observe

\begin{eqnarray*} \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B) \right)^2 \right] + B^2 \\ [6pt]\mathbb {E}\left[ \left( \sum _{i=1}^n c_i X_i \right)^2 \right] &=& \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B) \right)^2 \right] + B^2. \end{eqnarray*}

Therefore, under the conditions of the theorem,

\begin{eqnarray*} \Delta (c_1, \ldots , c_n) &=& \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n (X_i-B) \right)^2 \right] - \mathbb {E}\left[ \left( \sum _{i=1}^n c_i (X_i-B) \right)^2 \right] \nonumber \\ \end{eqnarray*}

showing that Δ only depends on the centred estimates.$\Box$

Proof of Fact 1. First we deal with straight averaging:

\begin{eqnarray*} &&\mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n X_i \right)^2 \right] - \mathbb {E}\left[ \left( \frac{1}{n} \sum _{i=1}^n Y_i \right)^2 \right] \nonumber\\ &&\quad = \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_i X_j \right] - \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ Y_i Y_j \right] \, \ge \, 0 \end{eqnarray*}

The proof exploits that Xi and Yi have the same variance, thus $\mathbb {E}\left[ X_i^2\right]=\mathbb {E}\left[ Y_i^2 \right]$. The proof for differential weights is similar, making use of the fact that the ci are the same for Xi and Yi because they only depend on the variance of the random variable.$\Box$

Proof of Theorem 4, part (i). First, assume without loss of generality that cic i + 1 for all i < n. Thus, our assumption on the $\mathbb {E}[X_i X_j]$ reduces to $\mathbb {E}[X_iX_k]\le \mathbb {E}[X_jX_k]$ for ijk. First, we show the theorem under the assumption that all $\mathbb {E}[X_i X_j]$ with ij are equal, say $\mathbb {E}[X_i X_j]=\gamma$. By Theorem 1 and/or 2, it suffices to show that

\begin{equation*} \frac{1}{n^2} \sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_i,X_j \right] - \sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]\ge 0 \end{equation*}

Inserting $\mathbb {E}[X_i X_j]=\gamma$ this reduces to

(17) \begin{equation} \gamma \cdot \left(\frac{n-1}{n} - \sum _{i=1}^n \sum _{j\ne i} c_i \, c_j \, \right) \ge 0 \end{equation}

The point (1/n, . . ., 1/n) is a global minimum of the function f(x) = ∑ix 2i under the constraints x 1, . . ., xn ⩾ 0 and ∑ixi = 1. Thus we have

(18) \begin{equation} \frac{1}{n}=f\left(\frac{1}{n},\ldots ,\frac{1}{n}\right) \le f\left(\mathbf {c}\right) = \sum _{i=1}^n {c_i}^2 \end{equation}

Observing ∑ni = 1j = 1nci cj = (∑ni = 1ci)2 = 1 and combining this equality with (17) and (18), we obtain

(19) \begin{eqnarray} \frac{n-1}{n} - \sum _{i=1}^n \sum _{j\ne i} c_i \, c_j &=& \frac{n-1}{n} - \sum _{i=1}^n \sum _{j=1}^n c_i \, c_j + \sum _{i=1}^n {c_i}^2 \ge 0 \end{eqnarray}

thus proving the statement in the case that all $\mathbb {E}[X_i X_j]$ are the same.

For the general case let us assume that not all ci are the same (otherwise the theorem is trivially true). Thus we either have c 1 > c n − 1 or c 2 > cn since the ci are ordered decreasingly. In the following, we assume c 2 > cn, the other case works with a similar argument. First observe that

\begin{equation*} \sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]=2\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]. \end{equation*}

Thus, we can concentrate on $\lbrace \mathbb {E}[X_i X_j]|i>j\rbrace$. We fix a natural number c and let Sc be the set of all vectors $(\mathbb {E}[X_i X_j])_{(i>j)}$ fulfilling the conditions of our theorem and $\sum _{i> j}\mathbb {E}[X_iX_j]=c$ We then consider the functional

\begin{eqnarray*} \tilde{\varphi }(e) &:=& \frac{1}{n^2} \displaystyle\sum _{i=1}^n \sum _{j<i} \mathbb {E}\left[ X_iX_j \right] - \,\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right] \\ &=& \displaystyle\frac{1}{2}\left[\sum _{i=1}^n \sum _{j\ne i} \mathbb {E}\left[ X_iX_j \right] - \,\sum _{i=1}^n \sum _{j\ne i} c_ic_j \, \mathbb {E}\left[ X_iX_j \right]\right] \end{eqnarray*}

on Sc. Observe that every Sc contains exactly one point eeq where all $\mathbb {E}[X_i X_j]$ are equal. By the first part of this proof, $\tilde{\varphi }(e_{eq})$ is non-negative. Thus, it suffices to show that eeq is an absolute minimum of $\tilde{\varphi }$ on Sc. First, observe that the value of $\frac{1}{n^2} \sum _{i=1}^n \sum _{j<i}\mathbb {E}\left[ X_i,X_j \right]$ is constantly $\frac{c}{n^2}$ on Sc, thus it suffices to show that

(20) \begin{equation} \varphi (e):=\sum _{i=1}^n \sum _{j<i} c_ic_j \, \mathbb {E}\left[ X_i X_j \right] \end{equation}

attains its maximum on Sc in eeq.

To do so, we show the following: For every eSc with eeeq there is some e′ ∈ Sc with φ(e′) > φ(e). In particular, φ does not take its maximum on Sc in e. Thus assume that $e=(\mathbb {E}[X_i X_j])_{(i> j)}\in S_c$ is given. Since eeeq there are some indices s > t and k > l such that $\mathbb {E}[X_s X_t]\ne \mathbb {E}[X_k X_l]$. Furthermore, we can assume that tl. Without loss of generality (by potentially replacing one of the two entries with $\mathbb {E}[X_sX_l]$) we can assume that either s = k or t = l. In the following we assume s = k, the other case works similar. The idea of the following construction is: We show that moving towards a more equal distribution of the entries $\mathbb {E}[X_iX_j]$ increases φ(e). In particular, we construct $e^{\prime }=(\mathbb {E}^{\prime }[X_i X_j])_{(i> j)}\in S_c$ as follows: In every row $r_i:=\left\langle \mathbb {E}[X_i X_1]\ldots \mathbb {E}[X_i X_{i-1}]\right\rangle$ of e we replace all the entries of this row by their arithmetic mean. Formally, that is for all i and j (independent of j):

\begin{equation*} \mathbb {E}^{\prime }[X_i X_j]=\frac{1}{i-1}\sum _{l<i}\mathbb {E}[X_i X_l] \end{equation*}

Trivially this operation satisfies for all i:

\begin{equation*} \sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{j<i}\frac{1}{i-1}\sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{j=1}^{i-1}\mathbb {E}^{\prime }[X_i X_j] \end{equation*}

and thus also for the double sum:

\begin{equation*} \sum _{i=1}^n\sum _{j<i}\mathbb {E}[X_i X_j]=\sum _{i=1}^n\sum _{j<i}\mathbb {E}^{\prime }[X_i X_j]. \end{equation*}

In particular e′ is in Sc. Furthermore, we have assumed that the ci are ordered decreasingly. Recall that ck > cj implies $\mathbb {E}[X_i X_k]\le \mathbb {E}[X_i X_j]$ by assumption, therefore the rows ri were ordered increasingly, and thus the rows of e′ − e:

\begin{equation*} \mathbb {E}^{\prime }[X_i,X_1]-\mathbb {E}[X_iX_1];\ldots ;\mathbb {E}^{\prime }[X_i,X_{i-1}]-\mathbb {E}[X_iX_{i-1}] \end{equation*}

are ordered decreasingly (since the rows of e′ are constant). In particular, we have for any i:

(21) \begin{equation} 0=\sum _{j<i}\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\le \sum _{j<i}c_ic_j(\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]) \end{equation}

where the ⩽ comes from the fact that both cj and $\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]$ are decreasing in j. Summing that up over all i we get that

\begin{equation*} 0=\sum _{i=1}^n\sum _{j<i}\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\le \sum _{i=1}^n\sum _{j<i}c_ic_j\left(\mathbb {E}^{\prime }[X_iX_j]-\mathbb {E}[X_iX_j]\right)=\varphi (e^{\prime })-\varphi (e) \end{equation*}

Thus we have φ(e′) ⩾ φ(e) as desired. Now observe that (21) for i = s is the following:

\begin{equation*} \begin{array}{rcl} 0&=&\sum _{j<s}\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\\ &=&\sum _{j<s,j\ne t,\;l}\left(\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\right)+\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t]+\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l] \end{array} \end{equation*}

with both,

\begin{equation*} \sum _{j<s,j\ne t,\;l}\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\le \sum _{j<s,j\ne t,\;l}c_sc_j\left(\mathbb {E}^{\prime }[X_sX_j]-\mathbb {E}[X_sX_j]\right) \end{equation*}

and

\begin{equation*} \begin{array}{ll} &\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t]+\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l]\\ &\le c_sc_t(\mathbb {E}^{\prime }[X_sX_t]-\mathbb {E}[X_sX_t])+c_sc_l(\mathbb {E}^{\prime }[X_sX_l]-\mathbb {E}[X_sX_l]). \end{array} \end{equation*}

By construction we have $\mathbb {E}[X_s X_t]\ne \mathbb {E}[X_s X_l]$, thus we would have a strict inequality in the last summand (and thus in the entire sum) if we knew that ctcl. Unfortunately, this is not always the case. However, we have put ourselves in a situation where applying the same construction again with $\mathbb {E}^{\prime }[X_2X_1]$ and $\mathbb {E}^{\prime }[X_nX_1]$ replacing $\mathbb {E}[X_sX_t]$ and $\mathbb {E}[X_sX_l]$ yields the desired (since we have assumed that c 2 > cn). To see this, observe that

  • $\mathbb {E}[X_2X_1]=\mathbb {E}^{\prime }[X_2X_1]$ by construction

  • $\mathbb {E}^{\prime }[X_sX_1]>\mathbb {E}[X_sX_1]$ since $\mathbb {E}[X_sX_t]\ne \mathbb {E}[X_s,X_l]$ and $\mathbb {E}[X_sX_1]$ is the minimal element in the row rs

  • $\mathbb {E}[X_2X_1]\le \mathbb {E}[X_sX_1]$ by assumption

Thus we have

\begin{equation*} \mathbb {E}^{\prime }[X_2X_1]=\mathbb {E}[X_2X_1]\le \mathbb {E}[X_sX_1]<\mathbb {E}^{\prime }[X_sX_1]\le \mathbb {E}^{\prime }[X_nX_1] \end{equation*}

By assumption we have c 2 > cn and repeating the construction from above with coloumns replacing rows and $\mathbb {E}^{\prime }[X_2,X_1],\mathbb {E}^{\prime }[X_n,X_1]$ as the two reference points yields the desired.

Proof of Theorem 4, part (ii). We have to show that the statement holds if all $\mathbb {E}[X_i X_j]$ with ijI are the same. The step from this case to the general statement works as in the proof above. As in the proof of (i), it suffices to show that

\begin{equation*} \frac{1}{n^2} \sum _{i \in I} \sum _{j\ne i \in I} 1 \ge \,\sum _{i \in I} \sum _{j\ne i \in I} c_ic_j \end{equation*}

Let $\overline{c}=\frac{1}{|I|}\sum _{i\in I}c_i$. By equation (10) we have

\begin{equation*} \sum _{i\in I}c_i^2\ge \frac{1}{|I|}\left(\sum _{i\in I}c_i\right)^2=\frac{1}{|I|}|I|^2\overline{c}^2=|I|\overline{c}^2 \end{equation*}

thus

\begin{equation*} \begin{array}{lc} \sum _{i \in I} \sum _{j\ne i \in I} c_ic_j&\le (|I|^2-|I|)\overline{c}^2 \le |I|^2-|I|=\frac{1}{n^2}\sum _{i \in I} \sum _{j\ne i \in I} 1 \end{array} \end{equation*}

with the last inequality coming from our assumption that $\overline{c}<1$.

Proof of Theorem 5. Let the benchmark agent have standard deviation s > 0, that is, variance s 2. We will show that Δ(s, σ1, . . ., σn) – the MSE difference between the differentially weighted and the straight average – is strictly monotonically decreasing in the first argument. To this effect, we calculate

\begin{equation*} \Delta (s,\sigma _1,\ldots ,\sigma _n)=\frac{1}{n^2} \sum _{i=1}^n \sigma _i^2 - \left( \frac{1}{\sum _k c_k} \right)^2 \,\sum _{i=1}^n c_i^2 \sigma _i^2. \end{equation*}

Now we show that $\frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)\le 0$, where ci denotes (∂/∂)sci:

\begin{equation*} \begin{array}{rcl} \frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)&=&-\frac{\partial }{\partial {}s} \left( \sum _{i=1}^n \frac{c_i^2}{(\sum _kc_k)^2}\sigma _i^2 \right)\\[4pt] &=& - \sum _{i=1}^n\sigma _i^2 \cdot 2\cdot \left(\frac{c_i}{\sum _kc_k}\right)\frac{c_i^{\prime }\sum _jc_j-c_i\sum _j c_j^{\prime }}{(\sum _kc_k)^2}\\[6pt] &=& - \frac{2}{(\sum _kc_k)^3}\sum _{i=1}^n \sigma _i^2 c_i \left( \sum _{j \ne i} c_i^{\prime }c_j-c_ic_j^{\prime } \right)\\[6pt] &=& - \frac{2}{(\sum _kc_k)^3}\sum _{i=1}^n \sum _{j<i}\left(\sigma _i^2 c_i-\sigma _j^2c_j\right)\left(c_i^{\prime }c_j-c_i c_j^{\prime }\right) \end{array} \end{equation*}

Since we are only interested in the sign of the first derivative and $-\frac{2}{(\sum _kc_k)^3}<0$, it suffices to show that:

(22) \begin{equation} \left(\sigma _i^2 c_i-\sigma _j^2c_j\right)\left(c_i^{\prime }c_j-c_j^{\prime }c_i\right)\ge 0 \end{equation}

We show that the terms in both brackets have the same sign.

For the first bracket we have:

\begin{eqnarray*} \sigma _i^2 c_i-\sigma _j^2c_j &=& s^2 \frac{\sigma _i^2}{s^2+(n-1)\sigma _i^2} - s^2 \frac{\sigma _j^2}{s^2+(n-1)\sigma _j^2}\\ &=& s^4 \frac{\sigma _i^2 - \sigma _j^2}{(s^2+(n-1)\sigma _i^2)(s^2+(n-1)\sigma _j^2)} \end{eqnarray*}

which is larger than or equal to 0 if and only if σ2i > σj2. Similarly, we observe for the second bracket that

\begin{equation*} c_i^{\prime } = \frac{2(n-1)s\sigma _i^2}{(s^2+(n-1)\sigma _i^2)^2} \end{equation*}

which allows us to conclude

\begin{eqnarray*} && c_i^{\prime }c_j-c_j^{\prime } c_i \\ &&\quad= \frac{2(n-1)s\sigma _i^2}{(s^2+(n-1)\sigma _i^2)^2} \cdot \frac{s^2}{s^2+(n-1)\sigma _j^2} - \frac{2(n-1)s\sigma _j^2}{(s^2+(n-1)\sigma _j^2)^2} \cdot \frac{s^2}{s^2+(n-1)\sigma _i^2}\\ &&\quad= 2 (n-1) s^5\ \frac{\sigma _i^2 - \sigma _j^2}{(s^2+(n-1)\sigma _i^2)^2 (s^2+(n-1)\sigma _j^2)^2} \end{eqnarray*}

Thus, both factors in (22) have the same sign, implying $\frac{\partial }{\partial s}\Delta (s,\sigma _1,\ldots ,\sigma _n)\le 0$ which is want we wanted to prove.$\Box$

Footnotes

1 Rewriting our results for the general case μ ≠ 0 is just a matter of affine transformation, but comes with some notational baggage. Therefore we focus without loss of generality on μ = 0.

2 Hartmann and Sprenger (Reference Hartmann and Sprenger2010) prove the optimality of the c*i for the case of Normally distributed independent and unbiased estimates with variance σ2i and the loss function family L α(x) = 1 − exp( − x 2/2α2). That paper also contains an elaborate justification for choosing this family of loss functions.

3 Recall that $\mathbb {E}[X_i,X_k] \le \mathbb {E}[X_j,X_k]$ can be rewritten as σij ⩽ ρjkik with ρij defined as the Pearson correlation coefficient $\rho _{ij} := \mathbb {E}[X_i X_j]/\sigma _i \sigma _j$. Also, if cicj then automatically σi ⩽ σj.

4 Lehrer and Wagner also defend their model from a normative point of view, but their arguments for this claim are not particularly persuasive, see e.g. Martini et al. (Reference Martini, Sprenger and Colyvan2013).

References

REFERENCES

Armstrong, J. S. 2001. Combining forecasts. In Principles of Forecasting: A Handbook For Researchers and Practitioners, ed. Armstrong, J. Scott. Norwell, MA: Kluwer Academic.Google Scholar
Bates, J. M. and Granger, C. W. J.. 1969. The combination of forecasts. Operational Research Quarterly 20: 451468.Google Scholar
Baumann, M. R. and Bonner, B. L.. 2004. The effects of variability and expectations on utilization of member expertise and group performance. Organizational Behavior and Human Decision Processes 93: 89101.Google Scholar
Bonner, B. L. 2000. The effects of extroversion on influence in ambiguous group tasks. Small Group Research 31: 225244.CrossRefGoogle Scholar
Bonner, B. L. 2004. Expertise in Group Problem Solving: Recognition, Social Combination, and Performance. Group Dynamics: Theory, Research, and Practice 8: 277290.Google Scholar
Bonner, B. L., Baumann, M. R. and Dalal, R. S.. 2002. The effects of member expertise on group decision-making and performance. Organizational Behavior and Human Decision Processes 88: 719736.Google Scholar
Clemen, R. T. 1989. Combining forecasts: a review and annotated bibliography. International Journal of Forecasting 5: 559583.CrossRefGoogle Scholar
Cooke, R. M. 1991. Experts in Uncertainty. Oxford: Oxford University Press.Google Scholar
Davis, J. H. 1973. Group decision and social interaction: A theory of social decision schemes. Psychological Review 80: 97125.Google Scholar
DeGroot, M. 1974. Reaching a consensus. Journal of the American Statistical Association 69: 118121.Google Scholar
Einhorn, H. J., Hogarth, R. M. and Klempner, E.. 1977. Quality of Group Judgment. Psychological Bulletin 84: 158172.Google Scholar
Elga, A. 2007. Reflection and Disagreement. Noûs 41: 478502.Google Scholar
Gigerenzer, G. and Goldstein, D. G.. 1996. Reasoning the fast and frugal way: models of bounded rationality. Psychological Review 103: 650669.Google Scholar
Hartmann, S. and Sprenger, J.. 2010. The weight of competence under a realistic loss function. The Logic Journal of the IGPL 18: 346352.CrossRefGoogle Scholar
Henry, R. A. 1995. Improving group judgment accuracy: information sharing and determining the best member. Organizational Behavior and Human Decision Processes 62: 190197.CrossRefGoogle Scholar
Hill, G. W. 1982. Group versus individual performance: Are N + 1 heads better than one? Psychological Bulletin 91: 517539.Google Scholar
Hinsz, V. B. 1999. Group decision making with responses of a quantitative nature: the theory of social decision schemes for quantities. Organizational Behavior and Human Decision Processes 80: 2849.Google Scholar
Hogarth, R. M. 1978. A note on aggregating opinions. Organizational Behavior and Human Performance 21: 4046.Google Scholar
Larrick, R. P., Burson, K. A. and Soll, J. B.. 2007. Social comparison and overconfidence: when thinking you’re better than average predicts overconfidence (and when it does not). Organizational Behavior and Human Decision Processes 102: 7694.Google Scholar
Laughlin, P. R. and Ellis, A. L.. 1986. Demonstrability and social combination processes on mathematical intellective tasks. Journal of Experimental Social Psychology 22: 177189.Google Scholar
Lehrer, K. and Wagner, C.. 1981. Rational Consensus in Science and Society: A Philosophical and Mathematical Study, Vol. 21. Berlin: Springer.Google Scholar
Libby, R., Trotman, K. T. and Zimmer, I.. 1987. Member variation, recognition of expertise and group performance. Journal of Applied Psychology 72: 8187.Google Scholar
Lindley, D. V. 1983. Reconciliation of probability distributions. Operations Research 31: 866880.Google Scholar
Littlepage, G. E., Schmidt, G. W., Whisler, E. W. and Frost, A. G.. 1995. An input-process-output analysis of influence and performance in problemsolving groups. Journal of Personality and Social Psychology 69: 877889.Google Scholar
List, C. 2012. The theory of judgment aggregation: an introductory review. Synthese 187: 179207.Google Scholar
Martini, C., Sprenger, J. and Colyvan, M.. 2013. Resolving disagreement through mutual respect. Erkenntnis 78: 881898.CrossRefGoogle Scholar
Nadeau, R., Cloutier, E. and Guay, J.-H.. 1993. New evidence about the existence of a bandwagon effect in the opinion formation process. International Political Science Review 14: 203213.Google Scholar
Page, S. E. 2007. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton, NJ: Princeton University Press.Google Scholar
Soll, J. B. and Larrick, R. P.. 2009. Strategies for revising judgment: how (and how well) people use others’ opinions. Journal of Experimental Psychology 35: 780805.Google Scholar
Surowiecki, J. 2004. The Wisdom of the Crowds. Harpswell, ME: Anchor.Google Scholar
Thomas, E. J. and Fink, C. F.. 1961. Models of group problem solving. Journal of Abnormal and Social Psychology 63: 5363.Google Scholar
Wilf, H. S. 1985. Some examples of combinatorial averaging. American Mathematical Monthly 92: 250261.Google Scholar