INTRODUCTION
Neuropsychologists and cognitive researchers often need quick estimates of global cognitive functioning [i.e., intelligence quotient (IQ)]. IQ is often estimated using various methods using limited testing and/or demographic variables. The use of demographic variables is particularly attractive when a patient has little or no tolerance for formal testing. The formality and complexity of demographic estimates vary a great deal. Informal estimates may simply be a crude judgment of level of functioning based solely on occupational status or years of formal education (Sattler, Reference Sattler2001). Formal estimating formulae vary in complexity and use an array of demographic variables (c.f., Barona et al., Reference Barona, Reynolds and Chastain1984; Crawford & Allan, Reference Crawford and Allan1997). Commonly used demographic variables include educational attainment, occupational status, and age in a weighted regression formula (Crawford & Allan, Reference Crawford and Allan1997). The Barona estimate also incorporates race, region of the country in which the person is living, and whether they live in an urban or rural environment (Barona et al., Reference Barona, Reynolds and Chastain1984).
The most popular method of estimating IQ involves a shortened administration of the Wechsler scales. A variety of subtest combinations are used to estimate a Full Scale IQ (FSIQ) based on administering as few as one Wechsler Adult Intelligence Scale (WAIS) subtest and as many as seven (Axelrod et al., Reference Axelrod, Dingell, Ryan and Ward2000; Engelhart et al., Reference Engelhart, Eisenstein, Johnson and Losonczy1999; Jeyakumar et al., Reference Jeyakumar, Warriner, Raval and Ahmad2004; Mendella et al., Reference Mendella, McFadden, Regan and Medlock2000; Pilgrim et al., Reference Pilgrim, Meyers, Bayless and Whetstone1999; Schoenberg et al., Reference Schoenberg, Duff, Scott and Adams2002, Reference Schoenberg, Duff, Dorfman and Adams2004a, Reference Schoenberg, Scott, Ruwe, Patton and Adams2004b). Many of these subtest combinations are based on their correlations with FSIQ from the Wechsler Adult Intelligence Scale-Revised (WAIS-R) or Wechsler Adult Intelligence Scale (Third Edition) (WAIS-III) standardization sample. Psychologists wanting a brief measure will often use the subtest with the highest correlation with FSIQ. The advantages of using a shortened form of an established test include a relatively quick familiar method of testing that produces high correlations with the referent measure with both measures based on a very large, representative normative sample established by the original instrument (Wechsler, Reference Wechsler1997). A drawback to partial test administration is interpolating the subtest(s) scores into a Full Scale estimate when the true Full Scale measure is computed using more subtests or questions.
A hybrid of these two types of proxy IQ measures combines demographic variables with limited testing using select subtests of the WAIS-III. The Oklahoma Premorbid Intelligence Estimate-3 (OPIE3; Schoenberg et al., 2002) provides five different formulae using between one and four WAIS-III subtests combined with demographic information to estimate FSIQ. The OPIE3-4 subtest (ST) uses the Vocabulary, Information, Matrix Reasoning, and Picture Completion subtests together with age, ethnicity, education level, and region of residence. The OPIE3-2ST FSIQ combines the Vocabulary (V) and Matrix Reasoning (MR) raw scores from the WAIS-III with age, education, ethnicity, and gender. Shorter OPIE3 uses the Vocabulary, Matrix Reasoning, and/or Picture Completion subtests with demographic variables.
The last form of proxy IQ measures examined here are original tests that provide an IQ estimate [the North American Adult Reading Test (NAART) and the Shipley Institute of Living Scales (SILS)] and school achievement testing. The NAART (Blair & Spreen, Reference Blair and Spreen1989) is an estimate of premorbid IQ and taps a relatively well-preserved function, pronouncing irregularly spelled words. The SILS (Zachary, Reference Zachary1986) is a two-subtest measure designed to produce an estimated FSIQ and two subscales. The conceptual quotient (CQ) is a measure of impairment, and the abstraction quotient adjusts the CQ for age and education (Zachary, Reference Zachary1986). Finally, we examine a prominent standardized school achievement test, the Iowa Test of Basic Skills (ITBS; Hoover et al., Reference Hoover, Dunbar and Frisbie2003).
The current article examined 11 proxy measures to determine their level of agreement with WAIS-III FSIQ across the entire sample. Two measures, the Barona and Crawford demographics formulae, were originally formulated for use with the WAIS-R. Since they are still in use, clinically there were examined to see how they related to the WAIS-III. The sample was also divided into three ability levels to determine how well the proxy measures perform at the tails of the IQ distribution.
METHODS
All procedures were approved by the Internal Review Board on Human Research.
Participants
Data for 313 participants from the Iowa Adoption Studies were used for the current study. The Iowa Adoption Studies is a series of studies examining Gene × Environment risk for developing substance abuse or psychopathology. Study participants underwent a complete neuropsychological test battery as part of the most recent follow-up. The average age was 43.89 years (SD = 6.78) and ranged from 31 to 60 years. The sample was predominantly female (61.86%). Average education was 14.17 years (SD = 2.26).
Procedures
All participants in the current follow-up were given a neuropsychological test battery that included the WAIS-III, the NAART (Blair & Spreen, Reference Blair and Spreen1989), and the SILS (Zachary, Reference Zachary1986) as global measures of cognitive ability (IQ). All measures were administered by trained research assistants under standard conditions and double scored by two trained raters who had achieved very high interrater reliability (average reliability ≥0.90). Files were then reviewed by a neuropsychologist. The WAIS-III was always given first in the battery to minimize fatigue effects, and testing typically began in the morning. The order of all other tests in the test battery was varied according to a Latin square design. School achievement data were obtained from the participant’s elementary and/or secondary school or from Iowa Testing Services at the University of Iowa after obtaining signed consent from the research participant.
Measures Evaluated
Measures evaluated for this study included the Ward-7ST short form developed by Ward and modified for the WAIS-III by Pilgrim et al. (Reference Pilgrim, Meyers, Bayless and Whetstone1999), the NAART, the SILS, ITBS, the Barona and Crawford demographic regression formulae, and the five OPIE3 hybrids combining demographic and WAIS-III subtest information. The final estimate examined was the ITBS (Hoover et al., Reference Hoover, Dunbar and Frisbie2003), a nationally recognized standardized school achievement test. School achievement is strongly related to IQ (Sattler, Reference Sattler2001), and the ITBS correlated .64 with WAIS-III FSIQ (Spinks et al., Reference Spinks, Arndt, Caspers, Yucuis, McKirgan, Pfalzgraf and Waterman2007). All proxy measures were computed per previously published guidelines. ITBS Iowa state percentile rank scores were converted to IQ scores using table 1.1 in Strauss et al., (Reference Strauss, Sherman and Spreen2006). Only FSIQ measures were examined for the various proxy measures.
ANALYSES
Entire Sample
WAIS-III FSIQ was considered our referent measure. Means, standard deviations, minimum and maximum scores for the WAIS-III FSIQ, and all the proxy measures for the entire sample are shown in Table 1, Entire Sample 1. Pearson correlations and confidence intervals were used as a measure of agreement between the WAIS-III FSIQ and the various proxy measures. Spearman correlations were compared to the Pearson correlations to check for any nonlinearity in the data. Finally, intraclass correlations were calculated to examine the case-by-case correspondence of the WAIS-III and proxy measures. The Spearman and intraclass correlations were slightly lower than Pearson correlations, but all three correlation matrices were quite similar. Therefore, the Spearman and intraclass correlations are not reported here. Percent agreement (defined as ±5 IQ points) between the WAIS-III FSIQ and each proxy measure was also calculated.
Table 1. Descriptive statistics for FSIQ estimates
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160629085009-03401-mediumThumb-S1355617709090766_tab1.jpg?pub-status=live)
Note
The sample size, mean, SD, minimum and maximum values, and the percentage of each sample where the participant’s proxy score was within 5 points of their WAIS-III FSIQ score are listed for the entire sample (N = 313) and for the three IQ groups determined from the WAIS-III FSIQ scores [e.g., below (n = 18), average (n = 211), and above average (n = 84)]. Values in italics refer to outside defined range and values in boldface refer to less than 50% of sample or group estimated within 5 points. ST, subtest; V, Vocabulary; PC, Picture Completion; MR, Matrix Reasoning.
Repeated measures multivariate analysis of variance (MANOVA) and post hoc comparisons examined the statistical difference between the proxy measures and the WAIS-III FSIQ.
Different Ability Levels
To examine the relationship of the proxy measures and WAIS-III FSIQ at the tails of the IQ score distribution, the sample was divided into three groups according to WAIS-III FSIQ. Individuals with an FSIQ at or above 115 were classified “above average” (actual score range 115–155). The “average-ability” group had FSIQs ranging from 85 to 114. The “below-average” individuals were those with an FSIQ below 85 (actual score range 67–84). Analyses computed on the entire sample were also performed on the three ability groups to determine how each proxy measure performed at the tails of the IQ distribution.
RESULTS
Participants
As a group, WAIS-III FSIQ was slightly above average overall (mean IQ = 106.68, SD = 13.43, range 67–155), and many individuals had above-average IQs (n = 84) than below-average IQs (n = 18) (Table 1). The average level of formal education was 14.17 years (SD = 2.26), with a range of 8–17 years.
Analyses on the Entire Sample
All the group means for the various IQ estimates produced were within 7 points of WAIS-III FSIQ. However, the range of IQs estimated by the proxy measures differed greatly from the referent measure (Table 1).
Table 1 shows the percent agreement of each proxy and FSIQ. The highest percent agreement was 65.18% by the Ward-7ST IQ estimate. The lowest percent agreement was 32.27% produced by the Crawford demographics equation.
The Pearson correlation and confidence interval between WAIS-III FSIQ and each proxy measure are shown in Table 2. Correlations ranged from r = .25 for the Barona estimate to r = .95 for the Ward-7ST short form. Three proxy measures, the ITBS, Barona, and Crawford, had Pearson correlations with WAIS-III FSIQ below r = .70, indicating they were not reliable enough for clinical use.
Table 2. Correlations between WAIS-III and estimated FSIQ for each proxy measure
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160629085023-05921-mediumThumb-S1355617709090766_tab2.jpg?pub-status=live)
Note
Column groups of the table show the correlations of WAIS-III and proxy measures for (1) the entire sample, (2) individuals with WAIS-III FSIQs below 85 (below-average IQ), (3) individuals with IQs ranging from 85 to 115, and (4) IQs above 115. Values in italics refer to correlations below .70 (minimum accepted clinical correlation). ST, subtest; V, Vocabulary; PC, Picture Completion; MR, Matrix Reasoning; CI, confidence interval.
Repeated measures MANOVA tested all the proxy measures against the WAIS-III FSIQ. The main effect for proxy measure was highly significant, F(df = 1,12) = 253.35, p < .0001. Post hoc comparisons between each proxy and the WAIS-III FSIQ are shown in Table 3. The OPIE3-V, OPIE3-MR, OPIE3-PC, SILS, and the Barona estimates did not differ significantly from the WAIS-III FSIQ.
Table 3. Post hoc comparisons of repeated measures MANOVA comparing all proxy measures to WAIS-III FSIQ
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160629085225-07223-mediumThumb-S1355617709090766_tab3.jpg?pub-status=live)
Note
A significance value of p < .0001 was adopted to account for multiple comparisons. Values in italics refer to no significant difference from WAIS-III FSIQ.
Different Ability Levels
The performance of the proxy measures across the different cognitive ability groups was examined next. The sample sizes of the groups ranged from 18 in the below-average group, 211 in the average-ability groups, and 84 in the above-average group. Means, standard deviations, minimum and maximum scores, and the percentage of group scoring within 5 points of the WAIS-III FSIQ score are listed in Table 1, Below-average FSIQ, Average FSIQ, Above-average FSIQ. Pearson correlations and confidence intervals are shown in Table 2. Note that restriction of range attenuated the correlations somewhat in the different ability levels.
The Ability Level × Proxy Measure interaction of the repeated measures MANOVA was highly significant F(df = 1,12) = 105.08, p < .0001 (Table 3). Post hoc contrasts between the WAIS-III FSIQ and each proxy measure are shown in Table 3.
Average-IQ Group
The average-IQ group was the largest of the three ability groups (n = 211). The mean WAIS-III FSIQ was 102.71 (SD = 7.73). The percentage of group members scoring within 5 points of the WAIS-III FSIQ score for each proxy measure ranged from a high of 68.57% for Ward-7ST to a low of 40.95% for the ITBS (Table 1, Average FSIQ, percentage within 5 points FSIQ).
Correlation coefficients and confidence intervals between each proxy measure and the WAIS-III FSIQ are shown in Table 2. The Pearson correlation for the Ward-7ST, the Barona, and the Crawford was higher for the average-ability group than for the entire sample (Table 2, Average FSIQ). The remaining correlation coefficients were reduced in the average-ability group. Some attenuation of the correlations was expected due to restriction of range, but the reduction of the correlation coefficient between FSIQ and NAART was unusually large. The correlation coefficient ranged from r = .71 for the entire sample to r = .19 for the average-ability group. The post hoc contrasts from the mixed-model MANOVA indicated that 9 of the 11 proxy measures were significantly different from the WAIS-III FSIQ. Only the SILS and ITBS estimates were not statistically different at the p < .0001 level (Table 3).
Below-Average Group
The mean WAIS-III FSIQ for the below-average group was 78.67 (SD = 4.75). All the proxy measures except the Ward-7ST produced mean IQs above the upper cutoff of the low-ability group (i.e., above 85; Table 1, Below-average FSIQ). The overall percent agreement (within 5 points of FSIQ) was poor in the low-ability group. Only one proxy (Ward-7ST) came within 5 points of FSIQ for more than 40% of the low-ability group. Seven of the 11 proxy measures did not produce clinically reliable correlations with WAIS-III FSIQ (i.e., r ≥ .70). Post hoc contrasts from the MANOVA indicated that only the Ward-7ST and ITBS estimates did not statistically differ from the WAIS-III FSIQ.
Above-Average IQ Group
Six of the 11 proxy measures judged the mean IQ of the above-average group to be in the average-ability range (Table 1). The five measures producing above-average mean IQs all used WAIS-III subtest scores. The four measures requiring the greatest amount of formal testing were the only proxies to have more than 50% of group members estimated within 5 points of actual FSIQ IQ. The Ward-7ST estimate was the only proxy to correlate above r = .70 for the high-ability group. Post hoc comparisons from the MANOVA indicated that only the OPIE3-4ST and OPIE3-2ST proxy measures did not differ from WAIS-III FSIQ.
DISCUSSION
The most important finding of this article is how poorly the IQ proxy measures performed at the tails of the IQ distribution. The proxy measures consistently overestimated the IQ of low-functioning individuals and underestimated the IQs of high-functioning individuals. Inaccurate assessment of global ability could be very problematic for clinicians designing treatment programs and managing family expectations for recovery. Researchers using a poor measure of global functioning as a covariate may miss a significant theoretical finding because too much (or too little) variance is being attributed to global functioning and factored out of the model.
The poor performance at the tails of the IQ distribution is concealed in the current literature, which only reports correlations for the entire sample. This has given the neuropsychological community a false sense of security for many years. The proxy measures consistently produced higher percent agreement (Table 1) and correlations (Table 2) across the entire sample than for the low- or high-ability individuals. This pattern cannot be completely accounted for by restriction of range.
The proxy measures that consistently produced scores closest to the WAIS-III FSIQ were measures that required the greatest amount of formal testing and used WAIS-III subtests. Similar analyses of verbal IQ and performance IQ proxy measures, not published here, found the same pattern (i.e., the greater the amount of formal testing, the more accurate the IQ estimate).
The second major finding of this article is how the different proxy measures performed depending on the method of evaluation. The majority of proxy articles published only examined correlations. We presented Pearson correlations, percent agreement defined as ±5 IQ points, and F and t tests from repeated measures MANOVA with post hoc comparisons. The Pearson correlations were the most consistent measures from group to group and from each group to the whole sample. The Ward-7ST and OPIE3-4ST produced the most consistently reliable correlations (i.e., r > .70) (Table 2). The Ward-7ST had the highest percent agreement index with WAIS-III FSIQ, with the whole sample and all three subgroups producing scores within 5 points of FSIQ at least 50% of the time. Conclusions determined from the MANOVA post hoc comparisons had very little overlap conclusions from the Pearson correlations or percent agreement. The OPIE3-Vocabulary, OPIE3-Matrix Reasoning, OPIE3-Picture Completion, SILS FSIQ, and Barona FSIQ did not differ from WAIS-III FSIQ statistically for the whole group. Overall, the proxy measures did not perform well in either the low- or the high-ability groups, regardless of the method of evaluation.
One interesting note on the various OPIE3, different formulae used different demographic variables. The only demographic variables found in all the OPIE3 equations were educational attainment and ethnicity. Both gender and age were used in only four of five of the equations. Interestingly, when gender was used, it was a negative coefficient (males coded 0 and females coded 1), subtracting points from the overall equation. Females were also rated heavier than males (i.e., females had a coefficient of 2 where males = 1). An interesting point about using age in the regression formulae is that even raw subtest scores on the WAIS-III are age adjusted. It should be noted that the bulk of the OPIE equations needed additional age correction.
The NAART generally performed very poorly. In addition, the NAART had an obvious ceiling effect producing poor estimates for extremely bright individuals. However, the NAART is believed to represent a true measure of premorbid functioning in that the ability to pronounce irregularly spelled words is well preserved even in moderate dementia (Strauss et al., Reference Strauss, Sherman and Spreen2006). This may make the NAART the measure of choice when cognition suddenly becomes impaired due to injury or illness. Again, the clinician and researcher must consider the target sample in selecting a measure.
CONCLUSIONS
The major emphasis of this article is that the IQ proxy measure of choice depends on the question being asked. Is the testing an estimate of current functioning, of premorbid functioning, or is the assessment for research? What are the basic patient characteristics that will influence testing (e.g., estimated ability level and the amount of testing time the patient will tolerate)? The administration of the full WAIS-III is recommended when possible. When the use of a proxy estimate is necessitated, consideration must be given to the examples shown here. The measures using the greatest amount of testing performed somewhat better than those using little or no formal testing. However, when the situation dictates, the use of a proxy, several proxy measures, produced marginal reliability and low correlations with WAIS-III FSIQ in individuals with cognitive functioning at the ends of the ability distribution.
ACKNOWLEDGMENTS
This study was supported by the National Institute on Drug Abuse grant number R01 DA005821, Gene × Environment Interactions in the Development of Drug Abuse. This study was presented at the Annual Conference of the International Society for Intelligence Research, December 2007, Amsterdam, The Netherlands. The authors have no financial or other relationships that could be interpreted as a conflict of interest affecting this article.
APPENDIX
Formulae for Calculating Estimating IQ From Demographic Variables
WAIS-III seven-subtest short form (Pilgrim et al., Reference Pilgrim, Meyers, Bayless and Whetstone1999)
FSIQ estimate produced using the Information, Digit Span, Arithmetic, Similarities, Picture Completion, Block Design, and Digit Symbol subtests.
Demographics only—Barona (Barona et al., Reference Barona, Reynolds and Chastain1984)
Estimated FSIQ = 54.96 + 0.47 (age) + 1.76 (sex) + 4.71 (race) + 5.02 (education) + 1.89 (occupation) + 0.59 (region).
Standard error of the estimate = 12.14,
where
Age:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151007081806175-0479:S1355617709090766_tab4.gif?pub-status=live)
Education:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20151007081806175-0479:S1355617709090766_tab5.gif?pub-status=live)
Demographics only—Crawford (Crawford & Allan, Reference Crawford and Allan1997)
Predicted FSIQ = 87.14 − (5.21 × occupation) + (1.78 × education) + (0.18 × age),
where
Demographics and Testing Combined
OPIE3 for the WAIS-III (Schoenberg et al., Reference Schoenberg, Duff, Dorfman and Adams2004a).
OPIE3-4 subtest FSIQ = 35.348 + 0.368 (Vocabulary Raw) + 0.682 (Information Raw) + 0.987 (Matrix Reasoning Raw) + 0.737 (Picture Completion Raw) + 0.175 (age in years) + 0.656 (education) + 0.578 (ethnicity) + 0.341 (region of country).
Standard error of the estimate = 5.68.
OPIE3-2 subtest FSIQ = 45.997 + 0.652 (Vocabulary Raw) + 1.287 (Matrix Reasoning Raw) + 0.157 (age in years) + 1.034 (education) + 0.652 (ethnicity) − 1.015 (gender).
Standard error of the estimate = 6.63.
OPIE3-Vocabulary subtest FSIQ = 57.220 + 0.874 (Vocabulary Raw) + 1.766 (education) + 1.081 (ethnicity) + 0.674 (region of country) − 1.508 (gender).
Standard error of the estimate = 8.35.
OPIE3-Matrix Reasoning FSIQ = 43.678 + 1.943 (Matrix Reasoning Raw) + 0.297 (age) + 3.564 (education) + 1.541 (ethnicity) + 0.543 (region of country) − 1.137 (gender).
Standard error of the estimate = 9.06.
OPIE3-Picture Completion FSIQ = 29.280 + 1.469 (Matrix Reasoning Raw) + 1.242 (Picture Completion Raw) + 0.332 (age) + 3.04 (education) + 1.025 (ethnicity) + 0.557 (region of country) − 1.278 (gender).
Standard error of the estimate = 7.93,
where
Iowa Test of Basic Skills
Iowa percentile ranks converted to IQ scores using table 1.1 in Strauss et al. (Reference Strauss, Sherman and Spreen2006) and then Iowa percentile ranks correlated with WAIS-III FSIQ scores, after Spinks et al. (Reference Spinks, Arndt, Caspers, Yucuis, McKirgan, Pfalzgraf and Waterman2006).
North American Adult Reading Test (Blair & Spreen, Reference Blair and Spreen1989)
Test of 61 irregularly spelled English words.
SILS-Revised Manual (Zachary, 1986): Los Angeles, CA: Western Psychological Services
Standard administration and scoring.