Introduction
Information on age data collected through various sample surveys is rarely free from reporting bias. The four leading causes of age misreporting are ignorance of actual ages, misunderstanding between interviewers and respondents, distortion of ages and errors in recording or processing (Singh, Reference Singh2017). In large-scale surveys, age data play a crucial role in understanding various population estimates such as population projections, migration, fertility and mortality. Distribution of age data is vital/critical/essential in planning multiple social services and implementing social policies (Yadav et al., Reference Yadav, Vishwakarma and Chauhan2020). Age misreporting and age displacement (deviation of reported age from actual age) are widespread problems in many developing countries. The error can occur at any stage of a survey, either from the respondent’s or interviewer’s side. Age misreporting on the respondent’s side is mainly due to recall bias, while the interviewers contribute to the error pool by the systematic transfer of respondents from the border of an eligible age group to the neighbouring age group to avoid individual interviews. It is possible that the interviewer may change the birth year to avoid asking questions on various questionnaire sections. The intentional skipping of questions by the interviewer can save time and money (Pullum, Reference Pullum2005; Borkotoky & Unisa, Reference Borkotoky and Unisa2014).
Survey respondents’ preferences for specific digits, most probably ending in 5 and 0, have mainly been reported in Demographic and Health Surveys, but also in census surveys. The reason behind this could be that respondents do not know their actual date of birth or deliberately report an incorrect age, as has been observed explicitly among older respondents, illiterate respondents and those living in rural areas (Fayehun et al., Reference Fayehun, Ajayi, Onuegbu and Egerson2020). The trend and pattern of disparity in age heaping vary from country to country. In the case of Afghanistan, there has been a gradual decrease in the inequality in age heaping from its highest around 1910 to the 1950s (Friesen et al., Reference Friesen, Baten and Prayon2012; Tollnek & Baten, Reference Tollnek and Baten2016).
Age heaping has also been recorded in Bangladesh. Four of the five surveys with the greatest incompleteness of women’s age (missing data) were conducted in Bangladesh in 2004, 2007, 2011 and 2014. Each of these four surveys had slightly less incompleteness than the preceding one, suggesting some improvement, but not a great deal. The underlying problem is that many women in Bangladesh simply do not know their date of birth (Pullum, Reference Pullum2005). A study on 34 sub-Saharan African countries utilized Whipple’s index to compare the magnitude of age heaping across all Demographic and Health Surveys from 1986 to 2015 (Lyons-Amos & Stones, Reference Lyons-Amos and Stones2017). A random slope multilevel model was used to evaluate the proportion of respondents in each survey that rounded their age to the nearest age with a terminal digit 0 or 5. The proportion of misreported ages remained flat at around 5%. James and Rajan (Reference James and Rajan2004) explored the influence of respondent’s educational level on the quality of response on standard demographic variables, including age and sex, using Indian National Family Health Survey data. They recognized that the information gathered from uneducated respondents was more erroneous than that from educated groups.
Efforts have been made to avoid age misreporting through filed check tables that generate lower and upper age limit reporting. The Demographic and Health Surveys (DHS) could have minimized age displacement issues, but these are still high in South Asian countries, especially where education level and socioeconomic status are low. Against this backdrop, this study explored the extent and pattern of age heaping in five countries in South Asia – Afghanistan, India, Nepal, Bangladesh and Pakistan.
Methods
Data
The most recent DHS surveys conducted in five selected South Asian countries were utilized for the study, specifically: Afghanistan (2015), India (2015–16), Nepal (2016), Bangladesh (2014) and Pakistan (2017–18). Household member information on age, sex, educational level and place of residence was included in the analysis for the five countries. Those who reported their age in single years between 10 and 91 years at the time of the survey were included in the study sample. The study sample sizes are given in Table 1.
Table 1. Distribution of study sample by country
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_tab1.png?pub-status=live)
Assessment of age heaping using the Whipple index
Whipple’s index, or index of concentration, was conceived by George Chandler Whipple to measure the tendency for individuals to misreport their age and to detect the extent to which age data show systematic heaping on certain ages as a result of digit preference or rounding off, particularly for ages ending in 0 and 5 (Shryock & Siegel, Reference Shryock and Siegel1976). Initially, the age heaping problem was addressed using Whipple’s index and Myers’ blended index (Pullum, Reference Pullum2005; Nasir & Hinde, Reference Nasir and Hinde2014). Moreover, these indices have been developed to measure the extent of age heaping and terminal digit preferences in Demographic Health Surveys and census surveys (Blum & Krauss, Reference Blum and Krauss2018).
During the past few decades, several modifications have been made to the original Whipple index (Roger et al., Reference Roger, Waltisperger and Corbille-Guitton1981; Noumbissi, Reference Noumbissi1992; Spoorenberg & Dutreuilh, Reference Spoorenberg and Dutreuilh2007; Spoorenberg, Reference Spoorenberg2009; Nasir & Hinde, Reference Nasir and Hinde2014; Singh, Reference Singh2017). Whipple and Whipple type, Myers’ and Bachi’s measures are based on the assumption of linearity in terms of mathematical structuring used to develop indexes. The index used in the present study produces practically the same results and direction of error as these indices (Kashif et al., Reference Kashif, Khan, Iqbal and Riaz2012; Nasir & Hinde, Reference Nasir and Hinde2014). This index, reformed by Singh (Reference Singh2017), was an extension of Nasir and Hinde’s (Reference Nasir and Hinde2014) ‘further modified Whipple index’, but the age range over which this was computed was extended to 10–91 years. The original Whipple index only estimates age heaping on ages ending with the digits 0 and 5. The modified and further modified Whipple indices extend this to all terminal digits and reduce the assumption of linearity to cover 5-year and 3-year windows, respectively.
The method of computation of the further modified Whipple index can be illustrated using the example of the terminal digit 1. The index is a ratio between a numerator and a denominator. In this case, the numerator is the sum of the recorded population at ages 11, 21, 31, …, 81. The denominator is the sum of the recorded population at ages 11, 12, 13, 21, 22, 23, …, 81, 82, 83 divided by 3. The ratio between the numerator and the denominator indicates the extent to which there is over-reporting or under-reporting of ages with the terminal digit 1. It is convenient sometimes to multiply the ratio by 100, so that values over 100 indicate over-representation of the terminal digit and values under 100 indicate under-representation.
For the three-point ratio, let a, b and c be the numbers of events in three consecutive intervals (e.g. ages 49, 50, 51). The ‘correct’ value of b is estimated to be (a+c)/2; the ratio of ‘observed’ to ‘correct’ is then:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_eqnu1.png?pub-status=live)
where for women: lower age (13, 14, 15), upper age (49, 50, 51); and for men: lower age (13, 14, 15), upper age (54, 55, 56).
For the five-point ratio, let a, b, c, d and e be the numbers of events in five consecutive intervals (e.g. ages 48, 49, 50, 51, 52). The ‘correct’ value of c is estimated to be (a+b+d+e)/4; the ratio of ‘observed’ to ‘correct’ is then:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_eqnu2.png?pub-status=live)
where for women: lower age (12, 13, 14, 15, 16), upper age (48, 49, 50, 51, 52); and for men: lower age (12, 13, 14, 15, 16), upper age (53, 54, 55, 56, 57).
The patterns of age heaping in the five selected South Asian country DHS surveys were estimated and age heaping by three primary characteristics of household members, i.e. sex of household members (male and female), place of residence (urban and rural) and education level (no education and some education), were explored across the countries. There were two stages of data analysis. First, the frequency distributions of different single-year age categories by sex of the household member, place of residence and education level were estimated. In the second stage, the extended modified Whipple’s index for each survey’s essential background characteristics for different years were estimated.
Results
Figure 1 presents the estimates of the digit-specific extended modified Whipple index for each of the ten digits 0–9 for the five study countries by sex. The line concentrated to ‘1’ shows no age displacement, whereas ‘more than one’ and ‘less than one’ reflect preferences and avoidances of ages, respectively. Age reporting in the selected South Asian countries followed the classic pattern of a strong preference for ages ending with digits ‘0’ and ‘5’, as reflected by the high peak values above ‘1’. Also, strong avoidance for ages ending in ‘1’ and ‘9’ are reflected by low values less than ‘1’. Among males, age misreporting at the terminal digits ‘0’ and ‘5’ was highest in Bangladesh, followed by Afghanistan and India, whereas Nepal showed the least displacement. Strong avoidance of ages ending with ‘4’ and ‘9’ was highest in Bangladesh, followed by Afghanistan, India and Pakistan. In the case of females, Afghanistan showed a strong preference for ages ending in the digits ‘0’, ‘2’ and ‘5’, but avoidance of ages ending with ‘1’, ‘4’ and ‘9’.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_fig1.png?pub-status=live)
Figure 1. Estimates of the digit-specific extended modified Whipple’s index for each of the ten digits 0–9 (W0–W9) among a) males and b) females for five South Asian countries (AFG, Afghanistan; IND, India; NP, Nepal; BD, Bangladesh; PK, Pakistan).
Figure 2 shows the estimates of the digit-specific extended modified Whipple index for each of the terminal digits 0–9 by place of residence. Afghanistan and Bangladesh showed a strong preference for ages ending with the digits ‘0’, ‘2’ and ‘5’ in both urban and rural areas. In contrast, Afghanistan showed a strong avoidance of ages ending with the digits ‘1’, ‘3’ and ‘9’. Nepal was very exceptional in terms of having less digit preference compared with the other four countries. Not much difference was observed between urban and rural areas in the five countries.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_fig2.png?pub-status=live)
Figure 2. Estimates of the digit-specific extended modified Whipple’s index for each of the ten digits 0–9 (W0–W9) among individuals living in a) urban and b) rural areas of five South Asian countries (AFG, Afghanistan; IND, India; NP, Nepal; BD, Bangladesh; PK, Pakistan).
Figure 3 shows the age heaping in all the terminal digits 0–9 by education level. Respondents with no education had higher age misreporting than those with some education. Uneducated respondents from Bangladesh, Afghanistan and India showed a stronger preference for ages ending with digits ‘0’ and ‘5’ compared with those from Pakistan and Nepal. Strong avoidance of ages ending with the digits ‘1’, ‘4’ and ‘9’ was observed in Bangladesh, Afghanistan and India.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_fig3.png?pub-status=live)
Figure 3. Estimates of the digit-specific extended modified Whipple’s index for each of the ten digits 0–9 (W0–W9) among a) uneducated and b) educated individuals from five South Asian countries (AFG, Afghanistan; IND, India; NP, Nepal; BD, Bangladesh; PK, Pakistan).
Table 2 shows the values of the total extended modified Whipple index, which was further adjusted by the change of origin and scale due to comparison with the United Nations standard for measuring age-misreporting with the different demographic characteristics of selected South Asian countries. None of the South Asian countries included in this study had accurate data. Only Nepal is a country which is close to the approximate data, with India, Pakistan and Bangladesh having rough data and Afghanistan very rough data. The quality of data for males was poorer than that for females in all countries except Afghanistan. Urban areas had better quality data in that they had less digit preference. Education played a significant role in deciding the quality of data, with the uneducated population offering very poor quality of data in all the studied South Asian countries.
Table 2. Total extended modified Whipple index by sex, education level of respondents for the five study countries
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_tab2.png?pub-status=live)
Table 3 shows three-point and five-point lower and upper age displacement by sex. A value of ‘1’ indicates no age displacement; ‘<1’ indicates downward and ‘>1’ indicates upward displacement. Usually, the lower and upper boundary ages are shifted to skip the various sections of the questionnaire. In the most recent rounds of the DHS questionnaire, the eligible ages for men and women were 15–54 and 15–49 years, respectively. Three-point and five-point lower age displacement for males and females was not observed in the five countries, i.e. no major age displacement was observed. On the other hand, upper age displacement for both males and females was observed in Bangladesh; three-point upper age displacement for males was 6.23, meaning that Bangladeshi males are 6.23 times more likely to report age 55 years as having ages 54 and 56 years. In the same way, five-point upper age displacement was 3.97 for Bangladeshi males, meaning that they were 3.97 times more likely to report age 55 years as having ages 53, 54, 56 and 57 years. In the case of Bangladeshi females five-point upper age displacement was exceptionally very low (r(3) = 0.37 and r(5) = 0.34). Among females, three-point upper age displacement was 0.37, meaning that females were 0.37 times less likely to report age 50 years as having ages 49 and 51 years. Five-point upper age displacement was 0.34, which says that Bangladeshi females were 0.34 times less likely to report age 50 years as having ages 48, 49, 51 and 52 years. Similarly, Afghanistan, India and Pakistan also showed some degree of upper age displacement, and this was more evident among men than women.
Table 3. Three-point and five-point lower and upper age displacement by sex of respondents and by country
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220602161003315-0074:S0021932021000249:S0021932021000249_tab3.png?pub-status=live)
Discussion
The issues of age heaping and digit preference were reported in India as long ago as 1959 (Mahalanobis, Reference Mahalanobis1959). The present study explored the extent and pattern of age heaping in the five South Asian countries of Afghanistan, India, Nepal, Bangladesh and Pakistan using the most recent round of DHS surveys conducted at different time points. An extended version of the further modified Whipple index was used to estimate the extent of age misreporting issues the five selected countries had more or less the same socioeconomic and demographic characteristics.
The results indicate that both males and females in the study countries had a strong preference for ages ending with the digits ‘0’ and ‘5’ and a strong avoidance of ages ending with the digits ‘1’ and ‘9’. Age misreporting of the terminal digits ‘0’ and ‘5’ was highest among males in Bangladesh, followed by Afghanistan and India, whereas Nepal showed the least displacement. Among females, Afghanistan showed a strong preference for ages ending with digits ‘0’, ‘2’ and ‘5’, but avoidance of ages ending with the digits ‘1’, ‘4’ and ‘9’. The 2011 Nepal population census pointed out that age was heaping at certain ages, especially on the digits ‘0’ and ‘5’. This was substantiated by the study results, and digit preference seemed to be lower in the younger than the older generation (Siegel & Swanson, Reference Siegel and Swanson2004; CBS, 2014).
Afghanistan and Bangladesh had strong preferences for ages ending with the digits ‘0’, ‘2’ and ‘5’. On the other hand, Afghanistan exhibited a strong avoidance for ages ending with the digits ‘1’, ‘3’ and ‘9’. Nepal was exceptional in that it had less digit preference than the other four countries. Respondents with no education exhibited higher age misreporting than those with some education. Uneducated respondents from Bangladesh, Afghanistan and India showed a stronger preference for ages ending with the digits ‘0’ and ‘5’ compared with those from Pakistan and Nepal. A study conducted in Pakistan utilizing the Whipple index and Myers’ blended index reported that age heaping occurred more often among females than males, and in rural areas compared with urban areas (Kashif et al., Reference Kashif, Khan, Iqbal and Riaz2012).
To say the least, none of the South Asian countries depicted highly accurate age data; the only satisfying aspect of this study was Nepal’s data accuracy, as the value of the total extended modified Whipple index was 126 – an accuracy level close to 100. Regarding India, Pakistan and Bangladesh, these values varied between 160 to 169, reflecting the poor quality of data in these countries. In contrast, Afghanistan had very poor-quality data, with an index value of 185, which was the highest of all the study countries. Evidence from surveys conducted in Pakistan before 1982 suggests that age discrepancies were plausibly explained by a pattern of age exaggeration that increased with age in all data sets except the Pakistan Fertility Survey (PFS), where the quality of age data appeared to be relatively good. More likely, the changes and discrepancies were either not real or are much smaller than estimated (Retherford & Mirza, Reference Retherford and Mirza1982).
The quality of data among males was poorer than that for females in all countries except Afghanistan. Urban areas had better quality data in terms of less digit preference than rural areas. Education seemed to play a significant role in determining data quality. The uneducated population had an inferior quality of data in all the selected South Asian countries than the educated population. In this context, information on age and sex collected from uneducated respondents was more erroneous than that from educated groups (James & Rajan, Reference James and Rajan2004). The Indian census data show that age is less likely to be stated correctly in rural areas than in urban areas at the all-India level. Still, in the northern parts, the figure was in the expected direction. As expected, the reporting seemed to be better among literate than illiterate respondents. For the 2001 Indian census, the quality of data appeared to be better when evaluated using the Whipple index, but there is still much scope for improvement (Unisa et al., Reference Unisa, Dwivedi, Reshmi and Kumar2009).
The highest age displacement was observed among males from Bangladesh for both three- and five-point upper age displacement, at 6.23 and 3.97 respectively, and for females it was marginal at 0.37 and 0.34, respectively. Similarly, Afghanistan, India and Pakistan also showed some degree of upper age displacement for both sexes. In contrast, all the countries showed minor deviation for three-point and five-point upper age displacement for both sexes. Age reporting was less accurate for males than for females. According to the National Institute of Population Research and Training (NIPORT), age heaping is prominent at the specific ages of 10 and 18 for males and females, respectively (NIPORT et al., 2014). Noticeable heaping was observed at ages ending with ‘0’ and ‘5’, and heaping was more prominent among males than females. Ages ending with ‘1’ and ‘9’ were under-reported (NIPORT et al., 2013). Specific to the Indian context, data from the District Level Household Survey-3 (DLHS-3) also indicated errors in reporting age across all states. Studies and research on census data have associated individual characteristics such as illiteracy, rural residence and poor economic condition with age misreporting and poor quality of the data (Borkotoky & Unisa, Reference Borkotoky and Unisa2014). One of the Indian studies also confirmed the influence of literacy status on the population’s age misreporting tendency using the Whipple modified index. The misreporting tendency declined from 5.5 to 2.9 between 2001 and 2011 with increasing literacy level (Agrawal & Khanduja, Reference Agrawal and Khanduja2015; Yadav et al., Reference Yadav, Vishwakarma and Chauhan2020). The quality of age data significantly improved with male literacy. The female Scheduled Tribe population and mean household size have shown some influence on the age reporting error (Mukhopadhyay & Majumdar, Reference Mukhopadhyay and Majumdar2009).
In conclusion, the present study computed age misreporting/displacement by age, sex, place of residence and education level using the most recent round of the DHS in five South Asian countries. Strong digit preference and avoidance, and upper age displacement, were witnessed in the surveys from Bangladesh, Afghanistan and India on parameters of sex and education level. Overall, the findings emphasize the need for all South Asian countries, including those performing best, to make efforts to reduce discrepancies in age reporting to achieve accurate and reliable age-related data for all users, such as policymakers and those developing and implementing programmes, where data on age occupies centre stage.
Acknowledgments
The authors are grateful to the DHS Survey agencies for conducting large-scale surveys at regular intervals and providing the data. The assistance of ICF Macro is acknowledged for granting access to the dataset analyzed in this study.
Funding
No funding was received towards completion of this work.
Conflicts of Interest
The authors have no conflict of interests to report.
Ethical Approval
The authors agreed with, and adhered to, all the terms of use of the dataset set by the DHS. The IRB-approved procedures for DHS public-use datasets do not in any way allow respondents, households or sample communities to be identified. There are no names of individuals or household addresses in the data files. The study adhered to the IRB Ethical Guidelines for Human Subjects.