Hostname: page-component-7b9c58cd5d-bslzr Total loading time: 0.001 Render date: 2025-03-14T03:24:03.292Z Has data issue: false hasContentIssue false

Methods for determining pubertal status in research studies: literature review and opinions of experts and adolescents

Published online by Cambridge University Press:  17 June 2019

I. V. Walker*
Affiliation:
Medical Research Council (MRC) Lifecourse Epidemiology Unit, University of Southampton, Southampton General Hospital, Southampton, UK Primary Care and Population Sciences, University of Southampton, Southampton General Hospital, Southampton, UK
C. R. Smith
Affiliation:
Department of General Paediatrics, Southampton Children’s Hospital, Southampton, UK
J. H. Davies
Affiliation:
Department of Endocrinology, Southampton Children’s Hospital, Southampton, UK
H. M. Inskip
Affiliation:
Medical Research Council (MRC) Lifecourse Epidemiology Unit, University of Southampton, Southampton General Hospital, Southampton, UK National Institute for Health Research (NIHR) Southampton Biomedical Research Centre, University of Southampton and University Hospital Southampton NHS Foundation Trust, Southampton, UK
J. Baird
Affiliation:
Medical Research Council (MRC) Lifecourse Epidemiology Unit, University of Southampton, Southampton General Hospital, Southampton, UK National Institute for Health Research (NIHR) Southampton Biomedical Research Centre, University of Southampton and University Hospital Southampton NHS Foundation Trust, Southampton, UK
*
Address for correspondence: I. V. Walker, c/o J. Baird, MRC Lifecourse Epidemiology Unit, University of Southampton, Southampton General Hospital, Southampton SO16 6YD, UK. Email: i.v.walker@soton.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

In lifecourse studies that encompass the adolescent period, the assessment of pubertal status is important, but can be challenging. We aimed to identify current methods for pubertal assessment and assess their appropriateness for population-based research by combining a review of the literature with the views of experts in the field. We searched bibliographic databases, extracted data and assessed study quality to inform a workshop with 21 experts. Acceptability of different approaches was explored with a panel of ten adolescents. We screened 11,935 abstracts, assessed 157 articles and summarised results from 38 articles. Combining these with the opinions of experts, self-assessment was found to be a practical method for use in studies where agreement with the gold standard of clinical assessment by physical examination to within one Tanner stage was acceptable. Serial measures of height and foot size accurately indicated timing of the pubertal growth spurt and age at peak height velocity, and were seen as feasible within longitudinal studies. Hormonal and radiological methods did not offer a practical means of assessing pubertal status. Assessment of voice maturation was promising, but needed validation. Young people thought that self-assessment, foot size and voice assessments were acceptable, and preferred an assessor of the same sex for clinical assessment. This review thus informs researchers working in lifecourse and adolescent health, and identifies future directions in order to improve validity of the methods.

Type
Review
Copyright
© Cambridge University Press and the International Society for Developmental Origins of Health and Disease 2019

Introduction

Much epidemiological research on the developmental origins of health and disease requires following study participants from early life to assess their growth and development. However, during adolescence, interpretation of a variety of measurements, such as those of body composition and mental health, requires knowledge of the participants’ stage of puberty at the time the assessments are made.

Accurate evaluation of puberty is key in the assessment of growth in young people. In a clinical setting, a physical examination of secondary sexual characteristics by a trained health professional (hereafter ‘clinical assessment’) is undertaken to document the stage of puberty,Reference Marshall and Tanner1,Reference Marshall and Tanner2 as changes in the timing and tempo of puberty may indicate delayed or precocious puberty, or adverse effects of an underlying disease or its treatment. The different aspects of the clinical assessment identify the degree to which the body has been exposed to different hormones: testosterone through its effect on pubic and axillary hair, and penile size, oestrogen through breast development and menarche, and gonadotrophins through testicular and ovarian volume. Clinical assessment is widely recognised as the gold standard method for assessing pubertal development.Reference Coleman and Coleman3Reference Schmitz, Hovell and Nichols5 Traditionally attributed to Tanner (1962), this method uses a series of reference photographs accompanied by brief descriptions, which depict five stages of development of pubic hair growth for both sexes, and breast and external genitalia development, for girls and boys, respectively.Reference Tanner6 This resource is sometimes known as the sexual maturation scale (SMS) or sexual maturity rating (SMR).

In a research setting, it may be useful to understand the effect of exposures on the timing of puberty or the confounding effects of sex steroids on various outcomes. Study participants are often healthy, and the acceptability and practicality of a standard clinical pubertal assessment is less certain. The use of clinical assessments in cohorts and large-scale longitudinal studies can be challenging due to difficulties obtaining consent to a physical examination. Studies often rely on self-assessment methods or parental reporting, either as the main or a backup method when clinical assessment is declined. Self-assessment methods are also not without their challenges – for example, schools and parents may be concerned about the use of images depicting the development of secondary sexual characteristics.Reference Petersen, Crockett, Richards and Boxer7

There are few reviews of the different approaches to pubertal assessment in a research setting.Reference Coleman and Coleman3,Reference Dorn and Biro4,Reference Dorn, Dahl, Woodward and Biro8 In order to identify current methods for pubertal assessment and assess their appropriateness for population-based research, we combined a review of literature with the views of experts in the field, as well as the views of young people on acceptability of various methods.

Methods

Review of evidence

We sought studies that described and compared methods of pubertal assessment, specifically searching for those that reported validation of the methods. In December 2015, assisted by an information specialist, we searched the following online databases of abstracts: Medline, PsycInfo, Scopus, Sociological Abstracts, CINAHL and ERIC. Search results were imported into EndNote X7, and duplicates were removed. In addition, we consulted via email, and in person, clinical and epidemiology experts in the field for recommendations of relevant articles, and hand-searched key journals to identify further publications. Titles and abstracts were screened to determine the relevance of studies. Where these contained insufficient information, full articles were accessed. While we did not impose limits on publication dates or geographical location of studies, we excluded those articles where the full text was not available in English, due to a lack of funding for translation.

While we employed systematic review methodology in using a standardised form for data extraction and study quality assessment, we did not aim for an exhaustive review; methodology is often described briefly within research articles and search techniques do not necessarily identify all such articles, because of the indexing procedures. Consistent with the purpose of the review, which was to inform a workshop of experts, we gathered information on the use of the methods to the point of saturation, but did not attempt to include every study that has used the methods. Data were extracted from the included studies, using a standard proforma adapted for this review, by authors CS and IW, following which IW and HI double-checked half of the extracted records. The risk of bias of each included study was assessed using a quality assessment checklist adapted from guidance of best practice in systematic reviews.9 Results were collated in a table format and summarised in a narrative synthesis.

Expert opinion

The preliminary findings of the literature review were presented at a workshop at the MRC Lifecourse Epidemiology Unit, University of Southampton in September 2016, which was attended by 21 clinicians and epidemiologists from the Cohort and Longitudinal Studies Enhancement Resources (CLOSER) and Society for Social Medicine (SSM) networks. The workshop incorporated discussions of the approaches to pubertal assessment most suitable for research studies. The workshop was facilitated and chaired by two of the authors (JB and HI). Following a presentation of the review findings, presentations were also given by experts in specific approaches to pubertal assessment – clinical examination, growth assessment to infer pubertal stage, and hormonal approaches. Three break-out groups were then formed to consider two specific questions:

  1. 1. Which cross-sectional approach is most suitable to determine current pubertal stage?

  2. 2. Which pubertal features should we assess longitudinally (e.g. peak height velocity), and how should we measure them?

A snowballing technique was used to reach consensus. The discussions of the break-out groups were recorded on flip charts and fed back to all workshop participants. Further discussion took place within the group as a whole. JB took notes throughout the meeting and produced a report of the workshop, including a synopsis of each presentation and a summary of the discussion points that arose during the snowballing exercise. This draft report was circulated to all participants, and they were invited to comment. We also held additional discussions with experts in the field of adolescent medicine outside the workshop.

The views of young people

We consulted six boys aged 12–16 years and four girls aged 10–13 years, who were part of the University Hospital Southampton Children’s Panel, which is available for consultations relating to research and clinical care. Three female researchers spoke to the panellists in two sex-specific groups, gathering their views on the acceptability, preferences and barriers in relation to the following types of pubertal assessment: physical examination, self-assessment questionnaires, shoe and foot size assessments, and voice maturation assessment for boys. In addition, the boys were asked about the use of age at first nocturnal emission and the use of orchidometer to assess testicular size. The girls were asked about the use of age at menarche. We used examples of self-assessment questionnaires with line drawings representing Tanner stages, and let both the boys and the girls use Speechtest, an app that reports boys’ pubertal stage derived from hearing the participant counting backwards from 20 to 1.Reference Howard10

Results

We screened 11,935 abstracts and assessed 157 articles in detail. Of these, 38 were deemed relevant, with most articles reporting data comparing two or more methods of pubertal assessment. Not all studies were validation studies, since some did not include a comparison with the gold standard method of clinical assessment.

The majority of studies had methodological weaknesses. Most did not include justification of the sample size. Selection bias was an issue in a number of studies with highly selective samples consisting of, for example, children from particular socioeconomic backgrounds or those attending specialist clinics, where these substantially differed from the intended target populations. Many articles included insufficient details of the methods used. Eleven studies were conducted in the United States, nine in the United Kingdom, four in Sweden, two in Australia and two in Turkey, and one each from Chile, China, the Czech Republic, Denmark, France, India, Iran, Japan, the Netherlands and South Africa. Review findings were grouped into five categories according to the pubertal assessment method.

Self-assessment

This was the largest group, with 17 relevant studies (Table 1). Most compared self-assessment using either the SMS or the pubertal development scale (PDS) against clinical assessment, with the PDS being an interview-based measure, consisting of questions about body changes and growth. Some studies used the photo version of the SMS, while othersReference Norris and Richter11,Reference Bonat, Pathomvanich, Keil, Field and Yanovski12 used a line drawing version,Reference Morris and Udry13 in order to increase acceptability of the images to young people.

Table 1. Description of studies using self-assessment methods

Agreement in Tanner stage between self-assessment and clinical assessment ranged from 43% to 81% of the sample. There was generally at least 84% agreement to within one pubertal stage for the SMS,Reference Schmitz, Hovell and Nichols5 and 85–100% for the PDS.Reference Schmitz, Hovell and Nichols5,Reference Brooks-Gunn, Warren, Rosso and Gargiulo14,Reference Hergenroeder, Hill, Wong, Sangi-Haghpeykar and Taylor15 Children earlier on in their development were reported to overestimate their stage of development, and children in the later stages were often found to underestimate this,Reference Brooks-Gunn, Warren, Rosso and Gargiulo14Reference Desmangles, Lappe, Lipaczewski and Haynatzki17 although the opposite pattern was also observed.Reference Rasmussen, Wohlfahrt-Veje and Tefre de Renzy-Martin18 Girls tended to be more accurate in self-reporting than boys.Reference Bonat, Pathomvanich, Keil, Field and Yanovski12,Reference Morris and Udry13,Reference Carskadon and Acebo16 Certain aspects of development were self-reported with more accuracy than others; for example, self-assessment of pubic hair development tended to correlate well with clinical assessment for both sexes, whereas self-reported breast and external genitalia stages were more weakly correlated with clinical staging.Reference Bonat, Pathomvanich, Keil, Field and Yanovski12,Reference Hergenroeder, Hill, Wong, Sangi-Haghpeykar and Taylor15,Reference Norris and Richter19

Self-assessment using the SMS perfectly agreed with that based on the PDS in 56% of females and 39% of males,Reference Bond, Clements and Bertalli20 but agreement to within one Tanner stage was observed in 97% and 89%, respectively. Agreement between the two scales was also higher for 12–13-year-olds compared with 10–11-year-olds or 14–15-year-olds. However, when compared with clinical assessment, boys tended to rate themselves as more mature using the SMS drawings than using the PDS score, and girls tended to rate themselves as less mature. This may suggest that viewing the images of pubertal stage might encourage a socially desirable response among young people.Reference Bond, Clements and Bertalli20 Colour drawings used in a clinic setting yielded assessments by children that were close to those of raters. Assessments were less reliable, however, when children were overweight or obese,Reference Bonat, Pathomvanich, Keil, Field and Yanovski12,Reference Rabbani, Noorian and Fallah21,Reference Sun, Tao and Su22 although not all studies observed this.Reference Schmitz, Hovell and Nichols5,Reference Rabbani, Noorian and Fallah21

When children were asked to compare their development with their classmates in a ‘global question’, taking into account all aspects of puberty, there was high concordance with answers to the same question provided by clinical examiners, with 95% for boys and 93.5% for girls, although this was in a self-selected subsample of those who volunteered to undergo clinical examination.Reference Berg-Kelly and Erdes23 A continuous Tanner visual analogue scale, used in one study, was found to be somewhat less accurate than the SMS or PDS.Reference Schmitz, Hovell and Nichols5

Four studies explored the use of parental assessments of own child’s pubertal status. One of the studies used a single question as to whether a particular child had entered puberty; however, parents were asked the question one year after the children, which limits interpretation of these findings.Reference Lum, Bountziouka and Harding24 In another study, mothers of girls rated their daughters’ SMS-based Tanner stage higher than clinicians, but correlation with clinical assessment was 0.85, compared with 0.82 for self-assessment, indicating a potential for the use of this method.Reference Brooks-Gunn, Warren, Rosso and Gargiulo14 In the same study, mothers were more likely to overestimate breast staging at the beginning of breast development, but this was not found in relation to pubic hair.Reference Brooks-Gunn, Warren, Rosso and Gargiulo14 In a large study, mothers and children tended to underestimate pubertal state for girls and to overestimate it for boys.Reference Rasmussen, Wohlfahrt-Veje and Tefre de Renzy-Martin18 In another study, comparison of girls’ and mothers’ assessment of breast bud development using black-and-white Tanner photos with a clinical assessment a showed that self-assessments by the girls themselves had low concordance with breast bud assessment by a trained nutritionist.Reference Pereira, Garmendia and Gonzalez25 Maternal assessment of breast bud development among the leaner girls was more accurate than that by the girls themselves.

While the experts in our workshop saw clinical assessment as the gold standard, they recognised that acceptability among study participants was often low. They considered self-assessment to be the easiest approach within cohort studies and the only practical method, other than clinical assessment, in cross-sectional studies. They agreed, however, that it was a rather crude approach, given the likelihood that it would only accurately categorise pubertal status to within one Tanner stage.

Growth

We identified nine relevant studies of growth (Table 2). These generally focused on two parameters: age at height growth take-off, indicating the start of pubertal growth spurt, and age at peak height velocity, indicating the intensity of the pubertal growth spurt. Serial measurement of height was the most frequently reported approach, which can also help identify the pre-pubertal growth spurt.Reference Karlberg, Kwan, Gelander and Albertsson-Wikland26 Height velocity may correlate with secondary sexual characteristics, with one study demonstrating high correlation between height velocity and testicular volume.Reference Bundak, Darendeliler and Gunoz27

Table 2. Description of studies using growth methods

One approach to growth curve analysis is the SuperImposition by Translation And Rotation (SITAR) model.Reference Cole, Donaldson and Ben-Shlomo28 It takes into account characteristics that differ from one individual to the next – namely mean height, timing and rate of puberty. The SITAR method produces three measurements, representing differences in mean size and growth tempo, and a measure of growth velocity. The method also could be applied to growth in other parameters, such as foot length, and to the development of secondary sexual characteristics, such as breast, testicular and pubic hair development. A study of the Edinburgh Longitudinal Growth Study cohort that used the SITAR method showed high correlations in relation to pubertal timing between such markers as height, genital and pubic hair stages, and testicular volume, although correlations for the shared markers were significantly higher in the girls.Reference Cole, Pan and Butler29 This method recently has been applied to the height measurements from children in the Avon Longitudinal Study of Parents and Children (ALSPAC) to enrich the dataset with measures that include age at peak growth velocity,Reference Frysz, Howe, Tobias and Paternoster30 a marker of pubertal timing. A rapid increase in foot length may correspond to the onset of puberty and Tanner stage 2 on clinical assessment.Reference Mitra, Samanta, Sarkar and Chatterjee31 Peak increase in shoe size has also been found to precede peak increase in sitting height,Reference Busscher, Kingma and Wapstra32 although the comparison was with data from an unrelated sample. Others compared mean age at increase in foot velocity with mean age at take-off in height, and found no significant difference between the two parameters.Reference Ford, Khoury and Biro33

The experts attending the workshop agreed that assessment of growth in height or foot size was a promising method, albeit that more evidence was required in relation to foot size. Growth assessment was thought to be feasible within cohort studies, provided measurements could take place with sufficient frequency, but by definition not suitable for cross-sectional studies.

Radiological methods of assessing the degree of maturation of the cervical vertebra, olecranon and digits have been proposed as pubertal assessment approaches. The Greulich and Pyle AtlasReference Greulich and Pyle34 and Tanner-Whitehouse III scoreReference Tanner, Whitehouse and Cameron35 can be used to assess skeletal age. A cervical-vertebral index indicates level of skeletal maturation,Reference Cericato, Bittencourt and Paranhos36 and radiological images of the olecranon correlate with those of the digits, yet there was no comparison with a reference standard in these studies.Reference Canavese, Charles and Dimeglio37,Reference Ozer, Kama and Ozer38 The experts agreed that radiological approaches were feasible in large-scale population-based studies, though required participants’ attendance at a clinic, limiting their use.

Hormonal assessment

Gonadotrophins and gonadal hormones

A number of articles discussed the potential for measurement of gonadotrophins as a means of assessing pubertal stage. At the onset of puberty, there is an increase in the overnight pulsatile release of luteinising hormone (LH), suggesting that early morning urinary LH might offer a way of determining Tanner stage in girls.Reference Apter, Bützow, Laughlin and Yen39,Reference Grumbach, Roth, Kaplan and Kelch40 Serum testosterone is aromatised to oestrogen in fat, and it is oestrogen that triggers growth hormone (GH) secretion in both sexes.Reference Søeborg, Frederiksen and Mouritsen41 Inhibin B is released from the ovary in pubertal girls and rises in early puberty, whereas inhibin A is slower to rise.Reference Sehested, Andersson, Müller and Skakkebaek42 For boys, testosterone rises throughout puberty, with the steepest rise seen between Tanner stages 3 and 4.

We identified five studies that incorporated both hormonal measurement and reference to pubertal staging (Table 3). In a study examining the correlation of three-monthly urinary oestradiol, testosterone and LH with self-reported SMS-based Tanner staging, the levels of all three hormones were correlated with the staging at baseline and 12 months later.Reference Singh, Balzer and Kelly43 In another study, staging based on physical examination reflected testosterone and DHEA levels in both sexes, but was only modestly related to oestradiol in girls.Reference Shirtcliff, Dahl and Pollak44 In boys, 24-hour testosterone levels in later puberty correlate reasonably well with testicular volume. Serum concentrations of testosterone increased progressively throughout puberty, with a marked increase occurring between early and mid-puberty. The onset of puberty was marked by accentuation of the diurnal rhythm of testosterone release due to increased release of testosterone at night.Reference Ankarberg-Lindgren and Norjavaara45 However, assays of urinary sex hormones both in boys and girls can be difficult to interpret due to within-person variability, and longitudinal sampling would be required to determine the hormonal changes associated with progression through pubertal stages.Reference Singh, Balzer and Kelly43,Reference Chada, Prusa and Bronsky46 It has been suggested that both inhibin A and B may have a potential as markers of pubertal development in boys and girls.Reference Chada, Prusa and Bronsky46,Reference Crofton, Evans and Groome47 However, progressive changes in their levels are not consistent enough to enable pubertal staging. On the whole, there are significant challenges to interpretation of such data due to the complexity of relationships between gonadotrophins, sex hormones and inhibins.Reference Chada, Prusa and Bronsky46,Reference Crofton, Evans and Groome47

Table 3. Description of studies using gonadotrophin or gonadal hormone methods

Leptin

Leptin interacts with the reproductive axis at multiple sites with stimulatory effects on the hypothalamus and pituitary, and inhibitory action on the gonads.Reference Rockett, Lynch and Buck48 Leptin may affect the regulation of gonadotrophin-releasing hormone (GnRH) and LH secretion during puberty.Reference Brann, Wade, Dhandapani, Mahesh and Buchanan49 There were five studies identified in this group (Table 4). One study reported an increase of 50% in serum leptin levels just before the onset of puberty, and a decrease to approximately baseline after the initiation of puberty.Reference Mantzoros, Flier and Rogol50 However, these findings were based on a small sample and should be treated with caution, given that another study showed that leptin levels varied widely among schoolchildren.Reference Wang, Morioka and Gowa51 Another study showed no correlation between serum leptin levels and age in boys, but a significant correlation in girls.Reference Carlsson, Ankarberg and Rosberg52

Table 4. Description of studies using leptin methods

Others have examined urinary leptin, demonstrating that monthly urinary leptin levels were higher in girls than boys over a period of six months.Reference Maqsood, Trueman and Whatmore53 Leptin was higher in children advanced in puberty, compared with children remaining pre-pubertal, but the measure of pubertal status was insufficiently described. Urine collection was a less invasive method for measuring leptin, compared with blood. In another study, urinary leptin showed day-to-day variability and correlated with serum leptin. Urinary leptin was similar in both sexes: in boys it increased significantly from Tanner stages 1 to 2, peaked in stage 3 and then declined for stages 4 and 5, while in girls there was a linear relationship between leptin levels and pubertal development.Reference Zaman, Hall and Gill54

The experts in the workshop regarded assay of sex hormones and leptin as an area of interest for assessment of pubertal stage, but the need for repeated measurements and the potential cost of assays were seen as barriers to their use in research.

Voice maturation assessment

The maturation of the human voice is characterised by changes in pitch, loudness and a variety of tone qualities as the larynx grows in both sexes.Reference Harries, Walker, Williams, Hawkins and Hughes55 Voice breaking in boys usually occurs as a distinct event during late puberty, with a rapid drop in voice occurring during Tanner stages 3 and 4, usually around 12–15 years.Reference Hodges-Simeon, Gurven, Cardenas and Gaulin56,Reference Ong, Bann and Wills57 One study has shown that the timing of voice breaking could act as a non-invasive marker of pubertal maturation, with moderate correlation with other markers, such as genital development, and pubic and axillary hair growth (Table 5).Reference Ong, Bann and Wills57

Table 5. Description of studies using voice maturation methods

Cooksey classification of voice is based on a six-stage pattern of pubertal voice development derived from the singing range in boys, and this method has been validated previously.Reference Cooksey and Runfola58 A clear correlation was found between the Tanner stages and Cooksey classification of boys’ voices studied at three-month intervals. In addition, change in fundamental voice frequency was correlated with testicular volume, but not with serum testosterone levels.Reference Harries, Walker, Williams, Hawkins and Hughes55 The experts saw assessment of voice maturation as a promising method, and acknowledged the need for further research in this area.

The views of young people

The panel of young people expressed a preference for questionnaires, rather than clinical assessment. They were shown line drawings representing pubertal stages, and said that they preferred these to the photographs. They also preferred paper, rather than digital, versions of questionnaires, and thought that measurements of height and foot size were acceptable and likely to be so for their peers. All stated that the voice maturation assessment via a Speechtest app was acceptable, and boys were unlikely to exaggerate the depth of their voice, provided the assessment was done in private. The boys felt that they would find a question about their first nocturnal emissions embarrassing and that most would not be able to recall the timing of this correctly. They also stated that they would need to self-examine in private in order to estimate their testicular size if asked to use an orchidometer. The girls did not object to being asked about age at menarche if the person asking was a professional, but felt that their mothers might be more accurate in their reporting of this. Both boys and girls suggested that young people would be much more likely to consent to a clinical assessment if this were carried out by a professional of the same sex. They noted the importance of clear communication that the assessment was brief and did not require complete removal of their clothes.

Integration of review findings with the views of experts and young people

As a result of the literature review it was possible to draw a number of conclusions for each category of assessment. These are summarised in Table 6, alongside conclusions drawn from the discussions with experts and adolescents.

Table 6. Conclusions from the literature review and opinions of experts and adolescents

Discussion

This review examined the main pubertal assessment methods that are currently in use in research. Clinical assessment can be used across all Tanner stages, from the pre-pubertal stage to complete maturation, in contrast with some of the other methods, which may be more useful in relation to specific changes through puberty, such as the pubertal growth spurt, specific hormonal changes or voice maturation. Physical examination is non-invasive compared with blood sampling required for many hormonal methods, and less harmful than radiological methods. Clinical assessment can also be used in cross-sectional studies, in contrast to some other methods. However, our work suggests low acceptability of the clinical examination in a research setting. The young people suggested that good communication, and sex-matching of the assessor and participant, could help improve acceptability of the method. However, it is possible that even when an assessor of the same sex is not available, this may not necessarily lead to refusal.Reference Dorn59

The other methods considered in this review may have their place in pubertal assessment under different circumstances. The self-assessment approach may be suitable in studies where accuracy to within one Tanner stage is acceptable, in large-scale studies lacking resources required to facilitate clinical assessment or in settings where there may be strong objections to physical examination. Self-reporting may be affected by social desirability or accepted norms,Reference Norris and Richter11,Reference Brooks-Gunn, Warren, Rosso and Gargiulo14 and younger children may consider themselves older (and therefore, more developed) than they are, whereas older children might view themselves as younger, hoping for more development.Reference Desmangles, Lappe, Lipaczewski and Haynatzki17 Higher validity against the reference test of clinical assessment was observed when adolescents were allowed to self-examine before rating their own development, compared with self-recall from memory.Reference Rabbani, Noorian and Fallah21 Assessments of growth could be used in longitudinal studies, provided appropriately frequent measurements were feasible. The SITAR method in particular has the advantage over conventional approaches, in that it takes account of individual characteristics, such as mean height, timing and rate of puberty. The SITAR method could be applied to growth in other parameters, such as foot length, and to the development of secondary sexual characteristics. Assessing foot size is also a promising method. The absence of data on validity, along with the acceptability issues, means that radiological methods of pubertal assessment are unlikely to be appropriate for use in research studies. Measurements of gonadotrophins, gonadal hormones and leptin can provide a detailed picture of the biological changes taking place during puberty. High laboratory-associated costs, the need for repeat measurements and the invasiveness of blood tests are some of the current barriers to the wider use of the hormonal methods. Between- and within-person variability and the lack of evidence of validity against clinical assessment also limit the utility of the hormonal methods. Nonetheless, advances in this field might yield acceptable and reliable methods in the future. Assessment of voice maturation presents a convenient, albeit insufficiently validated, method.

Clinical assessment remains a subjective method, prone to measurement error and bias, particularly if not diligently conducted. The most frequently used black-and-white reference photographs are old, and depict Caucasian adolescents only. This has been highlighted previously,Reference Dorn and Biro4 and using more up-to-date resources would strengthen the method. Inter-rater variability could be improved by thorough training and rigorous protocols within research studies. It would be valuable to investigate the effect on rates of consent of improved communication and sex-matching of the clinical assessor and participant. More work is needed on the use of parental assessments. Studies of growth employing the SITAR method for parameters other than height would be of great value, in particular in relation to foot size. In addition, it would be helpful to determine whether there is scope for measuring foot growth using repeated self-administered questionnaires incorporating shoe size. There is a need for further research to assess the validity of hormonal assays as pubertal assessment methods, especially with regard to frequency of measurements and the use of less invasive methods, such as hair or urine sampling. Voice maturation assessment is another area where further evidence on validity of the method is needed, and development and evaluation of apps such as Speechtest might be useful, not least as they seemed acceptable to young people.

Previous reviews of methods for pubertal assessment agreed that clinical assessment constituted the gold standard method. Coleman and Coleman concluded that self- and parental assessment methods had lower validity than clinical assessment.Reference Coleman and Coleman3 An extensive review by Dorn et al. covered a range of objective and subjective methods and discussed important issues associated with each, proposing an argument that any methods could be used, provided they are appropriate for addressing the research questions in a study.Reference Dorn, Dahl, Woodward and Biro8 Dorn and Biro highlighted the difficulties obtaining consent for clinical assessment and the fact that self-assessment is prone to significant bias and insufficient agreement with clinical assessment, whereas hormonal methods do not lead to adequate Tanner staging.Reference Dorn and Biro4

One important consideration is that in research, accuracy implies proximity of results to the true value. Stages of puberty are artificial constructs, created in order to help to interpret the influence of the timing and tempo of puberty on health in adolescence and beyond. They are the best proxy we have for the ‘true value’ in pubertal development. Staging is an integral part of clinical assessment, which thus can be seen as the most accurate method, in contrast to, for example, growth, hormonal or voice methods, which sometimes attempt to arrive at Tanner stages indirectly. Depending on the method, within- and between-person variability in the parameters can affect the accuracy of estimating corresponding pubertal development stages. Furthermore, Tanner stages are well known and widely used by clinicians and researchers. It should therefore be borne in mind that the use of some other methods, such as the PDS, global question or a visual analogue method in self-assessment might complicate interpretation of study results and their use in comparative analysis and meta-analysis with studies that use Tanner staging.

In this review, we employed methodology that is conventionally used in systematic reviews, including extensive searches of the published literature, strengthened by input from an information specialist, careful screening of potentially relevant abstracts and papers, and detailed data extraction and quality assessment of included studies. This work adds to the existing reviews, incorporating the more recent studies and capturing methodologies not previously included, yet avoiding some of the areas already extensively discussed. We complemented our review with the opinions of experts, with whom the review findings were discussed. We found that there was considerable consistency between the views of the experts and overall conclusions emerging from the literature. Another strength of this work was the incorporation of the views of young people. This allowed triangulation with the review findings, which strengthened our interpretation.

Our review does not cover all research on approaches to pubertal assessment, given that our intention was to focus on methods that might be used in population-based research studies. We searched to saturation to identify relevant methods, but did not attempt an exhaustive review of all studies relating to each of the assessment methods. We did not search the grey literature or contact experts in order to identify relevant unpublished studies, hence publication bias is likely. Given that abstract screening was conducted by a single reviewer, it is possible that not all relevant published studies were identified. There were few validation studies. Descriptions of the methods in some articles lacked detail, presenting challenges for study quality assessment.

Conclusions

This review, complemented by the views of experts and young people, highlighted strengths and limitations of self-assessment methods in research studies, compared with the gold standard of clinical assessment of pubertal development. It has confirmed the barriers to the use of hormonal and radiological methods in this setting, and identified the need for further research into the validity of the promising growth assessment methods, such as foot size measurement, as well as voice maturation assessment. Improved assessment methods would enhance studies examining growth and development through adolescence.

Author ORCIDs

Inna Walker 0000-0002-8460-8130; Justin Davies 0000-0001-7560-6320; Hazel Inskip 0000-0001-8897-1749; Janis Baird 0000-0002-4039-4361

Acknowledgements

We are grateful to Elizabeth Payne (Information Specialist, UK) for her assistance with literature searches, and to Tina Horsfall and Julia Hammond (MRC Lifecourse Epidemiology Unit, University of Southampton, UK) for their input in organising and running the University Hospital Southampton Children’s Panel discussions. Special thanks go to the young people for expressing their views on the methods. We are also grateful to Professor David Dunger (University of Cambridge, UK), Professor Tim Cole and Professor Russell Viner (both of University College London Great Ormond Street Institute of Child Health, UK) for their advice on the topic and presentations at the expert workshop, as well as to the workshop participants for their contributions.

Financial Support

The review of evidence and the work with young people were supported by Cohort and Longitudinal Studies Enhancement Resources (CLOSER), UK (grant reference ES/K000357/1), and the workshop with experts was supported both by CLOSER and the Society for Social Medicine (SSM), UK. Hazel Inskip is supported by the UK Medical Research Council and the NIHR Southampton Biomedical Research Centre, and her work receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements no. 733206 (LifeCycle).

Conflicts of Interest

None.

References

Marshall, WA, Tanner, JM. Variations in pattern of pubertal changes in girls. Arch Dis Child. 1969; 44(235), 291303.10.1136/adc.44.235.291CrossRefGoogle ScholarPubMed
Marshall, WA, Tanner, JM. Variations in the pattern of pubertal changes in boys. Arch Dis Child. 1970; 45(239), 1323.10.1136/adc.45.239.13CrossRefGoogle ScholarPubMed
Coleman, L, Coleman, J. The measurement of puberty: a review. J Adolesc. 2002; 25(5), 535550.CrossRefGoogle ScholarPubMed
Dorn, LD, Biro, FM. Puberty and its measurement: a decade in review. J Res Adolesc. 2011; 21(1), 180195.10.1111/j.1532-7795.2010.00722.xCrossRefGoogle Scholar
Schmitz, KE, Hovell, MF, Nichols, JF, et al. A validation study of early adolescents’ pubertal self-assessments. J Early Adolesc. 2004; 24(4), 357384.10.1177/0272431604268531CrossRefGoogle Scholar
Tanner, JM. Growth at adolescence, 2nd edn, 1962. Blackwell: Oxford.Google Scholar
Petersen, AC, Crockett, L, Richards, M, Boxer, A. A self-report measure of pubertal status: Reliability, validity, and initial norms. J Youth Adolesc. 1988; 17(2), 117133.CrossRefGoogle ScholarPubMed
Dorn, LD, Dahl, RE, Woodward, HR, Biro, F. Defining the boundaries of early adolescence: a user’s guide to assessing pubertal status and pubertal timing in research with adolescents. Appl Dev Sci. 2006; 10(1), 3056.CrossRefGoogle Scholar
Centre for Reviews and Dissemination. Systematic reviews: CRD’s guidance for undertaking reviews in health care. University of York; 2009 [cited 2019 20 February]; Available from: https://www.york.ac.uk/media/crd/Systematic_Reviews.pdf.Google Scholar
Howard, DM. Speechtest. 2014 [cited 2018 24 July]; Available from: https://itunes.apple.com/gb/app/speechtest/id904673964?mt=8.Google Scholar
Norris, SA, Richter, LM. Are there short cuts to pubertal assessments? Self-reported and assessed group differences in pubertal development in African adolescents. J Adolesc Health. 2008; 42(3), 259265.10.1016/j.jadohealth.2007.08.009CrossRefGoogle ScholarPubMed
Bonat, S, Pathomvanich, A, Keil, MF, Field, AE, Yanovski, JA. Self-assessment of pubertal stage in overweight children. Pediatrics. 2002; 110(4), 743747.10.1542/peds.110.4.743CrossRefGoogle ScholarPubMed
Morris, NM, Udry, JR. Validation of a self-administered instrument to assess stage of adolescent development. J Youth Adolesc. 1980; 9(3), 271280.10.1007/BF02088471CrossRefGoogle ScholarPubMed
Brooks-Gunn, J, Warren, MP, Rosso, J, Gargiulo, J. Validity of self-report measures of girls’ pubertal status. Child Dev. 1987; 58(3), 829841.CrossRefGoogle ScholarPubMed
Hergenroeder, AC, Hill, RB, Wong, WW, Sangi-Haghpeykar, H, Taylor, W. Validity of self-assessment of pubertal maturation in African American and European American adolescents. J Adolesc Health. 1999; 24(3), 201205.CrossRefGoogle ScholarPubMed
Carskadon, MA, Acebo, C. A self-administered rating scale for pubertal development. J Adolesc Health. 1993; 14(3), 190195.CrossRefGoogle ScholarPubMed
Desmangles, J-C, Lappe, JM, Lipaczewski, G, Haynatzki, G. Accuracy of pubertal Tanner staging self-reporting. J Pediatr Endocrinol Metab. 2006; 19(3), 213221.CrossRefGoogle ScholarPubMed
Rasmussen, AR, Wohlfahrt-Veje, C, Tefre de Renzy-Martin, K, et al. Validity of self-assessment of pubertal maturation. Pediatrics. 2015; 135(1), 8693.10.1542/peds.2014-0793CrossRefGoogle ScholarPubMed
Norris, SA, Richter, LM. Usefulness and reliability of tanner pubertal self-rating to urban black adolescents in South Africa. J Res Adolesc. 2005; 15(4), 609624.10.1111/j.1532-7795.2005.00113.xCrossRefGoogle Scholar
Bond, L, Clements, J, Bertalli, N, et al. A comparison of self-reported puberty using the Pubertal Development Scale and the Sexual Maturation Scale in a school-based epidemiologic survey. J Adolesc. 2006; 29(5), 709720.CrossRefGoogle Scholar
Rabbani, A, Noorian, S, Fallah, JS, et al. Reliability of pubertal self-assessment method: an Iranian study. Iran J Pediatr. 2013; 23(3), 327332.Google Scholar
Sun, Y, Tao, FB, Su, PY, China Puberty Research Collaboration. Self-assessment of pubertal Tanner stage by realistic colour images in representative Chinese obese and non-obese children and adolescents. Acta Paediatr. 2012; 101(4), e163166.10.1111/j.1651-2227.2011.02568.xCrossRefGoogle ScholarPubMed
Berg-Kelly, K, Erdes, L.Self-assessment of sexual maturity by mid-adolescents based on a global question. Acta Paediatr. 1997; 86(1), 1017.10.1111/j.1651-2227.1997.tb08822.xCrossRefGoogle ScholarPubMed
Lum, S, Bountziouka, V, Harding, S, et al. Assessing pubertal status in multi-ethnic primary schoolchildren. Acta Paediatr. 2015; 104(1), e4548.CrossRefGoogle ScholarPubMed
Pereira, A, Garmendia, ML, Gonzalez, D, et al. Breast bud detection: a validation study in the Chilean growth obesity cohort study. BMC Womens Health. 2014; 14, 96.CrossRefGoogle ScholarPubMed
Karlberg, J, Kwan, C-W, Gelander, L, Albertsson-Wikland, K. Pubertal growth assessment. Horm Res. 2003; 60(Suppl 1), 2735.Google ScholarPubMed
Bundak, R, Darendeliler, F, Gunoz, H, et al. Analysis of puberty and pubertal growth in healthy boys. Eur J Pediatr. 2007; 166(6), 595600.CrossRefGoogle ScholarPubMed
Cole, TJ, Donaldson, MDC, Ben-Shlomo, Y. SITAR--a useful instrument for growth curve analysis. Int J Epidemiol. 2010; 39(6), 15581566.CrossRefGoogle Scholar
Cole, TJ, Pan, H, Butler, GE. A mixed effects model to estimate timing and intensity of pubertal growth from height and secondary sexual characteristics. Ann Hum Biol. 2014; 41(1), 7683.CrossRefGoogle ScholarPubMed
Frysz, M, Howe, LD, Tobias, JH, Paternoster, L. Using SITAR (SuperImposition by Translation and Rotation) to estimate age at peak height velocity in Avon Longitudinal Study of Parents and Children. Wellcome Open Res. 2018; 3, 90.10.12688/wellcomeopenres.14708.1CrossRefGoogle ScholarPubMed
Mitra, S, Samanta, M, Sarkar, M, Chatterjee, S.Foot length as a marker of pubertal onset. Indian Pediatr. 2011; 48(7), 549551.10.1007/s13312-011-0092-zCrossRefGoogle ScholarPubMed
Busscher, I, Kingma, I, Wapstra, FH, et al. The value of shoe size for prediction of the timing of the pubertal growth spurt. Scoliosis. 2011; 6, 11.10.1186/1748-7161-6-1CrossRefGoogle ScholarPubMed
Ford, KR, Khoury, JC, Biro, FM. Early markers of pubertal onset: height and foot size. J Adolesc Health. 2009; 44(5), 500501.CrossRefGoogle ScholarPubMed
Greulich, W, Pyle, SI. Radiographic atlas of skeletal development of the hand and wrist. 2nd edn, 1959. Stanford University Press: Stanford, CACrossRefGoogle Scholar
Tanner, JM, Whitehouse, RH, Cameron, N. Assessment of skeletal maturity and prediction of adult height (TW3 Method). 3rd edn, 2001. W.B Saunders: London.Google Scholar
Cericato, GO, Bittencourt, MAV, Paranhos, LR. Validity of the assessment method of skeletal maturation by cervical vertebrae: a systematic review and meta-analysis. Dentomaxillofac Radiol. 2015; 44(4), 20140270.CrossRefGoogle ScholarPubMed
Canavese, F, Charles, YP, Dimeglio, A, et al. A comparison of the simplified olecranon and digital methods of assessment of skeletal maturity during the pubertal growth spurt. Bone Joint J. 2014; 96-B(11), 15561560.10.1302/0301-620X.96B11.33995CrossRefGoogle ScholarPubMed
Ozer, T, Kama, JD, Ozer, SY. A practical method for determining pubertal growth spurt. Am J Orthod Dentofacial Orthop. 2006; 130(2), 131.e131–136.CrossRefGoogle ScholarPubMed
Apter, D, Bützow, T, Laughlin, GA, Yen, SS. Accelerated 24-hour luteinizing hormone pulsatile activity in adolescent girls with ovarian hyperandrogenism: relevance to the developmental phase of polycystic ovarian syndrome. J Clin Endocrinol Metab. 1994; 79(1), 119125.Google ScholarPubMed
Grumbach, MM, Roth, JC, Kaplan, SL, Kelch, RP. Hypothalamic-pituitary regulation of puberty in man: Evidence and concepts derived from clinical research. In Control of the onset of puberty. 1974; pp. 115166. John Wiley & Sons: New York.Google Scholar
Søeborg, T, Frederiksen, H, Mouritsen, A, et al. Sex, age, pubertal development and use of oral contraceptives in relation to serum concentrations of DHEA, DHEAS, 17α-hydroxyprogesterone, Δ4-androstenedione, testosterone and their ratios in children, adolescents and young adults. Clin Chim Acta. 2014; 437, 613.CrossRefGoogle Scholar
Sehested, A, Andersson, AM, Müller, J, Skakkebaek, NE. Serum inhibin A and inhibin B in central precocious puberty before and during treatment with GnRH agonists. Horm Res. 2000; 54(2), 8491.Google ScholarPubMed
Singh, GKS, Balzer, BWR, Kelly, PJ, et al. Urinary sex steroids and anthropometric markers of puberty—a novel approach to characterising within-person changes of puberty hormones. PLoS One. 2015; 10(11), e0143555.CrossRefGoogle ScholarPubMed
Shirtcliff, EA, Dahl, RE, Pollak, SD. Pubertal development: correspondence between hormonal and physical development. Child Dev. 2009; 80(2), 327337.CrossRefGoogle ScholarPubMed
Ankarberg-Lindgren, C, Norjavaara, E. Changes of diurnal rhythm and levels of total and free testosterone secretion from pre to late puberty in boys: testis size of 3 ml is a transition stage to puberty. Eur J Endocrinol. 2004; 151(6), 747757.10.1530/eje.0.1510747CrossRefGoogle ScholarPubMed
Chada, M, Prusa, R, Bronsky, J, et al. Inhibin B, follicle stimulating hormone, luteinizing hormone and testosterone during childhood and puberty in males: changes in serum concentrations in relation to age and stage of puberty. Physiol Res. 2003; 52(1), 4551.Google ScholarPubMed
Crofton, PM, Evans, AEM, Groome, NP, et al. Dimeric inhibins in girls from birth to adulthood: relationship with age, pubertal stage, FSH and oestradiol. Clin Endocrinol (Oxf). 2002; 56(2), 223230.10.1046/j.0300-0664.2001.01449.xCrossRefGoogle ScholarPubMed
Rockett, JC, Lynch, CD, Buck, GM. Biomarkers for assessing reproductive development and health: Part 1--Pubertal development. Environ Health Perspect. 2004; 112(1), 105112.CrossRefGoogle ScholarPubMed
Brann, DW, Wade, MF, Dhandapani, KM, Mahesh, VB, Buchanan, CD. Leptin and reproduction. Steroids. 2002; 67(2), 95104.10.1016/S0039-128X(01)00138-6CrossRefGoogle ScholarPubMed
Mantzoros, CS, Flier, JS, Rogol, AD. A longitudinal assessment of hormonal and physical alterations during normal puberty in boys. V. Rising leptin levels may signal the onset of puberty. J Clin Endocrinol Metab. 1997; 82(4), 10661070.Google ScholarPubMed
Wang, T, Morioka, I, Gowa, Y, et al. Serum leptin levels in healthy adolescents: effects of gender and growth. Environ Health Prev Med. 2004; 9(2), 4146.CrossRefGoogle Scholar
Carlsson, B, Ankarberg, C, Rosberg, S, et al. Serum leptin concentrations in relation to pubertal development. Arch Dis Child. 1997; 77(5), 396400.CrossRefGoogle ScholarPubMed
Maqsood, AR, Trueman, JA, Whatmore, AJ, et al. The relationship between nocturnal urinary leptin and gonadotrophins as children progress towards puberty. Horm Res. 2007; 68(5), 225230.Google ScholarPubMed
Zaman, N, Hall, CM, Gill, MS, et al. Leptin measurement in urine in children and its relationship to other growth peptides in serum and urine. Clin Endocrinol (Oxf). 2003; 58(1), 7885.10.1046/j.1365-2265.2003.01677.xCrossRefGoogle ScholarPubMed
Harries, ML, Walker, JM, Williams, DM, Hawkins, S, Hughes, IA. Changes in the male voice at puberty. Arch Dis Child. 1997; 77(5), 445447.CrossRefGoogle Scholar
Hodges-Simeon, CR, Gurven, M, Cardenas, RA, Gaulin, SJC. Voice change as a new measure of male pubertal timing: a study among Bolivian adolescents. Ann Hum Biol. 2013; 40(3), 209219.CrossRefGoogle ScholarPubMed
Ong, KK, Bann, D, Wills, AK, et al. Timing of voice breaking in males associated with growth and weight gain across the life course. J Clin Endocrinol Metab. 2012; 97(8), 28442852.CrossRefGoogle ScholarPubMed
Cooksey, JM. The male adolescent changing voice: Some new perspectives. In Research symposium on the male adolescent voice. (ed. Runfola, M), 1984; pp. 459. State University of New York Press: Buffalo.Google Scholar
Dorn, LD. Measuring puberty. J Adolesc Health. 2006; 39(5), 625626.10.1016/j.jadohealth.2006.05.014CrossRefGoogle ScholarPubMed
Carel, JC, Leger, J. Clinical practice. Precocious puberty. N Engl J Med. 2008; 358(22), 23662377.CrossRefGoogle ScholarPubMed
Taylor, SJ, Whincup, PH, Hindmarsh, PC, et al. Performance of a new pubertal self-assessment questionnaire: a preliminary study. Paediatr Perinat Epidemiol. 2001; 15(1), 8894.10.1046/j.1365-3016.2001.00317.xCrossRefGoogle ScholarPubMed
Duke, PM, Litt, IF, Gross, RT. Adolescents’ self-assessment of sexual maturation. Pediatrics. 1980; 66(6), 918920.Google ScholarPubMed
Gerver, WJ, de Bruin, R. Paediatric morphometrics: a reference manual, 2001. University Press Maastricht: Maastricht.Google Scholar
Figure 0

Table 1. Description of studies using self-assessment methods

Figure 1

Table 2. Description of studies using growth methods

Figure 2

Table 3. Description of studies using gonadotrophin or gonadal hormone methods

Figure 3

Table 4. Description of studies using leptin methods

Figure 4

Table 5. Description of studies using voice maturation methods

Figure 5

Table 6. Conclusions from the literature review and opinions of experts and adolescents