Introduction
Depressive symptoms in adolescents are a rising concern (Dick and Ferguson, Reference Dick and Ferguson2015). Depression is more prevalent in females than males, a finding that begins in adolescence and persists throughout adulthood (Angold et al. Reference Angold, Costello and Worthman1998). This gender imbalance in depression is described as one of the most robust findings in epidemiological research (Kuehner, Reference Kuehner2003), nevertheless it has also been called as one of the major unresolved problems in psychiatric epidemiology (Bebbington et al. Reference Bebbington, Dunn, Jenkins, Lewis, Brugha, Farrell and Meltzer1998). The reasons for this post-pubertal-onset gender difference are not fully understood (Kaltiala-Heino et al. Reference Kaltiala-Heino, Kosunen and Rimpela2003; Angold and Costello, Reference Angold and Costello2006; Thapar et al. Reference Thapar, Collishaw, Pine and Thapar2012) and a range of hypotheses have been advanced. Differences in help-seeking behaviour are unlikely to explain the higher rates of depression in girls since this is seen both in nonclinical and clinical samples, and is robust across different methods of assessment (Kessler et al. Reference Kessler, McGonagle, Swartz, Blazer and Nelson1993; Thapar et al. Reference Thapar, Collishaw, Pine and Thapar2012). Biological and psychosocial factors are also proposed to justify the more common onset of depression in females. The effect of sex steroids on the maturating hypothalamic-pituitary-adrenal axis might increase female sensitivity to stress, whereas androgens appear to play a protective role in males (Chaplin et al. Reference Chaplin, Gillham and Seligman2009; Thapar et al. Reference Thapar, Collishaw, Pine and Thapar2012). Psychosocial factors include gender differences in stress coping and/or coping techniques, gender-specific expectations and differences in social cognitive function, with females presenting a greater sensitivity to rejection (Angold et al. Reference Angold, Costello and Worthman1998; Hankin, Reference Hankin2006; Chaplin et al. Reference Chaplin, Gillham and Seligman2009; Naninck et al. Reference Naninck, Lucassen and Bakker2011).
Another possibility for explaining this gender difference found in epidemiological research can be related to the diagnostic criteria or instruments used to assess depression (Santor et al. Reference Santor, Ramsay and Zuroff1994; Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005). It is assumed that symptoms or questions are equally effective in assessing depression in both genders (Santor et al. Reference Santor, Ramsay and Zuroff1994; Salokangas et al. Reference Salokangas, Vaahtera, Pacriev, Sohlman and Lehtinen2002) although it is well recognised that females and males experience and express their symptoms in a different way (Brooks, Reference Brooks2001; Brownhill et al. Reference Brownhill, Wilhelm, Barclay and Schmied2005; Weller et al. Reference Weller, Kloos, Kang and Weller2006; Martin et al. Reference Martin, Neighbors and Griffith2013; Rice et al. Reference Rice, Fallon, Aucote and Moller-Leimkuhler2013). There is a growing suggestion that the current diagnostic criteria and instruments used to assess depression include symptoms that represent a feminine-gender pattern of the disorder, being less sensitive to depression in men (Rochlen et al. Reference Rochlen, Paterniti, Epstein, Duberstein, Willeford and Kravitz2010; Martin et al. Reference Martin, Neighbors and Griffith2013; Castro et al. Reference Castro, Curi, Torman and Riboldi2015), and male-specific depression rating scales have also been proposed (Zierau et al. Reference Zierau, Bille, Rutz and Bech2002; Martin et al. Reference Martin, Neighbors and Griffith2013; Rice et al. Reference Rice, Fallon, Aucote and Moller-Leimkuhler2013).
According to the studies available, adolescent females may more likely disclose their feelings (Breslau et al. Reference Breslau, Javaras, Blacker, Murphy and Normand2008). Data from adult populations show that depressed women often experience changes in appetite, weight gain, carbohydrate craving, sleep disturbances, increased sense of failure or guilt, somatisation, crying and increased anger (Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005; Weller et al. Reference Weller, Kloos, Kang and Weller2006; Romans et al. Reference Romans, Tyas, Cohen and Silverstone2007). Men may experience alternative depressive symptoms. Traditional depressive symptoms (e.g., sadness, crying) are at odds with societal ideals of masculinity and men may be reluctant to report experiencing those symptoms (Castro et al. Reference Castro, Curi, Torman and Riboldi2015). Men's depression commonly manifests itself as anger, impulse control difficulties, anxiety and gender-role discord, irritability, aggression, substance abuse, risk-taking and escaping behaviours, emotional numbness, inability or unwillingness to express emotion, impoverished relationships and suicide (Rutz et al. Reference Rutz, Walinder, Von Knorring, Rihmer and Pihlgren1997; Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005; Brownhill et al. Reference Brownhill, Wilhelm, Barclay and Schmied2005; Weller et al. Reference Weller, Kloos, Kang and Weller2006; Martin et al. Reference Martin, Neighbors and Griffith2013; Rice et al. Reference Rice, Fallon, Aucote and Moller-Leimkuhler2013). Although it is believed that depressed adolescents share gender similarities with depressed adults (Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005; Weller et al. Reference Weller, Kloos, Kang and Weller2006), studies about this issue among adolescents are scarce, especially in Latin countries, since gender patterns and masculinity are expected to be culturally defined. Furthermore, it is still unclear how early such gender specific depressive symptom patterns emerge.
The Beck Depression Inventory (BDI) is widely used among adults and adolescents to measure depressive symptoms and has a strong track record in depression research (Roberts et al. Reference Roberts, Lewinsohn and Seeley1991; Myers and Winters, Reference Myers and Winters2002; Carnevale, Reference Carnevale2011; Stockings et al. Reference Stockings, Degenhardt, Lee, Mihalopoulos, Liu, Hobbs and Patton2015). It assesses cognitive, behavioural, affective and somatic dimensions of depression (Beck et al. Reference Beck, Ward, Mendelson, Mock and Erbaugh1961; Ambrosini et al. Reference Ambrosini, Metz, Bianchi, Rabinovich and Undie1991; Roberts et al. Reference Roberts, Lewinsohn and Seeley1991; Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005), being described as a relevant instrument with robust psychometric properties (Ambrosini et al. Reference Ambrosini, Metz, Bianchi, Rabinovich and Undie1991; Roberts et al. Reference Roberts, Lewinsohn and Seeley1991; Osman et al. Reference Osman, Barrios, Gutierrez, Williams and Bailey2008; Wang and Gorenstein, Reference Wang and Gorenstein2013; Stockings et al. Reference Stockings, Degenhardt, Lee, Mihalopoulos, Liu, Hobbs and Patton2015).
Our aim was to assess sex differences in the intensity of depressive symptoms, measured using the BDI, among Portuguese adolescents, to understand whether the gender difference in the prevalence of depression could be partially explained by the characteristics of the measurement instrument.
Method
This study is part of the Epidemiological Health Investigation of Teenagers in Porto (EPITeen). This population-based cohort was assembled during 2003–2004, when adolescents born in 1990, enrolled at public and private schools of Porto, Portugal, were recruited, as previously described (Ramos and Barros, Reference Ramos and Barros2007). A second evaluation took place when participants were on average 17 years of age (2007–2008). Participants were evaluated at schools and information was collected using self-administered questionnaires and performing a physical examination. In both study waves the procedures were standardised and completed by a team of trained health professionals. Written informed consent was obtained both from the adolescents and their parents or legal guardians. The study was approved by the Ethics Committee of Hospital S. João.
At the recruitment, 2786 eligible participants were identified and 2159 (77.5%) participated (77.9% in public and 77.0% in private schools; p = 0.71). We excluded 171 (7.9%) adolescents with missing information on BDI second edition (BDI-II). In the second wave (at 17 years), 1716 participants (79.5%) were re-evaluated. Further 783 adolescents born in 1990 who moved to the schools in Porto during this time frame were evaluated for the first time at 17 years. We excluded 368 (14.7%) adolescents because of missing information on the BDI-II. The final sample comprised 1988 (52.2% girls) and 2131 (53.0% girls) adolescents at 13 and 17 years, respectively.
Depressive symptomatology measurement
Depressive symptoms at 13 and 17 years of age were assessed using the BDI-II (Beck et al. Reference Beck, Steer and Brown1996). This scale consists of 21 items corresponding to 21 different symptoms, with four (two items with seven) statement responses representing how the respondent has been feeling during the previous 2 weeks and current day (Beck et al. Reference Beck, Steer and Brown1996). The four-response statements are presented in order of increasing severity and are scored from 0 to 3 (two items are provided with alternative statements sharing the same score) (Beck et al. Reference Beck, Steer and Brown1996). Responses are summed, yielding scores from 0 to 63, with higher scores indicating greater depressive symptoms (Beck et al. Reference Beck, Steer and Brown1996).
The BDI-II was previously validated in Portuguese adolescents and scores over 13 are considered to represent significant depressive symptoms (Coelho et al. Reference Coelho, Martins and Barros2002).
Data analysis
Separate analyses were performed at 13 and 17 years of age. Sex differences in the frequency of endorsing the statements on the 21 items of BDI-II were examined using the χ2 test. As higher scores were less frequent (the relative frequency of item scores of 3 fell below 5%), item scores 2 and 3 were combined and three levels of intensity were considered: score 0, score 1 and scores 2–3. We used Cohen's w effect sizes to indicate the magnitude of differences between females and males removing the dependence on sample size, and values of 0.10, 0.30 and 0.50 or greater were considered small, moderate and large effect sizes, respectively (Cohen, Reference Cohen1992).
BDI-II has items, which may work in different ways concerning gender, representing a severe threat to the validity of this measure of intensity of depressive symptoms. To examine whether responses were linked systematically to sex, we used a differential item functioning (DIF) analysis, based on the logistic regression approach (Swaminathan and Rogers, Reference Swaminathan and Rogers1990). The DIF is the unexpected difference in response to a test item between two populations, once the attribute that the test is measuring is controlled (Bares et al. Reference Bares, Andrade, Delva, Grogan-Kaylor and Kamata2012). Since the possible answers correspond to a Likert scale (score 0 to 3) we used the proportional logistic regression to estimate proportional odds ratios (POR) with respective 95% confidence intervals (95% CI). For each of the 21-items we compared the probability of selecting a statement between females and males (females as reference group), adjusting for the total BDI-II score. An item was considered to have DIF when the proportional logistic regression showed a significant association between sex and the probability of selecting a statement (score), after adjusting for BDI-II score. We considered POR >1.5 and POR ≤1/1.5 as clinically relevant.
Option characteristic curves were computed for boys and girls separately for the items with DIF, and were plotted as a function of expected total score. This technique computes an individual's response pattern at each evaluation point with respect to the option characteristic curves estimated from the entire sample (Santor et al. Reference Santor, Ramsay and Zuroff1994).
For each sex and age (13 and 17 years) a new BDI global score was computed excluding items showing evidence of DIF. We use the mean values of the new score multiplied by the original number of items (21 items), to make both scores (complete and without items with DIF) equivalent and comparable. Prevalence of depressive symptoms was calculated with both BDI scores and compared with the McNemar's χ2 test.
All the analysis was performed including only adolescents assessed both at 13 and 17 years. Secondly, in order to evaluate if the similarities between 13 and 17 years could be a consequence of having the same sample evaluated at both moments we analysed, at 17 years, only the adolescents who newly joined the study for the first time at this age (data not shown). As we obtained similar results we used the larger sample.
Statistical analyses were performed using R 2012 (R Foundation for Statistical Computing, Vienna, Austria). We assume a significance level of 0.05.
Results
At 13 years of age, the overall prevalence of depressive symptoms (BDI-II score >13) was 13.4% (18.8% in girls and 7.6% in boys, p < 0.001). The median score (P25–P75) of the depressive symptoms was 6.01 (3.00–11.00) among females and 3.00 (1.01–6.99) among males (p < 0.001). At 17 years of age, the prevalence of depressive symptoms was similar, 12.7%, also with a higher prevalence among females (17.9% v. 6.8%, p < 0.001). The median score (P25–P75) of depressive symptoms was 6.00 (2.00–11.00) and 3.00 (1.00–6.00) among females and males, respectively (p < 0.001).
Differential item functioning
BDI-II showed good reliability at both assessments (at 13 years the Cronbach alpha was 0.88 and 0.86 among females and males, respectively; at 17 years it was 0.87 and 0.84 among females and males, respectively). Table 1 shows the distribution of answers on each of the 21-items of the BDI-II, by sex. Globally, females chose statements with significantly higher scores both at 13 and 17 years, but effect sizes were considered small. Only items 1 and 10 presented moderate effect sizes, both at 13 and 17 years. Item 10 (Crying) revealed the highest differences between sexes, at both ages, with boys selecting the highest category 3-times less frequently than girls. Concerning item 7 (Self-dislike), item 18 (Changes in appetite), item 19 (Concentration difficulty), item 20 (Tiredness or fatigue) and item 21 (Lost of interest in sex), the differences between sexes increased from 13 to 17 years of age, with boys choosing the highest category for the last four items approximately 2-times less frequently than girls at 17 years.
a χ2 test.
After adjustment for global BDI-II score, the differences between sexes were attenuated (Table 2). Item 10 continued to have the highest sex differences at both ages, with boys presenting a lower probability of choosing statements with higher scores (POR = 0.21, 95% CI: 0.15–0.28 at 13 years; POR = 0.27, 95% CI: 0.20–0.35 at 17 years). Item 1 (Sadness) also presented similar results at both ages, with boys having a lower probability of choosing options with higher scores (POR = 0.50, 95% CI: 0.39–0.63 at 13 years; POR = 0.59, 95% CI: 0.46–0.76 at 17 years). At 13 years of age, we also found a significant difference for item 6 (Punishment feelings) and item 21 (Loss of interest in sex); for both items boys presented a higher probability of choosing a higher score. At 17 years old, the differences were for item 7 (Self-dislike) and item 20 (Tiredness or fatigue); in these items, boys presented a lower probability of choosing a higher score. We analysed the data for a significant interaction between sex and BDI-II score and there were no items with non-uniform DIF (data not shown).
Option characteristic curves for items with DIF
The option characteristic curves for items with DIF are represented in Fig. 1. As we have few adolescents with higher scores of depression, the analysis at higher ranges of depression must be made with caution.
Concerning item 1, the statements with higher scores were rarely endorsed and the differences between sexes are related with the probability of choosing the score 1 (I feel sad much of time). This statement was less frequently endorsed by boys, both at 13 and 17 years, independently of the expected total score. Regarding item 10, at both ages, boys tended to more frequently endorse the score 0 (I don't cry anymore than I used to) and girls tended to more frequently choose the statements with scores 2 and 3 than boys. For this item the difference between sexes was higher in the higher scores of depression.
Items 6 and 21 only had sex different performances at 13 years of age. For item 6, few adolescents chose statements with higher scores, so the sex differences are mostly due to the probability of endorsing score 0 (I don't feel I am being punished) and score 1 (I feel I may be punished). For this item the probability of choosing score 1 or higher was greater among boys. In item 21, girls were more likely to choose Option 0 (I have not noticed any recent change in my interest in sex) and the sex difference was higher for higher scores of depression.
The performance of items 7 and 20 differed by sex only at 17 years of age. For item 7, boys were less likely to choose score 1 (I have lost confidence in myself) and score 2 (I am disappointed in myself). At higher levels of depression the probability of choosing score 1 was similar between boys and girls and increases the probability of choosing score 2 among girls. For item 20 the effect is similar to the reported for item 7.
Prevalence of depressive symptoms
The prevalence of depression after excluding the items with DIF (items 1, 6, 10 and 21 at 13 years of age; items 1, 7, 10 and 20 at 17 years of age) was higher in all groups when using the short version (Table 3) but remained higher among girls. Nevertheless, at 17 years the short version attenuated the difference: using the usual 21 items scale the prevalence in girls was 2.6 times higher than in boys, but using the short version the ratio decreased to 2.2 times.
a McNemar's χ2 test with continuity correction.
Discussion
According to our results, females chose statements with higher scores at 13 and 17 years, with item 10 (Crying) revealing the highest differences between sexes, at both ages. However, we found that girls and boys at the same level of depression expressed similar severity ratings for most of the depressive symptoms evaluated, both at 13 and 17 years. Previous studies of youth (Roberts et al. Reference Roberts, Lewinsohn and Seeley1995; Kovacs, Reference Kovacs2001; Masi et al. Reference Masi, Favilla, Mucci, Poli and Romano2001; Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005) and adult (Young et al. Reference Young, Scheftner, Fawcett and Klerman1990) samples made similar observations. However, potential differences emerged and there was evidence for DIF in 4 of the 21 items of the BDI-II at 13 and 17 years of age. At 13 years of age, two items with DIF provided lower scores (sadness and crying items) and the other two higher scores (punishment feelings and loss of interest in sex items) among boys, compared with girls at similar overall levels of depressive symptoms. At 17 years, the four items with DIF provided lower scores among boys (sadness, crying, self-dislike and tiredness or fatigue items).
As at 13 years we have two items more related to the female sex and other two to male sex, this DIF may not be relevant at this age. On the other hand, at 17 years we have four items particularly related with female sex. It is possible that the prevalence of depression is being underestimated among boys and overestimated among girls, as described in previous research, as those items are more related to the female gender socially or biologically than to depression per se (Salokangas et al. Reference Salokangas, Vaahtera, Pacriev, Sohlman and Lehtinen2002; van Beek et al. Reference van Beek, Hessen, Hutteman, Verhulp and van Leuven2012). When sex differences were accounted for (excluding the items with DIF), the gap regarding the prevalence of depressive symptoms among girls and boys at 17 years was reduced. However, since depression was also more than twice common in girls than in boys, at both ages, it is unlikely that this gap is entirely related with the screening instrument used at this assessment. Finding similar results in the group of adolescents who newly joined the study for the first time at 17 years when compared with the adolescents at 17 years who were also evaluated at 13, minimises the limitation related to the dependence of samples between 13 and 17 years.
The lower scores presented by boys concerning sadness and crying items, comparing with equally depressed girls, are consistent with studies of depressed adolescents (Kovacs, Reference Kovacs2001; Bares et al. Reference Bares, Andrade, Delva, Grogan-Kaylor and Kamata2012; van Beek et al. Reference van Beek, Hessen, Hutteman, Verhulp and van Leuven2012) and adults (Young et al. Reference Young, Scheftner, Fawcett and Klerman1990; Castro et al. Reference Castro, Curi, Torman and Riboldi2015). Sadness seems to be a gender-bound emotional reaction not necessarily related to depression (Newmann, Reference Newmann1986; George et al. Reference George, Ketter, Parekh, Herscovitch and Post1996), being less commonly identified and expressed by males due to different socialisation processes (Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005). The same seems to be true for crying. Unlike men, females express their distress more often by crying and use this as a coping mechanism (Williams and Morris, Reference Williams and Morris1996), even during adolescence (Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005; Bares et al. Reference Bares, Andrade, Delva, Grogan-Kaylor and Kamata2012). Prior research also found that boys value less the self-concept and give less importance to fatigue when compared with girls (Stehouwer et al. Reference Stehouwer, Bultsma and Blackford1985; Siegel et al. Reference Siegel, Yancey, Aneshensel and Schuler1999; Khan et al. Reference Khan, Gardner, Prescott and Kendler2002; Bennett et al. Reference Bennett, Ambrosini, Kudes, Metz and Rabinovich2005; van Beek et al. Reference van Beek, Hessen, Hutteman, Verhulp and van Leuven2012). Finding lower scores among boys on the items self-dislike and tiredness or fatigue reinforces those results. A possible explanation for having significant results only at 17 years of age relies on the fact that we had more girls choosing the statements with the highest scores at this age.
We did not find described in literature higher scores among boys on punishment feelings and interest in sex, at 13 years of age. According to adult data, it is possible that for cultural or biological reasons, both depressive and non-depressive girls are less interested in sex than boys (Salokangas et al. Reference Salokangas, Vaahtera, Pacriev, Sohlman and Lehtinen2002). Thus, including questions concerning interest in sex can also give gender-bound biased results concerning depressive symptoms as depressive boys but not depressed girls can express changes in this issue more commonly. The small number of boys at 17 years that choose the statement with the highest score might explain not having significant results at this age.
The strengths of this study consist in its large population-based sample at both ages of evaluation, early and late adolescence, comprising adolescents of a major Portuguese city. Our findings are specific for the BDI-II. Although the BDI is one of the most widely used self-reported measures of depression in research and clinical practice (Wang and Gorenstein, Reference Wang and Gorenstein2013; Stockings et al. Reference Stockings, Degenhardt, Lee, Mihalopoulos, Liu, Hobbs and Patton2015), other depression questionnaires should be examined for similar sex differences. Given the overlap of the construct measured by BDI-II with that of other widely used scales to assess depression (Wang and Gorenstein, Reference Wang and Gorenstein2013) it is expected that instruments including questions related to these items with DIF can also give sex-bound biased results concerning depressive symptoms. However, the results should be viewed in the context of the following limitation. In this method of examining response characteristic curves as a function of depression, the expected total scores are themselves derived from the items we wish to evaluate. We had no formal diagnosis of depression and this self-report measure for screening depression does not necessarily refer to depression of clinical relevance.
In conclusion, sex differences were found in the functioning of the inventory, more relevant at 17 years of age. The use of BDI-II as an instrument for evaluating depressive symptoms may lead to an overestimation of symptoms among girls as well as to lower reported rates of depression among boys. Moreover, BDI-II closely parallels DSM-IV, both including symptoms that represent a feminine-genders pattern of depression. According to DSM-IV, depressed mood is one of two essential symptoms required to make a diagnosis of major depression, indicated by either subjective report (e.g., feels sad or empty) or observation made by others (e.g., appears tearful). As sadness and crying seems to be gender-bound emotional reactions, males with depression may not be identified in clinical practice and consequently untreated. For a higher diagnostic accuracy it is important that the criteria and instruments used to assess depression adequately reflect female and male common symptoms and experiences of depression.
Acknowledgements
The authors would like to thank the families enrolled in EPITeen for their kindness and all the staff envolved in the evaluations for their help and support.
Financial Support
This study was supported through FEDER from the Operational Programme Factors of Competitiveness – COMPETE and through national funding from the Portuguese Foundation for Science and Technology – FCT (Portuguese Ministry of Education and Science) within the project PTDC/DTP-EPI/6506/2014, and by the Epidemiology Research Unit – Institute of Public Health, University of Porto (UID/DTP/047507/2013). Individual grant attributed to CB (SFRH/SINTD/60138/2009) was supported by the Portuguese Foundation for Science and Technology – FCT.
Conflict of Interest
None.
Availability of Data and Materials
Data supporting the findings of this study are available to other investigators by contacting the authors.