Risk Screening Tool for Older Adult Mistreatment in the Domestic Setting
The mistreatment of older adults, which includes abuse and neglect, is a deeply troubling phenomenon which is predicted to escalate given the current health care and social climates (Baker & Heitkemper, Reference Baker and Heitkemper2005; Lachs & Pillemer, Reference Lachs and Pillemer2004). When those most vulnerable in society are mistreated or at risk of mistreatment, early detection is crucial. Unfortunately, many obstacles may be present when the victim is an older adult living at home and being mistreated by a caregiver. Such complexity, especially within the sacred context of the family, renders identification inherently challenging for health care professionals (Lachs & Pillemer, Reference Lachs and Pillemer2004). Screening tools used in practice must be valid and reliable, thereby permitting early identification of mistreatment risk and possibly preventing further escalation (Shugarman, Fries, Wolf, & Morris, Reference Shugarman, Fries, Wolf and Morris2003). This study contributes to the validation of a screening tool to detect risk of mistreatment of older adults; the tool is intended for use by English-speaking professionals visiting older adults in their homes.
Background
In Canada, although 78 per cent of Canadian older adults who require assistance live at home (Cranswick & Dosman, Reference Cranswick and Dosman2008), formal home care services have continued to suffer significant reductions, leaving family members with increased – sometimes unrealistic – responsibilities (National Advisory Council on Aging [NACA], 2006). Many studies have demonstrated that it is indeed family, including adult children or spouses, who are most frequently the perpetrators of mistreatment of the older adult (Choi & Mayer, Reference Choi and Mayer2000; National Research Council [NRC], Reference Bonnie and Wallace2003). For example, an American incidence study found that in almost 90 per cent of cases, the perpetrator was a family member (National Center on Elder Abuse [NCEA], 1998). More recently, a study in Turkey confirmed that the prevalence of mistreatment increased significantly when older adults lived with their spouse or child, who are the most frequent perpetrators (Kissal & Beser, Reference Kissal and Beser2011).
As isolated older adults may eventually come into contact with the health care system, health professionals who visit the home are in an ideal position to detect mistreatment risk (Allan, Reference Allan2002; NCEA, 1998). Although experts believe that mistreatment of older adults is frequent enough to be faced routinely in health professionals’ daily practice (Lachs & Pillemer, Reference Lachs and Pillemer2004), studies demonstrate that many fail to detect it (Fulmer, Guadagno, Dyer, & Connolly, Reference Fulmer, Guadagno, Dyer and Connolly2004; Ortmann, Fechner, Bajanowski, & Brinkmann, Reference Ortmann, Fechner, Bajanowski and Brinkmann2001; World Health Organization/International Network for the Prevention of Elder Abuse [WHO/INPEA], 2002). To aid in detection, it is imperative to increase knowledge of its predisposing risk factors (Anetzberger, Reference Anetzberger2001; Baker, Reference Baker2007). Although risk factors are not causal factors, they are associated with an increased probability of victimization, and the greater their presence in a family milieu, the higher the likelihood that mistreatment will occur (Wolf, Reference Wolf1997). The screening instrument therefore becomes a very important tool for the health professional as it provides for systematic and objective documentation of the phenomenon (Anetzberger, Reference Anetzberger2001).
Although many screening instruments have been developed, most have seen inadequate assessment of their psychometric properties and have significant limitations (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004; Wolf, Reference Wolf2000). Important policy documents have reiterated the need for valid and reliable screening tools for mistreatment of older adults (NCEA, 1998; NRC, Reference Bonnie and Wallace2003; WHO/INPEA, 2002).
Several challenges have impeded progress in the development of valid and reliable instruments for screening of mistreatment. First, the divergence of definitions is believed to have rendered the development of psychometrically valid, reliable measures of mistreatment virtually impossible: the construct of older adult mistreatment must be very clear if it is to be measured (Kozma & Stones, Reference Kozma, Stones and MacLean1995). A related problem has been the creation of instruments based on flawed or incomplete theories (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004; Reis & Nahmiash, Reference Reis and Nahmiash1998; Shugarman et al., Reference Shugarman, Fries, Wolf and Morris2003). For example, if based on the theory of vulnerability, the instrument would only assess for risk factors of vulnerability in the older adult and therefore ignore the characteristics of both the caregiver and the sociocultural environment. This lack of comprehensiveness is problematic (NRC, 2003) insofar as theorists have now concluded that no one theory can explain mistreatment of older adults; consequently, assessment must include all risk factors of mistreatment (Ansello, Reference Ansello, Baumhover and Beall1996).
Some tools are ineffective because of their format. For example, self-report questionnaires that require accurate responses from older adults who suffer from cognitive or emotional difficulties are ineffective (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004). This also applies to instruments that seek to obtain responses directly from a caregiver who may be incapable or unwilling to provide accurate responses (NRC, Reference Bonnie and Wallace2003). In addition, some measures have been adapted from other fields and, therefore, could neglect important factors contributing to the mistreatment of older adults (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004; NRC, Reference Bonnie and Wallace2003). The complexity of mistreatment of older adults demands the use of a valid and reliable tool developed specifically to assess this concept.
Other more comprehensive tools have been developed but suffer from vaguely operationalized indicators (Nagpaul, Reference Nagpaul2001; NRC, Reference Bonnie and Wallace2003). Such ambiguity has led to poor specificity and sensitivity which are unacceptable when dealing with such a serious phenomenon. Finally, some screening instruments have combined two constructs: risk of mistreatment and actual mistreatment of older adults. The identification of those who are mistreated versus those who are at risk of mistreatment requires different methods of assessment, either case identification or screening (Kozma & Stones, Reference Kozma, Stones and MacLean1995). As such instruments may lead to confusion because they lack conceptual clarity, this clear distinction is essential for effective screening (Anetzberger, Reference Anetzberger2001).
The Indicators of Abuse Screen [IOA], designed by Reis and Nahmiash (Reference Reis and Nahmiash1998), is considered an important milestone in the assessment of older adult mistreatment (Wolf, Reference Wolf2000). Developed specifically for use by home visiting social service practitioners, the inclusion of caregiver and environment assessment in the IOA does reflect the complexity of the older adult mistreatment (NRC, Reference Bonnie and Wallace2003; Wolf, Reference Wolf2000). A study of 341 cases supported the validity of the set of IOA indicators, which discriminated 84.4 per cent of “likely abuse” cases and 99.2 per cent of “likely nonabuse” cases (Reis & Nahmiash, Reference Reis and Nahmiash1998). Also, Chronbach alpha testing revealed an excellent internal consistency of 0.92 and 0.91 on two separate samples (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004). However, the 27 items to be assessed require interpretation by the practitioner, and therefore its ability to objectively measure mistreatment has been questioned (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004). Although the specificity of the instrument was 100 per cent, it achieved a sensitivity of 78.4 per cent (Reis & Nahmiash, Reference Reis and Nahmiash1998).
Some experts, commenting on the status of risk assessment of older adult mistreatment, have stated that little progress has been made in this respect because tools remain qualitative assessments based on clinical judgment (Wolf, Reference Wolf2000). As quantitative measurement is preferred for consistency when screening for mistreatment risk (Kozma & Stones, Reference Kozma, Stones and MacLean1995), researchers have declared that improvements in measurement methods are urgently required for progress to occur in the study of older adult mistreatment (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004). These instruments must have acceptable reliability and validity, be appropriate for the varying clinical contexts where mistreatment occurs, and either address screening or case identification (NRC, Reference Bonnie and Wallace2003). Furthermore, accurate and efficient measurement methods are essential given the important consequences of screening for older adults and their caregivers, the potential devastating effects on older adults and their families in the event of false positive or negative findings, and limited resources within the health and social service sectors(Lachs & Pillemer, Reference Lachs and Pillemer2004; NRC, Reference Bonnie and Wallace2003).
The ‘Expanded Indicators of Abuse’ Tool
In 2006, Cohen, Halevi-Levin, Gagin, and Friedman developed the Expanded Indicators of Abuse [e-IOA] tool. The comprehensive e-IOA includes all dimensions of the concept of mistreatment of the older adult. The e-IOA is based on Kosberg and Nahmiash’s (Reference Kosberg, Nahmiash, Baumhover and Beall1996) conceptual framework. Developed to address the complexity of older adult mistreatment, the framework includes four overlapping areas of risk identified in research on mistreatment of older adults. Like its predecessor, the Indicators of Abuse [IOA] (Reis & Nahmiash, Reference Reis and Nahmiash1998), the e-IOA is a semi-structured instrument that comprises 21 risk indicators. To rectify the subjectivity associated with the risk indicators in their original format in the IOA, these were operationalized in the e-IOA with sub- indicators based on the content of relevant academic literature in psychiatry and geriatric social work (Cohen, Halevi-Levin, Gagin, & Friedman, Reference Cohen, Halevi-Levin, Gagin and Friedman2006). The operationalization standardizes the tool and reduces its subjectivity as differences in professional interpretation are avoided (Nagpaul, Reference Nagpaul2001; NRC, Reference Bonnie and Wallace2003). Furthermore, the developers of the e-IOA proposed an instrument permitting early identification of high-risk older adults prior to the appearance of actual indicators, thereby addressing the problem of under-identification (Cohen et al., Reference Cohen, Halevi-Levin, Gagin and Friedman2006). In doing so, prevention, identification of risk, and early intervention by health professionals were made possible (Cohen et al., Reference Cohen, Halevi-Levin, Gagin and Friedman2006). Note that despite reference to indicators of abuse in the title of the tool, the authors considered abuse to also encompass neglect and measured risk factors, not indicators (Cohen et al., Reference Cohen, Halevi-Levin, Gagin and Friedman2006).
Cohen et al. (Reference Cohen, Halevi-Levin, Gagin and Friedman2006) evaluated and confirmed content, criterion, and construct validity as well as inter-rater reliability. Content validity of the e-IOA was first assessed by an interdisciplinary team of health professionals in the fields of psychiatry, social work, and geriatrics. After a pilot test, sub-indicators found to be problematic by the interviewers were restructured. Criterion validity was assessed with a small group of older adults known to social workers as having been mistreated or not. The e-IOA correctly identified 92.7 per cent of those at high risk for mistreatment and 97.9 per cent of the non-mistreated cases. Discriminant validity was assessed by verifying if the e-IOA could differentiate older adults probably mistreated from those probably not mistreated on the basis of a list of evident signs of abuse. High agreement of 93 per cent was obtained between interviewers (Cohen et al., Reference Cohen, Halevi-Levin, Gagin and Friedman2006).
The e-IOA was again evaluated in 2007 along with two other measures: direct questioning of older adults and an assessment conducted using an instrument of evident signs of mistreatment. These three measures were then compared. Because the e-IOA did capture a wider circle of older adults at risk (32.6%) than those who were actually mistreated (21.4%), the authors concluded that it could safely be used in other settings (Cohen, Halevi-Levin, Gagin, & Friedman, Reference Cohen, Halevi-Levin, Gagin and Friedman2007). Caution was expressed, however, about relying on a cutoff score since, of those identified for evident mistreatment, only 74.4 per cent were also at high risk, on the basis of a cutoff of 1.7.
Research Purpose
The aforementioned studies were conducted in Hebrew and in a hospital context. Consequently, a new study was considered necessary to evaluate the validity and reliability of this instrument when used in an English-speaking community context (Fortin, Reference Fortin2006). The authors confirmed having translated the English instrument by a process of back translation as recommended in the literature (Behling & Law, Reference Behling and Law2000). However, the English version had not been validated (M. Cohen, personal communication, August 1, 2008). Therefore, the purpose of this study was to adapt the expanded Indicators of Abuse screening tool and to contribute to the validation of this English adaptation in an Ontario community context. The specific research questions were: (1) What is the content validity of the e-IOA according to experts in mistreatment of the older adult? and (2) What is the inter-rater reliability of the adaptation of the e-IOA?
Research Method
This study assessed two critical psychometric properties: (a) content validity, which is the ability of the items of an instrument to adequately measure the concept chosen for study (Grant & Davis, Reference Grant and Davis1997), and (b) inter-rater reliability, which is the degree of agreement or concordance between raters (Myers & Winters, Reference Myers and Winters2002). As recommended by Waltz, Strickland, and Lenz (Reference Waltz, Strickland and Lenz1991), we requested permission from Dr. Cohen, who developed the e-IOA, to validate the instrument. Also, permission was granted to revise the instrument as per experts’ recommendations in order to reflect the specific community context – that is, Ontario communities (Burns & Grove, Reference Burns and Grove2005). Furthermore, prior to commencing the study, we obtained ethical approval from the Laurentian University REB and that of the participating agencies.
Phase 1 – Determination of Content Validity
A preferred means of evaluating content validity is a review by a panel of experts (Waltz et al., Reference Waltz, Strickland and Lenz1991). In this study, a modified Delphi method defined as a “reactive Delphi” was used as it guided participants to “react” to the e-IOA in its present format and make recommendations to adapt the tool for use within the proposed context of this study (McKenna, Reference McKenna1994). This method consists of a series of sequential rounds of seeking knowledge and expertise from a panel of experts with the aim of sharing and building consensus (Mead & Moseley, Reference Mead and Moseley2001). Three rounds of Delphi were conducted over a seven-month period.
Purposive sampling guided the selection of the content experts. Expertise could stem from empirical knowledge originating from scientific study or experiential knowledge resulting from the work of clinicians in the field (Anetzberger, Reference Anetzberger and Anetzberger2005). Inclusion criteria for this study therefore consisted of knowledge regarding mistreatment of the older adult and, more specifically, the clinical experience of applying this knowledge on a practical level within the community setting (Baker, Lovell, & Harris, Reference Baker, Lovell and Harris2006; Petry, Maes, & Vlaskamp, Reference Petry, Maes and Vlaskamp2007). Experts throughout Ontario in both rural and urban community settings were invited to serve as panellists, providing a heterogeneous sample for all geographical areas of the study (Hardy et al., Reference Hardy, O’Brien, Gaskin, O’Brien, Morrison-Ngatai and Skews2004; Mead & Moseley, Reference Mead and Moseley2001). When utilizing the Delphi process, the quality of the judges is of greater importance than their quantity (Grant & Davis, Reference Grant and Davis1997; Zink & Fisher, Reference Zink and Fisher2007). A total of seven experts who met the inclusion criteria agreed to participate.
Experts received a standardized detailed information package that included the conceptual framework to ensure that the instrument continued to rest upon its theoretical framework (Davis & Grant, Reference Davis and Grant1993). As well, a new Content Review Questionnaire was created for each new round based on the data previously gathered and guided a structured assessment of each sub-indicator (Springer, Abell, & Hudson, Reference Springer, Abell and Hudson2002).
In the first round, panelists judged the elements of a content review: representativeness, clarity, and comprehensiveness (Grant & Davis, Reference Grant and Davis1997). Representativeness of the content coverage was assessed to determine if items reflected, sampled, and measured the construct of risk of older adult mistreatment in the home setting (Berk, Reference Berk1990). Responses ranged from (1) “item is not representative of risk of older adult mistreatment in the home setting and should be removed” to (4) “item is very representative and should be kept”. Item clarity indicated if items were clear, well-written, and distinct (DeVellis, Reference DeVellis1991). Responses were coded as (0) “item is not clear and/or contains inappropriate wording and/or is not easily discriminated from other items and requires revisions to be clear” or (1) “item is clear, without bias and easily discriminated from other items”.
Panellists’ recommendations and descriptive feedback were summarized and used to improve either clarity or representativeness. Alternatively worded sub-indicators were then constructed for ranking in Round 2 (Grant, Kinney, & Guzzetta, Reference Grant, Kinney and Guzzetta1990; Hasson, Keeney, & McKenna, Reference Hasson, Keeney and McKenna2000). The comprehensiveness of the instrument was assessed, and additional sub-indicators were recommended by the judges to ensure that the included items were sufficient to represent the total content domain (Lynn, Reference Lynn1986). These were returned to subsequent rounds for consideration by all (Parks, Cintas, Chaffin, & Gerber, Reference Parks, Cintas, Chaffin and Gerber2007). Finally, experts were asked to rate the appropriateness of the response scales as the original e-IOA contained two different rating scales.
As transparency between rounds of a Delphi review is recommended (Bowles, Reference Bowles1999), feedback consisted of simple statistical summaries (Keeney, Hasson, & McKenna, Reference Keeney, Hasson and McKenna2006). Using SPSS, we calculated the median and semi-quartile range for representativeness (Bello & Singh, Reference Bello, Singh, LoBiodondo-Wood and Haber2004; Statistics Canada, 2009). As clarity was evaluated with a dichotomized 0 or 1 scale, the mode was the most appropriate measure of central tendency.
Round two began with a ranking process (Evans, Reference Evans2005). Judges were provided with the original sub-indicator from the e-IOA along with the alternatively worded sub-indicators constructed from Round 1 feedback and were asked to rank items from 1 to 3 with 1 being “the most comprehensive, representative and clear choice”. The average rank was calculated indicating the preferred wording for that sub-indicator (Siegel & Castellan, Reference Siegel and Castellan1988). To measure the degree of association among these rankings, we applied the Kendall’s coefficient of concordance W, a measure particularly useful when calculating agreement among rankings (Siegel & Castellan, Reference Siegel and Castellan1988). This value ranges from 0 (no agreement) to 1 (complete agreement). Second, for the recommended additions from Round 1, we also calculated the percentage of agreement. We retained only those items with acceptable agreement (60%) for reassessment.
In the third and final round, panelists were asked to judge if the preferred sub-indicators from Round 2 as well as the recommended deletions and additions from the previous rounds should be accepted. Percentage agreement was calculated as a measure of consensus reached after this final round. If the judges were not in agreement to modify, add, or delete the sub-indicator, we instructed them to reject the recommendations with supporting comments. There is a lack of guidance in the literature in regards to the determination of importance in Delphi studies (Hardy et al., Reference Hardy, O’Brien, Gaskin, O’Brien, Morrison-Ngatai and Skews2004). Rounds should continue as required until consensus is reached or until returns begin to diminish (Hasson et al., Reference Hasson, Keeney and McKenna2000). For this study, three rounds were conducted with the occurrence of both events: acceptable consensus and diminished returns.
Phase 2 – Assessment of Inter-Rater Reliability
The assessment of inter-rater reliability was achieved through the application of the adapted tool to various paper case studies. We selected this method versus testing in a natural setting because of the sensitive nature of older adult mistreatment research (i.e., deeply personal, may involve deviance or social control, and may be threatening for older adults and their caregivers; Dresser, Reference Dresser, Bonnie and Wallace2003). As well, to have proceeded with actual vulnerable older adults in the community would have carried numerous ethical challenges at this early stage since the findings could have had potentially devastating legal, financial, and social consequences for the older adult and caregiver (Lachs & Pillemer, Reference Lachs and Pillemer2004; Waltz et al., Reference Waltz, Strickland and Lenz1991). Others have successfully used a similar paper design when dealing with sensitive research with vulnerable participants (Endacott, Clifford, & Tripp, Reference Endacott, Clifford and Tripp1999; Selwood, Cooper, & Livingston, Reference Selwood, Cooper and Livingston2007).
For this study, we chose selected case studies on the basis of their realistic illustration of risk characteristics of the victim, the perpetrator, and the family context. These studies were obtained from a resource guide prepared by the Ontario Network for the Prevention of Elder Abuse (ONPEA), the agency mandated to provide expert consultation on older adult mistreatment at the community level for professionals, volunteers, and seniors (ONPEA, 2006). Therefore, we felt that these case studies reflected accurate and current knowledge of mistreatment of older adults in the community. The case studies were meant to be explicit and clear but not detailed enough to render rating totally predictable or to lead the panel (Endacott et al., Reference Endacott, Clifford and Tripp1999). Various forms of mistreatment of older adults and risk factors were described, and we obtained permission for the use of each study from the ONPEA.
Phase 2 drew on a convenience sample of home visiting nurses employed by a nursing agency in a northeastern Ontario city. The participation of nurses visiting older adults in their homes (versus hospital nurses) was essential as this adaptation and validation was geared to the domestic setting (Springer et al., Reference Springer, Abell and Hudson2002). Selection criteria consisted of registration with the College of Nurses of Ontario as a registered nurse (RN) or a registered practical nurse (RPN) as evidenced by active employment within the nursing agency. Although these two categories have different levels of foundational knowledge, we included both because they may function autonomously in the home setting (College of Nurses of Ontario, 2009) and therefore could encounter instances of mistreatment of older adults. The ability to understand English was also required.
Initial recruitment was carried out using a formal mailed invitation to each nurse followed by a general voice mail message. As the first message was only successful in attracting two nurses, we sent reminders again informing nurses of this opportunity. Washington and Moss (Reference Washington and Moss1988) have recommended a minimum of 10 subjects to adequately assess inter-rater reliability. Despite the numerous strategies we attempted, only six registered nurses attended and completed the packages. Although a larger sample of nurses would have been preferable, the size of the sample is considered to be of less importance than the number of items being measured (Lefrançois, Reference Lefrançois1992). Whereas the instrument consisted of 95 items to be assessed with 10 scenarios by six nurses, we tested each sub-indicator 60 times, resulting in 5,700 observations. This large number of sub-indicators, as well as the nonparametric statistical technique chosen and the ability to test significance of the latter with small samples, satisfied the consulted statistician as to the small sample size.
Due to the limitations imposed by traveling, the training of home visiting nurses was offered in various geographical locations and at different times. Consistency was maintained for the duration of the two-hour sessions with the individual nurses as indicated in the literature (Springer et al., Reference Springer, Abell and Hudson2002; Washington & Moss, Reference Washington and Moss1988). The session began with a Microsoft PowerPoint presentation which consisted of the dynamics of older adult mistreatment, the benefits of screening tools, the adapted instrument, and its conceptual underpinnings (Washington & Moss, Reference Washington and Moss1988). Packages were then distributed containing instructions: 10 case studies with 10 copies of the adapted screening tool. A case demonstration was completed by the researcher, and nurses were then directed to complete their package independently without discussion with other participants, which minimized threats to internal validity (Burns & Grove, Reference Burns and Grove2005). We modified the order of the paper scenarios for each participating nurse in order to prevent maturation with the last scenarios as nurses became more experienced but more fatigued in completing the instrument (Burns & Grove, Reference Burns and Grove2005).
Descriptive statistics were used to describe the sample of participating nurses.
We measured agreement among these six nurses using the Kendall’s coefficient of concordance W, a nonparametric measure strongly recommended in studies of inter-judge reliability and appropriate for ordinal measures (Siegel & Castellan, Reference Siegel and Castellan1988). For each case, the nurses rated the 95 sub-indicators according to a five-item Likert-scale ranking. This permitted the calculation of the Kendall’s W for each of these sub-indicators. Afterwards, the values for that specific sub-indicator for all 10 cases were added and their mean was calculated representing the performance of that specific sub-indicator in the 10 cases by the six judges. In testing the significance of W, critical values have been tabled for an N of 5 (rating options) and a k of 6 (number of nursing raters). For this concordance to be significant at the alpha = .01 level, the observed mean Kendall’s W was required to be .489 or larger (Siegel & Castellan, Reference Siegel and Castellan1988, p. 365).
Results
Phase 1 – Content Validity Assessment and Adaptation of the E-IOA
The panellists had from 2 to 30 years of professional experience. From the perspective of clinical knowledge and expertise, 43 years was the cumulative number of years of experience with older adult mistreatment. The minimum level of educational achievement was a university baccalaureate degree with three experts being masters’ prepared. Both nursing and gerontology backgrounds were reported. Examples of activities demonstrating clinical expertise included (a) directly responding to older adult mistreatment calls in the community; (b) case management of mistreatment of older adults within families; (c) mistreatment education; (d) local, provincial, and national collaboration to develop strategies and response programs for mistreatment; (e) service on various advocacy groups for the older adult; and (f) expert consultation.Moreover, research, authorship, and policy work were listed among this expert panel’s many attributes.
Data collection consisted of three subsequent rounds of Delphi review. All seven experts completed the first round. However, only five experts completed the study. Reasons offered for withdrawal consisted of (a) workload issues not related to the study and (b) personal matters.
Overall, 20 per cent of the items (18/90) were accepted in Round 1 as they met the following three criteria: (a) a median of 4.0 for representativeness, (b) a mode of 1.0 for clarity, and (c) without need for improvement as evidenced by a lack of recommendations for modifications. These are indicated by an asterisk in the Table 1, Round 1. Because these were accepted in this round, they did not reappear in Round 2 in order to prevent the fatigue which may result from a prolonged study (Hasson et al., Reference Hasson, Keeney and McKenna2000). In regards to comprehensiveness, experts recommended 13 additional sub-indicators which were forwarded to the next round for further consideration.
Table 1: Results: Adaptation of “expanded Indicators of Abuse” (e-IOA) Tool throughout Phase 1 – Modified Delphi and Phase 2 – Interrater Reliability of “Mistreatment of Older Adult Risk Factors” (MOARF) Tool
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160626064535-03293-mediumThumb-S0714980812000153_tab1.jpg?pub-status=live)
* = represents three criteria met for acceptance in round 1: (a) a median of 4.0 for representativeness, (b) a mode of 1.0 for clarity, and (c) without need for improvement as evidenced by a lack of recommendations for modifications
** = items that achieved elevated agreement (.760 to 1) and accepted in Round 2
*** = items which did not achieve sufficient interrater agreement in Phase 2
Mdn = Median
MOARF = Mistreatment of Older Adult Risk Factors
SQR = semi-quartile range
In Round 2, the experts ranked the 72 remaining items along with the recommended modifications and the average rank was determined. To measure agreement among the raters, we calculated a Kendall’s coefficient of concordance W. Only 10 items achieved elevated agreement (.760 to 1) and were therefore accepted in this round (see items marked by a double asterisk in Table 1, Round 2). The remaining 62 items did not achieve high consensus among the experts as evidenced by Kendall’s W being under .760, and so we retained them for reassessment. Of the recommended additions, we retained only those with acceptable percentage of agreement (60%) for reassessment (see Table 1).
In the third and final round, the percentages of agreement were as follows: 49 sub-indicators were agreed upon by 100 per cent of the judges, 20 sub-indicators were agreed upon by 80 per cent of the judges, and three sub-indicators received agreement by 60 per cent of the judges. Although these last three sub-indicators did not achieve high agreement, two factors motivated our retaining them in the adapted instrument. First, in all three instances, the sub-indicators were few in number and therefore to remove it would leave very little to measure the indicator. Second, if the item was problematic, it was felt that this would be triggered by the home visiting nurses trialing the instrument (see Table 1, Round 3).
Over the course of the three rounds of Delphi, 60 per cent of the original e-IOA items were revised. Five of the original items were deleted and 10 new items were added, resulting in a 95-item instrument. The experts agreed to keep one of the rating scales from the original tool which measured frequency (very often, often, not so often, very seldom, not at all, not possible to receive information). As well, a new title was deemed essential in order to correctly reflect the measurement of risk versus actual mistreatment and to be comprehensive in measuring both abuse and neglect as represented in the term mistreatment. The adaptation was entitled the “Risk Factors of Elder Mistreatment” screening tool. To be respectful of culture, we replaced “elder” by “older adult”. Therefore, the tool is now entitled “Mistreatment of Older Adult Risk Factors”.
Phase 2 – Inter Rater Reliability Assessment
The adapted instrument was then tested for inter rater reliability by home visiting nurses. The years of general professional experience of the participating nurses ranged from 1 month to 22 years in the home visiting community setting. Half were college graduates while the other half was university prepared. Four of the nurses had never received any training on the topic of mistreatment of the older adult while two had attended one in-service session.
Ninety of the 95 sub-indicators achieved satisfactory agreement among the nurses (see Table 1, Phase 2). The five sub-indicators which did not achieve the critical value of W = .489 for alpha = .01 significance levels are marked by a triple asterisk.
Discussion
This study has contributed to the validation of a screening instrument to detect the risk of mistreatment of older adults in the domestic setting. The Delphi method was successful in harnessing the experiential knowledge of experts, representing various geographical regions of Ontario, in the field of older adult mistreatment. Some changes to the instrument, including modifications, deletions, and additions, will be discussed based on the three structural elements of clarity, representativeness, and comprehensiveness.
The assessment of clarity of the original e-IOA sub-indicators revealed some important flaws such as inappropriate wording. For example, 71.4 per cent of the judges rated the sub-indicator “Is mentally retarded (mark degree of retardation)” as being unclear and considered it to be offensive. The literature indicates that scale items should never be offending to the patient or professional (Berk, Reference Berk1990; Springer et al., Reference Springer, Abell and Hudson2002). The content of the item was changed to “low cognitive functioning”. Improvements to clarity were also required for those sub-indicators that applied to both the caregiver and the older adult. For example, the sub-indicators “Marital/Family Conflicts – Family copes with special problems or trauma (disabled child, recent loss)” were identical for both the caregiver and the older adult. As this instrument is used within a familial context, the clarification of these sub-indicators better reflects current societal stressors facing the modern family (Levine, Reference Levine2003). Such stressors are illustrated in the new wording chosen by the judges: “Caregiver faced with special problems or trauma (disabled child, divorce, recent loss)” and for the older adult, “Older adult faced with special problems or trauma (recent loss, death of spouse, sale of home)”. Such specificity contributes to accurate measurement of the indicator. Clarity of the sub-indicator also meant that it was easily discriminated from other sub-indicators (Grant et al., Reference Grant, Kinney and Guzzetta1990). Lack of clarity resulted in the deletion of four sub-indicators.
The assessment of representativeness resulted in the deletion of the sub-indicator “Older Adult Behavior Problem – Committed to his/her obligations”. The following judges’ comments illustrate the lack of representativeness of this sub-indicator: “Does not fit. What obligations? Excessively loyal to family? Is this related to finances?” It is evident that such an item could create confusion among evaluators because it was not representative of “Older adult behavior problems” contributing to mistreatment risk. Another example is that of “Unspoken family secrets exist” which was changed to “Past history of family violence, conflict, or trauma”. Such wording correctly represents the risk of older adult mistreatment within the family context as studies have indeed identified a poor premorbid relationship as a significant contributor to mistreatment of the older adult (Cooney, Howard, & Lawlor, Reference Cooney, Howard and Lawlor2006).
Lack of comprehensiveness has been a serious flaw of past instruments. To improve this structural element, we sought recommendations for new sub-indicators in the first two rounds and fed them back into subsequent rounds for consideration by all judges. High levels of agreement among judges justified the addition of 10 new sub-indicators. First, “Communication barrier (language barrier, aphasia, stroke, developmental disability)” was added for the caregiver. Its inclusion is supported in the literature because language comprehension and memory deficits have been studied in caregivers who mistreat their older adult care receivers (Miller et al., Reference Miller, Lewis, Williamson, Lance, Dooley and Schultz2006). The experts then recommended adding a similar sub-indicator for the older adult inasmuch as speech and language impediments may impair ability to communicate needs or concerns to caregivers, family, and service providers or to disclose actual mistreatment (Cooney et al., Reference Cooney, Howard and Lawlor2006; Peguero & Lauck, Reference Peguero and Lauck2008).
“Gambling addiction” was then added. Although we do not yet fully understand the impact of gambling addiction within the caregiving relationship, studies are beginning to study its emotional and financial consequences. The research literature does speak of dependence or “other special problems” (Hwalek, Goodrich, & Quinn, Reference Hwalek, Goodrich, Quinn, Baumhover and Beall1996, p. 129). The judges then added “Holds older adult’s power of attorney for property” and “Loss of employment” to address financial dependency risk. This is supported in the literature as researchers Hwalek et al.(Reference Hwalek, Goodrich, Quinn, Baumhover and Beall1996) spoke of “parasitic or opportunistic behaviors” in regards to the older adult’s financial assets (p. 131). Also, research has confirmed financial mistreatment to be the most prevalent type of mistreatment of older adults in Canada (Podnieks, Pillemer, Nicholson, Shillington, & Frizzell, Reference Podnieks, Pillemer, Nicholson, Shillington and Frizzell1990), and it is frequently perpetrated by the victims’ adult children (NCEA, 1998). Next, the experts recommended adding the sub-indicator “Disrespectful of older adult’s cultural preferences” which reflects the recommendations that instruments should be sensitive to culture (Fulmer et al., Reference Fulmer, Guadagno, Dyer and Connolly2004). Although current study findings are inconclusive, factors studied in the literature include values of loyalty, interdependence, and the intergenerational gap resulting from contradictory traditional and modern cultural values (Litwin & Zoabi, Reference Litwin and Zoabi2004).
Although the original e-IOA addressed lack of social support for the older adult, it did not do so for the caregiver. The new sub-indicator “Lack of/poor external supports (community, siblings)” reflects the “reciprocal relationship” between social support for the older adult, the caregiver and mistreatment which explains that health support networks for the caregiver mitigate the risk to the dyad (Peguero & Lauck, Reference Peguero and Lauck2008, p. 65). In the present health care system in which family caregiving is a societal expectation and fiscal restraint a priority, caregiver needs and supports are seldom addressed (NACA, 2006). A perception of inadequate support has been demonstrated to be significantly associated with higher levels of burnout and actual mistreatment of older adults (Almberg & Granfström, Reference Almberg and Grafström1997).Also added was “Knowledge deficit regarding normal aging and/or disease process” which the literature supports as a risk factor (Hwalek et al., Reference Hwalek, Goodrich, Quinn, Baumhover and Beall1996) and which reflects current reality as care continues to be delegated to families despite their lack of knowledge and perhaps complete unsuitability to provide care (Wiles, Reference Wiles2003).
Next, the experts recommended the addition of the sub-indicator “Caregiver restricts access (family, friends, home health services)”. Research has demonstrated that isolation by the caregiver is one of the most significant contributing factors of mistreatment risk of older adults (NCEA, 1998; WHO/INPEA, Reference Bonnie and Wallace2002). The last addition made – “Unrealistic expectations of caregiver (unrealistic multiple requests)” – reflects the fact that many aspects of caregiving can be catalysts for older adult mistreatment, such as the intimacy required for personal care provision; the required commitment in time, effort, and resources; and the lack of respite (Anetzberger, Reference Anetzberger2000).
All additions recommended by the judges are supported by theory and research in this field. Furthermore, these sub-indicators place this instrument within the caregiver and community contexts and are supported by the conceptual framework of Kosberg and Nahmiash (Reference Kosberg, Nahmiash, Baumhover and Beall1996).
We provided the expert judges with a definition of mistreatment as including both abuse and neglect and examined risk factors versus indicators. As a result, they chose a new title for the adaptation which reflects both the concepts of risk and of mistreatment. The adaptation is now entitled the “Mistreatment of Older Adult Risk Factors” (MOARF) screening tool.
During the second phase of this study , the assessment of inter rater reliability of the adapted instrument, the Kendall’s coefficient of concordance W provided a measure of inter-rater reliability. Based on these statistical results, 90 of the 95 sub-indicators achieved acceptable agreement at a .01 significance level. The other five sub-indicators did not achieve acceptable agreement. Discussion of these variations will now be provided with recommendations for problematic sub-indicators.
Despite the positive modifications to the sub-indicators resulting from the Delphi process, some sub-indicators remained subjective. Differences in ratings can be attributed to three primary sources of difficulty: (a) ambiguity related to lack of clarity, (b) inaccurate measurement due to negative wording, and (c) inconsistency related to the rating scale.
In several instances, different understandings of expressions were noted. For example, the acronym IADLs (which is used to refer to independent activities of daily living) and the term “means for living” were perceived differently by the nurses. Some correctly understood “IADLs” to include grocery shopping, bill payment, and meal preparation whereas others did not. Even though all participants were nurses, the researchers should not have assumed that all expressions would be understood by all (Polit & Beck, Reference Polit and Beck2004). Other expressions remained very subjective contributing to poor measurement (Grant & Davis, Reference Grant and Davis1997). For example, the sub-indicator “Has poor coping mechanisms” rated inconsistently because only some participants viewed addictions such as gambling and alcohol abuse to be poor coping mechanisms. To remedy such ambiguity, rephrasing these items would be necessary as well as adding descriptors in parentheses to improve clarity. Uncertainty also resulted when more than one behavior was targeted by a sub-indicator. For example, “Expresses guilt or anger, and bitterness towards the family” described incongruent emotions. Whereas guilt was described with some of the older adults, they were not angry and bitter towards their family. Thus, addressing more than one concept in an item should be avoided (Burns & Grove, Reference Burns and Grove2005).
The negative wording of sub-indicators such as, “Does not attend activities outside the house” combined with negative ratings of “Not at all” also contributed to inaccurate measurement (Burns & Grove, Reference Burns and Grove2005; Polit & Beck, Reference Polit and Beck2004). To resolve this difficulty, item wording could be modified to affirmative statements such as “Attends few/no activities outside the home”.
Finally, in several situations, the nurses agreed on the presence of a sub-indicator but assessments varied in regards to its frequency on the Likert scale. For example, in two cases where the caregiver demonstrated verbal and physical outbursts towards the older adult, the nurses disagreed in regards to frequency. However, when reflecting upon this sub-indicator, the occurrence of such outbursts, regardless of frequency, was considered a risk factor. As the literature indicates that the response format should be matched to the measurement’s purpose (Myers & Winters, Reference Myers and Winters2002), the Likert-type scale truly is not helpful in this sense. Therefore, we recommend a dichotomised Yes/No scale which would serve to flag the presence of a risk factor.
The limitations for both phases of the study merit discussion. First, only five of the seven judges completed the study. Due to fatigue and to the effects of attrition that may result from a prolonged Delphi study (Hasson et al., Reference Hasson, Keeney and McKenna2000), the study would have benefited by starting with a larger panel. As well, a larger sample would have permitted the testing of significance of the second round of Delphi as the k (number of judges) of 5 was insufficient (Siegel & Castellan, Reference Siegel and Castellan1988). In regards to the Delphi method itself, the time required for each round of the study varied from 8 to 10 weeks and therefore, data collection spanned over seven months. There is little guidance in the literature regarding an acceptable level of consensus or recommended number of required rounds, and consequently, the researcher must decide when to stop collecting data based on achieved consensus and the “law of diminishing returns” (Keeney et al., Reference Keeney, Hasson and McKenna2006, p. 207). This study was therefore ended after three rounds with a percentage of agreement of 60 to 100 per cent for the final sub-indicators. Continuation of the study may have resulted in stronger consensus or refinement of the sub-indicators.
In regards to the inter-rater reliability trials, despite various recruitment strategies, only six nurses from one agency participated. It would have been beneficial to enlist the participation of more than one nursing agency to increase the number of participants. The present small convenience sample restricts generalizability of the study (Fu, McDaniel, & Rhodes, Reference Fu, McDaniel and Rhodes2007). The time allotted for the participant sessions was insufficient since the majority of the nurses had not received any prior in servicing on mistreatment of older adults, and considerable training with a new instrument is recommended to ensure inter-rater reliability (Myers & Winters, Reference Myers and Winters2002). As the work required to complete the 10 screening instruments was lengthy, possible haste by the participants may have led to careless assessments (Springer et al., Reference Springer, Abell and Hudson2002).
With respect to the paper scenarios, the content did not sufficiently describe the caregivers. Therefore, when insufficient information prevented actual assessment of a sub-indicator, the nurses rated “Not at all”. Furthermore, some sub-indicators would have performed better with face to face contact such as “Lack of eye contact, reluctant to answer”. Despite this limitation, the chosen design was convenient and assured that the instrument was validated in a controlled environment (Selwood et al., Reference Selwood, Cooper and Livingston2007). Lastly, the content of both risk factors and actual mistreatment in the paper scenarios may have contributed to lower inter-rater agreement.Scenarios strictly describing risk factors may have better served to assess inter-rater reliability of this risk screening instrument. For the paper scenario design to be effective, specific scenarios could be built based on seminal works in the literature on mistreatment risk of older adults (Endacott et al., Reference Endacott, Clifford and Tripp1999).
To further refine its psychometric properties, the MOARF requires further revision and revalidation before it can be used. Such work should initially consist of reconstructing the problematic sub-indicators based on the findings of this study and implementing a dichotomized Yes/No rating scale. Afterwards, a pilot study of the revised MOARF with a larger sample of home visiting nurses would be required. To correctly use a paper case design for the pilot, new scenarios should be developed focusing strictly on risk factors and containing sufficient information to properly assess each sub-indicator. Once satisfactory inter-rater reliability was established, actual testing of the MOARF with a large sample of older adults receiving home health services would be possible. This subsequent study would permit the length of the instrument to be revised as it may not be suitable for busy health care professionals in its present format of 95 sub-indicators.
Conclusion
The original e-IOA was developed to be used in an acute hospital setting by social workers to assess the risk of mistreatment with Hebrew-speaking older adults and their caregivers. The modifications recommended by the experts in this study have contributed to the tool’s content validity in a new context, namely a community context in Ontario, Canada. Sub-indicators are clearer and more representative of the risk of older adult mistreatment in the community home setting. The instrument’s comprehensiveness has been augmented because the tool’s conceptual underpinnings are now more closely reflected in the additional sub-indicators being measured. Despite the study limitations, elevated Kendall’s W values for most sub-indicators in the inter-rater trials by the home visiting nurses suggest that this is a positive step towards the development of a valid and reliable screening tool for risk of older adult mistreatment in the domestic setting.
Acknowledgements
The author wishes to acknowledge the assistance of her master’s advisory committee in the preparation of this manuscript.