Introduction
The Family Satisfaction with End-of-Life Care (FAMCARE) scale (Kristjanson, Reference Kristjanson1986, Reference Kristjanson1989), although used most widely with cancer patients in palliative care, has also been applied to a range of serious illness (Hwang et al., Reference Hwang, Chang and Alejandro2003), including caregivers to patients with Alzheimer's disease (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2019) and residents in long-term care (Rodriguez et al., Reference Rodriguez, Bayliss and Jaffe2010). The FAMCARE is used widely internationally as a quality measure of end-of-life care in clinical and research settings, and translations are available in many languages, including Italian (D'Angelo et al., Reference D'Angelo, Punziano and Mastroianni2017), Spanish (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2019), and Swedish (Ljungberg et al., Reference Ljungberg, Fossum and First2015). Although the psychometric properties of the scale have been examined in cancer patients in diverse settings internationally, little evidence exists regarding measurement equivalence across ethnically diverse groups. There is also little experience with the scale among individuals with different diseases such as Alzheimer's disease and related disorders (ADRD) or among ethnic subgroups, including Spanish speakers and caregivers. While several studies have examined the relationship of demographic characteristics to satisfaction with end-of-life care (Kristjanson, Reference Kristjanson1993; Lo et al., Reference Lo, Burman and Rodin2009; Aoun et al., Reference Aoun, Bird and Kristjanson2010), no studies have examined these characteristics in terms of measurement equivalence in Hispanic samples.
A study of measurement equivalence comparing Black with White non-Hispanic caregivers of patients with cancer found that 13 items evidenced differential item functioning (DIF), a type of item bias; however, none of high magnitude (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2015). Moreover, the scale-level impact was negligible. One item related to pain relief evidenced DIF for race and education and was also hypothesized to show DIF. To our knowledge, no other studies have examined the FAMCARE for equivalence of item endorsement across different socio-demographic groups using item response theory (IRT) methods to detect DIF. Thus, the aim of this set of analyses was to examine the psychometric properties of the scale in a sample of Hispanics using latent variable models and to obtain information on DIF to place in an existing item bank on family satisfaction and care transitions.
Methods
Qualitative
Qualitative methods, including content analyses and cognitive interviews, were used to develop Spanish translations for use among Spanish speakers (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2019). The first step in the evaluation of DIF is the generation of a priori hypotheses regarding potential group differences in item responses, conditional on the trait. Hypotheses regarding potential racial/ethnic group differences in item response were established qualitatively by a panel of content experts. The following instructions related to hypotheses generation were given.
Differential item functioning means that individuals in groups with the same underlying trait (state) level will have different probabilities of endorsing an item. Put another way, item endorsement should depend only on the level of the trait (state), e.g., satisfaction, and not on membership in a group, e.g., race/ethnicity. Very specifically, randomly selected persons from each of two groups (e.g., minority and non-minority) who are at the same (e.g., mild) level of satisfaction should have the same likelihood of reporting being very satisfied with the aspects of care provided. If it is hypothesized that this is not the case, it would be hypothesized that the item has DIF with respect to race/ethnicity.
The rationale for DIF hypotheses is that items may be posited to have a different meaning for some individuals and may measure a trait that is not expected. Thus, the item could perform differently for some groups, conditional on the trait.
Quantitative analyses and tests of DIF hypotheses
The graded (Samejima, Reference Samejima1969) form of the IRT model (Lord and Novick, Reference Lord and Novick1968; Lord, Reference Lord1980; Hambleton et al., Reference Hambleton, Swaminathan and Roger1991) was used for the analyses of DIF. An item shows DIF if people from different subgroups but at the same level of satisfaction have unequal probabilities of endorsement. The item characteristic curve (ICC) that relates the probability of an item response to the underlying state, e.g., satisfaction, measured by the item set can be characterized by two parameters: location (denoted b and also called threshold, difficulty, or severity) and a discrimination parameter (denoted a) that is proportional to the slope of the curve. DIF analyses approaches to assessment of patient and caregiver-reported outcomes using IRT are described in Orlando-Edelen et al. (Reference Orlando-Edelen, Thissen and Teresi2006). The Wald test was used for examination of group differences in IRT item parameters (Lord, Reference Lord1980; Teresi et al., Reference Teresi, Kleinman and Ocepek-Welikson2000; Cai et al., Reference Cai, Thissen and du Toit2011) accompanied by magnitude measures (Thissen et al., Reference Thissen, Steinberg, Wainer, Holland and Wainer1993; Raju et al., Reference Raju, van der Linden and Fleer1995; Kleinman and Teresi, Reference Kleinman and Teresi2016).
Uniform DIF is detected when the b parameters differ because the direction of the DIF (more or less severe) for one group as contrasted with a comparison group is the same across the latent continuum. If the a parameters differ, this result is called non-uniform DIF because the ICC curves cross and the direction of DIF can differ across the latent continuum. Non-uniform DIF occurs when the probability of response is in a different direction for the reference and focal groups, at different levels of the latent trait (θ). For example, Hispanic persons may have a lower probability than White, non-Hispanic persons of endorsing a satisfaction item at low levels of the satisfaction trait and higher probabilities of an endorsement than White, non-Hispanic persons at higher levels. If non-uniform DIF is detected in the context of the IRT method, this finding assumes primacy over findings of uniform DIF because tests for group differences in the a parameters are followed by conditional tests of the b parameters (tests of b parameters are performed, constraining the a parameters to be equal).
An iterative process was used in the selection of the anchor items for theta estimation. There are several methods for selecting anchor items, assumed to be DIF-free (Orlando-Edelen et al., Reference Orlando-Edelen, Thissen and Teresi2006; Woods, Reference Woods2009; Wang et al., Reference Wang, Shih and Sun2012). The approach that was used in these analyses was a modified “all-other” method in which initial DIF estimates were obtained by treating each item as a “studied” item while using the remainder as “anchor” items. The purification process was also iterative, and items identified as DIF-free were those included in the final anchor set. IRTPRO, version 3.1 option 3, which permits the all-other approach for the multiple group case was used. This (Wald-type) procedure is more robust than just relying on the all-other anchor procedure and may take several iterations.
The final P values testing for DIF were adjusted using the Bonferroni method (Bonferroni, Reference Bonferroni1936). Other methods such as Benjamini–Hochberg (B-H; Benjamini and Hochberg, Reference Benjamini and Hochberg1995; Thissen et al., Reference Thissen, Steinberg and Kuang2002) have been used in sensitivity analyses for many of our studies. Generally, the results are almost identical. Thus, the Bonferroni method was selected as the primary approach for adjustment for multiple comparisons.
Model assumptions and fit: Exploratory and confirmatory factor analyses (Asparouhov and Muthén, Reference Asparouhov and Muthén2009) to examine dimensionality were performed within each subgroup studied, and fit indices (Bentler, Reference Bentler1990) examined. Additionally, the explained common variance (ECV) was used as an indicator of unidimensionality. The ECV (Sijtsma, Reference Sijtsma2009), estimated as the percent of observed variance explained (Reise, Reference Reise2012), can be calculated as the ratio of the first eigenvalue to the sum of all eigenvalues extracted (see Reise et al., Reference Reise, Moore and Haviland2010).
Local independence requires that all pairs of item responses be independent, conditional on the latent trait. Local dependency (LD) was examined using the methods of Chen and Thissen (Reference Chen and Thissen1997). A suggested cutoff indicative of potential LD is 10 (Chen and Thissen, Reference Chen and Thissen1997; Cai et al., Reference Cai, Thissen and du Toit2011). This approach is based on a comparison of observed and expected frequencies derived from item-by-item two-way cross-tabulations; the likelihood ratio statistic resulting from this comparison is chi-square distributed. LD statistics are affected by sample size and increase in value with the increased sample size. Thus, to ensure comparability in sample sizes between the Hispanic and non-Hispanic White sample, a random sample of the White non-Hispanic group comparable in size to that of the Hispanic sample was selected. The root mean square error of approximation (RMSEA) was examined for both confirmatory factor analyses and IRT model fit.
The best methods and criteria for cutoff values for goodness of fit statistics have been debated (e.g., Cook et al., Reference Cook, Kallen and Amtmann2009), with recommendations to not be overly reliant on specific values, given the many factors that may affect these statistics. The following model fit statistics and criteria for goodness of fit (Bentler, Reference Bentler1990) provided general guidelines, and included the comparative fit index (CFI; Bentler, Reference Bentler1990; CFI > 0.95), Tucker–Lewis index (TLI; Tucker and Lewis, Reference Tucker and Lewis1973; TLI > 0.95), and the root mean square error of approximation (RMSEA < 0.06).
Evaluation of DIF magnitude and impact: Expected item scores were measures of magnitude. A method for quantification of the difference in the average expected item scores is the non-compensatory DIF (NCDIF) index used by Raju et al. (Reference Raju, van der Linden and Fleer1995). NCDIF is expressed as the average squared difference in expected scores for individuals as members of the focal group and as members of the reference group. The cutoff recommended as indicative of high DIF magnitude is 0.024 for polytomous items with three response options. An additional effect size measure (T1) proposed by Wainer (Reference Wainer, Holland and Wainer1993) and extended for polytomous data by Kim et al. (Reference Kim, Cohen and Alagoz2007) was also examined; however, primary reliance was on the NCDIF magnitude measure because little research has been conducted on the performance of T1. The use of these statistics is explicated in Kleinman and Teresi (Reference Kleinman and Teresi2016) and Teresi et al. (Reference Teresi, Ocepek-Welikson and Kleinman2007).
Expected scale scores that provide information about the effect of DIF on the total score were calculated by summing the expected item scores. Group differences in these scale response functions provide overall aggregated measures of impact.
DIF sensitivity analyses: Sensitivity analyses using a different method was conducted using an ordinal logistic regression approach with a latent conditioning variable; lordif, version 0.3-3 (Choi et al., Reference Choi, Gibbons and Crane2011) was used. This method was used to flag consistent DIF identified by both methods that might be salient based on magnitude and impact measures.
Additionally, sensitivity analyses were conducted comparing only Spanish speakers to White, non-Hispanic English speakers.
Reliability and information: Reliability was evaluated with McDonald's omega total (ω t; McDonald, Reference McDonald1999); this estimate is based on the proportion of total common variance explained. Reliability estimates were also calculated for various points along the latent continuum of family satisfaction using IRT. IRT also provides estimates of the information provided by items and scales. This item information can be used to select items for short-form measures. Additionally, information function parameters stored in item banks are used to generate computerized adaptive tests that tailor item selection to target the respondent's level of the trait based on responses to a starting item and to other items administered.
MPlus, version 6.11 (Muthén and Muthén, Reference Muthén and Muthén2011) was used for factor analyses and IRTPRO, version 3.12 (Cai et al., Reference Cai, Thissen and du Toit2011) for IRT item parameter estimation and DIF analyses. Item level magnitude using NCDIF (Fleer, Reference Fleer1993; Raju et al., Reference Raju, van der Linden and Fleer1995; Flowers et al., Reference Flowers, Oshima and Raju1999; Morales et al., Reference Morales, Flowers and Gutierrez2006) was estimated using MAGNITS (Kleinman and Teresi, Reference Kleinman and Teresi2016). Scale-level impact was evaluated using lordif, version 0.3-3 (Choi et al., Reference Choi, Gibbons and Crane2011) in the psych package in R. Reliability estimated with McDonald's omega was also calculated with R version 3.4.4 (R core team, 2018).
Measure
The short-form FAMCARE used in these analyses was based on earlier work (Teresi et al., Reference Teresi, Ornstein and Ramirez2014) with advanced psychometric methods. This work showed that lower categories were overlapping such that the probability of response was similar for the three categories: “very dissatisfied,” “dissatisfied,” and “undecided,” indicating little if any unique information provided by these categories. Thus, items were coded as ordinal and collapsed as follows: “very satisfied” responses were coded as 2, “satisfied” as 1 and “not satisfied” (indecision or “dissatisfaction”) as 0, with a resulting sum score from 0 to 20. The item analyses were thus performed with three ordinal response categories.
Sample
There were 1,834 respondents, 317 Hispanics, and 1,517 non-Hispanic Whites; among the Hispanic sample, 209 were interviewed in Spanish. For these analyses, the Hispanic Spanish and English speakers were combined because not enough respondents were interviewed in English to perform a separate DIF analysis by the language of administration. The Hispanic sample was comprised of caregivers to patients with Alzheimer's disease (study period June 1, 2013 through March 31, 2019), while the White non-Hispanic sample was comprised of caregivers to cancer patients (study period September 30, 2006 through July 31, 2013). A larger proportion of the Hispanic (83%) as contrasted with the non-Hispanic caregiver sample (55%) was female and younger (74% were below age 65 as contrasted with 62% of the non-Hispanic Whites; see Table 1). Among the Hispanic caregiver sample, 45% had some post-high school education and 24% had 0–11 years, as contrasted with the White non-Hispanic sample for which only 11% had less than high school education. More of the Hispanic sample of caregivers (77%) than the White non-Hispanic caregiver sample (54%) lived with the patient. The average age of the non-Hispanic White care recipients was 60.7 (11.6) as contrasted with the Hispanic care recipients with an average age of 79.9 (8.9).
Table 1. Demographic characteristics of the caregivers and care recipients for the White and Hispanic samples
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tab1.png?pub-status=live)
Sample size: Hispanic responders (n = 317); non-Hispanic White responders (n = 1,517); total (n = 1,834). Data were not available for care recipient education for the Hispanic sample.
This study was approved by the Institutional Review Board (IRB) at Mount Sinai Medical Center (study reported at https://projectreporter.nih.gov/project_info_description.cfm?aid=7892314) and at Columbia University Medical Center (protocols IRB-AAAL7251, IRB-AAAM5150), reported at https://projectreporter.nih.gov/project_info_description.cfm?aid=9251192&icde=43514731&ddparam=&ddvalue=&ddsub=&cr=10&csb=default&cs=ASC&MMOpt=.
Results
Qualitative
The DIF hypotheses were posited with respect to race/ethnicity and language. Although the majority (two-thirds) were interviewed in Spanish, the sample size was too small to examine language within the Hispanic subgroup. Thus, the hypotheses regarding ethnicity were relevant to these analyses. With respect to race/ethnicity, 5 items out of 10 were hypothesized to evidence DIF, however only 2 with a direction given: “The way the family is included in treatment and care decisions” and “Information given about the patient's tests.” These were hypothesized to be more likely endorsed in the dissatisfied direction, conditional on the trait by minority than by White respondents.
Quantitative
Model assumptions: As shown by the eigenvalue ratios in Table 2, there was strong support for essential unidimensionality for the total sample and both subgroups, Hispanic and non-Hispanic White responders. All three ratios of component 1–2 were large (total sample — 19.5; non-Hispanic White responders — 16.1; Hispanic responders — 33.9). The first component accounted for between 74% and 89% of the variance for all groups, supporting the essential unidimensionality of the item set across comparison subgroups. The RMSEA index from the MPlus analysis was 0.10 for the total sample and for both demographic groups. The RMSEA indices from the IRTPRO estimation were slightly lower ranging from 0.08 to 0.09. The CFIs ranged from 0.988 to 0.997. The ECVs ranged from 92.66 to 96.77 (see Table 3).
Table 2. Eigenvalues from the exploratory factor analysis using principal components estimation and fit indices from confirmatory factor analysesa (MPlus)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tab2.png?pub-status=live)
Model fit statistics: comparative fit index (CFI); Tucker–Lewis index (TLI), and root mean square error of approximation (RMSEA) from MPlus and RMSEA from IRTPRO.
a Geomin (oblique) rotation and fit statistics for one factor solutions.
b Based on M2 statistics which are based on full marginal tables.
Table 3. Reliability and dimensionality estimates
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tab3.png?pub-status=live)
All analyses based on polychoric correlations.
In general, the LD statistics (Chen and Thissen, Reference Chen and Thissen1997) were in the acceptable range for Hispanics, and over the threshold for the non-Hispanic White sample. There were five instances of LDs above 10 for the White non-Hispanic sample (see Appendix Table A1): items 2 (availability of doctors) and 8 (doctor assesses symptoms; 15.9); items 3 (coordination of care) and 4 (time required to make diagnosis; 13.2); items 5 (families included in treatment) and 8 (doctor assesses symptoms; 14.6); items 6 (information given about management of pain) and 10 (availability of the doctor; 14.5); and items 9 (tests and treatments followed up by doctor) and 10 (availability of the doctor; 12.2). These values did not appear to inflate the magnitude of the discrimination parameters, and the values were relatively low; thus, it was concluded that they did not require action.
The reliability estimates were high. The omega total values ranged from 0.962 to 0.986, and the ordinal alphas ranged from 0.961 to 0.985 (see Table 3). The classical test theory estimated Cronbach's alpha for the total sample was 0.95 for both non-standardized and standardized calculations. The corrected item-total correlations ranged from 0.72 to 0.83 (see Appendix Table A2). The internal consistency for those interviewed in English and Spanish were 0.96 and 0.97, respectively.
IRT-based reliability: The reliability estimates calculated along the satisfaction continuum were >0.90 in the range of theta from −2.0 to 0.8. Estimates were slightly lower at the dissatisfied tail (0.80, 0.83, 0.84 across the total, non-Hispanic White, and Hispanic subsamples) as well as the very satisfied range of the distribution. The overall reliability estimates were 0.90 for the total sample, 0.91 for the non-Hispanic White, and 0.93 for the Hispanic subgroup (see Table 4).
Table 4. IRT reliability estimates at varying levels of the attribute (theta) estimate based on results of the IRT analysis (IRTPRO)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tab4.png?pub-status=live)
Note: Reliability estimates were calculated for theta levels for which there are respondents.
The information function for the items and overall scale for the total sample were bimodal with the highest peaks at theta levels −1.2 and 0.4. The most informative item was “The way tests and treatments are followed up by the doctor” (item 9), and the least informative item was “Coordination of care” (item 3; see Figure 1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_fig1.png?pub-status=live)
Fig. 1. FAMCARE: scale and item information functions.
The analyses of DIF showed that three items evidenced DIF consistently by two methods: IRTPRO and lordif (see Table 5 and Appendix Table A3). However, only one item was flagged as significant by both methods. After the Bonferroni adjustment, non-uniform DIF was flagged with IRTPRO for the item, “The way the family is included in treatment and care decisions” (item 5). The item was more discriminating (more highly related to the satisfaction state) for the non-Hispanic, White responders than for the Hispanic subsample, and was also a more severe indicator (higher difficulty parameter) for this group at specific levels of the trait; the non-Hispanic White responders were less satisfied at higher levels of the satisfaction trait.
Table 5. Summary of DIF hypotheses and analyses
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tab5.png?pub-status=live)
All NCDIF values were smaller than the threshold (0.0240); the range was from 0.0001 to 0.0057 and none of the T1 statistics were significant.
NU, non-uniform DIF involving the discrimination parameters; U, uniform DIF involving the location parameters.
a The numbers in bold are the number positing DIF. Not all provided a direction to the hypothesis; only those with a direction are presented.
* Significant after Bonferroni correction.
The items “Information given about how to manage the patient's pain” (item 6) and “Information given about the patient's tests” (item 7) were identified with uniform DIF by IRTPRO; however, the result was not significant after application of the Bonferroni adjustment for multiple comparisons. The item “Information given about patient's tests” was also flagged for uniform DIF by lordif. Lordif identified non-uniform DIF for both items, after the adjustment; the items were more discriminating for the Hispanic responders (see Appendix Figure A1). The magnitude of DIF was small; all NCDIF and T1 statistics were below threshold (see Table 5). The impact of DIF was negligible, as shown by the overlapping curves (see Appendix Figure A2).
Language sensitivity analysis: Sensitivity DIF analysis was performed comparing the White non-Hispanic group to Spanish-speaking Hispanics alone (see Appendix Table A4). The results were similar to those of the main analyses. Three items showed DIF, two the same as in prior analysis. No DIF comparisons were significant after the Bonferroni correction.
Discussion
The FAMCARE scale, although extensively used to assess satisfaction with care for cancer patients, has also been applied to palliative care, including caregivers to patients with Alzheimer's disease. The psychometric properties of the FAMCARE have been examined in cancer patients in diverse settings internationally, including the relationship of demographic characteristics to satisfaction with end-of-life care. However, little evidence exists concerning measurement equivalence across ethnically diverse groups, particularly in Hispanic samples.
These analyses identified only one item with consistent DIF after Bonferroni correction: item 5, “The way the family is included in treatment decisions.” No items evidenced salient DIF.
Although the two groups examined in this study differ in disease type, we argue that the two groups have in common that they are caregivers to individuals with serious illness and poor prognosis. The diseases are different; however, it was not posited that the different diseases would result in DIF. It was posited that cultural and language differences can have an impact on item meaning and response. An advantage of IRT is that it produces arguably more invariant parameters that can be compared because they are sample independent. Philosophically, DIF can be examined with IRT across many groups differing in socio-demographic characteristics; however, it is important to present a rationale for such analyses.
Examination of the hypotheses for the qualitative analyses in conjunction with the quantitative analyses showed that two items were posited to evidence DIF for ethnic/race groups. In general, minority groups were hypothesized to express less satisfaction than White groups, conditional on overall satisfaction. Content experts posited directional race/ethnicity hypotheses for the item that evidenced consistent DIF: “The way the family is included in treatment and care decisions” (item 5). It was posited that minority group members would be less satisfied, conditional on the trait. Contrary to the hypotheses, item 5 showed non-uniform DIF, and the uniform DIF observed was in the opposite direction of that hypothesized. As noted, this item did not reach the criteria for salient DIF. Because the experts used their clinical experience when establishing hypotheses, it is possible that they took into account the potential language barrier when suggesting lower satisfaction for Hispanics, in contrast to their White counterparts. It may be that they felt, even at the same levels of satisfaction, Hispanics might respond in a more dissatisfied direction because of general health disparities and health care disparities, both real and perceived. Although there is no literature on the FAMCARE in a sample of Hispanic caregivers to persons with ADRD, earlier work on ethnically diverse caregivers may have informed the hypotheses. In contrast to the findings reported here, in an earlier paper on DIF in the FAMCARE (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2015), Black responders reported less satisfaction with their care, conditional on the trait.
The non-uniform DIF observed showed that conditional on overall satisfaction, the reported satisfaction for Hispanics was not constant (see the crossing item response curves); thus, supporting the dissatisfied direction posited by the experts for some satisfaction levels. This hypothesis is consistent with research evidence suggesting that Hispanics tend to endorse the extreme response categories in surveys (Clarke, Reference Clarke2000) potentially due to cultural values that relate such response style with demonstrating trustworthiness (McHorney and Fleishman, Reference McHorney and Fleishman2006).
A confirmatory directional hypothesis was not given for the item related to information about management of pain. However, in an earlier study a similar item, “Satisfaction with the patient's pain relief” was found to show DIF for the comparison of Blacks and White non-Hispanics. In that study, it was found that conditional on the satisfaction level, caregivers of Black patients were less satisfied with pain relief (Teresi et al., Reference Teresi, Ocepek-Welikson and Ramirez2015), a finding corresponding to findings of racial and ethnic disparities in pain treatment identified by Green et al. (Reference Green, Anderson and Baker2003). It is possible that the content experts posited the presence of an unmeasured secondary extraneous factor such as personal experiences that may have influenced responses to satisfaction items.
Strengths and limitations
Limitations of the study include the small number of Hispanics interviewed in English which did not permit systematic analyses of this group. The inability to perform other subgroup analyses due to sample size restrictions is also a limitation. As pointed out by a reviewer, the overlapping information curves and high corrected item-total correlations may be indicative of redundancy in the item set for this sample. IRT-based reliability estimates provided at varying points along the satisfaction trait continuum yielded somewhat lower reliability estimates, particularly at the tails of the distribution. Thus, while omnibus summary reliability estimates appear to show uniform item performance, the scale was not uniformly reliable across the trait; however, it is emphasized that estimates were above 0.80 for nearly all theta points for which reliability was estimated.
Strengths of the study include the provision of information for placement in an item bank on family satisfaction and care transitions. Such a bank was used to develop the short-form of the FAMCARE (Ornstein et al., Reference Ornstein, Teresi and Ocepek Welikson2015) used in these analyses. Additionally, the short-form version developed with IRT was used to develop a Japanese translation (Ito and Tadaka, Reference Ito and Tadaka2018). This study is the first to examine the measurement equivalence of the FAMCARE scale in a sample of Hispanic caregivers to patients with ADRD using latent variable models. This paper provides information on DIF for inclusion in an existing item bank on family satisfaction with care and care transitions. Additionally, reliability estimates indicated that the scale was highly reliable (estimates ≥ 0.90). Most items provided adequate information, although the item related to care coordination was less informative.
In summary, the analyses showed modest DIF of low magnitude and impact for the Hispanic sample in comparison to a White non-Hispanic sample. The item flagged related to information sharing: the way the family is included in treatment and care decisions. No items rose to the level of salient DIF of high magnitude or impact. Evidence from this study supports the measurement equivalence of the FAMCARE among Hispanics interviewed in Spanish and English. Thus, the short-form FAMCARE can be recommended for use in cross-cultural assessments and research involving such groups.
Authorship
J.A.T. substantially contributed to the design of the work, oversaw analyses, and drafted the article. K.O.-W. performed analyses and participated in drafting the article. M.R. contributed to the design, qualitative analyses, and review of the article. M.K. performed analyses. K.O. contributed to the design of the work and reviewed the manuscript. A.S. and J.L. acquired the data and participated substantially in the work. All authors have approved the publication of the article.
Acknowledgments
The authors thank Stephanie Silver, MPH for her expert editing of the manuscript.
Funding
Support for these analyses was provided by a collaboration between the Claude Pepper Older Americans Independence Center: National Institute on Aging (grant number 1P30AG028741) and the National Institute on Aging Alzheimer's Disease Resource Center on Minority Aging Research (grant number 1P30AG059303). The studies from which data were supplied were funded by the Patient-Centered Outcomes Research Institute (PCORI) (contract number CE-1304-7160) and the National Institute of Nursing Research (NINR) (grant number 1R01NR0114430-01) and the National Cancer Institute (NCI) (grant number 5R01CA116227-059999).
Conflict of interest
The authors declare that there is no conflict of interest with respect to the research, authorship, and/or publication of this article.
Appendix
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_figA1.png?pub-status=live)
Fig. A1. Item response functions and magnitude of DIF.
Note: Results are from lordif software. For each item, the upper left panel shows the expected item score plots (denoted item true score functions) for Hispanics and non-Hispanic Whites. The lower left panel shows the item characteristic curves (category response functions). The upper right panel displays the absolute group differences in expected item scores. The lower right panel shows the differences weighted by density and is indicative of the magnitude (impact) of DIF at the item level. This measure is related to the non-compensatory DIF statistic (NCDIF) described in the text.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_figA2.png?pub-status=live)
Fig. A2. Impact of DIF at the scale level: expected scale scores.
Table A1. Local dependency statistics (bolded entries are slightly above the threshold for elevation).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tabA1.png?pub-status=live)
Table A2. Classical test reliability estimates (SPSS): total sample (n = 1,834)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tabA2.png?pub-status=live)
Table A3. IRT item parameters and DIF statistics for Hispanic compared to non-Hispanic White responders (reference group)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tabA3.png?pub-status=live)
Table A4. Sensitivity analyses: summary of DIF analyses comparing White non-Hispanic subsample with Spanish-speaking Hispanics only
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201209145832776-0276:S1478951520000152:S1478951520000152_tabA4.png?pub-status=live)