Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-02-11T15:15:39.218Z Has data issue: false hasContentIssue false

LOESS smoothed density estimates for multivariate survival data subject to censoring and masking

Published online by Cambridge University Press:  22 August 2016

Peter Adamic*
Affiliation:
Laurentian University, Sudbury, Ontario P3E 2C6, Canada
Jenna Guse
Affiliation:
University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
*
*Correspondence to: Peter Adamic, Laurentian University, 935 Ramsey Lake Rd, Sudbury, Ontario P3E 2C6, Canada. Tel: (705)675-1151 x2325; E-mail: padamic@laurentian.ca
Rights & Permissions [Opens in a new window]

Abstract

Actuaries often encounter censored and masked survival data when constructing multiple-decrement tables. In this paper, we propose estimators for the cause-specific failure time density using LOESS smoothing techniques that are employed in the presence of left-censored data, while still allowing for right-censored and exact observations, as well as masked causes of failure. The smoothing mechanism is incorporated as part of an expectation-maximisation algorithm. The proposed models are applied to a bivariate African sleeping sickness data set.

Type
Papers
Copyright
© Institute and Faculty of Actuaries 2016 

1. Introduction

Actuaries often encounter censored and masked survival data when constructing single or multiple-decrement tables. Censoring is present in lifetime data when the time of death (or failure, in the more general survival context) is not known precisely, and masking is present when the cause of death is not known precisely. The survival literature is rather copious with respect to many parametric and semi-parametric models that can be employed when censoring and/or masking is present.

A non-parametric maximum likelihood estimator (NPMLE) of the cumulative incidence for competing risks data subject to right censoring was given by Aalen (Reference Aalen1976) and Kalbfleisch & Prentice (Reference Kalbfleisch and Prentice1980). Dinse (Reference Dinse1982) proposed an NPMLE for right-censored and masked competing risks data to be computed with the explicit use of an expectation-maximisation (EM) algorithm. Hudgens et al. (Reference Hudgens, Satten and Longini2001) first presented an NPMLE estimated using an EM algorithm for competing risks data, subject to both interval censoring and truncation. Jewell et al. (Reference Jewell, Van Der Laan and Henneman2003) and Groeneboom et al. (Reference Groeneboom, Maathuis and Wellner2008) continued similar studies of an NPMLE for current status data, an extreme form of interval-censored data, without masking. Subsequently, Adamic (Reference Adamic2010, Reference Adamic2012) developed generalisations of Turnbull’s (Reference Turnbull1974, Reference Turnbull1976) classical univariate algorithms for modelling competing risks, but in a more general setting, where each failure was allowed to be associated with any subset of possible failure modes (i.e. masking). Overall, distribution-free models that can be employed in a multiple-decrement context have received relatively little attention to date, in part due to the complexity that censoring and masking introduces.

In other recent research, models have been developed to analyse competing risks data subject to various censoring schemes. Craiu & Reiser (Reference Craiu and Reiser2006) developed a dependent competing risks model using an EM-based approach for masked data. Wang et al. (Reference Wang, Yu and Wong2012) proposed a new model for the NPMLE of interval-censored and masked competing risks data, the random partition masking model, which does not rely on the assumption that masked failure causes are independent of failure time. Yu & Li (Reference Yu and Li2012, 2014) examined the NPMLE proposed by Dinse (Reference Dinse1982); they concluded that the NPMLE was inconsistent and not unique and they introduced a consistent NPMLE of the joint distribution function with right-censored and masked competing risks data under another new model (the dependent masking and right-censoring model).

In this paper, we develop a smoothed non-parametric density estimator for modelling multiple-decrementFootnote 1 failure times that are subject to the possibility of both left censoring as well as right censoring, while still allowing for exact observations. Despite the novelty of modelling masked competing risks data using an EM algorithm, there is a significant drawback associated with the various approaches that have been developed to date. As expounded in Hudgens et al. (Reference Hudgens, Satten and Longini2001), estimators of this type will have an unexpected and undesirable property, namely, that the resulting estimator of the survival distribution will be undefined over a potentially larger set of regions than the NPMLE of S(t), ignoring failure type. Indeed, the problem is, quite intuitively, even more prevalent in the multiple-decrement environment: the self-consistent competing risks algorithms or SC-CR algorithms of Adamic (Reference Adamic2010, Reference Adamic2012) can be seen to converge only over a class of intervals called cause-specific innermost intervals.

To remedy this problem, we have chosen to generalise a univariate kernel density estimate found in Braun et al. (Reference Braun, Duchesne and Stafford2005) that was used to fill in the gaps between the innermost intervals that were created by invoking the self-consistent EM algorithm of Turnbull (Reference Turnbull1976). The converged estimator of the failure rate distribution is often difficult to smooth, due to the large gaps between innermost intervals as well as the multimodal shape of the distribution that will naturally arise when there are many gaps in the probability masses. As mentioned in Duchesne & Stafford (Reference Duchesne and Stafford2001), adopting a kernel smoothed estimator at each iteration avoids the bias created by arbitrarily assigning probability mass at the right end points of the innermost intervals, as recommended by Pan (Reference Pan2000), and is better at borrowing more information from neighbouring data points than would otherwise be the case. Duchesne & Stafford (Reference Duchesne and Stafford2001) go on to state that since the innermost intervals are no longer in the picture, the kernel modification moves the algorithm away from problem-causing areas – areas where Turnbull’s algorithm can sometimes get stuck at local solutions (also see Li et al., Reference Li, Watkins and Yu1997). Our generalisation of the approach proposed by Braun et al. (Reference Braun, Duchesne and Stafford2005) will be actualised in two ways: first, in order to smooth the data we will use local regression and the LOcal regrESSion (LOESS) fitting procedure developed by Cleveland (Reference Cleveland1979) and Cleveland & Devlin (Reference Cleveland and Devlin1988) as opposed to kernel density estimation – indeed, the latter is just a special case of the former; second, our models will be developed in the more general multiple-decrement setting as opposed to the simpler univariate setting.

2. Definitions, Notation, and Terminology

This article will focus exclusively on the so-called doubly-censored data, that is, data that is subject to both left and right censoring (in addition to exact observations), as defined by Zhou (Reference Zhou2004) below. This section is mostly from Adamic (Reference Adamic2010).

Definition 2.1 Suppose lifetimes X 1, … , X n are non-negative, independent, and identically distributed random variables subject to censoring. In the case of double censoring, we observe both left and right censoring (in addition to exact observations). That is, we observe

$$\left\{ {Z_{i} ,\Delta _{i} } \right\}{\equals}\left\{ \matrix{ \left( {X_{i} ,1} \right),\quad {\rm if}\,L_{i} \leq X_{i} \leq R_{i} \hfill \cr \left( {R_{i} ,0} \right),\quad {\rm if}\,X_{i} \,\gt\,R_{i} \hfill \cr \left( {L_{i} ,2} \right),\quad {\rm if}\,X_{i} \,\lt\,L_{i} \hfill \cr} \right.$$

where L i and R i are the left and right-censoring times, respectively, ∀ i=1, … , n, R i L i . Also, Δ i is an indicator of the type of censoring, if any.

In a multiple risk forum, it may be the case that some of the failure modes that could have been responsible for the actual failure are ruled out as possible candidates. In this case, the remaining subset of possible failure modes are considered to be masked. A formal definition is given below.

Definition 2.2 Suppose a failure mode j, j∈{1, … , k}, is associated with a failure time t. If the mode is not known exactly, but is known only to belong to a certain subset of all possible modes, then the failure is caused by a subset of modes that are said to be masked. Furthermore, following Park & Padgett (Reference Park and Padgett2006), if the failure mode is completely unknown, then the set of failure modes is said to be completely masked; if one or more of the failure modes can be ruled out, then the remaining subset of possible failure modes is considered to be partially masked.

In this paper, the possibility of determining some of the failure modes a posteriori, and then incorporating the new information into the model will not be directly addressed. However, the algorithms will be sufficiently versatile that once a mode of failure can be determined exactly (or reduced to a smaller set of possible modes), the algorithms can simply be executed again utilising the updated information. See Flehinger et al. (Reference Flehinger, Reiser and Yashchin1998) for more details on carrying out analyses using updated masking information.

The following preliminary notation, based on the International Actuarial Notation (IAN) guidelines as found in Bowers et al. (Reference Bowers, Gerber, Hickman, Jones and Nesbitt1997), is applicable to any survival or life table context. First, define t q x =Pr{failure prior or equal to time (t+x) | subject is currently aged x}. For example, in human mortality studies, 10 q 45 would represent the probability a 45-year-old dies in the next 10 years. When working with a life table, it is often convenient to replace q with d to represent the number, instead of the probability, of subjects that fail in a given time interval. Thus, t d x would represent the number of failures in the interval (x, x+t]. Also, let t p x =Pr{survival to time (t+x) | subject is currently aged x}. For example, 10 p 45 would represent the probability a 45-year old would survive to age 55 years. Clearly, t p x + t q x =1. When t=1, the subscript is often omitted. Thus, 1 p 45=p 45, could represent the probability a 45-year old survives 1 year.

To motivate the competing risks theory, some additional notation is required. A superscript in brackets will be used to identify which specific risk is being considered. For example, $$h_{x}^{{(j)}} (t)$$ would represent the hazard rate for risk j at time x+t, given a current age of x. Note the implicit assumption here that only one distinct cause (or failure mode) can be responsible for any particular failure. This assumption is germane to all of the theory presented in this paper. The superscript τ will be used to represent the set of all the risks taken together, in aggregate. For example, $$_{t} p_{x}^{{(\tau )}} $$ would represent the probability of surviving all risks up to time t. The following relationships are given without proof:

$$h_{x}^{{(j)}} (t){\equals}{{f(t,j)} \over {_{t} p_{x}^{{(\tau )}} }},\,h_{x}^{{(\tau )}} (t){\equals}\mathop{\sum}\limits_j {h_{x}^{{(j)}} (t)} ,\,_{t}\! q_{x}^{{(\tau )}} {\equals}\mathop{\sum}\limits_j {_{t} q_{x}^{{(j)}} } $$

where f(t, j)dt ≈ Pr{(t<Tt+dt)∩(J=j)}, and h denotes the hazard function. For example, $$_{t} q_{x}^{{(j)}} $$ would represent the probability of failure for some subject aged x prior or equal to time x+t by cause j in the multiple risk forum.

A very important function to model in any competing risks analysis is the cumulative incidence function (CIF), and this is usually accomplished for all of the possible failure modes. The standard definition in the literature for the CIF for cause j is

(1) $$F_{n}^{{(j)}} (t){\equals}{\rm Pr}(T\leq t,J{\,\equals\,}j){\equals}{\int}_0^t {h^{{(j)}} (u)_{u} p_{0}^{{(\tau )}} du} $$

where the joint pair of random variables (T, J) capture the time of failure and the corresponding mode of failure, respectively. It should also be noted that $$F_{n}^{{(j)}} (t){\,\equals\,}_{t} q_{0}^{{(j)}} $$ under IAN. The most common estimator for the CIF in counting process notation is

(2) $$\hat{F}_{n}^{{(j)}} (t){\equals}{\int}_0^t {\hat{S}(u)d\hat{\Lambda }_{j} (u)} $$

which, as opined in Lawless (Reference Lawless2003), is typically estimated non-parametrically using

(3) $$\hat{F}_{n}^{{(j)}} (t){\equals}\mathop{\sum}\limits_{i\,\colon\,t_{i} \leq t} {\hat{S}(t_{i} ) \cdot {{\delta _{{ij}} } \over {n_{i} }},\,j{\,\equals\,}1,\,\ldots\,,k} $$

where δ ij equals 0 or 1 depending upon whether or not subject i failed due to cause j, and n i captures the number at risk at time t i . Unfortunately, equation (3) cannot be used here for two main reasons. First, equation (3) can only be used for complete or right-censored data. Thus, this closed-form estimator for the CIF cannot be applied to doubly-censored data. Second, equation (3) assumes that all of the failure modes are known exactly. Consequently, if any of the failure modes are masked, this estimator cannot be applied. That is, all of the δ ij terms cannot be extracted from the data.

It is also noteworthy to mention that there is a non-identifiability phenomenon at work in the multiple-decrement model, as described by Tsiatis (Reference Tsiatis1975). An observational study consisting of data that only furnishes the joint pair (T i , J i ) for each subject i (and possibly some additional indicator variables for the censored range or ranges) will not allow for discrimination between independent and dependent modes of failure. This fact was also noted by Cox (Reference Cox1959). In many cases however, this is the only available data. Thus, for the sake of model simplification, an independence assumption has traditionally been invoked between the various competing risks. Some recent research has focussed on developing dependent competing risks models (see Escarela & Carriere, Reference Escarela and Carriere2003; Craiu & Reiser, Reference Craiu and Reiser2006; Wang et al., Reference Wang, Yu and Wong2012; Li & Yu, Reference Li and Yu2014). The complication of mutually dependent failure modes will not be considered in the ensuing theory.

The assumption of independence is also found in another significant way, namely, in the manner in which censored observations are reported. Censoring can sometimes be accomplished in such a way that the number of exposures to the risk (or risks) depends on the censoring mechanism itself, a dynamic which is often termed selection bias, as noted by Gichangi & Vach (Reference Gichangi and Vach2005). The absence of selection bias implies that independent censoring is taking place (Gichangi & Vach, Reference Gichangi and Vach2005), and the assumption of independent censoring means that the cause-specific hazards do not depend on whether or not a particular subject was censored (Gichangi & Vach, Reference Gichangi and Vach2005). Symbolically, this means that

$$\mathop {{\rm lim}}\limits_{h\to{\rm 0}} {{{\rm Pr}\{ T_{i} \,\lt\,t{\plus}h,J{\,\equals\,}j\,\mid\,T_{i} \geq t\} } \over h}\,{\equals}\,\mathop {{\rm lim}}\limits_{h\to0} {{{\rm Pr}\{ T_{i} \,\lt\,t{\plus}h,J{\,\equals\,}j\,\mid\,T_{i} \geq t,T_{i} \in\xi _{i} \} } \over h}$$

where subject i is censored by the observed set ξ i , and will hold true for all of the k competing risks (Gichangi & Vach, Reference Gichangi and Vach2005).

One of the primary characteristics of the estimators of the cause-specific CIFs that will be developed in this paper will be that they are members of the class of estimators that are called self-consistent estimators. A formal definition of the concept of self-consistency for the univariate case is given next, as found in Stafford (Reference Stafford2005), and is followed by conceptualisations for use in the arena of competing risks.

Definition 2.3 An estimator $$\hat{F}_{n} (t)$$ for a cumulative distribution function (CDF) is said to be a self-consistent estimator of the CDF if $\hat{F}_{n} (t){\,\equals\,}E_{{\hat{F}_{n} }} [F_{n} (t)\,\mid\,\Psi ]$ , where F n (t) is the empirical CDF under complete data and Ψ denotes the set of all of the observed data (complete or censored).

Essentially, Definition 2.3 is stating that at each time t, the expected value of the CDF is the same with entirely complete data or with censored and/or masked data. Since any censored failure can have a failure time at any t, we condition on the entire set of failures, Ψ. Note that in the multiple-decrement setting, the CDF is simply replaced by the cause-specific CIF. Following Duchesne & Stafford (Reference Duchesne and Stafford2001), a generalisation for self-consistency pertaining to CIFs is offered as follows.

Definition 2.4 An estimator $$\hat{F}^{{(j)}} (t)$$ for a CIF is said to be a self-consistent estimator if $$\hat{F}^{{(j)}} (t){\,\equals\,}E_{{\hat{F}}} [F^{{(j)}} (t)\,\mid\,\Psi ]$$ , where F (j)(t) is the empirical CIF under complete data, Ψ denotes the set of all of the observed data (complete or censored, with the possibility of masking), and $$\hat{F}$$ is understood to include the current information in all of the CIFs for modes 1 … k.

Consider the following as motivation for developing the SC-CR algorithm for doubly-censored data. Huang (Reference Huang2000) noted that self-consistent equations are essentially score equations of indicator variables appropriately defined, depending on the type of censoring. When the data are potentially censored, Huang (Reference Huang2000) also states that $$\hat{F}_{n} $$ can be obtained by taking the conditional expectation of F n , conditional on the observed data under the probability measure induced by $$\hat{F}_{n} $$ itself. However, evaluation of the conditional expectations in the SC equations will require the use of an iterative scheme, as no closed-form expression for these expectations exist in these non-parametric cases. Also, as noted by Turnbull (Reference Turnbull1974), convergence of a self-consistent iterative method implies that the application of the SC equations at some iteration will essentially have no effect on the realised value of the estimator (up to some small tolerance of discrepancy, ε say). Indeed, the SC procedure can also be shown to be a special case of the more general EM algorithm of Dempster et al. (Reference Dempster, Laird and Rubin1977).

3. The SC-CR Algorithm for Doubly-Censored Data

3.1. A self-consistent algorithm

Section 3, which summarises the SC-CR algorithm for doubly-censored data, is from Adamic (Reference Adamic2010) and/or Adamic et al. (2010). The idea of a self-consistent algorithm, as well as the concept of self-consistency itself, were propounded by Efron (Reference Efron1967). Efron developed an iterative algorithm to estimate the CDF for right-censored data using

(4) $$\tilde{F}_{K} (x){\equals}{1 \over n}\left[ {N(x){\minus}\mathop{\sum}\limits_{L_{i} \leq x,\,\Delta _{i} {\,\equals\,}0} {{{1{\minus}\tilde{F}_{{K{\minus}1}} (x)} \over {1{\minus}\tilde{F}_{{K{\minus}1}} (L_{i} )}}} } \right]$$

where N(x)=∑ I[X i x], Δ i =0 if X i is right censored, and K indexes the iteration number (see Braun et al., Reference Braun, Duchesne and Stafford2005). Turnbull (Reference Turnbull1974) extended Efron’s algorithm to included doubly-censored data, again, for a single risk. Also noteworthy is the fact that Turnbull (Reference Turnbull1974, Reference Turnbull1976) proved that the resulting survival estimators, upon convergence of the algorithm, were NPMLEs. The algorithm is quite well known, and as revealed by many researchers who employ survival techniques in practice (see Finkelstein, Reference Finkelstein1986; Yu et al., Reference Yu, Li and Wong2000), it is one of the most common estimators that has been used in medical applications of censored survival analysis over the last several decades. For the sake of brevity, Turnbull’s univariate algorithm will not be summarised here. The reader is referred to Klein & Moeschberger (Reference Klein and Moeschberger1997) for this purpose.

3.2. Developing the model

The SC-CR algorithm is carried out in the presence of doubly-censored failures, assuming the standard independence assumptions previously mentioned. However, it is further assumed that the masking probabilities are also independent of the failure cause itself. Based on these assumptions, the likelihood function under a double censoring scenario with the possibility of exact observations (where the cause of failure is known) and completely masked modes of failure can be expressed as

(5) $$L\propto\prod\limits_{i{\equals}1}^n {\left[ {(1{\minus}_{{t_{i} }} p_{0} )^{{c_{i} }} (_{{t_{i} }} p_{0} )^{{r_{i} }} \prod\limits_{j{\equals}1}^k {(_{{t_{i} {\minus}t_{{i{\minus}1}} }} q_{{t_{{i{\minus}1}} }}^{{(j)}} )^{{d_{{t_{i} }}^{{(j)}} }} } } \right]} $$

where c i represents the number of left-censored observations, r i the number of right-censored observations, and $$d_{{t_{i} }}^{{(j)}} $$ the number of failures for risk j, ∀ i=1, … , n. Based on the notation in the likelihood, “exact” should be taken here to mean, “since the last time point, t i−1”. Equivalently, it will be assumed that [t i , t i ] ≡ (t i−1, t i ], which is consistent with the step-function nature of the CIF estimators.

The algorithm may be implemented as follows. For ease of appropriation, the steps, statements, and logic of the algorithm will be directly generalised from those of the single variable algorithm of Klein & Moeschberger (Reference Klein and Moeschberger1997).

3.2.1. The SC-CR algorithm

Step 0: Provide initial estimates of the overall survival probabilities at each t r , $$_{{t_{r} }} p_{0}^{{(\tau )}} $$ , by equally allotting all of the probability mass for each time point. Also, find initial estimates for the probability of failing during each time interval by cause $j,\,_{{t_{r} {\minus}t_{{r{\minus}1}} }} q_{{t_{{r{\minus}1}} }}^{{(j)}} $ , for r=1, … , m, again by uniformly distributing the probability mass. Ignore all of the left-censored observations when calculating these estimates.

Step 1: Using the current estimates of $$_{{t_{r} }} p_{0}^{{(\tau )}} $$ and $_{{t_{r} {\minus}t_{{r{\minus}1}} }} q_{{t_{{r{\minus}1}} }}^{{(j)}} $ , calculate estimates of the conditional probabilities $$e_{{ir}}^{{(j)}} {\equals}P[t_{{r{\minus}1}} \,\lt\,X^{{(j)}} \leq t_{r} \,\mid\,X^{{(\tau )}} \leq t_{i} ]$$ , where X (j) represents the event of failure due to cause j, using

(6) $$\hat{e}_{{ir}}^{{(j),K}} {\equals}{{_{{t_{{r{\minus}1}} }} \hat{p}_{0}^{{(\tau ),K}} \cdot _{{t_{r} {\minus}t_{{r{\minus}1}} }} \hat{q}_{{t_{{r{\minus}1}} }}^{{(j),K}} } \over {1{\minus}_{{t_{i} }} \hat{p}_{0}^{{(\tau ),K}} }},\quad \forall \,r\leq i$$

Step 2: Using the results of the previous step, estimate the number of cause-specific failures at time t i by using

(7) $$\hat{d}_{i}^{{(j),K}} {\,\equals\,}d_{i}^{{(j)}} {\plus}\mathop{\sum}\limits_{i{\equals}r}^m {c_{i} \,\hat{e}_{{ir}}^{{(j),K}} } $$

Step 3: Compute $_{{t_{r} {\minus}t_{{r{\minus}1}} }} \hat{q}_{{t_{{r{\minus}1}} }}^{(j)} \,\forall \,r{\equals}1,\,\ldots\,,m$ , using a generalised cause-specific product-limit estimator based on the current estimates of $$\hat{d}_{i}^{{(j)}} $$ . If

$$\mathop{{{\rm sup}}}\limits_{{t_{{r,j}} }} \left| {_{{t_{r} {\minus}t_{{r{\minus}1}} }} \hat{q}_{{t_{{r{\minus}1}} }}^{{\left( j \right),K}} {\minus}_{{t_{r} {\minus}t_{{r{\minus}1}} }} \hat{q}_{{t_{{r{\minus}1}} }}^{{\left( j \right),K{\minus}1}} } \right|{\rm \,\lt\,}{\it&#x03B5;}$$

(jointly for all of the time points t i and causes j, given some small predetermined ε>0), stop the algorithm; otherwise return to Step 1.

So far, the issue of partial masking has been ignored. The possibility of partial masking, however, can be easily introduced into the algorithm. In essence, there will need to be a proper allocation of the left-censored observations into each of the partially masked sets, so that the property of self-consistency will be maintained. To this end, Step 1 would need to be modified as follows. Define

(8) $$e_{{irz}}^{{(j\,\mid\,S_{{iz}} )}} {\,\equals\,}P\left[ {t_{{r{\minus}1}} \,\lt\,X^{{(j)}} \leq t_{r} \left| {X^{{(S_{{iz}} )}} \leq t_{i} } \right.} \right]$$

ri, jS iz , where z=1, …, y i ≤2 j−1 indexes each of the distinct masked sets, S iz , at time t i . Then for each time point t i , and for each competing risk j, compute

(9) $$\hat{e}_{{irz}}^{{(j\,\mid\,S_{{iz}} ),K}} {\equals}{{_{{t_{{r{\minus}1}} }} \hat{p}_{0}^{{{\rm (}\tau {\rm ),}K}} \cdot _{{t_{r} {\minus}t_{{r{\minus}1}} }} \hat{q}_{{t_{{r{\minus}1}} }}^{{(j),K}} } \over {_{{t_{i} }} \hat{q}_{0}^{{(S_{{iz}} ),K}} }},\,\forall \,r\leq i,\,j\,\subseteq\,S_{{iz}} $$

where $$_{{t_{i} }} \hat{q}_{0}^{{(S_{{iz}} )}} $$ represents the current probability of failing due to any of the causes that comprise S iz up until time t i . Step 2 would also need to be modified slightly. The term $$\hat{d}_{i}^{{(j),K}} $$ would now be calculated using

(10) $$\hat{d}_{i}^{{(j),K}} {\,\equals\,}d_{i}^{{(j)}} {\plus}\mathop{\sum}\limits_{i{\equals}r}^m {\mathop{\sum}\limits_z {c_{i}^{{(S_{{iz}} )}} \hat{e}_{{irz}}^{{(j\,\mid\,S_{{iz}} ),K}} } } $$

where $$c_{i}^{{(S_{{iz}} )}} $$ represents the number of left-censored observations at time t i masked by the failure set S iz .

A proof that the SC-CR algorithms (for both the partially masked and completely masked cases) produce self-consistent estimators of the CIFs for each failure mode can be found in Appendix. That the CIFs derived from the SC-CR algorithms are also NPMLEs is strongly suggested by the fact that the estimators are self-consistent (due to the fact that the expected values of the CIFs for all time points equals an NPMLE of each CIF in the absence of any left-censored observations or masking – as, for example, found in equation (3) of section 2). For the purposes of the present paper, we will be content to rely on the statistical merits of self-consistency.

3.3. Illustrative numerical examples

In this section, the SC-CR algorithms will be illustrated with some numerical examples. First, a simple example of Turnbull’s traditional univariate algorithm is furnished, and the results are depicted in Table 1.

Table 1 Iterations 1, 2, 5 (one risk, doubly censored).

In Table 2, an SC-CR algorithm for doubly-censored competing risks is shown, where the total failures for each risk (at each time point) equal those as found in the univariate case. This is useful for comparative purposes, particularly because the resulting CIFs at each time point sum up to the CDF values as found in the univariate case. This fact was used in the proof of the self-consistency of the SC-CR algorithms and is shown empirically here. Finally, in Table 3, we have an example of the SC-CR algorithm under partial masking. In this case, the five left-censored observations at time 1 are assumed to be masked in the following way: three of the failures were known to be caused by Cause 1; the other two failure causes were not known. Thus, in Table 3, we see that the Cause 1 CIF is larger than previously, as would be expected. Note also that the sum of the CIFs at each time point again equals those as found in the univariate case, as it must. Finally, it should be noted that convergence was achieved in all three cases after five iterations, assuming an ε of 0.0001.

Table 2 Iterations 1, 2, 5 (two competing risks, doubly censored, complete masking).

Table 3 Iterations 1, 2, 5 (two competing risks, doubly censored, partial masking).

Note: *Partial masking: three failures are due to Cause 1, and two are due to either Cause 1 or Cause 2.

4. LOESS Modification to the SC-CR Algorithm

4.1. Developing the model

As motivation, consider the case where data are interval censored. In such a situation, Stafford (Reference Stafford2005), based on the work of Goutis (Reference Goutis1997), argues that a natural extension of the standard kernel smoothing weight is to define w i as a conditional expectation, conditional on the observed interval I i =(L i , R i ]. That is, if the weight is defined as

(11) $$w_{i} {\,\equals\,}E\left[ {\left. {{1 \over h}\,\tilde{K}\left( {{{X_{i} {\minus}x} \over h}} \right)} \right|I_{i} } \right]$$

the kernel density estimate of f is

(12) $$\mathopf f {\hskip 1pt \vskip -3.5pt &#x0030c;} (x){\equals}{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E\left[ {\left. {{1 \over h}\tilde{K}\left( {{{X_{i} {\minus}x} \over h}} \right)} \right|I_{i} } \right]} $$

for a fixed kernel function $$\tilde{K}$$ , as related in Stafford (Reference Stafford2005). Now, Braun et al. (Reference Braun, Duchesne and Stafford2005) espouse an intriguing way to evaluate equation (12): use an iterative algorithm such as Turnbull’s algorithm, and the kernel density estimate itself to evaluate the conditional expectation at each iteration. For our development here, we will translate this insightful approach for the case of doubly-censored data, use the more general framework of LOESS smoothing instead of kernel smoothing (again, kernel smoothing is simply a special case of local regression), and further generalise to the multivariate level comprising of multiple decrements.

We will first introduce the basic framework for the LOESS procedure proposed by Cleveland (Reference Cleveland1979) and expanded upon by Cleveland & Devlin (Reference Cleveland and Devlin1988). Let y i (i=1, … , n) be measurements of the dependent variable, and let x i =(x i1, … , x ip ), i=1, … , n, be n measurements of p independent variables. Suppose that the data are generated by y i =f(x i )+ε i , where f is a smooth function and the ε i are errors assumed to be independent and identically distributed with mean 0 and variance σ 2. Locally weighted regression provides an estimate of the smooth function, f̌(x). The estimate of f at x, where x is any value in the p-dimensional space of independent variables, is found by defining a neighbourhood in the space of independent variables where each point in the neighbourhood is weighted according to its distance from x, with points closer to x being weighted more heavily than points further from x. Therefore, a weight function and neighbourhood size must be specified in order to complete the LOESS smoothing technique. Cleveland (Reference Cleveland1979) defined four properties that any weight function, W, should possess when performing locally weighted regression:

  • 1. W(z)>0 for |z|<1;

  • 2. W(−z)=W(z);

  • 3. W(z) is a non-increasing function for z≥0;

  • 4. W(z)=0 for |z|≥1.

Following Cleveland & Devlin (Reference Cleveland and Devlin1988), we introduce the tricube weight function

(13) $$W(z){\,\equals\,}\left\{ \matrix{ (1{\minus}\,\mid\!z\!\mid\,^{3} )^{3} ,\,{\rm if}\,\,\mid\!z\!\mid\!\!\leq 1 \hfill \cr 0,\,{\rm otherwise} \hfill \cr} \right.$$

We must now specify the neighbourhood size. Following Loader (1999 Reference Loadera ), we define the span or bandwidth (these terms will be used interchangeably) as h(x) and a smoothing window (xh(x), x+h(x)) for a fitting point x. To find the smooth function f̌(x), only observations within the smoothing window are used. The weights for each observation (x i , y i ) are then

(14) $$w_{i} (x){\equals}W\left( {{{x_{i} {\minus}x} \over {h(x)}}} \right)$$

Using a LOESS smoothing model at each iteration of the SC-CR algorithm, the smoothed estimate for the Kth iteration is

(15) $$\mathop f {\hskip -0.5pt \vskip -3.5pt &#x0030c;} _{K} (x){\equals}{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{{\mathord f {\hskip 1pt \vskip -3.5pt &#x0030c;} _{{K{\minus}1}} }}} \left[ {\left. {W\left( {{{x_{i} {\minus}x} \over {h(x)}}} \right)} \right|\Psi } \right]} $$

conditioning on all observations, Ψ, since the LOESS smoother can use all of the observed data, depending on the degree of smoothing that is desired.

For the competing risks setting, an expression analogous to equation (15) can be developed. For modes of failure indexed by j=1, … , k, assume independence between the modes of failure. Then, the cause-specific LOESS smoothed estimate for iteration K would be expressed as

(16) $$\mathop f {\hskip -0.5pt \vskip -3.5pt &#x0030c;} _{K}^{{(j)}} (x){\equals}{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{\mathord f {\hskip 1pt \vskip -3.5pt &#x0030c;} _{{K{\minus}1}}^{{(j)}} }} \left[ {\left. {W\left( {{{x_{i}^{{(j)}} {\minus}x} \over {h(x)_{j} }}} \right)} \right|\Psi } \right]} $$

where an optimal value for h(x), the span, should be chosen for the failure density of each independent risk j. Although the independence assumption is not required for the SC-CR algorithm, it is useful here to avoid the complication of having to invoke multivariate LOESS techniques that reflect the joint distribution between the various competing risks.

The following theorem shows that as the bandwidths for the local fit of each risk tend to 0, the smoothed estimator of the density approaches the self-consistent density. The theorem statement, steps, and logic of the proof are direct generalisations of an analogous univariate proof from Braun et al. (Reference Braun, Duchesne and Stafford2005) regarding kernel density estimation. The tricube weight function, W(z), is also a commonly used kernel function and the theorem can be applied to the LOESS setting.

Theorem 4.1 Let F̌(x) $\mathord _{K}^{{(j)}} (x){\equals}{\int}_{{\minus}\infty}^x \,<italic>f</italic>{\vskip -3.5pt \,&#x030C;} (<italic>x</italic>)_{K}^{{(j)}} (t)dt} $ be the LOESS smoothed estimate of the corresponding CIF for risk j at the Kth iteration with a cause-specific span h(x) j , under an interval-censoring scheme with the possibility of masked failure modes. Then, assuming independent tricube weight functions for each risk j

$$\mathop {{\rm lim}}\limits_{h(x)_{j} \to0} \mathord F {\hskip -0.5pt \vskip -3.5pt &#x0030c;} _{K}^{{(j)}} (x){\equals}\hat{F}_{K}^{{(j)}} (x)\,\forall \,x,\,j,\,K$$

Proof. The self-consistent CIF is

$$\eqalignno{ \hat{F}_{K}^{{(j)}} (x) &#x0026; {\equals}E_{{K{\minus}1}} \left[ {\left. {F^{{(j)}} (x)} \right|\Psi } \right] \cr &#x0026; {\equals}E_{{K{\minus}1}} \left[ {\left. {{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n I (X_{i}^{{(j)}} \leq x)} \right|\Psi } \right] $$

since F (j)(x) captures the relative proportion of observations with X i x and J=j, and is also the NPMLE when using complete data. Bringing the expectation into the sum yields

$${\equals}{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{K{\minus}1}} \left[ {\left. {I\left( {X)_{i} ^{{(j)}} \leq x} \right)} \right|\Psi } \right]} $$

Now, since the tricube weight function is a kernel function it is also a valid probability distribution function that approaches the empirical distribution as the bandwidth tends to 0. And since each competing risk is smoothed with its own unique span h(x) j

$$\eqalignno{ &#x0026; {\equals}{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{K{\minus}1}} \left[ {\left. {\mathop {{\rm lim}}\limits_{h(x)_{j} \to0} {\int}_{{\minus}\infty}^x W\left( {{{x_{i}^{{(j)}} {\minus}u} \over {h(x)_{j} }}} \right)du} \right|\Psi } \right]} \cr &#x0026; {\equals}\mathop{{{\rm lim}}}\limits_{{h(x)_{j} \to0}} \,{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{K{\minus}1}} } \left[ {\left. {{\int}_{{\minus}\infty}^x {W\left( {{{x_{i}^{{(j)}} {\minus}u} \over {h(x)_{j} }}} \right)du} } \right|\Psi } \right] $$

since ${\int} {W\left( {{{x_{i}^{{(j)}} {\minus}u} \over {h(x)_{j} }}} \right)\leq 1} $ , and therefore the limit can be taken outside of the expectation. By the Fubini–Tonelli theorem, the order of expectation and integration may be interchanged in this case, producing□

$$\eqalignno{ &#x0026; {\equals}\mathop{{{\rm lim}}}\limits_{{h(x)_{j} \to0}} {\int}_{{\minus}\infty}^x {{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {E_{{K{\minus}1}} } } \left[ {\left. {W\left( {{{x_{i}^{{(j)}} {\minus}u} \over {h(x)_{i} }}} \right)} \right|\Psi } \right]du \cr &#x0026; {\equals}\mathop {{\rm lim}}\limits_{h(x)_{j} \to0} {\int}_{{\minus}\infty}^x {\mathord f {\hskip 1pt \vskip -3.5pt &#x0030c;} _{K}^{{(j)}} (u)du} \cr &#x0026; {\equals}\mathop{{{\rm lim}}}\limits_{{h(x)_{j} \to0}} \mathord F {\hskip 1pt \vskip -3.5pt &#x0030c;} _{K}^{{(j)}} (x) $$

4.2. Span selection

A key issue when employing local regression is the optimal choice of span. The magnitude of the span is always a tradeoff between the bias and variability of the local regression fit and therefore a compromise is always necessary. Values for the span that are too small will produce an estimate for the smooth function, f̌(x), which shows too many spurious features, reflecting large variance due to the fact that an insufficient amount of data falls into the smoothing window. In the case where the value for the span is too large, the estimate within the smoothing window may be over-smoothed and hide key structural features of the real smooth function, f(x), reflecting a large amount of bias.

There are a number of methods for finding the optimal span for a data set. The first method, which is simplistic in nature, is fixed bandwidth selection. In this case, the span h(x) is chosen to be a constant value, h. However, a constant choice for the bandwidth is rarely ideal because the estimate resulting from a fixed bandwidth will often behave particularly poorly in boundary or tail regions and have large variance due to differences in the density of the data, producing fits that are too noisy or in some cases undefined if the neighbourhoods are empty. Methods to determine a single fixed bandwidth in local regression include modern plug-in methods and classical methods such as cross-validation and Akaike’s information criterion. Loader (1999 Reference Loaderb ) examines these methods, challenging the superiority of plug-in methods over their older counterparts, the classical methods. Four bandwidth selectors are considered by Loader (1999 Reference Loaderb ): generalised cross-validation, Mallows (Reference Mallows1973) C p method, the plug-in algorithm of Gasser et al. (Reference Gasser, Kneip and Köhler1991) (GKK), and a hybrid of C p and plug-in methods proposed by Ruppert et al. (Reference Ruppert, Sheather and Wand1995) (RSW). See Loader (1999 Reference Loaderb ) for further details on implementing these methods, the results from simulations, and extensions to locally quadratic regression.

However, the problems engendered due to data sparsity can be resolved by ensuring that all neighbourhoods contain sufficient data. Using nearest neighbour bandwidth selection the bandwidth h(x) is chosen so that the local neighbourhood will always contain a specific number of points. To compute the nearest neighbour bandwidth h(x), a bandwidth parameter α>0 needs to be considered. The parameter α gives the proportion of observations that are used to find the smoothed estimate and as α increases f̌(x) becomes smoother. Following Cleveland & Loader (Reference Cleveland and Loader1996), Loader (1999 Reference Loadera ), and Irizarry (Reference Irizarry2001) when 0<α≤1 let q be equal to αn truncated to an integer. Then Δ i (x)=|xx i | is the distance between fitting point x and the data points x i and Δ(i)(x) is the value of these distances ordered from largest to smallest. Finally, the nearest neighbourhood bandwidth h(x)=Δ(q)(x), where Δ(q)(x) is the qth largest value of Δ(i)(x), i=1, … , n. For α>1, the nearest neighbour bandwidth is h(x)=α 1/p Δ(n)(x) where p is the number of numeric independent variables involved in the fit. Cleveland & Devlin (Reference Cleveland and Devlin1988) propose using an M plot, an adaptation of the C p procedure developed by Mallows (Reference Mallows1966, Reference Mallows1973) to select the bandwidth parameter, α, of nearest neighbour fitting. The M plot is a graph of an estimate of the expected mean squared error summed over the x i in the sample and divided by σ 2, $$\hat{M}_{\alpha } $$ , against the contribution of variance, V α , for a selection of values of α. Cleveland and Devlin admit that the M plot is not a definite method of determining the span but because the M plot graphically represents the tradeoff between variance and bias as the bandwidth parameter changes, it can aid in choosing the amount of smoothing required for a data set.

Another method for finding an appropriate span is graphical diagnostics. Lack of fit due to under smoothing or over smoothing can often be detected using graphical representations to compare the fit of models with varying magnitudes of spans. After the initial examination of the fitted curves and the original data it is essential to refer to other graphical aids to view the whole picture of a data set. Cleveland & Devlin (Reference Cleveland and Devlin1988) suggest that other diagnostic procedures for regression models (see Daniel & Wood, Reference Daniel and Wood1971; Belsley et al., Reference Belsley, Kuh and Welsch1980; Cook & Weisberg, Reference Cook and Weisberg1982; Chambers et al., Reference Chambers, Cleveland, Kleiner and Tukey1983) can be applied to local regression. A normal probability plot of |ε i | can be made to check the normality assumption, a plot of |ε i | against $$\hat{y}_{i} $$ can be made to check the assumption of constant variance, and a plot of |ε i | against the independent variables can be used to check for bias in the model.

Selection of an appropriate bandwidth has a critical effect on the smoothed LOESS estimate f̌(x). The advantage of more formal model selection criteria such as C p or cross-validation is that they are automated. The disadvantage is that these methods can easily give bandwidths that are not optimal depending on the data set they are applied to. Graphical displays are powerful tools that show the bias-variability tradeoff readily and aid in making decisions on the importance of each factor. However, as noted by Cleveland & Loader (Reference Cleveland and Loader1996), graphical diagnostics are labour intensive and therefore not the most efficient method for finding the optimal smoothed estimate. In practise, model selection criteria can be used to decide on a fit and that fit should then be analysed using graphical representations to study the goodness of fit. The importance of graphical representations when completing LOESS smoothing and choosing an appropriate value for the span cannot be emphasised enough – both methods are more powerful when used in conjunction.

4.3. Polynomial degree selection

Another important decision when fitting a LOESS model is the choice of polynomial degree, λ, which also affects the bias-variability tradeoff. Polynomials with higher degrees will produce estimates that are generally less biased but more variable than their lower degree counterparts. In RFootnote 2 the loess() function allows for the fitting of polynomials of degree 0, 1, and 2. Cleveland & Loader (Reference Cleveland and Loader1996) state that polynomials with degree 0, or locally constant fits, are rarely if ever the best choice because they cannot exactly fit any mathematical curve except for a straight line, which can only be fit in special cases. Loader (1999 Reference Loadera ) affirms that the most common choices in practise are local linear, λ=1, and local quadratic, λ=2, fits. By using polynomials of degrees greater than 0, the bandwidth can be increased without introducing intolerable bias. Higher-order polynomials have a larger number of coefficients to estimate, resulting in higher variance. However, the final fit will be smoother due to a larger neighbourhood size. In practice, fitting local polynomials of third degree and higher is rarely beneficial. Often a choice between a local linear fit and a local quadratic fit, the default in R, can be made by examining the scatterplot of the data (see Jacoby, Reference Jacoby2000). A LOESS modification was integrated into the SC-CR algorithm that was programmed in R. The program was then used to model the trypanosomiasis data.

5. Application to a Real Data Set

The Trypanosoma brucei is a parasite that causes the rare disease African trypanosomiasis, colloquially referred to as African sleeping sickness. The parasite is transferred to humans by the bite of the tsetse fly. A person can be infected for long periods without exhibiting any symptoms. Indeed, when symptoms do appear, the disease is often already at an advanced stage of development. There are two forms that the disease can take: the neurological form (N) and the lymphatic-sanguine (LS) form. These will comprise the two competing modes of failure, where the failure time, t i , is the age at which a person is first infected by the parasite. This diagnosis phase is crucial because it determines the appropriate type of treatment, which is highly toxic.

The data is from the Bulletin Épidémiologique Hebdomadaire that was published by the Institut de Veille Sanitaire as found in Legros et al. (Reference Legros and Ancelle2006), and is summarised in Table 4. There were 26 cases of the disease reported in France during the years 1980 and 2004. Due to the long incubation period where the disease was not detected, the exact age at infection was not always known precisely, resulting in numerous left-censored observations. Also, in few cases, the exact form of the disease (N or LS) was not able to be determined. This resulted in observations that can be viewed as masked competing risks. As such, the SC-CR algorithm of Adamic (Reference Adamic2010) can therefore be used to model the cumulative incidence failure times under these two forms of the trypanosomiasis disease. The SC-CR algorithm was run on the data, assuming an ε of 0.0000001. The results are given in the second and third columns of Table 5.

Table 4 Summary of the trypanosomiasis data.

Note: N, neurological form; LS, lymphatic-sanguine form.

Table 5 Converged failure rates for the trypanosomiasis data set.

Note that <50% of the possible data points have cause-specific innermost intervals for the two competing risks. Smoothing these two vectors of probability mass would not be advisable since there are so many gaps between the innermost intervals, making the probability landscape sparse and multimodal. Use of the LOESS modification at each iteration of the SC-CR algorithm is therefore opportune. To implement the smoothing modification, a LOESS function with a span of 1.6 and degree of 2 was used. Independent local fits for each risk seemed justified, since the correlation coefficient of the two competing densities before smoothing was only −0.1473877. Thus, a multivariate LOESS modification was not necessary in this case. The converged results after 32 iterations are also shown in Table 5, where an ε of 0.00000001 was again used.

The LOESS modification to the SC-CR algorithm produced revealing results. Note that the gaps between the innermost intervals were successfully filled in an optimal manner. In turn, this produces valuable information that was not previously available. First, there are obviously non-zero probabilities of decrement at each of the possible ages in reality. This is immediately rectified once the LOESS modification is employed. Second, the magnitudes of the failure probabilities are themselves revealing. For example, consider age 50. Before the LOESS modification, it was not known whether failure due to cause N or cause LS was more likely (since they were both 0). Now, we can estimate that infection at age 50 is over twice as likely to be due to form N than form LS. Figure 1 depicts the final CIF curves for each of the two competing risks.

Figure 1 Converged cumulative incidence functions for the trypanosomiasis data. N, neurological form; LS, lymphatic-sanguine form.

6. Conclusion and Future Work

There are other advantages that can be realised from using the LOESS modification to the SC-CR algorithm. First, local regression adapts well to bias problems at boundaries and in regions of high curvature. We note that the issue of larger bias introduced from arbitrarily placing probability mass at the right end of the innermost intervals, a dynamic outlined in Duchesne & Stafford (Reference Duchesne and Stafford2001), is even more relevant in the competing risks framework due to the larger number of innermost intervals involved. The LOESS modification to the SC-CR algorithm is unarbitrarily data driven, thereby making the cause-specific innermost intervals, and the problems they engender, irrelevant, as was also stated in Braun et al. (Reference Braun, Duchesne and Stafford2005) for kernel smoothing. Second, the use of the LOESS modification tends to reduce the number of iterations required for convergence. Many methods have been developed to provide fast computation of a smoothed estimate for one or more independent variables, partially because of the conceptual simplicity and excellent theoretical properties of local regression, a dynamic that also allows it to work for many different distributional assumptions. Finally, local regression does not require the smoothness and regularity conditions required by other smoothing methods, such as boundary kernels. These and other strengths of local regression, as outlined in Hastie & Loader (Reference Hastie and Loader1993), make local regression an attractive supplement to the SC-CR algorithm.

In terms of future work directly related to the LOESS modification, we note that the modification can be applied, not only to left and right-censored failures, but to other censoring mechanisms as well. It can be applied to a variety of self-consistent models that include left-censored, doubly-censored, truncated, and other more elaborate censoring schemes that are very much at the forefront of contemporary scholarship in survival analysis.

Acknowledgement

This work was supported by the Government of Canada: Natural Sciences and Engineering Research Council of Canada under Discovery Grant No. 356028-2010 RGPIN.

Appendix

Note that the definition and notation for univariate self-consistency, as found in the first two lines of the proof, come from Braun et al. (Reference Braun, Duchesne and Stafford2005).

Proof for the self-consistency of the CIFs derived from the SC-CR algorithms

$$\eqalignno{ \hat{F}_{K} (x) &#x0026; {\,\equals\,}E_{{K{\minus}1}} \left[ {F_{n} (x)\left| \Psi \right.} \right] \cr &#x0026; {\,\equals\,}E_{{K{\minus}1}} \left[ {{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {I(X_{i} \leq x)} \left| \Psi \right.} \right] \cr &#x0026; {\,\equals\,}E_{{K{\minus}1}} \left[ {{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {\mathop{\sum}\limits_j {I(X_{i}^{{(j)}} \leq x)} \left| \Psi \right.} } \right] \cr &#x0026; {\,\equals\,}\mathop{\sum}\limits_j {E_{{K{\minus}1}} } \left[ {{1 \over n}\mathop{\sum}\limits_{i{\equals}1}^n {I(X_{i}^{{(j)}} \leq x)} \left| \Psi \right.} \right] \cr &#x0026; {\,\equals\,}\mathop{\sum}\limits_j {\hat{F}_{K}^{{(j)}} } (x),\,{\rm if}\,{\rm and}\,{\rm only}\,{\rm if}\,\hat{F}_{K}^{{(j)}} (x)\,{\rm is}\,{\rm a}\,{\rm self{\minus}consistent}\,{\rm estimator}\,{\rm of}\,{F}_{K}^{{(j)}} (x) \cr &#x0026; {\,\equals\,}{F}_{K}^{{(j)}} (x),\,{\rm a}\,{\rm property}\,{\rm of}\,{\rm the}\,{\rm SC{\minus}CR}\,{\rm algorithm} $$

Thus, $\hat{F}_{K}^{{(j)}} (x)$ must be a self-consistent estimator of $$F_{K}^{{(j)}} (x)$$ , otherwise a contradiction would ensue.

Footnotes

1 The terms multiple decrement and competing risk will be used synonymously.

2 R is a free command line statistical software and implementation of S that can be downloaded from www.r-project.org

References

Aalen, O. (1976). Nonparametric inference in connection with multiple decrement models. Scandinavian Journal of Statistics, 3, 1527.Google Scholar
Adamic, P. (2010). Modeling multiple risks in the presence of double censoring. Scandinavian Actuarial Journal, 1, 6881.CrossRefGoogle Scholar
Adamic, P. (2012). A model for time-dependent competing risks. Contributed paper presented at the 40th Annual Meeting of the Statistical Society of Canada, June 4, 2012, University of Guelph, Guelph, Ontario, Canada.Google Scholar
Adamic, P., Dixon, S. & Gillis, D. (2010). Multiple decrement modeling in the presence of interval censoring and masking. Scandinavian Actuarial Journal, 4, 312327.CrossRefGoogle Scholar
Belsley, D.A., Kuh, E. & Welsch, R.E. (1980). Regression Diagnostics. John Wiley, New York.CrossRefGoogle Scholar
Bowers, N.L., Gerber, H.U., Hickman, J.C., Jones, D.A. & Nesbitt, C.J. (1997). Actuarial Mathematics, 2nd edition. The Society of Actuaries, Schaumburg, IL.Google Scholar
Braun, J., Duchesne, T. & Stafford, J.E. (2005). Local likelihood density estimation for interval censored data. The Canadian Journal of Statistics, 33(1), 3960.CrossRefGoogle Scholar
Chambers, J.M., Cleveland, W.S., Kleiner, B. & Tukey, P.A. (1983). Graphical Methods for Data Analysis. Wadsworth, Monterey, CA.Google Scholar
Cleveland, W.S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74, 829836.CrossRefGoogle Scholar
Cleveland, W.S. & Devlin, S.J. (1988). Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American Statistical Association, 83, 596610.CrossRefGoogle Scholar
Cleveland, W.S. & Loader, C.L. (1996). Smoothing by local regression: principles and methods. In W. Haerdle & M.G. Schimek (Eds.), Statistical Theory and Computational Aspects of Smoothing (pp. 1049). Springer, New York.CrossRefGoogle Scholar
Cook, R.D. & Weisberg, S. (1982). Influence and Residuals in Regression. Chapman & Hall, New York.Google Scholar
Cox, D.R. (1959). The analysis of exponentially distributed lifetimes with two types of failures. Journal of the Royal Statistical Society, Series B, 21, 411421.Google Scholar
Craiu, R.V. & Reiser, B. (2006). Inference for the dependent competing risks model with masked causes of failure. Life Time Data Analysis, 12, 2133.CrossRefGoogle ScholarPubMed
Daniel, C. & Wood, F. (1971). Fitting Equations to Data. John Wiley, New York.Google Scholar
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 138.Google Scholar
Dinse, G.E. (1982). Nonparametric estimation for partially-complete time and type of failure data. Biometrics, 38, 417431.CrossRefGoogle ScholarPubMed
Duchesne, T. & Stafford, J.E. (2001). A kernel density estimate for interval censored data, technical report no. 0106, University of Toronto, Toronto, Ontario, Canada.Google Scholar
Efron, B. (1967). The two-sample problem with censored data. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability: University of California Press, 4, 831853.Google Scholar
Escarela, G. & Carriere, J.F. (2003). Fitting competing risks with an assumed copula. Statistical Methods in Medical Research, 12(4), 333349.CrossRefGoogle ScholarPubMed
Finkelstein, D.M. (1986). A proportional hazards model for interval-censored failure time data. Biometrics, 42, 845854.CrossRefGoogle ScholarPubMed
Flehinger, B.J., Reiser, B. & Yashchin, E. (1998). Survival with competing risks and masked causes of failures. Biometrika, 85, 151164.CrossRefGoogle Scholar
Gasser, T., Kneip, A. & Köhler, W. (1991). A flexible and fast method for automatic smoothing. Journal of the American Statistical Association, 86, 643652.CrossRefGoogle Scholar
Gichangi, A. & Vach, W. (2005). The analysis of competing risks data: a guided tour. Statistics in Medicine, 132(4), 141.Google Scholar
Goutis, C. (1997). Nonparametric estimation of a mixing density via the kernel method. Journal of the American Statistical Association, 92, 14451450.CrossRefGoogle Scholar
Groeneboom, P., Maathuis, M.H. & Wellner, J.A. (2008). Current status data with competing risks: consistency and rates of convergence of the MLE. The Annals of Statistics, 36, 10311063.Google Scholar
Hastie, T. & Loader, C.R. (1993). Local regression: automatic kernel carpentry. Statistical Science, 8, 120129.Google Scholar
Huang, J. (2000). Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica, 9, 501519.Google Scholar
Hudgens, M.G., Satten, G.A. & Longini, I.M. Jr (2001). Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics, 57, 7480.CrossRefGoogle ScholarPubMed
Irizarry, R.A. (2001). Chapter 3: local regression. Available online at the address http://www.biostat.jhsph.edu/ ririzarr/Teaching/754/section-03.pdf [accessed Jun-2015].Google Scholar
Jacoby, W.G. (2000). LOESS: a nonparametric, graphical tool for depicting relationships between variables. Electoral Studies, 19, 577613.CrossRefGoogle Scholar
Jewell, N.P., Van Der Laan, M. & Henneman, T. (2003). Nonparametric estimation from current status data with competing risks. Biometrika, 90, 183197.CrossRefGoogle Scholar
Kalbfleisch, J.D. & Prentice, R.L. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.Google Scholar
Klein, J.P. & Moeschberger, M.L. (1997). Survival Analysis: Techniques for Censored and Truncated Data. Springer, New York.CrossRefGoogle Scholar
Lawless, J.F. (2003). Statistical Models and Methods for Lifetime Data, 2nd edition. John Wiley and Sons, New York.Google Scholar
Legros, F. & Ancelle, T. (2006). Trypanosomiase humaine africaine: recensement des cas d’importation observés en France, 1980–2004. Bulletin épidémiologique hebdomadaire, 7, 5759.Google Scholar
Li, J. & Yu, Q. (2014). A consistent NPMLE of the joint distribution function with competing risks data under the dependent masking and right-censoring model. Life Time Data Analysis, 22, 6399.CrossRefGoogle ScholarPubMed
Li, L., Watkins, T. & Yu, Q. (1997). An EM algorithm for smoothing the self-consistent estimator of survival functions with interval-censored data. Scandinavian Journal of Statistics, 24, 531542.CrossRefGoogle Scholar
Loader, C.R. (1999 a). Local Regression and Likelihood. Springer-Verlag, New York.CrossRefGoogle Scholar
Loader, C.R. (1999 b). Bandwidth selection: classical or plug-in? The Annals of Statistics, 27, 415438.CrossRefGoogle Scholar
Mallows, C.L. (1966). Choosing a subset regression. Presented at the Annual Meeting of the American Statistical Association, Los Angeles, California, August 15–19.Google Scholar
Mallows, C.L. (1973). Some comments on C p . Technometrics, 15, 661675.Google Scholar
Pan, W. (2000). Smooth estimation of the survival function for interval censored data. Statistics in Medicine, 24, 531542.Google Scholar
Park, C. & Padgett, W.J. (2006). Analysis of strength distributions of multi-modal failures using the EM algorithm. Journal of Statistical Computation and Simulation, 76, 619636.CrossRefGoogle Scholar
Ruppert, D., Sheather, S.J. & Wand, M.P. (1995). An effective bandwidth selector for local least squares regression. Journal of the American Statistical Association, 901, 12571270.CrossRefGoogle Scholar
Stafford, J. (2005). “Exact” local likelihood estimators. Available online at the address http://www.gla.ac.uk/external/RSS/RSScomp/Stafford.pdf [accessed Jun-2015].Google Scholar
Tsiatis, A. (1975). A nonidentifiability aspect of the problem of competing risks. Proceedings of the National Academy of Sciences USA, 72, 2022.CrossRefGoogle ScholarPubMed
Turnbull, B.W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. Journal of the American Statistical Association, 69, 169173.CrossRefGoogle Scholar
Turnbull, B.W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, 38, 290295.Google Scholar
Wang, J., Yu, Q. & Wong, G.Y.C. (2012). The random partition masking model for interval-censored and masked competing risks data. Journal of Statistical Computation and Simulation, 82, 9811002.CrossRefGoogle Scholar
Yu, Q. & Li, J. (2012). The NPMLE of the joint distribution function with right-censored and masked competing risks data. Journal of Nonparametric Statistics, 24, 753764.CrossRefGoogle Scholar
Yu, Q., Li, L. & Wong, G.Y.C. (2000). On consistency of the self-consistent estimator of survival functions with interval-censored data. Scandinavian Journal of Statistics, 27, 3544.CrossRefGoogle Scholar
Zhou, M. (2004). Nonparametric Bayes estimator of survival functions for doubly/interval censored data. Statistica Sinica, 14, 533546.Google Scholar
Figure 0

Table 1 Iterations 1, 2, 5 (one risk, doubly censored).

Figure 1

Table 2 Iterations 1, 2, 5 (two competing risks, doubly censored, complete masking).

Figure 2

Table 3 Iterations 1, 2, 5 (two competing risks, doubly censored, partial masking).

Figure 3

Table 4 Summary of the trypanosomiasis data.

Figure 4

Table 5 Converged failure rates for the trypanosomiasis data set.

Figure 5

Figure 1 Converged cumulative incidence functions for the trypanosomiasis data. N, neurological form; LS, lymphatic-sanguine form.