Treating ordinal variables as interval or continuous variables might produce biased results (Olsson, Reference Olsson1979). The (family of) polychoric models (PM) deals with ordinal data by categorizing them with latent multivariate normal variables. PM mainly appears in two situations. First, PM is used to study the association between ordinal variables, often leading to a polychoric correlation model (PCM). PCM assumes that the latent variables underlying two ordinal variables follow the bivariate normal distribution with zero mean and a correlation matrix known as the polychoric correlation (Olsson, Reference Olsson1979). In particular, if both variables are binary, a polychoric correlation reduces to a tetrachoric correlation (Bonett & Price, Reference Bonett and Price2005; Pearson, Reference Pearson1901).
Second, PM subsumes several commonly used psychometric models. In particular, PM is a general model of structural equation modeling with ordinal data (ordinal SEM; Muthén, Reference Muthén1984). Ordinal SEM models PM’s mean vector and covariance matrix as a function of some parameters. Several models are special cases of ordinal SEM, such as the graded response model (Samejima, Reference Samejima1968; Samejima, Reference Samejima, van der Linden and Hambleton1997) and the family of item factor analysis models (Wirth & Edwards, Reference Wirth and Edwards2007).
Although PM is the basis of several models, its identifiability has not been addressed in the literature; only the identification of PCM (which, as mentioned above, is a special case of PM) has been proved (Almeida & Mouchart, Reference Almeida and Mouchart2003a). The importance of establishing the identifiability of a statistical model cannot be overstated. If a model is not identifiable, one may have two sets of parameters with the same probability distribution, even when the sample size approaches infinity. In this case, all estimators would not be consistent estimators.
To make the issue a bit more complicated, the normality assumption underlying PM has been challenged recently; researchers have suggested that the latent variables underlying PM could be generalized to the family of elliptical distributions, such as the multivariate logistic distribution and the multivariate t distribution (Jin & Yang-Wallentin, Reference Jin and Yang-Wallentin2017; Kolbe et al., Reference Kolbe, Oort and Jak2021; Roscino & Pollice, Reference Roscino, Pollice, Zani, Cerioli, Riani and Vichi2006). In light of this, in this article we explore these more general elliptical distributions. Two unsolved questions can be posited: (a) Are PM and/or PCM with latent elliptical distributions identifiable? (b) If any of them is not identifiable, can we find the minimal identifiability constraints? Practically, even if PCM is identifiable, it is not practical in some situations. For instance, when modeling the developmental changes of children, it is unreasonable to assume that all of the mean vectors are zero (McArdle et al., Reference McArdle, Petway, Hishinuma, Reise and Revicki2015; Muthén, Reference Muthén1984). Therefore, finding some other reasonable identifiability constraints is a task with practical significance.
This article aims at answering the above two questions and is organized as follows. Section 1 gives formal definitions of PM and PCM with elliptical distributions. We address the issue of (a) in Section 2. By generalizing Almeida and Mouchart’s (Reference Almeida and Mouchart2003a) argument, we show that PCM with elliptical distributions is identifiable. In particular, we prove the identification of the polychoric t correlation model based on the copula representation. On the other hand, PM with elliptical distributions is not identifiable. We address the issue of (b) in Section 3. We show that one can find the identifiability constraints of PM through the equivalence-classes approach of Tsai (Reference Tsai2000, Reference Tsai2003). This approach can also help determine the measurement scales of latent variables. The minimal identifiability constraints of PM on Likert scales and also on comparative judgment are demonstrated. Moreover, we prove a theorem stating the necessary and sufficient conditions for the identifiability of ordinal SEM and item factor analysis. Section 4 is devoted to the discussion of possible implications and applications induced by these identifiability constraints.
1 Definitions of PM and PCM with elliptical distributions
Suppose there are K ordinal-scale item scores,
${W}_1,\dots, {W}_K$
with support
$\left\{1,\dots, {r}_k\right\},$
respectively. Let
$W=\left({W}_1,\dots, {W}_K\right)$
be the response vector. The family of polychoric models (PM) assumes that for an ordinal variable
${W}_k,$
there is a corresponding vector of cut-offs (or thresholds)
${a}^{(k)}=\left({a_j}^{(k)}:-\infty ={a_0}^{(k)}<{a_1}^{(k)}<\cdots <{a_{r_k}}^{(k)}=\infty \right)$
and a latent random variable
${X_k}^{\ast }$
such that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu1.png?pub-status=live)
That is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqn1.png?pub-status=live)
where
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$
follows a distribution function
$F.$
Thus, PM can be parametrized by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu2.png?pub-status=live)
where
$\mathcal{A}=\{{a}^{(k)}\}_{k=1,\dots, K},$
and
${\Theta}_{PM}$
is the parameter space for this parametrization.
In the following, we review the concept of spherical and elliptical distributions with some examples.
Definition 1.1. (Fang et al., Reference Fang, Kotz and Ng1990)
-
(i) Let
$\mathbf{X}$ be a p-dimensional random vector.
$\mathbf{X}$ is said to have a spherical distribution if for any orthogonal transformation Γ,
$\boldsymbol{\Gamma} \mathbf{X}\overset{d}{=}\mathbf{X},$ where
$\overset{d}{=}$ means “equal in distribution.”
-
(ii) Let Y be an n-dimensional random vector. Y is said to have an elliptical distribution with parameters
$\boldsymbol{\mu} \in {\mathrm{\mathbb{R}}}^n$ and
$\boldsymbol{\Sigma} \in {\mathrm{\mathbb{R}}}^{n\times n}$ if
$$\begin{align*}\mathbf{Y}\overset{d}{=}\boldsymbol{\mu} +{\mathbf{A}}^{\top}\mathbf{X},\end{align*}$$
where
$\mathbf{A}\in {\mathrm{\mathbb{R}}}^{p\times n},{\mathbf{A}}^{\top}\mathbf{A}=\boldsymbol{\Sigma}$ with
$\operatorname{rank}\left(\boldsymbol{\Sigma} \right)=p.$
The following proposition characterizes the spherical and elliptical distributions.
Proposition 1.2. (Fang et al., Reference Fang, Kotz and Ng1990)
-
(i)
$\mathbf{X}$ is spherically distributed if and only if there is a scalar function
$\phi \left(\cdot \right)$ such that the characteristic function of
$\mathbf{X}$ satisfies
${\psi}_{\mathbf{X}}\left(\mathbf{t}\right)=\phi \left({\mathbf{t}}^{\top}\mathbf{t}\right).$ The function
$\phi \left(\cdot \right)$ is called the characteristic generator of the spherical distribution.
-
(ii) Under Definition 1.1(ii), The characteristic function of Y satisfies
${\psi}_{\mathbf{Y}}\left(\mathbf{t}\right)={e}^{i{\mathbf{t}}^{\top}\boldsymbol{\mu}}\phi \left({\mathbf{t}}^{\top}\boldsymbol{\Sigma} \mathbf{t}\right).$ Therefore, an elliptical distribution can be parametrized by
$\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi,$ denoted
${EC}_n\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right).$
The parameters of (a family of) elliptical distributions are named below.
Definition 1.3.
$\mathbf{Y}\sim {EC}_n\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right)$
, where
$\boldsymbol{\mu}$
is the location vector,
$\boldsymbol{\Sigma}$
is the dispersion matrix or scatter matrix. If
$\boldsymbol{\Sigma}$
is positive definite, then
$\mathbf{P}=\operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}\boldsymbol{\Sigma} \operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}$
is called the pseudo-correlation matrix. If the first and second moments of Y exist, then
$\boldsymbol{\mu}$
is called the mean vector,
$\boldsymbol{\Sigma}$
is called the covariance matrix, and
$\mathbf{P}$
is called the correlation matrix.
Some commonly used elliptical distributions include the normal, logistic, uniform, and t distributions.
Example 1.4.
$\mathbf{Y}\sim {EC}_n\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right).$
Then Y follows
-
(i) a multivariate Normal distribution if
$\phi (u)=\exp \left(-\frac{u}{2}\right).$ Also, we denote
$\mathbf{Y}\sim \mathcal{N}_n\left(\boldsymbol{\mu}, \boldsymbol{\Sigma} \right).$
-
(ii) a multivariate logistic distribution if
$\phi (u)=4\exp \left(-u\right)/{\left(1+\exp \left(-u\right)\right)}^2.$
-
(iii) a multivariate uniform distribution if
$\phi (u)=2{I}_{\left\{u<1\right\}}.$
-
(iv) a multivariate t distribution if
${\phi}_{\nu }(u)={\left(1+u/\nu \right)}^{-\left(\nu +2\right)/2}.$ Also, we denote
$\mathbf{Y}\sim {t}_{n,\nu}\left(\boldsymbol{\mu}, \boldsymbol{\Sigma} \right).$ The characteristic function of a multivariate t distribution depends on the degrees of freedom parameter ν.
Definition 1.5.
-
(i) PM with a latent elliptical distribution characterized by
$\phi$ is
${P}_{\theta_{PM},}$ where
${\theta}_{PM}=\left({EC}_n\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right),\mathcal{A}\right).$
-
(ii) PCM with a latent elliptical distribution characterized by
$\phi$ is
${P}_{\theta_{PCM},}$ where
${\theta}_{PCM}= \left({EC}_n\left(\mathbf{0},\mathbf{P}, \phi \right),\mathcal{A}\right).$
Unless specifically stated, in this article we study PCM and PM within the realm of elliptical distributions.
2 Identifiability of PCM with elliptical distributions
We first define some terms concerning model identifiability:
Definition 2.1.
${\mathcal{P}}_{\theta}=\left\{{P}_{\theta }:\theta \in \Theta \right\}$
is a parametric model. Let
${\theta}^{\ast}=g\left(\theta \right)\in {\Theta}^{\ast}\subset \Theta .$
Then
-
(i) For any
${\theta}_1,{\theta}_2\in \Theta,$
${\theta}_1,{\theta}_2$ are empirically indistinguishable if
${P}_{\theta_1}={P}_{\theta_2},$ i.e., for any response w,
${P}_{\theta_1}(w)={P}_{\theta_2}(w).$
-
(ii)
${\mathcal{P}}_{\theta }$ is identifiable (or
$\theta$ is identifiable) if
${P}_{\theta_1}={P}_{\theta_2}\to {\theta}_1={\theta}_2$ for all
${\theta}_1,{\theta}_2\in \Theta .$
-
(iii)
${\mathcal{P}}_{\theta }$ is just-identified (or
$\theta$ is just-identified) if
$\forall P\in {\mathcal{P}}_{\theta },$ there is a unique
$\theta$ such that
$P={P}_{\theta }.$
-
(iv)
${\mathcal{P}}_{\theta }$ is partially identifiable over
${\theta}^{\ast }$ (or
${\theta}^{\ast }$ is partially identifiable) if
${P}_{\theta_1}={P}_{\theta_2}\to {\theta_1}^{\ast}={\theta_2}^{\ast }$ for all
${\theta}_1,{\theta}_2\in {\Theta}^{\ast }.$
-
(v) For any
${\theta}_0\in \Theta$ , the identified set of
${P}_{\theta_0}$ is
$\left\{\theta :{P}_{\theta}={P}_{\theta_0}\right\}.$
In ordinary language, two sets of parameters are empirically indistinguishable if they have the same probability structure for all possible outcomes. A model is identifiable if distinct sets of parameters correspond to distinct distribution functions; thereby, constructing consistent estimators is possible. Similarly, a model is partially identifiable over
${\theta}^{\ast }$
if distinct sets of
${\theta}^{\ast }$
correspond to distinct distribution functions.
The following proposition states that if the model is parametrized by
$\theta,$
then the model is identifiable if and only if the model is just-identified. Therefore, if the model is parameterized by
$\theta,$
there is no need to distinguish between whether the model is just-identified or over-identified. The distinguishment between just-identified or over-identified is meaningful only when considering the restricted model of a parameterized model (see Section 3.1).
Proposition 2.2.
${\mathcal{P}}_{\theta}=\left\{{P}_{\theta }:\theta \in \Theta \right\}.{\mathcal{P}}_{\theta }$
is identifiable iff
${\mathcal{P}}_{\theta }$
is just-identified.
Proof.
$\left(\supseteq \right)$
The uniqueness of
$\theta$
implies the identification of
${\mathcal{P}}_{\theta }.$
$\left(\subseteq \right)$
The
${P}_{\theta }$
is parametrized by
$\theta,$
so for any
$P\in {\mathcal{P}}_{\theta },$
there is a
$\theta$
such that
$P={P}_{\theta }.$
The uniqueness of
$\theta$
is guaranteed by the identification of
${P}_{\theta }.$
Note that parameter identification is a necessary condition for the existence of a consistent estimator. This proposition has been implicitly stated in some studies (such as Gu & Xu, Reference Gu and Xu2019; Ouyang & Xu, Reference Ouyang and Xu2022). San Martín and Quintana (Reference San Martín and Quintana2002) provided a formal formulation and a proof of the proposition (see below, where we also provide an alternative proof).
Proposition 2.3. (San Martín & Quintana, Reference San Martín and Quintana2002)
${\mathcal{P}}_{\theta}=\left\{{P}_{\theta }:\theta \in \Theta \right\}$
is a statistical model and
${X}_1,\dots, {X}_n$
are independently, identically distributed random variables from
${P}_{\theta }$
. Further, g is an invertible function of. If
${\mathcal{P}}_{\theta }$
is not identifiable, then there is no consistent estimator of
$g\left(\theta \right).$
That is, the identifiability of the parameter is a necessary condition for the existence of a consistent estimator.
Proof. If the estimator
${\delta}_n=\delta \left({X}_1,\dots, {X}_n\right)$
is a consistent estimator of
$g\left(\theta \right),$
then
${\delta}_n\to g\left(\theta \right)$
under
${P}_{\theta }$
for any
$\theta \in \Theta .$
Given
${P}_{\theta_1}={P}_{\theta_2}$
while
${\theta}_1\ne {\theta}_2,$
we have
${\delta}_n\to g\left({\theta}_1\right)$
under
${P}_{\theta_1}$
and
${\delta}_n\to g\left({\theta}_2\right)$
under
${P}_{\theta_2}.$
Because
${P}_{\theta_1}={P}_{\theta_2},$
${\delta}_n$
has the same distribution under
${P}_{\theta_1}$
and
${P}_{\theta_2}$
based on the inversion formula. Since a convergence sequence has a unique limit,
$g\left({\theta}_1\right)=g\left({\theta}_2\right).$
Because g is invertible, we have
${\theta}_1={\theta}_2.$
—a contradiction.
Identifiability is a property of a statistical model, sometimes defined as a property of parameters (Casella & Berger, Reference Casella and Berger2002, p. 523). However, the statement “
$\theta$
is identifiable” is ambiguous because it does not specify the statistical model in consideration. For example, Almeida and Mouchart (Reference Almeida and Mouchart2003a) showed that PCM under the normality assumption is identifiable in PCM but not in PM. Indeed, the latter fact was derived from a proposition in Almeida and Mouchart (Reference Almeida and Mouchart2003a).
Proposition 2.4. (Almeida and Mouchart, Reference Almeida and Mouchart2003a) For any monotonic increasing functions
${g}_1,\dots, {g}_k,$
define
$g=\left({g}_1,\dots, {g}_k\right)$
as a component-wise transformation. Then
${\theta}_{PM}=\left(F,A\right)$
is empirically indistinguishable from
${\theta}_{PM,g}=\left({F}_g,{A}_g\right),$
where
${F}_g=F\circ {g}^{-1}$
and
${A}_g=\left\{{g}_j\left({a}_j^{(k)}\right)\right\}_{j=1,\dots, {r}_k-1; k=1,\dots, K}.$
2.1 Olsson’s (Reference Olsson1979) argument of identification of PCM
Olsson (Reference Olsson1979) worked on the identification problem under PCM (with the normality assumption) by counting the difference between the number of parameters and the number of independent proportions. For convenience, we define the degrees of freedom, df, as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu4.png?pub-status=live)
This definition is consistent with the common usage of degrees of freedom (Rodgers, Reference Rodgers2019). Olsson (Reference Olsson1979) argued that PCM is identified because
$df\ge 0.$
In particular, the model is just-identified if
$df=0$
and over-identified if
$df>0.$
However, Proposition 2.4 implies that Olsson’s (Reference Olsson1979) argument of identifiability is flawed, in that even for PCM under the normality assumption,
${\theta}_{PCM}=\left({\mathcal{N}}_K\left(\mathbf{0},\mathbf{P}\right),\mathcal{A}\right)$
satisfying
$df\ge 0,$
there could be a model
${\theta}_{PCM,g}=\left({\mathcal{N}}_K{\left(\mathbf{0},\mathbf{P}\right)}_g,{\mathcal{A}}_g\right)$
indistinguishable from it. Thus, merely observing
$df\ge 0$
for a model does not guarantee that the model is identifiable. A similar statement (showing that
$df\ge 0$
is a necessary but not sufficient condition for model identification) also can be found in the SEM literature (the t rule in Bollen, Reference Bollen1989).
In the literature, Almeida and Mouchart (Reference Almeida and Mouchart2003a) provided the first rigorous proof of identifiability of PCM under the normality assumption. In the following, we generalize their proof to the elliptical distribution case. Specifically, using Proposition 2.2, we show that PCM is always just-identified under its model assumptions (see Theorem 2.5).Footnote 1
2.2 Generalization of Almeida and Mouchart’s (Reference Almeida and Mouchart2003a) proof
Motivated by Theorem 3.1 in Almeida and Mouchart (Reference Almeida and Mouchart2003a), we now establish the conditions of identification under PCM with any latent continuous elliptical distributions.
Theorem 2.5. Under PCM with a latent elliptical distribution characterized by
$\phi,$
${P}_{\theta_{PCM}},$
where
${\theta}_{PCM}=\left({EC}_K\left(\mathbf{0},\mathbf{P}, \phi \right),\mathcal{A}\right)$
and
$\phi$
is known. If
-
(i)
$\phi$ defines a continuous distribution with strictly increasing univariate and bivariate CDFs on its support;
-
(ii) the polychoric (pseudo-)correlation matrix
$\mathbf{P}$ is positive definite,
-
(iii) then PCM (
${P}_{\theta_{PCM}}$ ) is just-identified.
Proof. It suffices to prove that if two pairs of parameters correspond to the same response probability vector, say,
$\left(\mathbf{P}, \mathcal{A}\right)$
and
$\left(\tilde{\mathbf{P}},\tilde{\mathcal{A}}\right),$
then
$\left(\mathbf{P}, \mathcal{A}\right)=\left(\tilde{\mathbf{P}},\tilde{\mathcal{A}}\right).$
Our proof is composed of two parts:
-
(i) The cut-offs (or thresholds) are identifiable: Without loss of generality, consider the jth cut-off of item k, say,
${a}_j^{(k)}$ and
${\tilde{a}}_j^{(k)}$ . Let the response vector
$\mathbf{w}=\left({r}_1,\dots, {r}_{k-1},j,{r}_{k+1},\dots, {r}_K\right).$ Then
$P\left(\mathbf{W}\le \mathbf{w}\right)={F}_{EC_{1}\left(0,1,\phi \right)}\left({a}_j^{(k)}\right)={F}_{EC_{1}\left(0,1,\phi \right)}\left({\tilde{a}}_j^{(k)}\right)$ . Since the CDF is injective,
${a}_j^{(k)}={\tilde{a}}_j^{(k)}.$ Because
${a}_j^{(k)}$ can be any cut-off in
$\mathcal{A}$ , we have
$\mathcal{A}=\tilde{\mathcal{A}}$ .
-
(ii) The pseudo-correlation matrix is identifiable: Consider a response vector
$\mathbf{w}=\left({r}_1,\dots, {r}_{l-1},i,{r}_{l-1},\dots, {r}_{k-1},j,{r}_{k+1},\dots, {r}_K\right),$ for which the lth response is i, and the kth response is j. We define
$g\left({\rho}_{lk}\right)=P\left(\mathbf{W}\le \mathbf{w}\right)=P\left({\mathbf{X}}^{\ast}\le {\left({a}_j^{(l)},{a}_i^{(k)}\right)}^{\top}\right)={F}_{EC_2\left(\mathbf{0},{\mathbf{P}}_{lk},\phi \right)}\left({a}_j^{(l)},{a}_i^{(k)}\right),$ where
${\mathbf{P}}_{lk}=\left(\begin{array}{cc}1& {\rho}_{lk}\\ {}{\rho}_{lk}& 1\end{array}\right).$ Because
$\mathbf{P}$ is positive definite,
${\mathbf{P}}_{lk}$ is also positive definite. So the Cholesky decomposition is unique:
${\mathbf{P}}_{lk}={\boldsymbol{\Lambda} \boldsymbol{\Lambda}}^{\top}=\left(\begin{array}{cc}1& 0\\ {}{\rho}_{lk}& \sqrt{1-{\rho_{lk}}^2}\end{array}\right)\left(\begin{array}{cc}1& {\rho}_{lk}\\ {}0& \sqrt{1-{\rho_{lk}}^2}\end{array}\right),$ and
$\boldsymbol{\Lambda}$ is invertible. We have
$g\left({\rho}_{lk}\right)=P\left({\boldsymbol{\Lambda}}^{-1}{\mathbf{X}}^{\ast}\le {\boldsymbol{\Lambda}}^{-1}{\left({a}_j^{(l)},{a}_i^{(k)}\right)}^{\top}\right)={F}_{EC_2\left(\mathbf{0},{\mathbf{I}}_2,\phi \right)}\left({a}_j^{(l)},\frac{-{\rho}_{lk}}{\sqrt{1-{\rho_{lk}}^2}}{a}_j^{(l)}+\frac{1}{\sqrt{1-{\rho_{lk}}^2}}{a}_i^{(k)}\right).$
Since the bivariate CDF is monotone,
$g\left({\rho}_{lk}\right)$
is strictly decreasing iff
$h\left({\rho}_{lk}\right)=\frac{-{\rho}_{lk}}{\sqrt{1-{\rho_{lk}}^2}}{a}_j^{(l)}+\frac{1}{\sqrt{1-{\rho_{lk}}^2}}{a}_i^{(k)}$
is strictly decreasing. Based on the calculus, we know that
$h\left({\rho}_{lk}\right)$
is strictly decreasing. Thus
$g\left({\rho}_{lk}\right)$
is strictly decreasing, and so
${\rho}_{lk}$
is identified. Because i and j can be any pair of items, we have
$\mathbf{P} =\tilde{\mathbf{P}}.$
Based on Proposition 2.2, the model is just-identified because it is identifiable and also a parametrized model.
Corollary 2.6. The multivariate normal distribution, multivariate t distribution with known degrees-of-freedom, multivariate logistic distribution, and multivariate uniform distribution all satisfy the conditions in Theorem 2.5, so their corresponding
${\theta}_{PCM}=\left({EC}_K\left(\mathbf{0},\mathbf{P}, \phi \right),\mathcal{A}\right)$
is identifiable. In particular, the multivariate normal case corresponds to Theorem 3.1 in Almeida and Mouchart (Reference Almeida and Mouchart2003a).
Remark 2.7. The multivariate t distribution (
${t}_{\nu}\left(\boldsymbol{\mu}, \boldsymbol{\Sigma} \right)$
) does not satisfy the conditions in Theorem 2.5, because the function
$\phi (u)={\left(1+u/\nu \right)}^{-\left(\nu +2\right)/2}$
depends on the degrees-of-freedom parameter
$\nu$
, which could be unknown.
As mentioned in Remark 2.7, Theorem
2.5
cannot cover the case of PCM with a latent multivariate t distribution (abbreviated as polychoric t correlation model). In the following, we provide a proof for this case based on the copula approach proposed by Almeida and Mouchart (Reference Almeida and Mouchart2003b). For brevity, in the following paragraphs, we denote the CDF of a distribution simply by the symbol of that distribution. For example, we denote the CDF of the t distribution with degrees of freedom
$\nu$
as
${t}_{\nu }.$
And the CDF of a multivariate t distribution
${t}_{\nu}\left(\boldsymbol{\mu}, \boldsymbol{\Sigma} \right)$
is abbreviated as
${t}_{\nu, \boldsymbol{\mu}, \boldsymbol{\Sigma}}.$
2.3 The copula approach to the identifiability of PCM
First, we introduce the concept of copula and define the copulas of elliptical distributions, under which the bivariate t copula will be a special case.
Definition 2.8. (Hofert, Reference Hofert2018) A copula is a multivariate CDF with standard uniform univariate margins, that is, the Unif(0,1) margins.
Definition 2.9. The elliptical copula characterized by
$\phi$
with pseudo-correlation
$\mathbf{P}=\operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}\boldsymbol{\Sigma} \operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}$
is defined as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu5.png?pub-status=live)
Definition 2.10. Let
$\mid \rho \mid \le 1$
and
$\nu >0.$
Then
${C}_{\rho, \nu}^t$
is a bivariate t copula if
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu6.png?pub-status=live)
where
$\mathbf{P}=\left(\begin{array}{cc}1& \rho \\ {}\rho & 1\end{array}\right)$
,
${u}_1,$
and
${u}_2$
are within
$\left[0,1\right].$
Remark 2.11. The elliptical copula does not depend on the scale of the elliptical distribution. That is,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu7.png?pub-status=live)
where
$\mathbf{P}=\operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}\boldsymbol{\Sigma} \operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}},{\left(\boldsymbol{\Sigma} \right)}_{kk}={\sigma_k}^2,$
and
$\boldsymbol{\mu} ={\left({\mu}_1,\dots, {\mu}_K\right)}^{\top }.$
Therefore, to avoid over-parameterization, the parameters of elliptical copula do not contain
${\mu}_k,{\sigma}_k,k=1,\dots, K.$
Almeida and Mouchart (Reference Almeida and Mouchart2003b) found that PCM can be reparametrized by the copulas. Under this parametrization, the cut-offs are no longer the parameters to be considered. We formulate this property as a lemma.
Lemma 2.12. (Almeida & Mouchart, Reference Almeida and Mouchart2003b) Let
${\theta}_{PCM}=\left({EC}_p\left(\mathbf{0},\mathbf{P},\phi \right),\mathcal{A}\right).$
If
$\phi$
is a continuous distribution with strictly increasing univariate and bivariate CDFs on its support. Also, the k-th marginal is
${F}_{EC_1\left(0,1,\phi \right)}.$
Let
${u_j}^{(k)}={F}_{EC_1\left(0,1,\phi \right)}\left({a_j}^{(k)}\right),j=1,\dots, {r}_k,$
and
${u}^{(k)}=\left({u_j}^{(k)}:0={u_0}^{(k)}<{u_1}^{(k)}<\cdots <{u_{r_k}}^{(k)}=1\right).$
Then PCM can be reparametrized by a one-to-one correspondence function
$\psi :{\Theta}_{PCM}\to {\Theta}_{PCM}^{CO},$
defined by
$\left(\phi, \mathbf{P}\right)\mapsto {C}_{\mathbf{P}}^{\phi },$
where
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu300.png?pub-status=live)
$U=\left\{{u}^{(k)}:k=1,\dots, K\right\},$
and
${\Theta}_{PCM}^{CO}$
is the parameter space for this parametrization.
Proof. The proof can be derived from Equation (1):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu8.png?pub-status=live)
That is, two sets of parameters
${\theta}_{PCM}$
and
$\psi \left({\theta}_{PCM}\right)$
define the same CDF. Moreover, the assumption of
$\phi$
guarantees that
${F}_{EC_1\left(0,1,\phi \right)}$
is invertible. It implies that the cut-offs can be uniquely determined through
${u_j}^{(k)}$
by
${F_{EC_1\left(0,1,\phi \right)}}^{-1}\left({u_j}^{(k)}\right)={a_j}^{(k)},j=1,\dots, {r}_k.$
So the function
$\psi$
defines a one-to-one correspondence between two parameter spaces.
Using Lemma 2.12, in the following, we construct another proof of Theorem 2.5 based on the parametrization of copulas.
Proof of Theorem 2.5 based on copula parametrization. Based on condition (i) and Lemma 2.12, the identifiability of PCM under the copula parameterization
$({\theta}_{PCM})$
implies the identifiability of PCM
$({\theta}_{PCM}^{CO})$
. We claim that if two pairs of copula parametrized parameters,
$\left({C}_{\mathbf{P}}^{\phi },U\right)$
and
$\left({C}_{\tilde{\mathbf{P}}}^{\phi },\tilde{U}\right)$
, correspond to the same response probability vector, then
$\left({C}_{\mathbf{P}}^{\phi },U\right)=\left({C}_{\tilde{\mathbf{P}}}^{\phi },\tilde{U}\right).$
This can be done by the following:
-
(i) The marginals are identifiable:
For any k and j, by definition, we have
${u_j}^{(k)}=P\left({W}_j\le j\right).$
Based on the law of large numbers,
$\left|\{{W}_j\le j\right\}|\!\left/ \!n\right.$
is a consistent estimator of
${u_j}^{(k)}.$
So based on Proposition 2.3,
${u_j}^{(k)}$
is identifiable.
-
(ii) The pseudo-correlation matrix is identifiable:
Let
${\mathbf{P}}_{lk}=\left(\begin{array}{cc}1& {\rho}_{lk}\\ {}{\rho}_{lk}& 1\end{array}\right).$
Define
$g\left({\rho}_{lk}\right)={C}_{\mathbf{P}}^{\phi}\left({u}_l,{u}_k\right).$
Similar to the proof of Theorem 2.5, we can show that
$g\left({\rho}_{lk}\right)$
is strictly decreasing.
We note by passing that the identifiability of the pseudo-correlation matrix also leads to a property of elliptical copulas:
Corollary 2.13.
${\partial}_{\rho_{lk}}{C}_{\mathbf{P}}^{\phi}\left({u}_l,{u}_k\right)<0$
for all
$l=1,\dots, K;$
$k=1,\dots, K;l\ne k.$
2.4 Identification of PCM with a latent multivariate t distribution
The following theorem establishes the identification of the polychoric t correlation model that Theorem 2.5 cannot cover. As aforementioned, the proof is based on the reparameterization of parameters by the copula.
Theorem 2.14. Let
${\theta}_{PCM,t}=\left({t}_{\nu}\left(\mathbf{0},\mathbf{P}\right),\mathcal{A}\right).$
If the polychoric (pseudo-)correlation matrix
$\mathbf{P}$
is positive definite, then the polychoric t correlation model (
${P}_{\theta_{PCM,t}}$
) is just-identified.
Proof. If two pairs of parameters correspond to the same response probability vector, say,
$\left({C}_{\mathbf{P}}^{\phi \left(\nu \right)},U\right)$
and
$\left({C}_{\tilde{\mathbf{P}}}^{\phi \left(\tilde{\nu}\right)},\tilde{U}\right),$
we claim that
$\left({C}_{\mathbf{P}}^{\phi },U\right)=\left({C}_{\tilde{\mathbf{P}}}^{\phi },\tilde{U}\right).$
Based on Lemma 2.12 and Corollary A.3, establishing
$\left({C}_{\mathbf{P}}^{\phi },U\right)=\left({C}_{\tilde{\mathbf{P}}}^{\phi },\tilde{U}\right)$
suffices to complete the proof. Also,
${C}_{\mathbf{P}}^{\phi }$
can be uniquely determined by all bivariate copulas implied by
${C}_{\mathbf{P}}^{\phi },$
i.e.,
$\left\{{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)\right\}_{l=1,\dots, K;k=1,\dots, K;l\ne k},$
so establishing
$\left({C}_{\rho_{12},\nu}^t,\dots, {C}_{\rho_{\left(K-1\right)K},\nu}^t,U\right)=\left({C}_{{\tilde{\rho}}_{12},\nu}^t,\dots, {C}_{{\tilde{\rho}}_{\left(K-1\right)K},\nu}^t,\tilde{U}\right)$
suffices to complete the proof.
Our proof is composed of two parts:
-
(i) The marginals are identifiable:
The proof is the same as the corresponding part of the “Proof of Theorem 2.5 based on copula parametrization” in the previous section.
-
(ii) The bivariate copulas are identifiable:
Consider a response vector
$\mathbf{w}=\left({r}_1,\dots, {r}_{l-1},i,{r}_{l-1},\dots, {r}_{k-1},j,{r}_{k+1},\dots, {r}_p\right),$
for which the lth response is i, and the kth response is j. We claim that both
${\partial}_{\rho_{lk}}{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)<0$
and
${\partial}_{\nu }{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)<0.$
For
$P\left(\mathbf{W}\le \mathbf{w}\right)={C}_{\rho_{lk},\nu}^t\left({u}_i^{(k)},{u}_j^{(l)}\right),$
there is a unique copula
${C}_{\rho_{lk},\nu}^t$
satisfying the equation. Thus
${C}_{\rho_{lk},\nu}^t$
is identified, which completes this part of proof. The fact that
${\partial}_{\rho_{lk}}{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)<0$
has been established by Corollary 2.13. It remains to show that
${\partial}_{\nu }{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)<0.$
Let
$\tilde{\nu}>\nu .$
Without loss of generality, we may assume that both
${u}_l=P\left({W}_l\le {w}_l\right)\le 0.5,$
and
${u}_k=P\left({W}_k\le {w}_k\right)\le 0.5.$
If
${u}_l$
or
${u}_k>0.5,$
then the response can be reversely coded to achieve both
${u}_l$
and
${u}_k\le 0.5.$
Because
$\mathbf{P}$
is positive definite,
${\mathbf{P}}_{lk}$
is also positive definite. Thus, the Cholesky decomposition
${\mathbf{P}}_{lk}={\boldsymbol{\Lambda} \boldsymbol{\Lambda}}^{\top }$
is unique and
$\boldsymbol{\Lambda}$
is invertible. Based on Lemma A.4 and the property of the multivariate t distribution, we have
where
$\mathbf{Z}\sim \mathcal{N}_2\left(\mathbf{0},{\mathbf{I}}_2\right)$
and
$V\sim {\chi_{\nu}}^2.$
Therefore,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu10.png?pub-status=live)
That is,
${C}_{\rho_{lk},\nu}^t\left({u}_1,{u}_2\right)<{C}_{\rho_{lk},\tilde{\nu}}^t\left({u}_1,{u}_2\right),$
and so
${\partial}_{\nu }{C}_{\rho_{lk},\nu}^t\left({u}_1,{u}_2\right)<0.$
Because l and k are not specific, we have that
${\partial}_{\nu }{C}_{\rho_{lk},\nu}^t\left({u}_l,{u}_k\right)<0$
for any l and k.
We have proved that
$\left({C}_{\rho_{12},\nu}^t,\dots, {C}_{\rho_{\left(K-1\right)K},\nu}^t,U\right)=\left({C}_{{\tilde{\rho}}_{12},\nu}^t,\dots, {C}_{{\tilde{\rho}}_{\left(K-1\right)K},\nu}^t,\tilde{U}\right)$
if two pairs of parameters correspond to the same response probability vector, and this leads to the identification of the polychoric t correlation model.
3 Identifiability and identifiability constraints of PM
In Section 2, we proved the identification of PCM. In comparison with PCM, the more general model PM is not identifiable, for which we now describe. We first present a proposition.
Proposition 3.1. For a polychoric model
${\theta}_{PM}=\left({EC}_K\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right),\mathcal{A}\right),$
suppose the transformation
$g\left(\mathbf{x}\right)=\mathbf{Bx}+\mathbf{c},$
with
$B=\mathit{diag}\left\{{b}_1,\dots, {b}_K\right\},{b}_k>0,\mathbf{c}\in {\mathrm{\mathbb{R}}}^K$
holds. Then
${\theta}_{PM}$
is empirically indistinguishable from
${\theta}_{PM,g},$
where
${\theta}_{PM,g}=\left({EC}_K\left(\tilde{\boldsymbol{\mu}},\tilde{\boldsymbol{\Sigma}},\phi \right),\tilde{\mathcal{A}}\right)$
and
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu14.png?pub-status=live)
Proof. Let
${\tilde{\mathbf{X}}}^{\ast}={\mathbf{BX}}^{\ast }+\mathbf{c}.$
Then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu15.png?pub-status=live)
Corollary 3.2.
PM is not identifiable because
${\theta}_{PM}$
and
${\theta}_{PM,g}$
are empirically indistinguishable, and both of them belong to
${P}_{\theta_{PM}}.$
Thus, some workable identifiability constraints are needed to make PM identifiable. Indeed, the polychoric correlation model is constructed by putting some constraints on the location vector μ and the dispersion matrix Σ of PM. However, these identifiability constraints are not practical in some situations. For instance, when modeling the developmental changes of children, it is not well grounded to assume that the location vector μ is zero (McArdle et al., Reference McArdle, Petway, Hishinuma, Reise and Revicki2015; Muthén, Reference Muthén1984). Therefore, finding some other reasonable identifiability constraints is a task with practical significance.
3.1 The equivalence-classes approach of identifiability (ECAI)
An effective way to find the identifiability constraints was proposed by Tsai (Reference Tsai2000, Reference Tsai2003) in the context of Thurstonian modeling of comparative judgment. Because this approach depends on the mathematical concept of equivalence classes (Dummit & Foote, Reference Dummit and Foote2004), we call it the Equivalence-Classes Approach of Identifiability (ECAI).
Traditionally, the identifiability of models needs to be proved case by case. ECAI provides a different route for derivation. Importantly, Tsai (Reference Tsai2003) applied ECAI and proved that Case III and Case V of the Thurstonian models of comparative judgment are identifiable.
ECAI can help researchers to find a model’s identifiability constraints. Furthermore, it can be utilized to determine whether a (restricted) model is just-identified or over-identified. It utilizes the following two principles to construct the identifiability of the restricted model:
-
(i) For a non-identified general model, find the set of all parameters empirically indistinguishable from the true parameter, i.e., the identified set.
-
(ii) After adding some constraints to the model, if there is at most one element in the predefined set, then the restricted model is identified, and the constraints are identifiability constraints. Moreover, if there is one and only one element in the predefined set, then the restricted model is just-identified; otherwise, the model is over-identified.
We can formulate ECAI as follows.
Proposition 3.3. Under Definition 2.1,
-
(i)
${P}_{\theta_1}={P}_{\theta_2}$ defines an equivalence relation on
$\theta,$ denoted by
${\theta}_1\sim {\theta}_2.$
-
(ii) The identified sets are the equivalence classes defined by the relation. Therefore, the identified set of
${P}_{\theta_0}$ can be denoted as
$\left[\theta \right]$ for all
$\theta \in \Theta$ satisfying
${P}_{\theta}={P}_{\theta_0}.\theta$ is called a representative of the equivalence class. That is,
$\left\{\theta |{P}_{\theta}={P}_{\theta_0}\right\}=\left[{\theta}_0\right]=\left[\theta \right]\; for\;\theta \sim {\theta}_0.$
-
(iii) The set of equivalence classes forms a partition of
$\Theta,$ that is, for any
${\theta}_0\in \Theta,$
${\theta}_0$ belongs to one and only one of the equivalence classes.
-
(iv)
$\mathcal{P}$ is (just-)identifiable if and only if for each identified set
$\left[{\theta}_0\right]$ , there is a unique element,
${\theta}_0$ , in it, i.e.,
$\left[{\theta}_0\right]=\left\{{\theta}_0\right\}.$ Namely,
$\left[{\theta}_0\right]$ is a singleton.
-
(v)
$\mathcal{P}$ is partially identifiable over
${\theta}^{\ast }$ if and only if for each identified set
$\left[{\theta}_0\right]$ , there is a unique element in
$g\left[{\theta}_0\right]=\left\{{\theta}^{\ast }|{\theta}^{\ast}=g\left(\theta \right),\theta \sim {\theta}_0\right\}$ .
Proof. (i) is straightforward by the definition of the equality of functions. (iii) follows by Proposition 2 in 0.1 of Dummit and Foote (Reference Dummit and Foote2004). (iv) and (v) can be proved by (iii), Proposition 2.2 and the definition of identifiability.
Proposition 3.3(iv) shows the sufficient and necessary condition for model identification. A statistical model might not be identifiable unless one adds some identifiability constraints to it. As mentioned above, Thurstonian models are identifiable with the Case III or Case V assumptions (Thurstone, Reference Thurstone1927; Tsai, Reference Tsai2000). Here is another example. The confirmatory factor analysis model assumes that either the variance of the latent factor is any positive constant, or the loading of an indicator is any nonzero constant; otherwise, the model is not identifiable (Kline, Reference Kline2023).
Definition 3.4. (Identifiability constraints) Under Definition 2.1, let
${\psi}_1,\dots, {\psi}_J$
be functions of
$\theta .$
Then
$\mathcal{P}$
is identifiable with constraints
${\psi}_1,\dots, {\psi}_J$
if the model with the parameter space under constraints,
${\Theta}_c=\Theta \cap \left\{\theta |{P}_{\theta}={P}_{\theta_0},{\psi}_1\left(\theta \right)=0,\dots, {\psi}_J\left(\theta \right)=0\right\},$
is identifiable. We call
${\psi}_1,\dots, {\psi}_J$
the identifiability constraints if
$\mathcal{P}$
is identifiable with these constraints but not identifiable without these constraints.
Proposition 3.5. Let
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}=\left\{\theta \in \Theta |{P}_{\theta}={P}_{\theta_0},{\psi}_1\left(\theta \right)=0,\dots, {\psi}_J\left(\theta \right)=0\right\}$
and
${\Theta}_c$
is not empty. The following three statements are equivalent:
-
(i)
$\mathcal{P}$ is identifiable with constraints
${\psi}_1,\dots, {\psi}_J$
-
(ii) For any
${\theta_0}^{\ast}\in {\Theta}_c,{\left[{\theta_0}^{\ast}\right]}_{\psi_1,\dots, {\psi}_J}=\left\{{\theta_0}^{\ast}\right\}.$
-
(iii) For any
${\theta}_0\in \Theta,$
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$ has at most one element.
Proof. (ii) can be easily derived from Proposition 3.3(iv). (iii) implies (ii) naturally. (ii)
$\Rightarrow$
(iii) can be proved through contraposition. If there is a
${\theta}_0\in \Theta$
such that
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$
has more than one elements. Because
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}\subseteq {\Theta}_c$
, then (ii) is violated. Thus (ii)
$\Rightarrow$
(iii).
The distinction between just-identified and over-identified (restricted) models can be recharacterized below using the concept of identifiability constraints.
Definition 3.6. Under Definitions 2.1 and 3.4,
-
(i)
$\mathcal{P}$ is just-identified with constraints
${\psi}_1,\dots, {\psi}_J$ if for any
${\theta}_0\in \Theta,$
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$ has a unique element. Moreover,
${\psi}_1,\dots, {\psi}_J$ are called minimal constraints if removing any of it leads to non-identification.
-
(ii)
$\mathcal{P}$ is over-identified with constraints
${\psi}_1,\dots, {\psi}_J$ if there exists
${\theta}_0\in \Theta,$ such that
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$ is empty.
We take the one-way ANOVA model (Rice, Reference Rice2007) as an example to demonstrate the relation of non-identified, just-identified, and over-identified models. Let K be the number of groups. In this model, the parameters
${\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\in {\mathrm{\mathbb{R}}}^{K+1}$
are not identifiable because both
${\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top }$
and
${\left(\mu +1,{\alpha}_1-1,\dots, {\alpha}_K-1\right)}^{\top }$
belong to
$\left[{\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\right].{\sum}_k{\alpha}_k=0$
is an identifiability constraint because
${\left[{\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\right]}_{{\sum \limits_k{\alpha}_k}=0}=\left\{{\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\right\}.$
Also, “
${\sum}_k{\alpha}_k=0$
and
${\alpha}_1=0$
” are identification constraints because a subset of the constraints (
${\sum}_k{\alpha}_k=0$
) can identify the model. Moreover,
${\sum}_k{\alpha}_k=0$
is a minimal constraint because for any
${\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\in {\mathrm{\mathbb{R}}}^{K+1},$
there exists a unique element
${\left(\mu, {\alpha}_1-{\sum}_k{\alpha}_k,\dots, {\alpha}_K-{\sum}_k{\alpha}_k\right)}^{\top }$
belonging to
$\left[{\left(\mu, {\alpha}_1,\dots, {\alpha}_K\right)}^{\top}\right]$
that removing
${\sum}_k{\alpha}_k=0$
leads to non-identification. However, “
${\sum}_k{\alpha}_k=0$
and
${\alpha}_1=0$
” produce an over-identified model because there is a vector
${\left(\mu, 0,1,\dots, 1\right)}^{\top }$
such that
${\left[{\left(\mu, 0,1,\dots, 1\right)}^{\top}\right]}_{\sum_k{\alpha}_k=0,{\alpha}_1=0}=\phi .$
That is, there is no parameter which is empirically equivelent to
${\left(\mu, 0,\dots, 1\right)}^{\top }$
and simultaneously satisfies both
${\sum}_k{\alpha}_k=0$
and
${\alpha}_1=0.$
Another characterization of just-identification is the existence of a one-to-one correspondence between the identified parameters and the parameters with identification constraints (see Chang et al., Reference Chang, Hsu and Tsai2017). It turns out this characterization is equivalent to the definition of just-identification:
Proposition 3.7. Under Definitions 2.1, 3.4, and 3.6., let the model with
${\psi}_1,\dots, {\psi}_J$
constraints be just-identified with the parameter space under constraints
${\Theta}_c.$
Then the model with
${\phi}_1,\dots, {\phi}_L$
constraints is just-identified with the parameter space under constraints
${\Theta}_{\tilde{c}}$
if and only if there is a one-to-one correspondence (of empirically indistinguishable models) between
${\Theta}_c$
and
${\Theta}_{\tilde{c}}.$
Proof.
$\left(\supseteq \right)$
Because the model with
${\psi}_1,\dots, {\psi}_J$
constraints is just-identified, for any
${\theta}_0\in \Theta,$
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$
has a unique element. Then
${\left[{\theta}_0\right]}_{\phi_1,\dots, {\phi}_L}$
is also not empty based on the one-to-one correspondence. It indicates that the model with
${\phi}_1,\dots, {\phi}_L$
constraints is just-identified.
$\left(\subseteq \right)$
Based on just-identification of the two models, both
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$
and
${\left[{\theta}_0\right]}_{\phi_1,\dots, {\phi}_L}$
are singletons mapped by
${\theta}_0.$
Thus, we can construct the correspondence between
${\Theta}_c$
and
${\Theta}_{\tilde{c}}$
based on these mappings. This completes the proof.
The following theorem links ECAI to identifiability constraints.
Definition 3.8.
${\mathcal{P}}_{A,\theta}=\left\{{P}_{\theta }:\theta \in {\Theta}_A\right\}$
and
${\mathcal{P}}_{B,\theta}=\left\{{P}_{\theta }:\theta \in {\Theta}_B\right\}$
are two models where
${\Theta}_B\subseteq {\Theta}_A.$
Then
${\mathcal{P}}_{A,\theta }$
is a general model of
${\mathcal{P}}_{B,\theta }$
and
${\mathcal{P}}_{B,\theta }$
is a restricted model of
${\mathcal{P}}_{A,\theta },$
denoted by
${\mathcal{P}}_{B,\theta}\subseteq {\mathcal{P}}_{A,\theta }.$
Theorem 3.9. Let
${\psi}_1,\dots, {\psi}_J$
be functions of
$\theta .$
${\Theta}_B=\left\{\theta \in {\Theta}_A|{\psi}_1\left(\theta \right)=0,\dots, {\psi}_J\left(\theta \right)=0\right\}.$
Thus
${\mathcal{P}}_{B,\theta}\subseteq {\mathcal{P}}_{A,\theta }.$
Then
-
(i) If
${\mathcal{P}}_{A,\theta }$ is identifiable, then
${\mathcal{P}}_{B,\theta }$ is identifiable.
-
(ii) If
${\mathcal{P}}_{B,\theta }$ is identifiable, then
${\psi}_1,\dots, {\psi}_J$ are identifiability constraints of
${\mathcal{P}}_{A,\theta }.$ In particular, If
${\mathcal{P}}_{B,\theta }$ is just-identified and removing any of
${\psi}_1,\dots, {\psi}_J$ leads to non-identification., then
${\psi}_1,\dots, {\psi}_J$ are minimal constraints.
-
(iii) (ECAI)
${\mathcal{P}}_{A,\theta }$ is not identifiable.
${\theta_0}^{\ast}\in {\Theta}_B.$ Under
${\mathcal{P}}_{A,\theta }$ , if
${\left[{\theta_0}^{\ast}\right]}_{\psi_1,\dots, {\psi}_J}$ is a singleton or empty, then
${\mathcal{P}}_{B,\theta }$ is identifiable. Moreover,
${\psi}_1,\dots, {\psi}_J$ are identifiability constraints of
${\mathcal{P}}_{A,\theta }.$ Further, if
${\left[{\theta}_0\right]}_{\psi_1,\dots, {\psi}_J}$ is a singleton for any
${\theta}_0\in {\Theta}_A,$ then
${\mathcal{P}}_{B,\theta }$ is just-identified. Otherwise
${\mathcal{P}}_{B,\theta }$ is over-identified.
Another related issue is that the equivalent class of a non-identifiable model consists of all permissible transformations of a model, which can determine the scale of measurement (Luce et al., Reference Luce, Krantz, Suppes and Tversky1990, Chapter 20; Stevens, Reference Stevens1946). The scale of a measurement can be identified as follows.
Definition 3.10. Let
${\mathcal{P}}_{\theta}=\left\{{P}_{\theta }:\theta \in \Theta \right\}$
be a model and X be a random variable
$X\sim {P}_{\theta }.$
Define
$G=\left\{g:g(X) \sim P_{\theta^{\ast}}, \exists \theta^{\ast} \in {\Theta}\right\}.$
-
(i) X is a nominal scale if
$G$ consists of all injective functions.
-
(ii) X is an ordinal scale if
$G$ consists of all monotone increasing functions.
-
(iii) X is an interval scale if
$G=\left\{g(x):g(x)= bx+c,b>0\right\}.$
-
(iv) X is a ratio scale if
$G=\left\{g(x):g(x)= bx,b>0\right\}.$
-
(v) X is an absolute scale if
$G=\left\{ id\right\}.$
3.2 Identified sets of PM
Theorem 3.11 finds the equivalent class for PM. The theorem states that all empirically equivalent PMs can be linearly transformed into each other. So, the equivalent class of PM is the set of all possible linearly transformed PMs of it, and the latent vectors corresponding to PM are interval scales.
Theorem 3.11.
Given the conditions in Theorem 2.5, consider PM
${\theta}_{PM}=\left({EC}_K\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right),\mathcal{A}\right).$
Also,
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu16.png?pub-status=live)
Then all entries of
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$
are interval scales, and the equivalent class of PM consists of models
${\theta}_{PM, EC,g}=\left({EC}_K\left(\tilde{\boldsymbol{\mu}},\tilde{\boldsymbol{\Sigma}},\phi \right),\tilde{\mathcal{A}}\right)$
satisfying
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu17.png?pub-status=live)
Proof.
$\left(\Leftarrow \right)$
is straightforward by Proposition 3.1.
$\left(\Rightarrow \right)$
Consider two PMs,
${\theta}_{PM}=\left({EC}_K\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right),\mathcal{A}\right)$
and
${\theta^{\ast}}_{PM}=\left({EC}_{K}\left({\boldsymbol{\mu}}^{\ast },{\boldsymbol{\Sigma}}^{\ast },\phi\right),{\mathcal{A}}^{\ast} \right),$
where
${\theta}_{PM}\sim {\theta^{\ast}}_{PM}.$
We claim that
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu18.png?pub-status=live)
that is, there is a function
$\overline{g}$
such that
${\theta}_{PM,\overline{g}}={\theta^{\ast}}_{PM},$
where
$\overline{g}\left(\mathbf{x}\right)=\overline{\mathbf{B}}\mathbf{x}+\overline{\mathbf{c}},$
Define
$g\left(\mathbf{x}\right)=\mathbf{Bx}+\mathbf{c}$
where
$B=\operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}$
and
$\boldsymbol{c}=-\operatorname{diag}{\left(\boldsymbol{\Sigma} \right)}^{-\frac{1}{2}}\boldsymbol{\mu},$
and
${g}^{\ast}\left(\mathbf{x}\right)={\mathbf{B}}^{\ast}\mathbf{x}+{\mathbf{c}}^{\ast}$
where
${\mathbf{B}}^{\ast}=\operatorname{diag}{\left({\boldsymbol{\Sigma}}^{\ast}\right)}^{-\frac{1}{2}}$
and
${\mathbf{c}}^{\ast}=-\operatorname{diag}{\left({\boldsymbol{\Sigma}}^{\ast}\right)}^{-\frac{1}{2}}{\boldsymbol{\mu}}^{\ast }.$
Based on Proposition 3.1, we have
${\theta}_{PM,g}\sim {\theta}_{PM}\sim {\theta^{\ast}}_{PM}\sim {\theta^{\ast}}_{PM,{g}^{\ast }}.$
Both
${\theta}_{PM,g}$
and
${\theta^{\ast}}_{PM,\overline{g}}$
are PCM. Because of the identification of PCM (i.e., Theorem 2.5 and Theorem 2.14), there cannot be two different PCMs that are empirically indistinguishable. Therefore
${\theta}_{PM,g}={\theta^{\ast}}_{PM,{g}^{\ast }},$
and so
${\theta}_{PM,{\left({g}^{\ast}\right)}^{-1}\circ g}={\theta^{\ast}}_{PM},$
i.e.,
$\overline{g}={\left({g}^{\ast}\right)}^{-1}\circ g.$
By applying Proposition 3.7, Theorem
3.9
, and Theorem 3.11, we show in the following that the constraints of
${\theta}_{PCM}$
are minimal constraints. We then construct the criteria for the minimal constraints of the identifiability constraint in PM.
Corollary 3.12. For PM the constraints that
$\boldsymbol{\mu} =\mathbf{c}$
and
$\mathit{diag}\left(\boldsymbol{\Sigma} \right)=\mathbf{b}$
(where
$\mathbf{b}\in {{\mathrm{\mathbb{R_{+}}}}^K}$
with
$\mathbf{c}\in {\mathrm{\mathbb{R}}}^K,$
) are the minimal constraints. In particular, for PCM, the constraints that
$\boldsymbol{\mu} =\mathbf{0}$
and
$\mathit{diag}\left(\boldsymbol{\Sigma} \right)={\mathbf{1}}_{\mathbf{K}}$
are the minimal constraints.
Proof. Equation (
2) shows that
$\left[{\theta}_{PM}\right]$
will not be a singleton when any of the two constraints
$\boldsymbol{\unicode{x3bc}} =\mathbf{c}$
and
$\mathit{diag}\left(\boldsymbol{\Sigma} \right)=\mathbf{b}$
is removed. Therefore
$\boldsymbol{\mu} =\mathbf{c}$
and
$\mathit{diag}\left(\boldsymbol{\Sigma} \right)=\mathbf{b}$
consist of the minimal constraints. The particular scenario concerning the minimal constraints of PCM is also supported by the application of Theorem 3.9(ii).
Corollary 3.13. Under PM, let
${\theta}_{PM}=\left({EC}_K\left(\boldsymbol{\mu}, \boldsymbol{\Sigma}, \phi \right),\mathcal{A}\right).$
The following three statements are equivalent:
-
(i)
${\psi}_1,\dots, {\psi}_J$ are the minimal constraints.
-
(ii) There is a bijection from
$\left\{{\theta}_{PCM}\right\}$ to
$\left\{{\theta}_{PM}|{\psi}_1\left({\theta}_{PM}\right)=\cdots ={\psi}_J\left({\theta}_{PM}\right)=0\right\},$ and removing any of
${\psi}_1,\dots, {\psi}_J$ leads to non-identification.
-
(iii)
${\psi}_1,\dots, {\psi}_J$ are identifiable constraints, there is a surjection (of empirically indistinguishable models) from
$\left\{{\theta}_{PCM}\right\}$ to
$\left\{{\theta}_{PM}|{\psi}_1\left({\theta}_{PM}\right)=\cdots ={\psi}_J\left({\theta}_{PM}\right)=0\right\},$ and removing any of
${\psi}_1,\dots, {\psi}_J$ leads to non-identification.
Proof.
$(i)\iff (ii)$
By Proposition 3.7 and Corollary 3.12,
${\psi}_1,\dots, {\psi}_J$
are the minimal constraints if and only if there is an injection from
$\left\{{\theta}_{PCM}\right\}$
to
$\left\{{\theta}_{PM}|{\psi}_1\left({\theta}_{PM}\right)=\cdots ={\psi}_J\left({\theta}_{PM}\right)=0\right\}.$
$(ii)\iff \left( ii i\right)$
The identification can guarantee the injection. The reverse is also true.
In the following, we demonstrate two applications of the ECAI approach along with our theorems concerning the identified sets of PM. First, we will use ECAI to find the identifiability constraints of PM on Likert scale (LS) and on comparative judgment (CJ) items. Second, using the ordinal SEM and item factor analysis as an example, we will illustrate the use of ECAI in establishing the identifiability of (restricted) models of PM.
3.3 Application 1: Identifiability constraints of PM on Likert scales and on comparative judgment
As mentioned earlier, PM subsumes several commonly used psychometric models in analyzing LS and CJ items. For LS, the graded response model (Samejima, Reference Samejima1968; Reference Samejima, van der Linden and Hambleton1997) and item factor analysis models (Wirth & Edwards, Reference Wirth and Edwards2007) are restricted PMs. For CJ, Thurstonian models (Thurstone, Reference Thurstone1927) and their advances (Bockenholt & Tsai, Reference Bockenholt and Tsai2001; Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2013) are also restricted models of PM. The following theorem shows some identifiability constraints of PM on LS and CJ items. For convenience, CJ with r-point ordinal preference responses is called r-point CJ (Agresti, Reference Agresti1992; Brown & Maydeu-Olivares, Reference Brown and Maydeu-Olivares2018), whereas LS with r-point ordinal responses is called r-point LS.
Theorem 3.14.
-
(i) Consider K items in a Q-point LS (Q > 2). If the first cut-offs of items are set to zero, and the final cut-offs are set to one (
${a}_1^{(k)}=0,{a}_{Q-1}^{(k)}=1$ ), then these constraints are minimal, and the PM is just-identified. Moreover, all entries of
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$ are absolute scales. We call the above constraints the “global scale constraints (GSC).”
-
(ii) Consider K items in a (2H)-point LS/CJ (H ≥ 1). If the middle cut-offs of LS/CJ items are set to zero (
${a}_{H+1}^{(k)}=0$ ), then
$\left[{\theta}_{PM}\right]=\left\{{\theta}_{PM,g}:g\left(\mathbf{x}\right)=\mathbf{Bx},\mathbf{B}=\mathit{diag}({b}_1,\dots, {b}_K), {b}_k>0\right\}.$ So all entries of
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$ are ratio scales. Moreover, if
$\operatorname{diag}\left(\boldsymbol{\Sigma} \right)={\mathbf{1}}_{\mathbf{K}},$ namely,
$\boldsymbol{\Sigma} =\mathbf{P},$ then these constraints are minimal and the PM is just-identified. We call
$\operatorname{diag}\left(\boldsymbol{\Sigma} \right)={\mathbf{1}}_{\mathbf{K}}$ the “unit variance constraints (UVC)” because it assumes that the variances of scales in X1* to Xk* are one.
-
(iii) Consider K items in a (2H + 1)-point LS/CJ items (H ≥ 1). If the middle cut-offs of LS/CJ items are symmetric around 0 (
${a}_H^{(k)}+{a}_{H+1}^{(k)}=0$ ), then
$\left[{\theta}_{PM}\right]=\{{\theta}_{PM,g}:g\left(\mathbf{x}\right)=\mathbf{Bx},\mathbf{B}=\mathit{diag} ({b}_1,\dots, {b}_K ), {b}_k>0\}.$ So all entries of
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$ are ratio scales. Moreover, if
$\operatorname{diag}\left(\boldsymbol{\Sigma} \right)={\mathbf{1}}_{\mathbf{K}},$ namely,
$\boldsymbol{\Sigma} =\mathbf{P},$ then these constraints are minimal and the PM is just-identified.
-
(iv) Consider K items in a Q-point LS/CJ (Q > 2). If the cut-offs of extreme preferences are set to be − 1 and 1 (i.e.,
${a}_1^{(k)}=-1,{a}_{Q-1}^{(k)}=1$ ), then these constraints are minimal and the PM is just-identified. Moreover, all entries of
${\mathbf{X}}^{\ast}={\left({X_1}^{\ast },\dots, {X_K}^{\ast}\right)}^{\top }$ are absolute scales. We call the above constraints the “extremity constraints (EC).”
Proof. Because
$\left[{\theta}_{PM}\right]=\left\{{\theta}_{PM,g}:g\left(\mathbf{x}\right)=\mathbf{Bx}+\mathbf{c},\mathbf{B}=\mathit{diag}({b}_1,\dots, {b}_K),{b}_k>0,\mathbf{c}\in {\mathrm{\mathbb{R}}}^K\right\},$
we only need to consider whether
${b}_k=1$
and
${c}_k=0$
for all k. If both hold, then any PM-based model with these constraints is identifiable. Moreover, by Corollary 3.13, if there is a surjection from PCM to PM with the aforementioned constraints, then the PM is just-identified.
-
(i) For any k,
${a}_1^{(k)}=0,$
${a}_{Q-1}^{(k)}=1,$ so
$\left\{\begin{array}{l}{b}_k{a}_1^{(k)}+{c}_k={c}_k=0\\ {}{b}_k{a}_{Q-1}^{(k)}+{c}_k={b}_k+{c}_k=1\end{array}\right.\Rightarrow \left\{\begin{array}{l}{b}_k=1\\ {}{c}_k=0.\end{array}\right.$
Therefore
$g(x)=x.$
For any PCM, we can construct a surjection by rescaling
${a}_1^{(k)}$
to
$0$
and
${a}_{Q-1}^{(k)}$
to 1 for all items. Therefore, the model is just-identified. These constraints are minimal because they are linearly independent.
-
(ii) For any k,
${a}_{H+1}^{(k)}=0,$ and
${b}_k{a}_{H+1}^{(k)}+{c}_k=0.$ So
${c}_k=0$ for all k. We now prove the identification under the UVC. Let
${\theta}_{PM, EC,g}$ be empirically indistinguishable from
${\theta}_{PM, EC}$ for some
$g\left(\mathbf{x}\right)=\mathbf{Bx}+\mathbf{c}.$ Because
${\sigma_k}^2 = {b_k}^2{{\tilde{\sigma}}_k}^2$ and
$\operatorname{diag}\left(\boldsymbol{\Sigma} \right)=\operatorname{diag}\left(\tilde{\boldsymbol{\Sigma}}\right)={\mathbf{1}}_{\mathbf{K}},$ we have
$\mathbf{B}={\mathbf{I}}_k.$ For any PCM, we can construct a surjection by shifting
${a}_{H+1}^{(k)}$ to
$0$ for all items. Therefore, the model is just-identified. These constraints are also minimal because they are linearly independent. (iii) can be proved in a similar way.
(iv) For any k,
${a}_1^{(k)}=-1,{a}_{Q-1}^{(k)}=1$
so
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu19.png?pub-status=live)
Therefore
$g\left(\mathbf{x}\right)=\mathbf{x}.$
For any PCM, we can construct a surjection by rescaling
${a}_1^{(k)}$
to −1 and
${a}_{Q-1}^{(k)}$
to 1 for all items. Therefore, the model is just-identified.
Theorem 3.14(i) and (iv) show that PM is identifiable with GSC or EC for LS items. Indeed, GSC or EC would be appropriate for LS with different context labels. For instance, Casper et al. (Reference Casper, Edwards, Wallace, Landis and Fife2020) enumerated several types of labels, such as agreement, similarity, and frequency. Within this context, EC might be better than GSC for the agreement anchor because agreement and disagreement are two extreme attitudes. For example, for a 7-point LS from 1 (strongly disagree) to 7 (strongly agree), we could set the bound of responding “strongly disagree” to −1 and “strongly agree” to 1. In contrast, for the similarity and frequency anchor, GSC might be better than EC, because GSC is better at representing the context of similarity or frequency. For example, for a 7-point LS from 1 (not at all like me) to 7 (extremely like me), we could set the cut-offs of “not at all like me” to 0 and “extremely like me” to 1.
Under GSC, because the zero and unit are prespecified, the scale is an absolute scale, i.e., a scale with an absolute zero and absolute unit (Zwislocki & Goodman, Reference Zwislocki and Goodman1980). The absolute scale is a type of measurement scale that extends Stevens’ (Reference Stevens1946) classification (Luce et al., Reference Luce, Krantz, Suppes and Tversky1990). An absolute scale might be helpful in practice because all calculations are permissible for the absolute scale.
Moreover, Theorem 3.14(ii) and (iii) show that PM is identifiable with UVC or EC for CJ items. Under the Thurstonian modeling framework,
$\boldsymbol{\unicode{x3bc}}$
is the differences of the means of latent processes. In order to overcome the identifiability problem, researchers have made assumptions such as Case III or Case V to identify the model parameters (Thurstone, Reference Thurstone1927; Tsai, Reference Tsai2000). Here we demonstrate that other constraints, such as UVC and EC, can help researchers to identify
$\boldsymbol{\unicode{x3bc}} .$
(Note that the latent scales under PCM and UVC share a common unit but have different origins. The relation between these two is similar to a standardized scale
$z=\left(x-\mu \right)/\sigma$
and a scale normalized by the standard deviation:
$y=x/\sigma .$
).
We mentioned earlier that Case V implies the covariance matrix underlying CJ items
$\boldsymbol{\Sigma} =c\mathbf{I}$
for some c, where
$\boldsymbol{\Sigma}$
is the covariance matrix of latent differences (Thurstone, Reference Thurstone1927; Tsai, Reference Tsai2000). Let the variance of the discriminal process be 1/2, then
$\boldsymbol{\Sigma} =\mathbf{I},$
indicating that UVC (
$\mathit{diag}\left(\boldsymbol{\Sigma} \right)={\mathbf{1}}_K$
) is a general model of Case V. The identification of UVC implies that the independence assumption in Case V is unnecessary for identifying the covariance matrix underlying CJ items. Assuming UVC rather than Case V can avoid the risk of making wrong uncorrelated assumptions. In practice, UVC can be applied to multidimensional scaling (MDS). Recall that MDS utilizes
$\boldsymbol{\unicode{x3bc}}$
to infer multidimensional relative positions among different alternatives, for which Case V is typically assumed (Torgerson, Reference Torgerson1952). UVC can identify
$\boldsymbol{\unicode{x3bc}}$
without assuming that the latent variables are uncorrelated.
3.4 Application 2: Identifiability of restricted models of PM
PM subsumes several commonly used psychometric models. We can use this hierarchy to prove the identifiability of some commonly used psychometric models. We first prove a lemma.
Lemma 3.15. Let
$\theta \left(\omega \right)={EC}_K\left(\boldsymbol{\mu} \left(\omega \right),\boldsymbol{\Sigma} \left(\omega \right),\phi \right)$
be an elliptical distribution parametrized by a real vector
$\omega \in \Omega$
, then the empirically indistinguishable relation defines the family of equivalence classes
$\left[\theta \left(\omega \right)\right]=\left\{\theta \left({\omega}^{\ast}\right):\boldsymbol{\mu} \left(\omega \right)=\boldsymbol{\mu} \left({\omega}^{\ast}\right),\boldsymbol{\Sigma} \left(\omega \right)=\boldsymbol{\Sigma} \left({\omega}^{\ast}\right),{\omega}^{\ast}\in \Omega \right\}.$
Consequently, the equivalence class of the PM with parameter
${\theta}_{PM}\left({\omega}\right)=\left({EC}_K\left(\boldsymbol{\mu} \left({\omega}\right),\boldsymbol{\Sigma} \left({\omega}\right),\phi \right),\mathcal{A}\right)$
is
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu20.png?pub-status=live)
Proof. Based on Theorem 3.11,
$\left[{\theta}_{PM}\left({\omega}\right)\right]=\underset{{\omega}^{\ast} \in \theta^{-1} \left[\theta \left({\omega}\right)\right]}{\cup}\{{\theta}_{PM,g}\left({\omega}^{\ast} \right):g(\mathbf{x})=\mathbf{Bx}+\mathbf{c},\mathbf{B}=\mathit{diag}({b}_1,\dots, {b}_K), {b}_k>0, \mathbf{c}\in {\mathrm{\mathbb{R}}}^K\}.$
The union of equivalence relations satisfies reflexivity and symmetry natually. Also, the equivalence of probability structure satisfies transitivity. Thus, the union froms another equivalence class.
Based on the above lemma, we can easily derive that If
$\theta \left(\omega \right)={EC}_K\left(\boldsymbol{\mu} \left(\omega \right),\boldsymbol{\Sigma} \left(\omega \right),\phi \right)$
is just-identified/identifiable, then PM with parameter
${\theta}_{PM}\left(\omega \right)=\left({EC}_K\left(\boldsymbol{\mu} \left(\omega \right),\boldsymbol{\Sigma} \left(\omega \right),\phi \right),\mathcal{A}\right)$
is an interval scale. An application is stated in the following theorem, which presents necessary and sufficient conditions for the identifiability of the ordinal SEM. Notably, the item factor analysis (FA) model, also named as the confirmatory multidimensional item response theory (IRT), can be granted as a special case of ordinal SEM (Asparouhov & Muthén, Reference Asparouhov and Muthén2020; Reckase, Reference Reckase2009; SAS Institute Inc., 2024; Takane & de Leeuw, Reference Takane and de Leeuw1987; Wirth & Edwards, Reference Wirth and Edwards2007).
Corollary 3.16. An SEM model,
$\theta \left(\omega \right)={EC}_K\left(\boldsymbol{\mu} \left(\omega \right),\boldsymbol{\Sigma} \left(\omega \right),\phi \right)$
, is identifiable if and only if the corresponding ordinal SEM model is an interval scale.
Proof.
$\left(\Rightarrow \right)$
Based on the identification of the SEM model, there is only one element,
${\omega}$
, in
$\theta^{-1} \left[\theta \left({\omega}\right)\right]$
. Thus Equation (3) reduces to
$\left[{\theta}_{PM}\left({\omega}\right)\right]=\{{\theta}_{PM,g}\left({\omega}^{\ast} \right):g(\mathbf{x})=\mathbf{Bx}+\mathbf{c},\mathbf{B}=\mathit{diag}({b}_1,\dots, {b}_K),{b}_k>0, \mathbf{c}\in {\mathrm{\mathbb{R}}}^K\}$
, which is the definition of the interval scale. Reversely, if the ordinal SEM model is not an interval scale, then
$\theta^{-1} \left[\theta \left({\omega}\right)\right]$
has more than one elelments. Thus, the SEM model is not identifiable.
Theorem 3.17. (Sufficient and necessary condition for (just-)identifiability of ordinal SEM) The ordinal SEM model,
${\theta}_{PM}\left(\omega \right)=\left({EC}_K\left(\boldsymbol{\mu} \left(\omega \right),\boldsymbol{\Sigma} \left(\omega \right),\phi \right),\mathcal{A}\right),$
with constraints
${\psi}_1,\dots, {\psi}_J,$
is identifiable if and only if the SEM model is identifiable and
${\left[{\theta}_{PM}\left(\omega \right)\right]}_{\psi_1,\dots, {\psi}_J}$
admits both
$\mathbf{B}={\mathbf{I}}_k$
and
$\mathbf{c}=0$
in Equation (3). Moreover, it is just-identifiable if and only if the SEM model is just-identifiable and for any
${\omega}$
such that
$\theta \left({\omega}\right)={EC}_K\left(\mathbf{0},\mathbf{P},\phi \right)$
there is one and only one corresponding
${\left[{\theta}_{PM}\left({\omega}\right)\right]}_{\psi_1,\dots, {\psi}_J}.$
In addition,
${\psi}_1,\dots, {\psi}_J$
are minimal constraints if removing any of
${\psi}_1,\dots, {\psi}_J$
leads to non-identification.
Proof.
$\left(\Rightarrow \right)$
Using an argument similar to that of Corollary 3.16, the ordinal SEM model with constraints admits a partition finer than that of an interval scale, and so the SEM model is identifiable. Moreover, we have
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu21.png?pub-status=live)
Now that the ordinal SEM model is identifiable. It implies
$g\left(\mathbf{x}\right)=\mathbf{x},$
which is equivalent to
$\mathbf{B}={\mathbf{I}}_k$
and
$\mathbf{c}=\mathbf{0}.$
$\left(\Leftarrow \right)$
Because the SEM model is identifiable, and
${\psi}_1,\dots, {\psi}_J$
admits both
$\mathbf{B}={\mathbf{I}}_k$
and
$\mathbf{c}=\mathbf{0}$
, the equivalence class
${\left[{\theta}_{PM}\left({\omega}\right)\right]}_{\psi_1,\dots, {\psi}_J}$
will not have more than one element. Therefore, the ordinal SEM model is identified.
The remaining parts can be easily proved through Corollary 3.13 .
Corollary 3.18. (Sufficient and necessary condition for (just-)identifiability of item factor analysis) An item factor analysis model (setting all of the error variances to be one) is identifiable/just-identifiable if and only if the corresponding factor analysis model (through correlation matrix) is identifiable/just-identifiable.
Proof. Consider an m-dimensional item factor analysis with discriminant parameter vectors
${\mathbf{a}}^{(k)}\in {\mathrm{\mathbb{R}}}^m$
and threshold vectors
${\mathbf{b}}^{(k)}\in {\mathrm{\mathbb{R}}}^m$
for item k, k = 1,…,K. Based on the formulas (6) and (13) in Takane and de Leeuw (Reference Takane and de Leeuw1987), the model can be expressed as
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu22.png?pub-status=live)
where
$\mathcal{A}\left({\mathbf{b}}^{(1)},\dots, {\mathbf{b}}^{(K)}\right)=\left\{-{\mathbf{b}}^{(k)}\right\}_{k=1,\dots, K}$
and
$\boldsymbol{\Sigma} \left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right)=\left({\mathbf{a}}^{\left(k\right)\top }{\mathbf{a}}^{\left(j\right)}\right)+{\mathbf{I}}_K.$
We now check the conditions in Theorem 3.17. Because there is a one–one correspondence (through standardization and its inverse transformation) between
$\boldsymbol{\Sigma} \left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right)=\left({\mathbf{a}}^{\left(k\right)\top }{\mathbf{a}}^{\left(j\right)}\right)+{\mathbf{I}}_K$
and the FA model through correlation matrix,
$\mathbf{P}\left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right)=\left({\mathbf{a}}^{\left(k\right)\top }{\mathbf{a}}^{\left(j\right)}\right)+\mathit{diag}({q}_1,{q}_2,\dots, {q}_K)$
, we can reparametrize the model as a PCM. Thus, it is identifiable/just-identifiable when the corresponding factor analysis model (through correlation matrix) is identifiable/just-identifiable.
Remark 3.19. Based on the formulas (6) and (13) in Takane and de Leeuw (Reference Takane and de Leeuw1987), the model can be parameterized as an IRT or as an FA. The IRT parametrization corresponds to Equation ( 4), and the FA parametrization corresponds to the following equation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu23.png?pub-status=live)
where
$\mathcal{A}\left({\mathbf{b}}^{(1)},\dots, {\mathbf{b}}^{(K)}\right)=\left\{-{\mathbf{b}}^{(k)}\right\}_{k=1,\dots, K}$
and
$\boldsymbol{\Sigma} \left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right)=\left({\mathbf{a}}^{\left(k\right)\top }{\mathbf{a}}^{\left(j\right)}\right)+\mathit{diag} ({\psi}_1,{\psi}_2,\dots, {\psi}_K),$
where
${\psi}_k$
is the error variance corresponding to item k. Note that the FA parametrization also corresponds to the theta parametrization in MPLUS (Asparouhov & Muthén, Reference Asparouhov and Muthén2020). Because
$\boldsymbol{\Sigma} \left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right)$
corresponds to more than one
$\mathbf{P}\left({\mathbf{a}}^{(1)},\dots, {\mathbf{a}}^{(K)}\right),$
the FA parametrization is not identifiable unless one sets
${\psi}_1={\psi}_2=\cdots ={\psi}_K=1$
(Takane & de Leeuw, Reference Takane and de Leeuw1987).
4 General Discussion
Two unsolved questions concerning the identifiability of PM or PCM have been raised: (a) Are PCM and/or PM with latent elliptical distributions identifiable? (b) If any of them is not identifiable, can we find the minimal identifiability constraints? For (a), we proved the just-identifiability of PCM and non-identifiability of PM by generalizing Almeida and Mouchart’s (Reference Almeida and Mouchart2003a) argument. In particular, we proved the just-identification of the polychoric t correlation model based on the copula representation. For (b), we found the sets of identifiability constraints of PM using ECAI of Tsai (Reference Tsai2000, Reference Tsai2003). Our results showed that PM of LS with GSC or EC is just-identified, and that PM of CJ with UVC or EC is just-identified. Moreover, all of GSC, EC, and UVC are minimal constraints for identification. We also showed that for CJ, the latent differences underlying the Thurstonian models are ratio scales.
Identifiability is the premise of constructing consistent estimators. While researchers have advanced in proving the identifiability of several commonly used psychometric models (Fariña et al., Reference Fariña, González and San Martín2019; Gu & Xu, Reference Gu and Xu2019; Ouyang & Xu, Reference Ouyang and Xu2022), the issue of non-identified models has not been fully addressed. ECAI provides an alternative by focusing on model hierarchy and can help researchers find identifiability constraints of non-identified general models. In particular, we showed in Theorem 3.9 (i) that if the general model is identifiable, then the restricted model is identifiable. This theorem justifies the common practice that if the most general model among a set of model hierarchies is identifiable, then there is no need to prove the restricted models in the model hierarchy. For example, if a confirmatory factor analysis model is identifiable, then the model with more constraints is also identifiable.
The scales under PM with GSC or EC have more merits than PCM. Linking the results from two LS attitude tests might be more manageable. If the labels of the extremities of the anchors in the two LS tests are the same, then the two constructed polychoric models will have the same unit. We may assume that people have a consensus on understanding what the anchors mean. For example, the corresponding PM may have the same unit if two tests with the same LS anchor range from 1 (strongly disagree) to 7 (strongly agree). Alternatively, if two LSs have different anchors or different numbers of points, then these two scales might have different scale origins and units. Future studies should investigate how to equate two LS scales with different anchors or different numbers of points.
Traditionally, most researchers adopt PCM rather than PM to model LS. For researchers interested in modeling changes in longitudinal data, however, PCM does not meet the need to identify mean differences among different ages. Instead, researchers usually adopted PM with some scalar invariance assumptions to identify the mean differences (McArdle et al., Reference McArdle, Petway, Hishinuma, Reise and Revicki2015; Muthén, Reference Muthén1984). If longitudinal studies can adopt GSC, then the mean differences among different ages can be identified without further assumptions.
Identifiability has been an issue for Thurstonian models. Theorem 3.14(iv) provides a novel approach to the identifiability problem. If one adopts CJ items with more than two points, then PM is identifiable with EC. The intuition behind EC is that we define −1 and 1 as the points when a participant shows the strongest preference for one over another. Because EC can only apply to CJ with more than two points, we suggest researchers adopt CJ with more than two points for identifiability purposes. Similar to the LS case, if two CJ items have different numbers of points, they have two different units. Future studies should also investigate how to equate these two scales constructed by CJ items with different numbers of points.
Moreover, we have proved the necessary and sufficient conditions for the ordinal SEM model to be just-identifiable/identifiable. Specifically, we demonstrated that when the FA based on the correlation matrix is identifiable, the item factor analysis model within an IRT parametrization is also identifiable. However, the model within the FA parametrization remains unidentifiable unless all error variances are fixed at one. This result is consistent with the established literature about item factor analysis (Takane & de Leeuw, Reference Takane and de Leeuw1987; Wirth & Edwards, Reference Wirth and Edwards2007). These theorems might help investigators to determine the identification of more advanced ordinal SEM models. Also, we note that these theorems release the normality assumption of ordinal SEM, providing more flexibility for statistical modeling.
We have focused on the identifiability issue of PCM and PM with latent elliptical distributions. Distributions not belonging to the family of elliptical distributions, such as the skewed normal distribution (Jin & Yang-Wallentin, Reference Jin and Yang-Wallentin2017), need further investigation.
We mention that the approach we proposed in this article is not confined to the identifiability issue of PM and/or PCM models. In Figure 1, we provide a flowchart for establishing model identifiability and finding identifiability constraints, in which each process is annotated with the corresponding theorem(s) and example(s) in this article. The steps of this principled approach are as follows. (1) Select a parametric model. If it proves to be identifiable, then it is deemed just-identified. If, however, the model is not identifiable, we search for identifiability constraints through ECAI. (2) Upon applying some constraints, if each identified set contains at most one element, then the model is identified. If, on the other hand, at least one set is empty, then the model is over-identified, which implies a probable need to relax some constraints. (3) If each set contains a unique element and removing any constraint makes the model non-identifiable, then the model is just identified and these constraints are minimal constraints.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_fig1.png?pub-status=live)
Figure 1 A flowchart for establishing model identifiability and finding identifiability constraints. Note: In the flowchart, each process is annotated with the corresponding theorem(s) and example(s) in this article.
In this article, the concept of identifiability discussed is global identifiability. There may be some weaker conditions that might be useful. First, some commonly used latent variable models struggle to ascertain whether they are globally identifiable. In such cases, researchers can verify whether the model possesses the characteristics of local identifiability (see Bekker et al., Reference Bekker, Merckens and Wansbeek1994; Skrondal & Rabe-Hesketh, Reference Skrondal and Rabe-Hesketh2004). Second, we may conceive a weaker concept of identifiability from lower-order margins. In this study, the polychoric correlation models show “identifiability from bivariate distributions.” In such a scenario, it may be possible to find consistent estimators of the parameters by only assuming bivariate distributions (without assumptions about higher-order joint distributions).Footnote 2 These estimators, which coincide with the limited-information estimator of the elliptical cases, may lead to robust estimators. These discussions potentially open a new direction for future research.
Data availability statement
Data sharing is not applicable to this article as no real datasets were collected in the study.
Funding statement
The present study was supported in part by the National Science and Technology Council of Taiwan (R.O.C.) Grants 111-2410-H-002-166 and 113-2410-H-002-217 to Yung-Fong Hsu, and 109-2410-H-002-092-MY2 and 111-2410-H-002-109 to Keng-Ling Lay.
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Appendix
The statements shown here are used in the proof of Theorem 2.14 for the identification of PCM with a latent multivariate t distribution.
Lemma A.1.
Suppose that X and Y are independent random variables with
$X\sim Gamma\left({a}_1,b\right),$
$Y\sim Gamma\left({a}_2,b\right),$
for
${a}_1,{a}_2,b>0.$
Then
$X+Y\sim Gamma\left({a}_1+{a}_2,b\right).$
Lemma A.2 is used to establish the monotonicity of the CDF of the product of a jointly distributed random variable.
Lemma A.2.
If X and Y are two independent, continuous random variables and
$Z= XY,$
then Z is a continuous random variable. Moreover, if X is supported on the positive values and Y is supported on the whole real line, then Z is supported on the whole real line, and the CDF of Z is strictly increasing and injective on the real line.
Proof. Suppose that there exists
${z}_0$
such that
$f\left({z}_0\right)={\int}_{-\infty}^{\infty }{f}_X(x){f}_Y\left(\frac{z_0}{x}\right)\frac{1}{\mid x\mid } dx=0.$
Because
${f}_X(x){f}_Y\left(\frac{z_0}{x}\right)\frac{1}{\mid x\mid}\ge 0,$
we have
${f}_X(x){f}_Y\left(\frac{z_0}{x}\right)\frac{1}{\mid x\mid}=0$
almost everywhere. However,
${f}_X(x)$
and
${f}_Y\left(\frac{z_0}{x}\right)$
are nonzero almost everywhere, so
${z}_0$
cannot exist. Thus Z is supported on the whole real line. Now because for any b > a, where ϵ is the infimum of within [a, b]. Thus the distribution of Z is strictly increasing, and so the function is injective on the real line.
Corollary A.3. A t distribution is supported on the whole real line, and the CDF is strictly increasing and injective on the real line.
Lemma A.4.
Let
${t}_{\nu }$
be the CDF of a t distribution with degrees of freedom
$\nu .$
Then for any
${\nu}_2>{\nu}_1>0,$
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu24.png?pub-status=live)
Therefore,
${\partial}_{\nu }F\left(x|\nu \right)$
and
$x$
have the same sign. More generally, let
$F\left(\mathbf{x}|\nu \right)$
be the CDF of a multivariate t distribution
${t}_{\nu}\left(\boldsymbol{\mu}, \boldsymbol{\Sigma} \right)$
, and
$x=\left({x}_1,{x}_2,\dots, {x}_p\right).$
If
${\nu}_2>{\nu}_1,$
then
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu25.png?pub-status=live)
Proof. By the definition of t distribution, if
$X\sim {t}_{\nu_1},$
then
$X=\frac{Z}{\sqrt{S/{\nu}_1}}$
for some
$S\sim {\chi}_{\nu_1}^2$
and
$Z\sim \mathcal{N}\left(0,1\right).$
Without loss of generality, for any
$x<0,$
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu26.png?pub-status=live)
Likewise, for any
$x>0,{t}_{\nu_1}(x)<{t}_{\nu_2}(x).$
The multivariate case can be proved using a similar method.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_fig2.png?pub-status=live)
Figure A1 The CDF of the t distribution with different degrees of freedom. Note: Fixing the target probability
$p<0.5,$
then the quantile is a strictly increasing function of ν; if
$p>0.5,$
then the quantile is a strictly decreasing function of ν.
Proposition A.5 shows that the quantile of the t distribution is an implicit function of the degrees of freedom. Figure 1 demonstrates this property.
Proposition A.5.
Let q be the u-quantile of the t distribution, i.e.,
$u={t}_{\nu }(q).$
Given u, if
$0<u<0.5$
then
$q\left(\nu \right)$
is strictly increasing and invertible on
$q<0;$
if
$0.5<u<1$
then
$q\left(\nu \right)$
is strictly decreasing and invertible on
$q>0.$
In other words, for any
${\nu}_2>{\nu}_1>0,$
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20250211121701512-0126:S0033312324000255:S0033312324000255_eqnu27.png?pub-status=live)
Proof. Fix
$0.5<u<1,$
and let
${q}_1$
and
${q}_2$
be the u quantile of t distributions with degrees of freedom
${\nu}_1$
and
${\nu}_2,$
, respectively. The uniqueness of the quantiles is guaranteed by Corollary A.3. Also, based on the property of t distributions,
${q}_1,{q}_2>0.$
Then, based on Lemma A.4, for any
${\nu}_2>{\nu}_1,u={t}_{\nu_1}\left({q}_1\right)={t}_{\nu_2}\left({q}_2\right)>{t}_{\nu_1}\left({q}_2\right).$
By Corollary A.3,
${q}_2<{q}_1,$
therefore,
$q\left(\nu \right)$
is strictly increasing. Similarly, for
$0<u<0.5,$
$q\left(\nu \right)$
is strictly decreasing. Also, for any fixed values of u and
$\nu,$
there is a value
$q$
such that
$u={t}_{\nu }(q).$
Thus
$q\left(\nu \right)$
is invertible on both
$q<0$
and
$q>0.$