On equal-input and monotone Markov matrices

Michael Baake; Jeremy Sumner

doi:10.1017/apr.2021.39

On equal-input and monotone Markov matrices

Part of: Markov processes

Published online by Cambridge University Press: 06 June 2022

Michael Baake and

Jeremy Sumner

Show author details

Michael Baake*: Affiliation:
Bielefeld University
Jeremy Sumner*: Affiliation:
University of Tasmania
*: *Postal address: Fakultät für Mathematik, Universität Bielefeld, Postfach 100131, 33501 Bielefeld, Germany. Email: mbaake@math.uni-bielefeld.de
**Postal address: School of Natural Sciences, Discipline of Mathematics, University of Tasmania, Private Bag 37, Hobart, TAS 7001, Australia. Email: jeremy.sumner@utas.edu.au

Article contents

Abstract
Introduction
Equal-input matrices and some of their properties
Monotone Markov matrices and embeddability
Monotone Markov matrices in three dimensions
Uniqueness of embedding and further directions
References

Rights & Permissions

Abstract

The practically important classes of equal-input and of monotone Markov matrices are revisited, with special focus on embeddability, infinite divisibility, and mutual relations. Several uniqueness results for the classic Markov embedding problem are obtained in the process. To achieve our results, we need to employ various algebraic and geometric tools, including commutativity, permutation invariance, and convexity. Of particular relevance in several demarcation results are Markov matrices that are idempotents.

Keywords

Markov matrix embedding problem monotone partial order convexity

MSC classification

Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces)

Type: Original Article
Information: Advances in Applied Probability , Volume 54 , Issue 2 , June 2022 , pp. 460 - 492

DOI: https://doi.org/10.1017/apr.2021.39 [Opens in a new window]
Copyright: © The Author(s) 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Let $\mathcal{M}_d$ denote the set of d-dimensional Markov (or stochastic) matrices, which are the elements of $\textrm{Mat} (d,\mathbb{R})$ with nonnegative entries and all row sums equal to 1. Clearly, $\mathcal{M}_d$ is a compact convex set, with the $d^{ d}$ Markov matrices with entries in $\{0,1\}$ being its extremal points (or elements); see [Reference Marcus and Minc24, Sec. II.1] for a summary. Another classic example is the subset of $\mathcal{M}_d$ of doubly stochastic matrices, where both the matrix and its transpose are Markov. Here, the extremal elements are the $d !$ permutation matrices, which is known as Birkhoff’s theorem. Concepts and methods from convexity will be needed and employed throughout the manuscript; see [Reference Webster30] for general background on convex structures.

A Markov matrix M is called embeddable [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2, Reference Davies5, Reference Elfving7, Reference Kingman20] if it can be written as $M=\textrm{e}^Q$ with a rate matrix Q, which is a matrix with nonnegative off-diagonal elements and vanishing row sums. A rate matrix Q is also called a Markov generator, or simply generator, because $\big\{ \textrm{e}^{t Q} \,:\, t \geqslant 0 \big\}$ is a semigroup of Markov matrices with unit, and is thus a monoid; see [Reference Norris25] for general background on Markov chains in continuous time. The set of embeddable matrices from $\mathcal{M}_d$ is denoted by $\mathcal{M}^{\textrm{E}}_{d}$ . A Markov matrix is called infinitely divisible [Reference Feller8, Reference Göndöcs, Michaletzky, Móri and Székely11] if it has a Markov nth root for every $n\in\mathbb{N}$ , where $\mathbb{N}$ is the set of positive integers. It is a well-known fact [Reference Kingman20, Prop. 7] that a Markov matrix is embeddable if and only if it is nonsingular and infinitely divisible; see [Reference Higham14, Sec. 2.3] as well as [Reference Eisner and Radl6, Reference Guerry12] for related material. In the other direction of taking powers, whenever Q is a generator, $M_{\infty}\,:\!=\,\lim^{}_{t\to\infty} \textrm{e}^{t Q}$ exists, and is an idempotent Markov matrix by [Reference Baake and Sumner1, Prop. 2.3(3)], so $M^{2}_{\infty} = M^{}_{\infty}$ .

The embedding problem goes back to Elfving [Reference Elfving7] and became prominent through the foundational paper by Kingman [Reference Kingman20]. In the two decades following it, many abstract characterisations of embeddable matrices were found and investigated; see [Reference Baake and Sumner1, Reference Davies5] and references cited there for some of the literature, as well as [Reference Higham14, Reference Horn and Johnson17] for various related questions in matrix analysis. However, beyond $d=3$ , concrete criteria suitable for real-world applications remained elusive, and the interest in the problem diminished somewhat. New impetus then came from theoretical economics and, more recently and quite vigorously, from bioinformatics, where precisely the embedding problem for $d=4$ is relevant in phylogenetics, with significant progress still being rather fresh; see [Reference Baake and Sumner1, Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2, Reference Göndöcs, Michaletzky, Móri and Székely11] and references therein.

In this context, the inference problem of discretely-observed, continuous-time Markov chains is the starting point; compare [Reference Yang and Ranala31], where natural consistency considerations led to several new results (see [Reference Fernández-Sánchez, Sumner, Jarvis and Woodhams9] and references therein). Of particular relevance, both theoretically and practically, are equal-input matrices [Reference Steel27], which possess a powerful algebraic structure and paved the way to some progress, also beyond this class. In fact, it is precisely the systematic use of some standard (and some perhaps not quite so standard) tools from (linear) algebra that unlocks the somewhat stuck embedding problem for progress beyond $d=2$ as needed in the applications. It is one goal of this paper to explain some of these techniques in action, by applying them to two particularly important classes of Markov chains. This way, we also attempt to convince the reader that stepping a little into algebraic methods can be more than a little profitable.

To avoid trivial statements, we will generally assume $d\geqslant 2$ . Let $C \in \textrm{Mat} (d,\mathbb{R})$ have equal rows, each being $\big(c^{}_{1}, \ldots , c^{}_{d}\big)$ , and define $c=c^{}_{1} + \ldots + c^{}_{d}$ as its parameter sum. Such a matrix C is Markov precisely when we have $c=1$ together with $c_i\geqslant 0$ for all i. However, it then has rank 1, so $\det(C)=0$ for $d>1$ . For this reason, such matrices C are often of limited interest, for instance in the context of embeddability. Instead, consider $M^{}_{C} = (1-c) \mathbb{1} + C$ , which is a matrix with each row summing to 1. Here and below, $\mathbb{1}$ denotes the identity matrix. Cleary, $M^{}_{C}$ is Markov if and only if $c_i\geqslant 0$ and $c \leqslant 1 + c_i$ for all i. Following the much-cited monograph [Reference Steel27], we call such Markov matrices equal-input, since they describe Markov chains where the probability of a transition $i \to j$ , for $i\ne j$ , depends on j only. As detailed in [Reference Steel27] (see also [Reference Baake and Sumner1] and references therein), they constitute an important model class in bioinformatics and phylogenetics. All such matrices are of the above form, and the underlying c is called its summatory parameter. For a given d, they form another convex set, which we denote by $\mathcal{C}_d$ ; see Lemma 2.8 for more.

A seemingly unrelated concept, at least at first sight, is the following. Consider the standard simplex of d-dimensional probability vectors,

\begin{equation*} \mathbb{P}_d \,:\!=\, \big\{ \big(x^{}_{1}, \ldots , x^{}_{ d}\big) \,:\, \text{all } x_i \geqslant 0 \text{ and } x^{}_{1} + \ldots + x^{}_{d} = 1 \big\} ,\end{equation*}

which is a compact convex set. Its extremal elements are the standard (row) basis vectors $e_i$ with $i \in [d ] \,:\!=\,\{ 1, \ldots, d \}$ . One can introduce the partial order of stochastic monotonicity on $\mathbb{P}_d$ by saying that x is dominated by y, written as $x\preccurlyeq y$ , when $\sum_{i=m}^{d} x_i \leqslant \sum_{i=m}^{d} y_i$ holds for all $m \in [d ]$ ; see Equation (3.2) below for an alternative formulation. The corresponding partial order is well defined also on the positive multiples of $\mathbb{P}_d$ , called level sets, hence extendable to convex combinations. Further, it is consistent on the entire positive cone, where two vectors can at most be compared when they lie in the same level set. This notion has its origin in an important class of stochastic processes [Reference Daley4] that show up in many places in probability theory and its applications [Reference Keilson and Kester18, Reference Kijima19, Reference Lindqvist22]. Though practically perhaps most relevant in various economic contexts, stochastic monotonicity induces another class of matrices with a lot of internal structure, which is relevant in the embedding context, as we shall explain below.

A Markov matrix $M = \big(m^{}_{ij}\big)^{}_{1 \leqslant i,\,j \leqslant d}$ is called stochastically monotone, or monotone for short, when the mapping $x\mapsto x M$ preserves stochastic monotonicity; compare [Reference Daley4, Reference Keilson and Kester18]. It is well-known that M is monotone if and only if its rows $m^{}_i = \big(m^{}_{i1}, \ldots, m^{}_{id}\big) = e^{}_{i} M$ are ordered accordingly with increasing row numbers, meaning $m^{}_i \preccurlyeq m^{}_j$ for all $i\leqslant j$ . More generally, the concept of being monotone is well defined for all nonnegative matrices with equal row sums, which are simply multiples of Markov matrices. Then, preserving the partial order means that an inequality in one level set is turned into the corresponding one in another. The monotone Markov matrices in $\mathcal{M}_d$ form a closed convex set, which we denote by $\mathcal{M}_{d,\preccurlyeq}$ . All elements of $\mathcal{M}_{d,\preccurlyeq}$ have trace ${\geqslant}1$ , and the extremal points of $\mathcal{M}_{d,\preccurlyeq}$ , as detailed in Lemma 3.4 below, are the $\binom{2d-1}{d}$ monotone Markov matrices with entries in $\{0,1\}$ . Monotone Markov matrices appear in many contexts; see [Reference Keilson and Kester18, Reference Lindqvist22] as well as [Reference Kijima19, Ch. 3] for examples.

A stationary vector of $M\in\mathcal{M}_d$ is any $x\in\mathbb{P}_d$ with $x M = x$ . Given M, the set of all stationary vectors is convex. In fact, it is a subsimplex of $\mathbb{P}_d$ that can be a singleton set (as for all irreducible M) or larger, up to $\mathbb{P}_d$ itself (for $M =\mathbb{1}$ ). A Markov matrix M is an idempotent when $M^2=M$ . This means that M maps any $x\in\mathbb{P}_d$ to a stationary vector in one step. Except for $M =\mathbb{1}$ , any idempotent in $\mathcal{M}_d$ has minimal polynomial $x (x-1)$ . In particular, all idempotents in $\mathcal{M}_d$ are diagonalisable, and all but $M=\mathbb{1}$ are singular. Markov idempotents constitute an interesting subset of the Markov matrices in their own right (compare [Reference Högnäs and Mukherjea16, Sec. 1.6]), but also provide a natural link between equal-input and monotone matrices. Indeed, the above two matrix classes are intimately connected, and several of their properties can be derived and understood interactively. In our presentation below, idempotents will play an important role, which is an interesting point that does not seem to have attracted enough attention in the past.

The structure of the paper is as follows. Since we build on terminology, methods, and results from [Reference Baake and Sumner1], we feel that a renewed section on preliminaries is unnecessary. While we try to make the paper as self-contained as possible, some reference to [Reference Baake and Sumner1] is inevitable to avoid duplication. Instead, we proceed by recalling and extending some important properties of equal-input matrices in Section 2 and then deriving new results on their graded semigroup structure (Proposition 2.6), which provides relevant insight into the properties of such Markov matrices. We then determine their embeddability (Proposition 2.12) and their multiplicative structure in exponential form (Theorem 2.15).

The latter can be viewed as a nontrivial, explicit version of the Baker–Campbell–Hausdorff (BCH) formula in this case; see the Wikipedia entry on the BCH formula for a good summary of this tool, which is particularly useful in the context of inhomogeneous Markov chains. Beyond the commuting case, where the BCH formula is trivial, we are not aware of many other matrix classes with such a favourable structure, and a better understanding of multiplicatively closed matrix classes is needed; compare [Reference Sumner28, Reference Sumner, Fernández-Sánchez and Jarvis29] as well as [Reference Steel27, Ch. 7].

In Section 3, we turn to the monotone matrices, first recalling some of their elementary properties and then continuing with results on embeddability (Theorem 3.8) and monotone generators (Proposition 3.11). Throughout the discussion, idempotent matrices will naturally appear, which can be explained by the intrinsic (pseudo-)Poissonian structure of infinitely divisible Markov matrices (Proposition 3.14 and Theorem 3.17). We consider the case $d=3$ in more detail in Section 4, where the embeddability can be decided completely (Proposition 4.3 and Theorem 4.4), and close with a general uniqueness result (Theorem 5.3 and Corollary 5.5) and some comments on Markov roots and limiting cases in Section 5.

2. Equal-input matrices and some of their properties

From now on, we always use C to denote a nonnegative matrix with equal rows and the parameter sum $c\geqslant 0$ , where $c=0$ then implies $C= \mathbb{0}$ . For any matrix C of this type, one has $MC = C$ for all $M\in\mathcal{M}_d$ .

2.1. Equal-input Markov matrices

Given such a matrix C, the corresponding matrix

(2.1)

\begin{equation} M^{}_{C} \,:\!=\, (1-c) \mathbb{1} + C\end{equation}

is Markov precisely when $0 \leqslant c^{}_{i}$ for all i and $c\leqslant 1 + \min^{}_{ i} c^{}_i$ . It is then called an equal-input Markov matrix, and the set of all such matrices for a fixed d is denoted by $\mathcal{C}_d$ . As $C\ne \mathbb{0}$ has eigenvalues c and 0, the latter with multiplicity $d - \! 1$ , it is clear that

(2.2)

\begin{equation} \det\!\big(M^{}_{C}\big) = (1 -c)^{d-1} .\end{equation}

A Markov generator Q is called equal-input if it is of the form $Q=Q^{}_{C}=C - c \mathbb{1}$ , with C as above and $c\geqslant 0$ without further restrictions. Clearly, $c=0$ means $Q=\mathbb{0}$ . Also here, we call c the summatory parameter, as it will always be clear from the context whether we are referring to a Markov matrix or to a generator. Since $C^2 = c C$ , one gets $Q^{2}_{C} = -c Q^{}_{C}$ , so any equal-input generator is diagonalisable. When $c>0$ , the matrix $\frac{1}{c} C$ is both Markov and an idempotent. As we shall see, the relation $Q^{}_{C} = M^{}_{C} - \mathbb{1}$ will become particularly important.

Fact 2.1. If $M\in \mathcal{C}_d$ , with $d\geqslant 2$ , its summatory parameter satisfies $c \in \bigl[ 0, \frac{d}{d-1}\bigr]$ . Moreover, the matrix $M\in\mathcal{C}_d$ is an idempotent if and only if $c=1$ or $c=0$ , where the latter case means $M =\mathbb{1}$ , while all other idempotents are singular.

Proof. Assume that $M = M^{}_{C}$ is Markov. Since $0 \leqslant c = c^{}_{1} + \ldots + c^{}_{d} \leqslant 1+ \min^{}_i c^{}_i$ , the maximal value of c is realised for $c^{}_{1} = \ldots = c^{}_{d} = \frac{c}{d}$ , which gives the first claim. The second follows from considering the equation

\begin{equation*} (1-c) \mathbb{1} + C = M^{}_{C} = M^{ 2}_{C} = (1-c)^2 \mathbb{1} + (2-c) C ,\end{equation*}

which is solved by $C=\mathbb{0}$ with $c=0$ or, for any $c>0$ , implies $(1-c) C = \mathbb{0}$ and hence $c=1$ . This leads to the two cases stated, where $\mathbb{1}$ is the only nonsingular idempotent by (2.2).

Let us next look at some asymptotic properties. Here, one has

\begin{equation*} M^{ n}_{C} \,= \, (1-c)^{n} \mathbb{1} + \frac{1 - (1-c)^{n}}{c} \, C\end{equation*}

for $n\in \mathbb{N}_0$ and $c>0$ (so $C\ne \mathbb{0}$ ). If $\lvert 1-c \rvert < 1$ , we see that $M^{ n}_{C}$ , as $n\to\infty$ , converges to the Markov matrix $\frac{1}{c} C$ . Adding the case with $c=0$ , but excluding from consideration $c=2$ (where convergence fails, occurring only for $d=2$ ), one can summarise as follows.

Fact 2.2. Let $d\geqslant 2$ and let C be a nonnegative matrix with equal rows and parameter sum $0 \leqslant c < 2$ . Then, if the matrix $M^{}_{C}$ from (2.1) is Markov, one has

\begin{equation*} \lim_{n\to\infty} M^{ n}_{C} = \begin{cases} \mathbb{1} , & if\ c=0, \\[5pt] \dfrac{1}{c} C , & {otherwise} , \end{cases}\end{equation*}

where all limits are idempotents. Here, the summatory parameter of $M^{ n}_{C}$ is $1-(1-c)^n$ , which is $0$ for $c=0$ , or otherwise converges to 1 as $n \to \infty$ .

Since idempotents will show up repeatedly below, we recall the following well-known property of Markov matrices, which we also prove for the reader’s convenience.

Lemma 2.3. For $M\in\mathcal{M}_d$ , the following properties are equivalent:

1. M is a nonsingular idempotent.
2. $1$ is the only eigenvalue of M.
3. $M=\mathbb{1}$ .
4. M has minimal polynomial $q(x) = x-1$ .

Proof. Clearly, M has 1 as an eigenvalue, because it is Markov. When $M^2=M$ , the only possible other eigenvalue is 0, and (1) $\Rightarrow$ (2) is clear. The implications (3) $\Rightarrow$ (4) $\Rightarrow$ (1) are immediate, and it remains to show (2) $\Rightarrow$ (3).

By [Reference Gantmacher10, Thm. 13.10], we know that the algebraic multiplicity of the eigenvalue 1 agrees with the geometric one. When no other eigenvalue exists, this means $M=\mathbb{1}$ .

The set $\mathcal{C}_d$ of equal-input matrices is important in many applications (see [Reference Baake and Sumner1, Reference Steel27] and references therein) and has interesting and revealing algebraic properties as follows.

Fact 2.4. Let C and C^′ be two nonnegative, equal-row matrices, with parameter sums c and c^′, such that $M_C$ and $M_{C^{\prime}}$ are Markov matrices, so both lie in $\mathcal{C}_d$ . Then, one also obtains that $M = M_C M_{C^{\prime}} \in \mathcal{C}_d$ , where one has $M=M_{C^{\prime\prime}}$ with $C^{\prime\prime} = (1-c^{\prime} ) C + C^{\prime}$ and parameter sum $c^{\prime\prime} = c + c^{\prime} - c c^{\prime}$ .

The Markov property of $M = M_C M_{C^{\prime}}$ implies $0 \leqslant c^{\prime\prime}_{i} \leqslant c^{\prime\prime}\leqslant 1 + c^{\prime\prime}_{i} $ for all i, which may not be obvious from the formula for C ^′′. In fact, the relation between the summatory parameters can be further analysed as follows, where we refer to [Reference Lang21] for the grading notion. We will need the group $C_2$ with two elements, here written as $\{ 1,-1 \}$ with ordinary multiplication.

Lemma 2.5. Consider $f (a,b) = a + b - a b$ for $a,b \in X \,:\!= [ 0,1) \cup (1,2 ]$ . Then, one of the following three cases applies:

1. If $\max (a,b) < 1$ , one has $ 0 \leqslant \max (a,b) \leqslant f (a,b) < 1$ .
2. If $\min (a,b) > 1$ , one has $ 0 \leqslant 2 - \min (a,b) \leqslant f (a,b) < 1$ .
3. Otherwise, one has $ 1 < f (a,b) \leqslant \max (a,b) \leqslant 2 $ .

In particular, the mapping $(a,b)\mapsto f(a,b)$ turns X into a $C_2$ -graded, commutative monoid, with $0$ as the neutral element of $ X\!$ , and with the grading being induced by the two connected components of $X \!$ , for instance via the mapping $x \mapsto \textrm{sgn} (1 -x)$ for $x\in X\! $ .

Proof. Without loss of generality, we may assume $a\leqslant b$ . Also, observe that the function $x \mapsto x (2-x)$ , on [0, 2], has a unique maximum at $x=1$ , with value 1, so $x (x-1) < 1$ holds for all $x\in [ 0,1) \cup (1,2 ]$ . Now, we can look at the three cases as follows.

When $0\leqslant a \leqslant b < 1$ , where $1-b$ is positive, we obtain the estimate

\begin{equation*} 0 \, \leqslant \, b \, \leqslant \, b + (1-b) a = f (a,b) \, < \, b + (1-b) = 1 ,\end{equation*}

while $1 < a \leqslant b \leqslant 2$ , where $1-a$ is negative, leads to

\begin{equation*} 0 \, \leqslant \ 2-a = a + 2 (1-a) \, \leqslant \, a + b (1-a) = f (a,b) \, \leqslant \, a + a (1-a) = a (2-a) \, < \, 1 .\end{equation*}

For the remaining case, it suffices to consider $0 \leqslant a < 1 < b \leqslant 2$ , which gives

\begin{equation*} 1 = b + (1-b) \, < \, b + a (1-b) = f (a,b) \, \leqslant \, a + b - a = b \, \leqslant \, 2 ,\end{equation*}

from which the claims (1)–(3) follow.

Since $f (\, f(a,b),c ) = a + b + c - ab - ac - bc + abc = f ( a, f(b,c))$ , associativity of the mapping $(a,b)\mapsto f(a,b)$ is clear, and the $C_2$ -graded monoid structure is now obvious.

Proposition 2.6. The set $\mathcal{C}_d$ is a monoid under matrix multiplication, with the subset of nonsingular elements forming a submonoid. The latter is $C_2$ -graded by $ \textrm{sgn} (1 -c)$ , where c is the summatory parameter, which matches with the grading of X from Lemma 2.5. When d is even, the same grading emerges from the sign of the determinant.

Proof. The semigroup property follows from Fact 2.4, and $\mathbb{1} \in \mathcal{C}_d$ shows that $\mathcal{C}_d$ is a monoid. The nonsingular matrices, which include $\mathbb{1}$ , are closed under multiplication.

The formula for the summatory parameter of a product from Fact 2.4, in conjunction with Lemma 2.5, implies $c^{\prime\prime}\in [ 0,1)$ when c and $c^{\prime}$ are either both in [0, 1) or both in $\bigl(1, \frac{d}{d-1}\bigr]$ , where the ranges follow from Fact 2.1. Likewise, $c^{\prime\prime} > 1$ if and only if $c < 1 < c^{\prime}$ or $c^{\prime} < 1 < c$ . Together, these observations provide the claimed $C_2$ -grading.

For even d, by Equation (2.2), the sign of $1 -c$ matches the sign of the determinant.

Remark 2.7. Given $d\geqslant 2$ , one can consider $ X_d \,:\!=\,\big\{ x\in X \,:\, x \leqslant \frac{d}{d-1} \big\}$ , which defines a submonoid of ${X}$ which is again $C_2$ -graded. We then have two successive monoid homomorphisms, namely

\begin{equation*} \mathcal{C}_d \, \xrightarrow{\quad} \, X_d \, \xrightarrow{\quad} \, C_2 ,\end{equation*}

which summarises the grading structure. Let us mention in passing that the grading can be extended to include the singular matrices (that is, those with $c=1$ ), and thus cover all of $\mathcal{C}_d$ , by employing the semigroup $\{ -1, 0 , 1\}$ instead of $C_2$ .

Next, we consider the set $\mathcal{C}_d$ in a little more detail. We begin with the closed subset (and semigroup) of nontrivial idempotents,

\begin{equation*} \mathcal{C}_{d,1} \,:\!=\, \big\{ M \in \mathcal{C}_d \,:\, c = 1 \big\} = \big\{ M\in \mathcal{C}_d \,:\, M^2 = M \ne \mathbb{1} \big\} = \big\{ M \in \mathcal{C}_d \,:\, \det\!(M) = 0 \big\} ,\end{equation*}

where the alternative characterisations immediately follow from Fact 2.1 and Equation (2.2). Now, let $E_i \in \textrm{Mat} (d,\mathbb{R})$ denote the matrix with 1 in all positions of column i and 0 everywhere else, which is an idempotent Markov matrix. In fact, these matrices satisfy

(2.3)

\begin{equation} E_i E_j = E_j \quad \text{for all } 1 \leqslant i,j \leqslant d ,\end{equation}

and it is easy to see that any convex combination $M = \sum_{i=1}^{d} \beta_{i} E_{i}$ , where all $\beta_{i} \geqslant 0$ subject to $\beta^{}_{1} + \ldots + \beta^{}_{d} = 1$ , is an idempotent as well. For $d\geqslant 2$ , we also define the rational matrix

(2.4)

\begin{equation} G^{}_{ d} = \frac{1}{d-1} \left( \begin{pmatrix} 1 & \quad \cdots & \quad 1 \\[3pt] \vdots & \quad \ddots & \quad \vdots \\[3pt] 1 & \quad \cdots & \quad 1 \end{pmatrix} - \mathbb{1} \right) = \frac{1}{d-1} \bigl( E^{}_{1} + \ldots + E^{}_{d} - \mathbb{1} \bigr) ,\end{equation}

which is the unique element of $\mathcal{C}_d$ with maximal summatory parameter, $c=\frac{d}{d-1}$ .

Lemma 2.8. The sets $\mathcal{C}_d$ and $\mathcal{C}_{d,1}$ are convex. When $d \geqslant 2$ , the extremal elements of $\mathcal{C}_{d,1}$ are the idempotent matrices $E_1 , \ldots , E_d$ , which remain extremal in $\mathcal{C}_d$ . There are two further extremal elements in $\mathcal{C}_d$ , namely $\mathbb{1}$ and $G_d$ .

Proof. The convexity of $\mathcal{C}_d$ follows from

\begin{equation*} \alpha M^{}_{C} + (1-\alpha) M^{}_{C^{\prime}} = \bigl( 1 - \big(\alpha c + (1-\alpha) c^{\prime} \big)\big) \mathbb{1} + \alpha C + (1-\alpha) C^{\prime}\end{equation*}

for $\alpha \in [ 0,1]$ together with the linearity of the summatory parameter. The convexity of the subset $\mathcal{C}_{d,1}$ is then obvious because these are the elements with $c=1$ . Clearly, the convex combinations of the d matrices $E_1, \ldots , E_d$ span $\mathcal{C}_{d,1}$ . Since they are linearly independent, they must be extremal.

Let $M\in \mathcal{C}_d$ , so $M = M^{}_{C}$ , where C has equal rows $\big(c^{}_{1}, \ldots , c^{}_{d}\big)$ with $c^{}_i \geqslant 0$ and parameter sum $c \in \bigl[ 0, \frac{d}{d-1} \bigr]$ , subject to the condition $c \leqslant 1 + c^{}_{\min}$ with $c^{}_{\min} = \min^{}_{ i} c^{}_i$ . We will now show that M is of the form $r \mathbb{1} + s G^{}_{ d} + \sum_i t^{}_i E^{}_i$ for some $r,s,t_i \geqslant 0$ that sum to 1.

If $c\in [ 0,1]$ , simply choose $s=0$ , $t^{}_i = c^{}_i$ , and $r=1-c$ , which does the job. If $c>1$ , we have $c^{}_{\min}>0$ in our setting. Choose $s=(d-1) c^{}_{\min}$ and $t^{}_{i} = c^{}_{i} - c^{}_{\min}$ , where $s\geqslant 0$ and all $t^{}_{i} \geqslant 0$ by construction. Then $s+ \sum_i t^{}_{i} = c - c^{}_{\min} \leqslant 1$ , so we can complete this with the choice $r = 1 - s - \sum_i t^{}_{i} \geqslant 0$ . It is easy to check that this gives a convex combination with summatory parameter c, and also that the sum equals M.

Consequently, the compact set $\mathcal{C}_d$ is the convex hull of $\big\{ \mathbb{1}, G^{}_{ d}, E^{}_{1} , \ldots , E^{}_{d} \big\}$ , and hence is the smallest convex set that contains these $d+2$ matrices. In view of the classic Krein–Milman theorem [Reference Webster30, Thm. 2.6.16], it remains to show that they all are extremal. This is clear for $\mathbb{1}$ , where $c=0$ , and for $G^{}_{ d}$ , which is the only matrix in $\mathcal{C}_d$ with $c=\frac{d}{d-1}$ . Since the $E_i$ are linearly independent, but all have $c=1$ , none can be replaced by a convex combination of the other matrices (including $\mathbb{1}$ and $G^{}_{ d}$ ), which completes the argument.

Example 2.9. For $d=2$ , all Markov matrices are of equal-input type. The four extremal elements of $\mathcal{C}_2=\mathcal{M}_2$ are given by

where the two Markov matrices with $c=1$ span $\mathcal{C}_{2,1}$ ; see [Reference Baake and Sumner1, Fig. 1] for an illustration.

The situation is a little more interesting for $d=3$ , where $\mathcal{C}_3 \subsetneq \mathcal{M}_3$ , and one has

part of which will reappear in Table 1 below.

2.2. Equal-input generators and embeddability

If Q is an equal-input generator with summatory parameter c, so $Q=C-c \mathbb{1}$ , its exponential is

(2.5)

\begin{equation} \textrm{e}^Q = \mathbb{1} + \frac{1-\textrm{e}^{-c}}{c} Q = \frac{1-\textrm{e}^{-c}}{c} C + \textrm{e}^{-c} \mathbb{1} ,\end{equation}

with $C=\mathbb{0}$ for $c=0$ , so the summatory parameter of $\textrm{e}^Q$ is always given by $\tilde{c} = 1 - \textrm{e}^{-c}$ . For embeddability, one has the following well-known result; see [Reference Davies5, Reference Kingman20] for background.

Lemma 2.10. The Markov matrix $M = \left( \small\begin{array}{c@{\quad}c} 1-a & a \\ b & 1-b \end{array} \right)$ with $a,b\in [ 0,1]$ is embeddable if and only if $\det\!(M) > 0$ , which is equivalent to the condition $0 \leqslant a+b <1$ . In this case, there is precisely one generator Q such that $M=\textrm{e}^Q$ , namely

\begin{equation*}Q = - \frac{\log(1-a-b)}{a+b} \bigl( M - \mathbb{1} \bigr),\end{equation*}

which is an equal-input generator.

Proof. The first statement is Kendall’s theorem; see [Reference Baake and Sumner1, Thm. 3.1] for a complete formulation. The uniqueness claim is established in [Reference Baake and Sumner1, Eq. (5) and Cor. 3.3].

Put differently, since $\mathcal{C}_2=\mathcal{M}_2$ , a matrix $M\in\mathcal{M}_2$ is embeddable if and only if its summatory parameter satisfies $0\leqslant c < 1$ . The closure of the set of embeddable matrices in $\mathcal{M}_2$ consists of all infinitely divisible elements of $\mathcal{M}_2$ , as we shall discuss in more detail later, in Theorem 3.8 and Example 3.15.

Remark 2.11. The equation $\textrm{e}^{ x} = 1$ has precisely one solution $x\in\mathbb{R}$ , namely $x=0$ . In contrast, $\textrm{e}^A = \mathbb{1}$ with $A\in\textrm{Mat} (2,\mathbb{R})$ has already infinitely many solutions, including $A = n \left( \small\begin{array}{c@{\quad}c} 0 & - 2 \pi \\ 2 \pi & 0 \end{array} \right)$ with $n\in\mathbb{Z}$ . Restricting $A$ to real matrices with zero row sums restores uniqueness, because one eigenvalue of $A$ is then 0, hence also the second, by the spectral mapping theorem (SMT), as $A$ is real. Since $A$ must be diagonalisable by [Reference Baake and Sumner1, Fact 2.15], $A=\mathbb{0}$ is the only solution.

Whenever $A = (a_{ij})^{}_{1\leqslant i,j \leqslant d}$ is a Markov generator with $\textrm{e}^A = \mathbb{1}$ , for arbitrary d, one has $1 = \det\!\big(\textrm{e}^A\big) = \textrm{e}^{\textrm{tr} (A)}$ and thus $0 = \textrm{tr} (A) = - \sum_{i\ne j} a_{ij}$ . With $a_{ij}\geqslant 0$ for all $i\ne j$ by the generator property, this gives $a_{ij} = 0$ for all $i\ne j$ , hence also $a_{ii}=0$ for all i. Consequently, $A=\mathbb{0}$ is the only generator with $\textrm{e}^A = \mathbb{1}$ . However, already for $d=3$ , there are further solutions of $\textrm{e}^A = \mathbb{1}$ among the real matrices with zero row sums, which is one reason why the embedding problem becomes significantly more complicated for $d\geqslant 3$ .

In general, if Q is an equal-input generator, then so is $\frac{1}{n} Q$ , for every $n\in\mathbb{N}$ . Now, we can reformulate results from [Reference Baake and Sumner1] and combine them with Kingman’s characterisation of embeddability via regularity in conjunction with infinite divisibility [Reference Kingman20, Prop. 7]. As this is compatible with the equal-input structure, we can summarise the general situation as follows.

Proposition 2.12. When d is even, $M \in \mathcal{C}_d$ is embeddable if and only if $0 \leqslant c < 1$ . When $d\geqslant 3$ is odd, there are further embeddable cases with $c>1$ .

For arbitrary d and $M\in \mathcal{C}_d$ , the following properties are equivalent:

1. M has positive spectrum.
2. M is embeddable via an equal-input generator.
3. M is nonsingular and infinitely divisible within $\mathcal{C}_d$ .
4. The summatory parameter of M satisfies $0\leqslant c < 1$ .

Note that, for $M \in\mathcal{C}_d$ with summatory parameter c, the even/odd dichotomy with the dimension emerges from Equation (2.2). A concrete example of an embeddable matrix $M\in\mathcal{C}_3$ with $c>1$ is discussed in [Reference Davies5, Ex. 16] and [Reference Baake and Sumner1, Ex. 4.3]. This M is infinitely divisible within $\mathcal{M}_3$ , but not within $\mathcal{C}_3$ . In fact, since $\sqrt[n \, ]{M}$ has spectrum $\Big\{ 1, \exp \Big( \frac{-\pi\sqrt{3}}{n} \pm \frac{\textrm{i} \pi}{n}\Big) \Big\}$ in this case, M does not possess an nth root of equal-input type for any $n\geqslant 2$ .

Example 2.13. Within $\mathcal{C}_d$ lies the submonoid of constant-input matrices [Reference Baake and Sumner1, Rem. 4.8], which all are of the form $M_c \,:\!=\,\mathbb{1} + c J_d$ with $0\leqslant c \leqslant \frac{d}{d-1}$ , where $J_d = \frac{d-1}{d} (G_d - \mathbb{1})$ with $G_d$ from (2.4) is a constant-input generator with summatory parameter 1, and hence $J^{2}_{d} = - J^{}_{d} $ . Clearly $J_d$ , as well as every constant-input matrix, is diagonalisable. If $c\in [ 0,1)$ , the spectral radius of $c J_d$ is $c<1$ , and a simple calculation with $\log (\mathbb{1} + c J_d)$ gives

\begin{equation*} M_c \,:\!=\, \mathbb{1} + c J_d = \exp\bigl( -\log (1-c) J_d \bigr) .\end{equation*}

For d even, by Proposition 2.12, no constant-input Markov matrix with $c>1$ can be embeddable, while this changes for $d\geqslant 3$ odd.

Assume that d is odd and $M_c$ with $c>1$ is embeddable, so $M=\textrm{e}^Q$ , where we also have $[J_d, Q ] = \mathbb{0}$ . So, by [Reference Baake and Sumner1, Lemma 4.10 and Fact 2.15], Q is doubly stochastic and diagonalisable. As the eigenvalues of $M_c$ are 1 and $1-c<0$ , the latter with multiplicity $d-1$ , the spectrum of Q cannot be real. In particular, Q is not symmetric. Still, for any $a\in [ 0,1)$ , we get

\begin{equation*} M_a M_c = \textrm{e}^{-\log (1-a) J_d} \textrm{e}^{Q} = \textrm{e}^{Q - \log (1-a) J_d},\end{equation*}

and $M_{f (a,c)}$ is embeddable as well, where f is the function from Lemma 2.5.

When $c>1$ is fixed and a varies in [0, 1), f(a, c) runs through (1, c]. Now, the infinitely divisible elements of $\mathcal{M}_d$ form a closed subset, as follows by a standard compactness argument via convergent subsequences. Since $M_c$ with $c>1$ is nonsingular, we see that there is a number $c^{}_{\max} \in (1,2 ]$ such that $M_c$ is embeddable precisely for all $c \in [ 0,1 ) \cup \big( 1, c^{}_{\max} \big]$ . These constant-input matrices form a monoid that inherits the $C_2$ -grading from Proposition 2.6. For the case $d=3$ , we know from [Reference Baake and Sumner1, Cor. 6.6] that $c^{}_{\max} = 1+\textrm{e}^{-\pi \sqrt{3}}$ . The determination of $c^{}_{\max}>1$ for $d=2m+1$ with $m\geqslant 2$ is an interesting open question.

When $d\geqslant 3$ , the embedding of $M\in\mathcal{C}_d$ need no longer be unique as for $d=2$ , but one still has the following property.

Lemma 2.14. Let $d\geqslant 2$ and let $M\in\mathcal{C}_{d}$ be embeddable. If M admits a representation of the form $M=\textrm{e}^Q$ with a generator Q of equal-input type, the latter is unique in the sense that no other embedding can have an equal-input generator.

Proof. The claim is obvious for $M=\mathbb{1}$ , where $Q=\mathbb{0}$ is the only generator that solves $\mathbb{1} = \textrm{e}^Q$ ; compare Remark 2.11. Next, let Q and Q ^′ be equal-input generators, with summatory parameters c and $c^{\prime}$ , where we may now assume that $c c^{\prime} >0$ . If $\textrm{e}^Q = \textrm{e}^{Q^{\prime}}\!$ , Equation (2.5) implies

\begin{equation*} \textrm{e}^{-c} \mathbb{1} + \frac{1-\textrm{e}^{-c}}{c} \, C = \textrm{e}^{-c^{\prime}} \mathbb{1} + \frac{1-\textrm{e}^{-c^{\prime}}}{c^{\prime}} \, C^{\prime} .\end{equation*}

As both C and C ^′ are equal-row matrices, this can only hold when $ \textrm{e}^{-c} = \textrm{e}^{-c^{\prime}} $ ; hence $c=c^{\prime}$ , which in turn forces $C=C^{\prime}$ and thus $Q=Q^{\prime} $ as claimed.

When $M\in\mathcal{C}_d$ is equal-input embeddable, so $M=M^{}_{C}$ for some nonnegative matrix C with parameter sum $0\leqslant c < 1$ by Proposition 2.12, the unique generator from Lemma 2.14 is given by

(2.6)

\begin{equation} Q = - \frac{\log (1-c)}{c} \, \big(M^{}_{C} - \mathbb{1} \big) ,\end{equation}

meaning $Q=\mathbb{0}$ for $c=0$ , which is a nice extension of Lemma 2.10. The derivation rests on the observation that $c<1$ is the spectral radius of $M^{}_{C} - \mathbb{1}$ , which permits the use of the standard branch of the matrix logarithm and its power series.

Let us next expand on an observation made in [Reference Baake and Sumner1], in the context of an effective BCH formula for embeddable equal-input matrices. Here, one considers products of exponentials of equal-input generators, including the complicated case of noncommuting ones.

Theorem 2.15. Let Q and Q^′ be equal-input generators, with summatory parameters c and $c^{\prime}$ , respectively. Then, one has $\textrm{e}^{Q} \textrm{e}^{Q^{\prime}} = \textrm{e}^{Q^{\prime\prime}}\!$ with

\begin{equation*} Q^{\prime\prime} = \frac{c+c^{\prime}}{c\bigl(1-\textrm{e}^{-(c+c^{\prime})}\bigr)} \bigg( \textrm{e}^{-c^{\prime}} \big(1-\textrm{e}^{-c}\big) Q + \frac{c}{c^{\prime}} \Big(1-\textrm{e}^{-c^{\prime}}\Big) Q^{\prime} \bigg) ,\end{equation*}

interpreted appropriately for $c=0$ or $c^{\prime}=0$ , where Q^′′ is again an equal-input generator. In particular, when $\big[Q, Q^{\prime} \big] =\mathbb{0}$ , the formula simplifies to $Q^{\prime\prime} = Q + Q^{\prime} $ .

Proof. Let C and C ^′ be the constant-row matrices underneath Q and Q ^′. Using the second identity from Equation (2.5) in conjunction with the relation $C C^{\prime} = c \, C^{\prime} \!$ , one finds

(2.7)

\begin{equation} \textrm{e}^Q \textrm{e}^{Q^{\prime}} = \, \textrm{e}^{-(c + c^{\prime} )} \mathbb{1} + \frac{\textrm{e}^{-c^{\prime}} (1-\textrm{e}^{-c})}{c} \, C + \frac{1-\textrm{e}^{-c^{\prime}}}{c^{\prime}} \, C^{\prime} .\end{equation}

Since the summatory parameters of $\textrm{e}^Q$ and $\textrm{e}^{Q^{\prime}}$ are $\tilde{c} = 1 - \textrm{e}^{-c}$ and $\tilde{c}^{\prime} = 1 - \textrm{e}^{-c^{\prime}}$ , which both lie in [0, 1), the product $\textrm{e}^Q \textrm{e}^{Q^{\prime}}$ is an equal-input matrix that has the summatory parameter $\tilde{c}^{\prime\prime} \in [ 0,1)$ by Lemma 2.5. As such, it is equal-input embeddable by Proposition 2.12; see also [Reference Baake and Sumner1, Thm. 4.6]. Consequently, there exists an equal-input generator $Q^{\prime\prime} \! = C^{\prime\prime} - c^{ \prime \prime} \mathbb{1}$ with

(2.8)

\begin{equation} \textrm{e}^Q \textrm{e}^{Q^{\prime}} = \textrm{e}^{Q^{\prime\prime}} = \, \textrm{e}^{-c^{\prime\prime}} \mathbb{1} + \frac{1-\textrm{e}^{-c^{\prime\prime}}}{c^{\prime\prime}} \, C^{\prime\prime} ,\end{equation}

where the last step follows once more from (2.5).

A comparison of (2.7) and (2.8) reveals that equality can only hold when the summatory parameters satisfy $c^{\prime\prime} = c + c^{\prime}$ , which then gives

\begin{equation*} C^{\prime\prime} = \, \frac{c+c^{\prime}}{1-\textrm{e}^{-(c+c^{\prime})}} \left( \frac{\textrm{e}^{-c^{\prime}} (1-\textrm{e}^{-c})}{\textit{c}} \, C + \frac{1-\textrm{e}^{-c^{\prime}}}{c^{\prime}} \, C^{\prime} \right) .\end{equation*}

Inserting $C = Q + c \mathbb{1}$ and the analogous terms for C ^′ and C ^′′ leads to the formula stated.

The condition $\big[Q, Q^{\prime} \big]=\mathbb{0}$ , which includes the case that one generator is $\mathbb{0}$ , is equivalent to $c^{\prime} C = c\, C^{\prime}$ , which also gives $c^{\prime} Q = c \, Q^{\prime}$ . Inserting this into the formula for Q ^′′ produces the claimed simplification after a short calculation.

In Theorem 2.15, the summatory parameters of $\textrm{e}^Q$ , $\textrm{e}^{Q^{\prime}}$ , and $\textrm{e}^{Q^{\prime\prime}}$ are always related by

\begin{equation*} \tilde{c}^{\prime\prime} = \, \tilde{c} + \tilde{c}^{\prime} - \tilde{c} \tilde{c}^{\prime} = f \bigl( \tilde{c} , \tilde{c}^{\prime} \bigr) = 1 - \textrm{e}^{-(c + c^{\prime})} \, < \, 1 ,\end{equation*}

in accordance with Fact 2.4 and Lemma 2.5. Various other aspects of equal- and constant-input Markov matrices have been discussed in [Reference Baake and Sumner1], although without consideration of their connection with idempotents. We rectify this omission now by exploring some ideas in this direction that will prove useful later.

2.3. Markov idempotents and equal-input matrices

When $d=2$ , the only Markov idempotents are $\mathbb{1}$ and all the equal-input matrices $\alpha E^{}_{1} + (1-\alpha) E^{}_{2}$ with $\alpha \in [ 0,1]$ . This situation is deceptively simple, as one realises already for $d=3$ . Still, some general considerations are possible, some of which can be considered as a refinement of Fact 2.1.

Lemma 2.16. Let $M = (m^{}_{ij})^{}_{1\leqslant i,j\leqslant d}$ be a Markov idempotent that is also a positive matrix, so $m^{}_{ij} > 0$ for all $ i,j \in [d ]$ . Then, M is equal-input with $M =C$ for some positive equal-row matrix C with parameter sum $c=1$ .

Proof. Let $m^{}_{ij}$ be a maximal element in column j of M, so $m^{}_{kj} \leqslant m^{}_{ij}$ holds for all $k \in [d ]$ . Then, with $M^2 = M$ , we get the inequality

(2.9)

\begin{equation} m^{}_{ij} = \bigl( M^2 \bigr)_{ij} \, = \sum_{k=1}^{d} m^{}_{ik} \, m^{}_{kj} \, \leqslant \, m^{}_{ij} \sum_{k=1}^{d} m^{}_{ik} = m^{}_{ij} ,\end{equation}

where we actually have equality. When all matrix elements are positive, this is only possible if $m^{}_{kj} = m^{}_{ij}$ holds for all $k \in [d ]$ , which means that column j of M is constant.

Since $j \in [d ]$ was arbitrary, the above argument applies to any column of M, and we obtain $M=c^{}_{1} E^{}_{1} + \ldots + c^{}_{d} E^{}_{d}$ , where all $c^{}_{i} > 0$ with $c = c^{}_{1} + \ldots + c^{}_{d} = 1$ because M is Markov.

The statement of Lemma 2.16 can also be understood via the Perron–Frobenius theorem, as M under the assumed conditions is primitive. Then, $M^{}_{\infty} = \lim_{n\to\infty} M^n = M$ is the projector to the unique equilibrium vector of M, which is $\big(c^{}_{1}, \ldots , c^{}_{d}\big)$ . Note that being idempotent implies that primitivity of M becomes equivalent with positivity of M.

When M fails to be positive, there are further cases. These are driven by the added possibility of having equality in (2.9) due to the presence of vanishing matrix elements.

Example 2.17. Let us look at Markov idempotents for $d=3$ . When $M^2=M$ is positive, the complete answer is provided by Lemma 2.16, so we only need to analyse cases with zero entries. Since the set of idempotents within $\mathcal{M}_d$ is closed, it clearly contains the simplex

\begin{equation*} \big\{ c^{}_{1} E^{}_{1} + c^{}_{2} E^{}_{2} + c^{}_{3} E^{}_{3} \,:\, c^{}_{i} \geqslant 0 \text{ and } c^{}_{1} + c^{}_{2} + c^{}_{3} = 1 \big\} ,\end{equation*}

in line with Fact 2.1. This simplex includes the three $\{ 0,1 \}$ matrices $E^{}_{1}$ , $E^{}_{2}$ , and $E^{}_{3}$ , which are its extremal elements. Each matrix in this simplex has a unique equilibrium vector.

Further, one finds that the matrices

$$\left( {\matrix{ 1 & {\quad 0} & {\quad 0} \cr 0 & {\quad a} & {\quad 1 - a} \cr 0 & {\quad a} & {\quad 1 - a} \cr } } \right),\quad \left( {\matrix{ a & {\quad 0} & {\quad 1 - a} \cr 0 & {\quad 1} & {\quad 0} \cr a & {\quad 0} & {\quad 1 - a} \cr } } \right),\quad and\;\left( {\matrix{ a & {\quad 1 - a} & {\quad 0} \cr a & {\quad 1 - a} & {\quad 0} \cr 0 & {\quad 0} & {\quad 1} \cr } } \right)$$

are idempotents for all $a\in [ 0,1]$ . For each matrix, its equilibrium vectors form a 1-simplex, hence with two extremal vectors. By choosing $a=0$ or $a=1$ , we obtain six further $\{ 0,1 \}$ matrices. So far, all nine of them are equal-input or blockwise equal-input, possibly after a state permutation, and the only missing $\{ 0,1 \}$ Markov idempotent is $M=\mathbb{1}$ .

Next, again for any $a\in [ 0,1]$ , the matrices

\begin{equation*} \begin{pmatrix} 1 & \quad 0 & \quad 0 \\[3pt] 0 & \quad 1 & \quad 0 \\[3pt] a & \quad 1 - a & \quad 0 \end{pmatrix} , \quad \begin{pmatrix} 1 & \quad 0 & \quad 0 \\[3pt] a & \quad 0 & \quad 1 - a \\[3pt] 0 & \quad 0 & \quad 1 \end{pmatrix} , \quad {and}\; \begin{pmatrix} 0 & \quad a & \quad 1 - a \\[3pt] 0 & \quad 1 & \quad 0 \\[3pt] 0 & \quad 0 & \quad 1 \end{pmatrix}\end{equation*}

are Markov idempotents. They do not produce new $\{ 0,1 \}$ matrices. Also, for $0<a<1$ , they are not of equal-input form, not even blockwise, which means that more complicated cases do occur. We leave it as an exercise to the interested reader to verify that we have covered all cases for $d=3$ , and to analyse them further.

At this point, it seems worthwhile to recall the structure of general Markov idempotents, where we follow [Reference Högnäs and Mukherjea16, Sec. 1.6]. Given any idempotent $M\in\mathcal{M}_d$ of rank r, where we must have $1\leqslant r \leqslant d$ , there is a partition of $[d] = \{ 1, 2, \ldots, d \}$ of the form

(2.10)

\begin{equation} [d] = Z \,{\dot\cup}\, K^{}_{1} \,{\dot\cup}\, \ldots \, {\dot\cup} \, K^{}_{r}\end{equation}

with $Z = \{ \ell \,:\, \text{the}\, {\ell}\text{th column of}\,M\,\text{is } 0 \}$ . Without any further specification of the $K_i$ , we know from the definition of Z that the matrix elements of M satisfy $m^{}_{ij} = 0$ for all $i\in [d]$ and every $j\in Z$ . Further, given some subset $K \subseteq [d]$ , we follow standard notation and use $M\big|_{K\times K}$ for the restriction of M to indices from K. Now, we can reformulate [Reference Högnäs and Mukherjea16, Thm. 1.16] as follows.

Theorem 2.18. Let $M = (m^{}_{ij})^{}_{1\leqslant i,j \leqslant d}\in \mathcal{M}_d$ be an idempotent of rank r. Then, there is a partition of $[d]$ as in (2.10) with the following properties:

1. For all $s\in [r]$ , one has $m^{}_{ij} = 0$ for all $i\in K_s$ and $j \in [d]\setminus K_s$ .
2. For all $s\in[r]$ , the restriction $M{\big |}_{K_s \times K_s}$ is a Markov matrix with equal, positive rows.
3. For all $i \in Z$ and $s\in[r]$ , and then every $k,\ell \in K_s$ , one has $ m^{}_{ik} m^{}_{k \ell} = m^{}_{i \ell} m^{}_{kk}$ , where $m^{}_{kk} m^{}_{k\ell} \ne 0$ by (2).

Conversely, every Markov matrix $M\in\mathcal{M}_d$ with a partition of $[d]$ as in (2.10) with these three properties is an idempotent.

Specialising this classification result to $\{ 0, 1 \}$ Markov matrices gives the following consequence, the explicit derivation of which we leave to the interested reader. It can also be derived directly by a careful analysis of $\{ 0, 1\}$ Markov matrices.

Corollary 2.19. Possibly after an appropriate state permutation, any idempotent $\{ 0,1 \}$ Markov matrix appears in a block form where each block is an equal-input matrix with a single column of $1$ s.

Simple examples, with an indication of the blocks according to Theorem 18(2), include

\begin{equation*} \begin{pmatrix} \boxed{1} & \quad 0 & \quad 0 \\[4pt] 1 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 0 & \quad \boxed{1} \end{pmatrix} , \quad \begin{pmatrix} 0 & \quad 1 & \quad 0 \\[4pt] 0 & \quad \boxed{1} & \quad 0 \\[4pt] 0 & \quad 0 & \quad \boxed{1} \end{pmatrix} \quad\text{or}\quad \begin{pmatrix} \boxed{1} & \quad 0 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 0 & \quad 1 & \quad 0 \\[4pt] 0 & \quad 0 & \quad \boxed{1} & \quad 0 \\[4pt] 1 & \quad 0 & \quad 0 & \quad 0 \end{pmatrix} \sim \begin{pmatrix} 1 & \quad 0 & \quad 0 & \quad 0 \\[4pt] 1 & \quad 0 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 0 & \quad 1 & \quad 0 \\[4pt] 0 & \quad 0 & \quad 1 & \quad 0 \end{pmatrix},\end{equation*}

where the similarity in the third case is under the obvious state permutation. On the other hand, the matrices

\begin{equation*} \begin{pmatrix} 1 & \quad 0 & \quad 0 \\[4pt] 1 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 1 & \quad 0 \end{pmatrix} \quad\text{and}\quad \begin{pmatrix}1 & \quad 0 & \quad 0 & \quad 0 \\[4pt] 1 & \quad 0 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 1 & \quad 0 & \quad 0 \\[4pt] 0 & \quad 0 & \quad 1 & \quad 0 \end{pmatrix}\end{equation*}

fail to be idempotent, for instance. Let us now analyse the second class of matrices mentioned in the introduction before we return to this type of structure.

3. Monotone Markov matrices and embeddability

Let us begin this section with a formalisation of some of our previous recollections. To this end, we follow [Reference Keilson and Kester18] and employ the lower-triangular matrix $T\in \textrm{Mat} (d,\mathbb{R})$ given by

(3.1)

\begin{equation} T = \begin{pmatrix} 1 & \quad & \quad \boldsymbol{0} \\[4pt] \vdots & \quad \ddots \\[4pt] 1 & \quad \cdots & \quad 1 \end{pmatrix} ,\end{equation}

together with its inverse, $T^{-1} $ , which has entries 1 on the diagonal, $-1$ on the first subdiagonal, and 0 everywhere else. With T and vectors $x,y \in \mathbb{P}_d$ , one has the equivalence

(3.2)

\begin{equation} x \, \preccurlyeq \, y \quad \Longleftrightarrow \quad x T \, \leqslant \, y T ,\end{equation}

where the inequality on the right means that it is satisfied element-wise.

3.1. Monotone Markov matrices

If $E_{(i, j)} \in \textrm{Mat} (d,\mathbb{R})$ denotes the elementary matrix with a single 1 in position (i, j) and 0 everywhere else, which results in

(3.3)

\begin{equation} E^{}_{(k, \ell)} E^{}_{(m, n)} = \delta^{}_{\ell, m} E^{}_{(k, n)} ,\end{equation}

one obtains the relations

\begin{equation*} E_{(i, j)} T = E_{(i, 1)} + \ldots + E_{(i, j)} \quad \text{and} \quad T^{-1} E_{(i, j)} = E_{(i, j)} - E_{(i+1,j)} ,\end{equation*}

where $E_{(d+1,j)} \,:\!=\,\mathbb{0}$ . Furthermore, we call a column vector $v = \big(v^{}_{1}, \ldots , v^{}_{d}\big)^T$ nondecreasing if $v_i \leqslant v_{i+1}$ holds for all $i \in [d - \! 1]$ . Now, we can characterise monotone matrices as follows.

Fact 3.1. For a Markov matrix $M\in \mathcal{M}_d$ , the following statements are equivalent:

1. The matrix M is monotone.
2. The mapping $x\mapsto xM$ preserves the partial order $\preccurlyeq$ on the positive cone.
3. One has $T^{-1} M T \geqslant \mathbb{0}$ , understood element-wise, with T as in Equation (3.1).
4. Whenever v is a nondecreasing vector, Mv is also nondecreasing.

Further, the same equivalences hold for any nonnegative $B \in \textrm{Mat} (d,\mathbb{R})$ with equal row sums.

Proof. By our definition, (1) is equivalent to preserving $\preccurlyeq$ on $\mathbb{P}_d$ , which clearly extends to all level sets $\alpha \mathbb{P}_d$ with $\alpha>0$ , so (1) $\Longleftrightarrow$ (2) is clear. The equivalences (2) $\Longleftrightarrow$ (3) $\Longleftrightarrow$ (4) are now an immediate consequence of [Reference Keilson and Kester18, Thm. 1.1].

The case $B=\mathbb{0}$ is trivial. When $B\ne \mathbb{0}$ , with elements $b_{ij}$ , is a nonnegative matrix with equal row sums, meaning that $\sum_{j=1}^{d} b_{ij} = b > 0$ for all $i \in [d ]$ , the matrix $M=\frac{1}{b} B$ is Markov, and the final claim follows from the compatibility of the partial order on the positive cone with scaling by b and the fact that the conditions in (3) and (4) are linear in M.

Example 3.2. Let $M= \big(m^{}_{ij}\big)^{}_{1\leqslant i,j \leqslant d}$ be Markov. When $d=2$ , the nonnegativity of $T^{-1} M T$ is equivalent to the single condition $\textrm{tr} (M) \geqslant 1$ , or alternatively to

\begin{equation*} m^{}_{22} \, \geqslant \, m^{}_{12} .\end{equation*}

Likewise, for $d=3$ , the original monotonicity condition for M, or equivalently the nonnegativity condition for $T^{-1} M T$ , boils down to

\begin{equation*} m^{}_{33} \, \geqslant \, m^{}_{23} \, \geqslant \, m^{}_{13} \quad \text{and} \quad m^{}_{11} \, \geqslant \, m^{}_{21} \, \geqslant \, m^{}_{31} ,\end{equation*}

where it was used that all rows of M sum to 1.

Let $E^{}_{\ell_1 , \ldots, \ell_d}$ denote the $\{ 0,1 \}$ Markov matrix with the row vector $e^{}_{\ell_i}$ as row i, for $i \in [d ]$ , so

\begin{equation*}E^{}_{\ell_1 , \ldots, \ell_d} = E^{}_{(1, \ell_1)} +\ldots + E^{}_{(d, \ell_d)}.\end{equation*}

There exist $d^{ d}$ such matrices, which are the extremal elements of $\mathcal{M}_d$ . Since $e_i \preccurlyeq e_j$ if and only if $i\leqslant j$ , it is clear that $E^{}_{\ell_1 , \ldots, \ell_d}$ is monotone precisely when $\ell_1 \leqslant \ell_2 \leqslant \dots \leqslant \ell_d$ . Using (3.3), or alternatively tracing the images of the basis vectors $e_i$ , one verifies the multiplication rule

(3.4)

\begin{equation} E^{}_{k_1 , \ldots, k_d} E^{}_{\ell_1 , \ldots, \ell_d} = E^{}_{\ell_{k_1} , \ldots, \ell_{k_d}} .\end{equation}

Owing to the existence of singular idempotents among these matrices (when $d\geqslant 2$ ), one thus obtains the following simple, but helpful, structure result.

Fact 3.3. For $d\geqslant 2$ , the set of $\{ 0, 1\}$ Markov matrices, under matrix multiplication, is a monoid, but not a group. The same property holds for the subset of monotone $\{ 0, 1\}$ matrices.

A $\{ 0,1 \}$ Markov matrix in $\mathcal{M}_d$ is nonsingular if and only if it is a permutation matrix. The subset of the $d !$ permutation matrices is isomorphic with the symmetric group $S_d$ .

Now, we turn to the convexity structure of $\mathcal{M}_{d,\preccurlyeq}$ . While this is certainly known, we are not aware of a source with a proof, whence we include one for convenience.

Lemma 3.4. The set $\mathcal{M}_{d,\preccurlyeq}$ is convex. It has $\binom{2d-1}{d}$ extremal points, which are the monotone Markov matrices with entries in $\{ 0,1 \}$ , that is, the $E^{}_{\ell_1 , \ldots, \ell_d}$ with $1\leqslant \ell_1 \leqslant \ell_2 \leqslant \dots \leqslant \ell_d \leqslant d $ .

Proof. Convexity is clear, for instance via Fact 3.1 (3). Consequently, any convex combination of monotone $\{ 0,1 \}$ Markov matrices must lie in $\mathcal{M}_{d,\preccurlyeq}$ . Thus, we first have to show that such convex combinations exhaust $\mathcal{M}_{d,\preccurlyeq}$ . This follows from a greedy algorithm that is based on the following reduction argument.

Consider a nonnegative matrix $B\ne \mathbb{0}$ with equal row sums, b say, where $b>0$ . Assume that B is monotone. Such a matrix, because of the monotonicity condition, appears in a (non-reduced) row-echelon form. This is captured in a set of integer pairs $\big( \big(i^{}_{1}, j^{}_{1}\big), \ldots , \big(i^{}_{r}, j^{}_{r}\big) \big)$ , where $j^{}_{1}$ is the position of the first (or leftmost) nonzero column of B, with $i^{}_{1}$ the lowest position of a positive element in it, $j^{}_{2}$ then is the leftmost position of a column that is nonzero below row $i^{}_{1}$ , with $i^{}_{2}$ the lowest position of a positive element in column $j^{}_{2}$ , and so on. Clearly, $r\geqslant 1$ and $i^{}_{r} = d$ since $b>0$ and B is monotone.

For instance, B may have the row-echelon form

here with $r=3$ and integer pairs $( (3,2), (4,3), (6,5))$ . A symbol $\bullet$ marks the lowest positive element in a column, and $*$ any element in the same column (above $\bullet$ ) that cannot be smaller (as a consequence of monotonicity). In each row, there is thus either one $*$ or one $\bullet$ by this rule. The total number of symbols of type $\bullet$ or $*$ is d, so 6 in this particular case. To the left of them, all elements are 0, while the remaining elements of B are left unspecified, as they play no role at this stage.

Now, let $\alpha>0$ be the minimal element in the $\bullet$ positions, which are the $\big(i^{}_{k}, j^{}_{k}\big)$ , and let E denote the matrix that has a 1 in every $\bullet$ and in every $*$ position and a 0 anywhere else, which obviously is a monotone $\{ 0,1 \}$ Markov matrix, namely the $E^{}_{\ell_1 , \ldots, \ell_d}$ where $\ell_i$ is the unique position of $*$ or $\bullet$ in row i, for $i \in [d ]$ . Now, set $B^{\prime} = B - \alpha E$ , which is still monotone and has constant row sums $b^{\prime} = b -\alpha \geqslant 0$ , but one $\bullet$ is now replaced by a 0, which means that this $\bullet$ is gone or has moved up or right (or both) in the matrix. Unless $B^{\prime}=\mathbb{0}$ , we repeat the procedure with the new row-echelon form, which terminates after finitely many steps. The result is a decomposition of B as a sum of monotone $\{0,1 \}$ Markov matrices with positive weight factors. If we start with M, which has equal row sums $b=1$ , it is clear that we end up with a convex combination.

No monotone $\{0,1\}$ Markov matrix can be written as a convex combination of the other ones, so their extremality is clear. Now, given d, the monotone $\{ 0,1 \}$ Markov matrices are in obvious bijection with the possibilities to distribute d indistinguishable balls (the 1s) to d distinguishable boxes (the columns of the matrix), where an outcome $\big(n^{}_{1}, \ldots , n^{}_{d}\big)$ , with $n^{}_{1} + \ldots + n^{}_{d} = d$ , parameterises the matrix M with the row vector $e^{}_{1}$ in the first $n^{}_{1}$ rows, then $e^{}_{2}$ in the next $n^{}_{2}$ rows, and so on. The total number of possible outcomes is well known to be $\binom{ 2d -1}{d -1}= \binom{ 2 d -1}{d}$ , as this is the number of choices to place $d- 1$ separating walls between the d balls on the altogether $2d- 1$ positions; see [Reference Sloane26, A001700] for details.

Table 1. The 10 extremal elements $E^{}_{\ell_1 , \ell_2 , \ell_3}$ of $\mathcal{M}^{}_{3,\preccurlyeq}$ with some of their properties. Here, $\sigma(M)$ is the spectrum of M with multiplicities, while p and q are, up to an overall sign, the characteristic and the minimal polynomial of M. Note that $p\ne q$ precisely when M is an idempotent. The last column gives $M^2$ in terms of its index parameters.

The case $d=3$ is summarised in Table 1. By Lemma 3.4, every monotone Markov matrix $M\in\mathcal{M}_d$ can be expressed as a convex combination of the form

(3.5)

\begin{equation} M \, = \sum_{1\leqslant \ell_{1} \leqslant \cdots \leqslant \ell_{ d} \leqslant d } \! \alpha^{}_{\ell_{1} , \ldots, \ell_{d}} \, E^{}_{\ell_{1} , \ldots, \ell_{d}}\end{equation}

with all coefficients $\alpha^{}_{\ell_{1} , \ldots, \ell_{d}}\geqslant 0$ , their sum being 1, and $E^{}_{\ell_{1} , \ldots, \ell_{d}}$ as above. Observing that $ \textrm{tr} \bigl( E^{}_{\ell_{1} , \ldots, \ell_{d}} \bigr)\geqslant 1$ whenever $\ell_1 \leqslant \ell_2 \leqslant \dots \leqslant \ell_d$ , one finds from (3.5) that $\textrm{tr} (M) \geqslant 1$ holds for all monotone Markov matrices, which easily generalises as follows.

Corollary 3.5. Let $B \in \textrm{Mat} (d,\mathbb{R})$ be a nonnegative matrix with equal row sums, $b\geqslant 0$ . If B is also monotone, one has $\textrm{tr} (B) \geqslant b$ .

3.2. Monotonicity and embeddability

As in [Reference Baake and Sumner1], we use $\mathcal{E}_d$ to denote the semigroup generated by the embeddable Markov matrices of dimension d. For $d=2$ , every element of $\mathcal{E}_2$ is itself embeddable (so $\mathcal{E}^{}_{2} = \mathcal{M}^{\textrm{E}}_{2}$ , which is no longer true for $d\geqslant 3$ ), and the set of monotone Markov matrices agrees with the closure of $\mathcal{E}_2$ . In fact, $\mathcal{M}_{2,\preccurlyeq}$ is the closed triangle in $\mathcal{M}_2$ with the vertices being $\mathbb{1}^{}_{2}$ , $E^{}_1=\left( \small\begin{array}{c@{\quad}c} 1 & 0 \\ 1 & 0 \end{array}\right)$ , and $E^{}_2=\left( \small\begin{array}{c@{\quad}c} 0 & 1 \\ 0 & 1 \end{array}\right)$ . Only the line $\{ \alpha E^{}_1 + (1-\alpha) E^{}_2 \,:\, 0 \leqslant \alpha \leqslant 1\}$ does not belong to $\mathcal{E}_2$ , because it consists of singular idempotents. The only other idempotent in $\mathcal{M}^{}_2$ is $\mathbb{1}$ , the trivial case. This leads to the following result.

Proposition 3.6. An element $M\in \mathcal{M}^{}_2$ is monotone if and only if $\textrm{tr} (M) \geqslant 1$ . Thus, being monotone is equivalent to either being embeddable or being a nontrivial idempotent.

Proof. Observe that $\textrm{tr} (M) \leqslant 2$ holds for all $M\in \mathcal{M}^{}_2$ . Since $M=\left( \small\begin{array}{c@{\quad}c} 1-a & a \\ b & 1-b \end{array} \right)$ with numbers $a,b\in [ 0,1]$ is monotone if and only if $1\geqslant a+b$ (compare Example 3.2), the first claim is immediate. By Lemma 2.10, $\textrm{tr} (M) = 1$ means M is monotone, but not embeddable.

For the second claim, recall that $M\in \mathcal{M}^{}_2$ is an idempotent if and only if $M^2=M$ , which implies $\sigma (M) \subseteq \{ 0,1\}$ . Since $\lambda = 1$ is always an eigenvalue, being an idempotent means either that the second eigenvalue is also 1, hence $M=\mathbb{1}$ by Lemma 2.3, or that $\det\!(M)=0$ , which gives the line from $E^{}_1$ to $E^{}_2$ discussed above.

Corollary 3.7. Any $M\in\mathcal{M}^{}_{2,\preccurlyeq}$ is infinitely divisible within $\mathcal{M}^{}_{2,\preccurlyeq}$ . In fact, $M\in\mathcal{M}^{}_{2}$ is infinitely divisible if and only if it is monotone.

Proof. Let $M \in \mathcal{M}^{}_2$ be monotone. By Proposition 3.6, the case $\det\!(M) = 0$ means $M^2=M$ , so also $M^n=M$ for all $n\in\mathbb{N}$ by induction, and M is a monotone nth root of itself. Clearly, the latter statement also applies to $M=\mathbb{1}$ .

When $M=\left(\small\begin{array}{c@{\quad}c} 1-a & a \\ b & 1-b\end{array}\right) \ne \mathbb{1}$ is embeddable, we have $a+b>0$ and $M=\textrm{e}^Q$ with the unique generator Q from Lemma 2.10. Then, for any $n\in\mathbb{N}$ , a Markov nth root of M is given by

\begin{equation*} \exp \Big(\tfrac{1}{n} Q \Big) = \begin{pmatrix} 1 -\epsilon a & \quad \epsilon a \\[4pt] \epsilon b & \quad 1 - \epsilon b \end{pmatrix} \quad\text{with } \, \epsilon = \frac{1 - \! \sqrt[n\,]{1-a-b}}{a+b} ,\end{equation*}

as follows from the same standard calculation with the matrix exponential that was used to derive (2.6). Now, $\exp \Big(\frac{1}{n} Q \Big)$ is monotone if and only if $1 \geqslant \epsilon (a+b)$ , by an application of the criterion from Example 3.2. But this estimate follows from $0<a+b < 1$ because $\epsilon \in [ 0, 1]$ .

It remains to show that infinite divisibility of $M\in\mathcal{M}^{}_{2}$ implies its monotonicity, which can be derived from the spectrum as follows. If 1 is the only eigenvalue of M, we have $M=\mathbb{1}$ by Lemma 2.3, which is embeddable. Otherwise, one has $\sigma (M) = \{1,\lambda\}$ where $\lambda \ne 1$ must be real, with $\lvert \lambda \rvert \leqslant 1$ . Since M has a Markov square root by assumption, we get $\lambda \in [ 0,1)$ . Now, $\lambda = 0$ means that M is an idempotent, while $\lambda>0$ implies $\det\! (M) > 0$ , so M is embeddable by Lemma 2.10. Monotonicity of M now follows from Proposition 3.6.

Our next goal is a better understanding of the connection between $\mathcal{M}_{d,\preccurlyeq}$ and $\mathcal{C}_d$ , aiming at generalisations of Corollary 3.7 to general d. To this end, we once more consider a nonnegative matrix C with equal rows and parameter sum c, with $C=\mathbb{0}$ only when $c=0$ . For $c>0$ , the matrix $\frac{1}{c} C$ is both Markov and monotone. Consequently, the Markov matrix $M^{}_{C}$ from Equation (2.1), for any $c\in [ 0,1]$ , is a convex combination of $\mathbb{1}$ and $\frac{1}{c} C$ , hence monotone as well.

When $c=0$ , which means $M^{}_{C}=\mathbb{1}$ , or when $c=1$ , where $M^{}_{C} = C$ , the matrix $M^{}_{C}$ is a monotone idempotent. When $c\in (0,1)$ , Equation (2.6) implies

(3.6)

\begin{equation} M^{}_{C} = \textrm{e}^Q \quad\text{with }\, Q = - \frac{\log (1-c)}{c} Q^{}_{C} ,\end{equation}

where $Q^{}_{C} = M^{}_{C} - \mathbb{1}$ as before, which is an equal-input generator. Now, for arbitrary $n\in\mathbb{N}$ , a standard calculation with the exponential series gives the formula

(3.7)

\begin{equation} \exp \bigl( \tfrac{1}{n} Q \bigr) = \mathbb{1} + \frac{1 - \! \sqrt[n\,]{1-c }}{c} Q^{}_{C} = \sqrt[n\,]{1-c } \, \mathbb{1} + \Big( 1 - \! \sqrt[n\,]{1-c } \, \Big) \tfrac{1}{c} C .\end{equation}

Since $\sqrt[n\,]{1-c } \in (0,1)$ under our assumptions, this is a convex combination of two monotone Markov matrices, hence monotone and Markov itself. We have thus proved the following generalisation of our previous statements, assuming $d\geqslant 2$ as usual. In particular, the idempotent elements play a similar role as in the two-dimensional case.

Theorem 3.8. Let C, $Q^{}_{C}$ , and $M^{}_{C}$ be as above, and let $c=c^{}_{1} + \ldots + c^{}_{ d}$ be the corresponding parameter sum. If $c\in [ 0,1]$ , $M^{}_{C}$ is Markov and monotone, with the following properties:

1. $M^{}_{C}$ is an idempotent if and only if $c \in \{ 0,1 \}$ , where $c=0$ means $M^{}_{C} = \mathbb{1}$ .
2. $M^{}_{C}$ is embeddable if and only if $c\in [ 0,1)$ , with $Q=\mathbb{0}$ for $c=0$ or the generator Q from (3.6) otherwise.

In particular, $M^{}_{C}=\mathbb{1}$ is the only embeddable idempotent. Further, for all $c\in [0, 1]$ and for every $n\in\mathbb{N}$ , $M^{}_{C}$ has a Markov nth root that is both equal-input and monotone, which is to say that $M^{}_{C}$ is infinitely divisible within $\mathcal{M}_{d,\preccurlyeq} \cap \mathcal{C}_d$ .

Note that, also in generalisation of the case $d=2$ , the set of monotone Markov matrices of type $M^{}_{C}$ with summatory parameter $c\in [ 0,1]$ is the closure of the set of equal-input Markov matrices that are embeddable with an equal-input generator, with all non-embeddable boundary cases being nontrivial idempotents. In fact, one has more as follows.

Corollary 3.9. A Markov matrix $M\in \mathcal{C}_d$ is monotone if and only if its summatory parameter satisfies $c \in [ 0,1]$ . So, one obtains the convex set

\begin{equation*} \mathcal{C}_{d,\preccurlyeq} \,:\!=\, \mathcal{C}_d \cap \mathcal{M}_{d,\preccurlyeq} = \big\{ M\in \mathcal{C}_d \,:\, c \in [ 0,1] \big\} ,\end{equation*}

with the $d+ 1$ extremal elements $E^{}_{1}, \ldots , E^{}_{d}$ and $\mathbb{1}$ .

Further, $\mathcal{C}_{d,\preccurlyeq}$ is the disjoint union of the set of equal-input embeddable elements from $\mathcal{C}_d$ with the set $\mathcal{C}_{d,1}$ of nontrivial idempotents in $\mathcal{C}_d$ . The eigenvalues of any $M\in\mathcal{C}_{d,\preccurlyeq}$ are real and nonnegative, and they are positive precisely for the embeddable cases.

Proof. It is clear from Theorem 3.8 that all $M\in \mathcal{C}_d$ with $c\in [ 0,1]$ are monotone, so we need to show that no further element of $\mathcal{C}_d$ is. To this end, consider a Markov matrix of the form $M = (1-c) \mathbb{1} + C$ with $c>1$ , which implies that all $c^{}_{i} > 0$ . Then it is easy to check that $T^{-1} M T$ fails to be a nonnegative matrix, where T is the matrix from (3.1), and M fails to be monotone by Fact 3.1 (3).

When $c\in [ 0,1]$ , we have $0 \leqslant c^{}_{i} \leqslant c^{}_{1} + \ldots + c^{}_{d} = c \leqslant 1$ for all $1 \leqslant i \leqslant d$ , and

\begin{equation*} M^{}_{C} = (1-c) \mathbb{1} + C = (1-c) \mathbb{1} + \sum_{i=1}^{d} c^{}_{i} \, E^{}_{i}\end{equation*}

is a convex combination, where the extremality of $E^{}_{1}, \ldots , E^{}_{d}$ and $\mathbb{1}$ is clear.

Another application of Theorem 3.8 gives the decomposition claimed, while the statement on the spectrum is clear because the eigenvalues of $M^{}_{C}$ are 1 and $1-c$ .

Geometrically, the situation is that the simplex $\mathcal{C}_{d,\preccurlyeq}$ separates the compact set $\mathcal{C}_d$ into the subset with $c\in [ 0,1)$ , which are the ‘good’ cases for embeddability, and the subset with $c\in \Big( 1, \frac{d}{d-1}\Big]$ , where embeddability requires d even and further conditions, but is never possible with an equal-input generator. For $d=2$ , we refer to [Reference Baake and Sumner1, Fig. 1] for an illustration.

One can view $\mathcal{C}_{d,\preccurlyeq}$ differently when starting in $\mathcal{M}_{d,\preccurlyeq}$ . Let $S_d$ be the symmetric (or permutation) group of d elements, and $P_{\pi}$ for $\pi\in S_{d}$ the standard permutation matrix that represents the linear mapping $e^{}_{i} \mapsto e^{}_{\pi (i)}$ under multiplication to the right. $P_{\pi}$ has elements $\delta^{}_{i, \pi (j)}$ and satisfies $P^{-1}_{\pi} = P^{}_{\pi^{-1}}$ . There are $d !$ such matrices, which are the extremal elements among the doubly stochastic matrices mentioned earlier. The conjugation action by such a matrix gives

\begin{equation*} P^{}_{\pi} E^{}_{( k,\ell )} P^{-1}_{\pi} = E^{}_{\left( \pi (k), \pi (\ell) \right)} \quad \text{and} \quad P^{}_{\pi} E^{}_{\ell_1 , \ldots , \ell_d} P^{-1}_{\pi} = E^{}_{\pi \big(\ell_{\pi^{-1}(1)}\big), \ldots , \pi \big(\ell_{\pi^{-1} (d)}\big)} ,\end{equation*}

as follows from a simple calculation with $E^{}_{\ell_1 , \ldots , \ell_d} = E^{}_{(1, \ell_1)} +\ldots + E^{}_{(d, \ell_d)}$ , or, alternatively, from tracing the images of the basis vectors $e_i$ for $1 \leqslant i \leqslant d$ .

Now, a set $\mathcal{F}\subseteq\mathcal{M}_d$ of Markov matrices is called permutation-invariant if $P^{}_{\pi} \mathcal{F} P^{-1}_{\pi} = \mathcal{F}$ holds for all $\pi \in S_d$ . Clearly, $\mathcal{M}_d$ itself is such a set, as is $\mathcal{C}_d$ or its subset of constant-input matrices. The latter are also individually permutation-invariant (which is also called exchangeable in probability theory [Reference Feller8]); that is, $P^{}_{\pi} M P^{-1}_{\pi} = M$ holds for every constant-input matrix M and all $\pi \in S_d$ . In fact, the Markov matrices that are individually permutation-invariant are precisely the constant-input ones, without restriction on the summatory parameter c.

The set of all $\{ 0, 1 \}$ Markov matrices is permutation-invariant as well, and partitions into $S^{}_d $ -orbits of the form $\mathcal{O}^{}_{ S_d} (M) = \{ P^{}_{\pi} M P^{-1}_{\pi} \,:\, \pi \in S^{}_d \}$ . Two such orbits are

\begin{equation*} \mathcal{O}^{}_{ S_d} (\mathbb{1}) = \{ \mathbb{1} \} \quad \text{and} \quad \mathcal{O}^{}_{ S_d} \big(E^{}_{1}\big) = \big\{ E^{}_{1}, \ldots , E^{}_{d} \big\} ,\end{equation*}

which both consist of monotone matrices only. One can check that no other orbit of the decomposition has this property, which implies the following characterisation.

Fact 3.10. The convex set $\mathcal{C}_{d,\preccurlyeq}$ is the maximal subset of $\mathcal{M}_{d,\preccurlyeq}$ that is permutation-invariant. The elements of $\mathcal{C}_{d,\preccurlyeq}$ that are individually permutation-invariant are the constant-input matrices with $c\in [ 0,1]$ .

Let us turn to Markov semigroups. Recall from [Reference Keilson and Kester18] that a (homogeneous) Markov semigroup $\big\{ \textrm{e}^{t Q} \,:\, t \geqslant 0\big\}$ , with generator Q, is called monotone when $\textrm{e}^{t Q}$ is monotone for every $t\geqslant 0$ . Moreover, a generator Q is called monotone if all off-diagonal elements of $T^{-1} Q T$ are nonnegative, where T is the matrix from Equation (3.1). This concept is motivated by the following connection, which is a minor variant of [Reference Keilson and Kester18, Thm. 2.1]. Because of its importance, we include a short proof that is tailored to our later needs in this context.

Proposition 3.11. If Q is a Markov generator, the following properties are equivalent:

1. The semigroup $\big\{ \textrm{e}^{t Q} \,:\, t \geqslant 0\big\}$ is monotone.
2. The generator Q is monotone.

Proof. For $(1) \Rightarrow (2)$ , observe that $T^{-1} \textrm{e}^{t Q} T \geqslant \mathbb{0}$ implies $\Big( T^{-1} \frac{1}{t} (\textrm{e}^{tQ} - \mathbb{1}) T \Big)_{ij} \geqslant 0$ for $t>0$ and all $i\ne j$ . Then, taking $t \searrow 0$ establishes this direction.

For $(2) \Rightarrow (1)$ , is it clear that $T^{-1} Q T + \alpha \mathbb{1} \geqslant \mathbb{0}$ holds for any sufficiently large $\alpha > 0$ . Choose $\alpha$ also large enough so that $M^{}_{\alpha} \,:\!=\,\mathbb{1} + \alpha^{-1} Q$ is Markov, which is clearly possible. Then $T^{-1} M^{}_{\alpha} T \geqslant \mathbb{0}$ , and we get $T^{-1} M^{m}_{\alpha} T = \big(T^{-1} M^{}_{\alpha} T \big)^m \geqslant \mathbb{0}$ for all integers $m\geqslant 0$ . Now, observe

\begin{equation*} \textrm{e}^{t Q} = \, \textrm{e}^{-\alpha t} \textrm{e}^{\alpha t M_{\alpha}} = \sum_{m=0}^{\infty} \textrm{e}^{-\alpha t} \frac{(\alpha t)^m}{m !} \, M^{m}_{\alpha} ,\end{equation*}

which, for all $t\geqslant 0$ , constitutes a convergent sum that is a convex combination of monotone Markov matrices. Consequently, $\textrm{e}^{t Q}$ is monotone as well.

Example 3.12. Let $Q = \big(q^{}_{ij}\big)^{}_{1\leqslant i,j \leqslant d}$ be a Markov generator. When $d=2$ , it is always monotone, that is, no extra condition emerges; compare Proposition 3.6. When $d=3$ , being monotone is equivalent to the two conditions

\begin{equation*} q^{}_{23} \, \geqslant \, q^{}_{13} \quad \text{and} \quad q^{}_{21} \, \geqslant \, q^{}_{31} ,\end{equation*}

which provide a surprisingly simple criterion for monotonicity in this case.

The above considerations have the following consequence.

Corollary 3.13. For $M \in \mathcal{M}_{d,\preccurlyeq}$ , the following properties are equivalent:

1. M is embeddable via a monotone Markov generator.
2. M is nonsingular and infinitely divisible within $\mathcal{M}_{d,\preccurlyeq}$ .

Proof. (1) $\Rightarrow$ (2): Let $M=\textrm{e}^Q$ with Q a monotone generator, so $\det\!(M) = \textrm{e}^{\textrm{tr} (Q)} >0$ . Now, $\frac{1}{n} Q$ is still a monotone generator, for any $n\in\mathbb{N}$ , and $\exp \bigl( \frac{1}{n} Q\bigr)$ is a monotone Markov matrix (by Proposition 3.11) that is also an nth root of M.

(2) $\Rightarrow$ (1): By Kingman’s characterisation [Reference Kingman20, Prop. 7], M is embeddable, so $M=\textrm{e}^Q$ with some generator Q. We need to show that Q can be chosen to be monotone. Let $R_n$ be an nth root of M that is Markov and monotone, which exists and implies that $A_n \,:\!=\,n ( R_n - \mathbb{1} )$ is a monotone generator. By a standard compactness argument, there is a subsequence $\big(n^{}_{i}\big)^{}_{i\in\mathbb{N}}$ of increasing integers such that $Q^{\prime} = \lim_{i\to\infty} A_{n_i}$ is a monotone generator as well. From here on, we can employ Kingman’s original proof to conclude that

\begin{equation*} M = \bigg( \mathbb{1} + \frac{Q^{\prime}}{n_{i}} \bigg)^{n^{}_{i}} + \, o (1) \qquad \text{as } i\to\infty ,\end{equation*}

which gives $M = \textrm{e}^{Q^{\prime}}$ as claimed.

3.3. Idempotents and infinite divisibility

At this point, it seems worthwhile to take a closer look at infinite divisibility in general. In this context, we refer to [Reference Feller8, Sec. X.9] for the underlying (pseudo-)Poissonian structures.

Proposition 3.14. Let $P^{}_{0} , P \in \mathcal{M}^{}_{d}$ be chosen such that $P^{2}_{0} = P^{}_{0}$ , so $P^{}_{0}$ is an idempotent, and that $P^{}_{0} P = P P^{}_{0} = P$ . Then, the matrix family $\{ M(t) \,:\, t \geqslant 0 \}$ with

\begin{equation*} M(t) \,:\!=\, \textrm{e}^{-t} \Bigg( P^{}_{0} + \sum_{m=1}^{\infty} \frac{{t^m}}{{m!}} \, P^m \Bigg) = \textrm{e}^{-t} \Big( P^{}_{0} - \mathbb{1} + \textrm{e}^{t P} \Big)\end{equation*}

satisfies the following properties, where $A \,:\!=\,P - \mathbb{1}$ is a Markov generator:

1. The mapping $t\mapsto M(t)$ is continuous, with $M (0) = P^{}_{0}$ .
2. $M(t) M(s) = M(t+s)$ holds for all $t,s \geqslant 0$ .
3. M(t) is Markov, for all $t \geqslant 0$ .
4. $M(t) = \textrm{e}^{-t} \big( P^{}_{0} - \mathbb{1} \big) + \textrm{e}^{t A} = P^{}_{0} \textrm{e}^{tA}$ for all $t\geqslant 0$ .
5. $P=P^{}_{0}$ if and only if $M(t) = P^{}_{0}$ holds for all $t\geqslant 0$ .
6. For $t\geqslant 0$ , M (t) is idempotent if and only if $M(t) = P^{}_{0}$ .

In particular, $\big\{ M(t) \,:\, t \geqslant 0 \big\}$ always constitutes a continuous monoid, with $P^{}_{0}$ as its neutral element, while it is a homogeneous Markov semigroup if and only if $P^{}_{0} = \mathbb{1}.$

Proof. (1) is obvious, while (2) follows from a standard calculation with the convergent series. Both $P^{}_{0}$ and P are Markov, and so is $P^m$ for all $m\in\mathbb{N}$ . Now, for any $t\geqslant 0$ , M(t) is a convergent, convex combination of Markov matrices, hence Markov as well, which shows (3).

Next, (4) and the easy direction of (5) follow from elementary calculations, using the expansion $P^{}_{0} \textrm{e}^{tP} = P^{}_{0} + \sum_{m\geqslant 1} \frac{t^m}{m !} P^m$ together with $\textrm{e}^{t P} = \textrm{e}^t \textrm{e}^{t A}$ . When $M(t) = P^{}_{0}$ for all $t\geqslant 0$ , one obtains $P^{}_{0} + \textrm{e}^{tP} = \mathbb{1} + \textrm{e}^t P^{}_{0}$ . But, observing $\textrm{e}^{t P^{}_0} = \mathbb{1} - P^{}_{0} + \textrm{e}^t P_0$ , this implies $\textrm{e}^{t P} = \textrm{e}^{t P^{}_{0}}$ for all $t\geqslant 0$ and thus $P=P^{}_{0}$ .

For (6), one direction is trivial. The other follows from (4) upon the observation that M (t) idempotent implies $P^{}_{0} \textrm{e}^{tA} = M(t) = M(t)^2 = M(2 t) = P^{}_{0} \textrm{e}^{2 t A}$ and hence gives the relation $P^{}_{0} = P^{}_{0} \textrm{e}^{tA} = M(t)$ as claimed.

Finally, while the (abstract) semigroup and monoid properties are clear, the family can only be a homogeneous Markov semigroup when $M(t) = \textrm{e}^{t Q}$ for some generator Q and all $t\geqslant 0$ , so $M(0)=\mathbb{1}$ , which is the only nonsingular idempotent in $\mathcal{M}_d$ by Lemma 2.3. We thus have $P^{}_{0} = \mathbb{1}$ in this case, and (4) implies $M(t) = \textrm{e}^{tA}$ as claimed.

Example 3.15. Let us analyse the meaning of Proposition 3.14 for $d\geqslant 2$ . If $P^{}_{0}=\mathbb{1}$ , there is no further restriction on P, and $M(t) = \textrm{e}^{t A}$ with $A=P -\mathbb{1}$ , which can thus be any generator with diagonal elements in $[-1,0]$ . This way, possibly after rescaling $t$ , all embeddable matrices are covered. For $P\in\mathcal{C}_d$ with $c\in [ 0,1)$ , we see that $A=P-\mathbb{1}$ is a generator of equal-input type, in line with Lemma 2.10 and Proposition 2.12.

If $P^{}_{0} \in \mathcal{C}^{}_{d,1}$ , we get $P^{}_{0} = \sum_{i=1}^{d} \beta_i E_i$ with $\beta_i \geqslant 0$ and $\beta^{}_{1} + \ldots + \beta^{}_{d} = 1$ ; hence $P^{}_{0} = P P^{}_{0}$ for any $P\in\mathcal{M}_d$ . Assuming $P P^{}_{0} = P^{}_{0} P = P$ , we find $P = P^{}_{0}$ , which then gives $M(t) \equiv P^{}_{0}$ by Proposition 3.14 (5).

For $d=2$ , where $\mathcal{M}^{}_{2} = \mathcal{C}^{}_{2}$ , this exhausts all cases because no further idempotents exist. Consequently, $M\in\mathcal{M}_2$ is infinitely divisible if and only if it is embeddable or an idempotent, with $M = \mathbb{1}$ being the only case that is both; compare Proposition 3.6 and Corollary 3.7.

When $d\geqslant 3$ , one obtains mixtures via direct sums, where $\mathbb{1} \oplus P^{}_{0}$ and $(\mathbb{1} + A) \oplus P^{}_{0}$ lead to $M(t) = \textrm{e}^{t A} \oplus P^{}_{0}$ . There are further examples for $d=3$ (compare Example 2.17), such as

\begin{equation*} P^{}_{0} = \alpha E^{}_{1,1,3} + (1 -\alpha) E^{}_{1,3,3} = \begin{pmatrix} 1 & \quad 0 & \quad 0 \\[4pt] \alpha & \quad 0 & \quad 1 - \alpha \\[4pt] 0 & \quad 0 & \quad 1 \end{pmatrix} , \quad \text{with } \det\!\big(P^{}_{0}\big) = 0 ,\end{equation*}

which is idempotent for any $\alpha \in [ 0,1]$ . Then, $P\in\mathcal{M}^{}_{3}$ with $P P^{}_{0} = P^{}_{0} P = P$ leads to

\begin{equation*} P = \begin{pmatrix} a & \quad 0 & \quad 1\! - a \\[4pt] c & \quad 0 & \quad 1 - c \\[4pt] 1 - b & \quad 0 & \quad b \end{pmatrix}\end{equation*}

with $a,b \in [ 0,1]$ and $ c = \alpha a + (1 -\alpha) (1 -b)$ . Here, $M(t) = P^{}_{0} \textrm{e}^{tA}$ is singular for all $t\geqslant 0$ and thus never embeddable. Moreover, for $P\ne P^{}_{0}$ , the matrix M(t) can only be idempotent for $t=0$ and possibly for isolated further values of $t>0$ . This is so because $M(t) = M(t)^2 = M (2 t)$ implies $\textrm{e}^{t P} = \textrm{e}^{ t P^{}_{0}}$ via an elementary calculation. When this holds for t from a set with an accumulation point, $t^{}_0$ say, standard arguments imply $P=P^{}_{0}$ . So, more interesting as well as more complicated cases emerge in $\mathcal{M}_d {\setminus}\mathcal{C}_d$ as d grows.

Lemma 3.16. With $P^{}_{0}$ , P, and M(t) as in Proposition 3.14, one has either $\det\!( M(t)) > 0$ for all $t\geqslant 0$ , which happens if and only if $P^{}_{0} = \mathbb{1}$ , or $\det\!( M(t)) = 0$ for all $t\geqslant 0$ , which is true whenever $P^{}_{0} \ne \mathbb{1}$ , or equivalently whenever $P^{}_{0}$ is singular.

Proof. Recall via Lemma 2.3 that $P^{2}_{0} = P^{}_{0}$ either means $\det\!\big(P^{}_{0}\big) = 1$ , which forces $P^{}_{0} = \mathbb{1}$ , or $\det\!\big(P^{}_{0}\big) = 0$ . Now, observe that $P^{}_{0} M(t) = M(t)$ holds for all $t\geqslant 0$ , so

\begin{equation*} \det\!\big(P^{}_{0}\big) \det\!( M(t)) = \det\!( M(t)),\end{equation*}

and $\det\!( M(t)) \equiv 0$ for singular $P^{}_{0}$ is immediate.

The only remaining case is $P^{}_{0} = \mathbb{1}$ . Here, Proposition 3.14 (3) implies $M(t) = \textrm{e}^{tA}$ with $A=P-\mathbb{1}$ and hence $\det\!( M(t)) = \textrm{e}^{\textrm{tr} ( t A)} > 0$ .

A semigroup as in Proposition 3.14 is called Poissonian if $P^{}_{0} = \mathbb{1}$ , and pseudo-Poissonian otherwise [Reference Feller8, Sec. X.1]. We can now recall the central classification result on infinitely divisible, finite-dimensional Markov matrices from [Reference Göndöcs, Michaletzky, Móri and Székely11] as follows. It seems a bit hidden in the literature, but nicely underpins the role of idempotents in the embedding problem.

Theorem 3.17. A Markov matrix $M\in\mathcal{M}_d$ is infinitely divisible if and only if there are Markov matrices $P^{}_{0}, P \in \mathcal{M}^{}_{d}$ , with $P^{2}_{0} = P^{}_{0}$ and $P^{}_{0} P = P P^{}_{0} = P$ , and some $s\geqslant 0$ such that

\begin{equation*} M = \textrm{e}^{-s} \Bigg( P^{}_{0} + \sum_{m=1}^{\infty} \frac{s^m}{{m!}} P^m \Bigg) .\end{equation*}

Moreover, M is embeddable if and only if one also has $P^{}_{0} = \mathbb{1}$ .

Proof. For the proof of the first claim, we refer to [Reference Göndöcs, Michaletzky, Móri and Székely11].

For the second claim, we know that M embeddable implies $\det\!(M) > 0$ , and we are in the case with $P^{}_{0} = \mathbb{1}$ by Lemma 3.16. Conversely, when $P^{}_{0} = \mathbb{1}$ , Proposition 3.14 (4) gives $M = \textrm{e}^{sA}$ with the generator $A=P -\mathbb{1}$ .

Note that the parameter s cannot be avoided in this formulation, because a generator Q can have diagonal entries of arbitrarily large negative value, whence $\mathbb{1} + Q$ need not be Markov, while $\mathbb{1} + s Q$ , for all suitably small $s>0$ , will be; compare Example 3.15 and Proposition 3.14.

Further consequences can be derived from $\big[P^{}_{0}, P \big]=\mathbb{0}$ when P is cyclic, which means that the minimal and characteristic polynomial of P agree. In particular, this is the case when P is simple; see [Reference Baake and Sumner1, Fact 2.10] for a systematic characterisation of cyclic matrices. Whenever $P\in\mathcal{M}_d$ is cyclic, its centraliser is the abelian ring

\begin{equation*} \textrm{cent} (P) = \{ B \in \textrm{Mat} (d, \mathbb{R}) \,:\, [ P , B ] = \mathbb{0} \} = \mathbb{R} [P ] ,\end{equation*}

where each element of this ring is of the form $\sum_{n=0}^{d-1} \alpha^{}_{n} P^n$ with all $\alpha^{}_{n} \in \mathbb{R}$ , as a consequence of the Cayley–Hamilton theorem. In particular, $P^{}_{0}$ is then an idempotent from this ring. We leave further details to the interested reader.

4. Monotone Markov matrices in three dimensions

Let us now look at $d=3$ in more detail, stating the following simple and certainly well-known property, which we also prove owing to lack of reference.

Proposition 4.1. The eigenvalues of any $M\in\mathcal{M}^{}_{3,\preccurlyeq}$ are real. Moreover, at most one eigenvalue of M can be negative, which happens if and only if $\det\!(M) < 0$ .

Further, if $d=3$ and Q is a monotone Markov generator, its eigenvalues are nonpositive real numbers.

Proof. First, $M\in\mathcal{M}^{}_{3,\preccurlyeq}\subset \mathcal{M}^{}_{3}$ implies $1\in\sigma (M)$ . As $d=3$ , the characteristic polynomial of M then is $(1-x) \bigl( x^2 - (\textrm{tr} (M) - 1) x + \det\!(M) \bigr)$ , and the remaining two eigenvalues are

(4.1)

\begin{equation} \lambda^{}_{\pm} = \frac{1}{2} \Big( \textrm{tr} (M) - 1 \pm \sqrt{\Delta} \, \Big) ,\end{equation}

with the discriminant $\Delta = ( \textrm{tr} (M) - 1 )^2 - 4 \det(M)$ . By an explicit calculation, where one first eliminates $m^{}_{22}$ and later also $m^{}_{11}$ via the row sum condition, one verifies that

\begin{equation*} \Delta = \big( m^{}_{11} - m^{}_{21} + m^{}_{23} - m^{}_{33} \big)^2 + 4 \big( m^{}_{23} - m^{}_{13} \big) \big( m^{}_{21} - m^{}_{31} \big) \, \geqslant \, 0 ,\end{equation*}

where the inequality follows from the monotonicity of M via Example 3.2.

This implies $\sigma (M) \subset \mathbb{R}$ , and the formula for $\lambda^{}_{\pm}$ from Equation (4.1) shows that at most $\lambda^{}_{-}$ can be negative, because $\textrm{tr} (M) \geqslant 1$ by Corollary 3.5. Also, when $\det\!(M) = 0$ , the spectrum is $\sigma (M) = \{ 1, \textrm{tr} (M) - 1, 0 \}$ , which is nonnegative. This establishes the claims on M.

If $Q = \big(q^{}_{ij}\big)^{}_{1\leqslant i,j \leqslant 3}$ is a Markov generator, its spectrum contains 0, while the other two eigenvalues are given by

(4.2)

\begin{equation} \mu^{}_{\pm} = \frac{1}{2} \Big( \textrm{tr} (Q) \pm \sqrt{D}\, \Big) ,\end{equation}

where, in analogy to above, one finds

\begin{equation*} D = \big( q^{}_{11} - q^{}_{21} + q^{}_{23} - q^{}_{33} \big)^2 + 4 \big( q^{}_{23} - q^{}_{13} \big) \big( q^{}_{21} - q^{}_{31} \big) \, \geqslant \, 0 .\end{equation*}

Here, the inequality follows via the monotonicity criterion from Example 3.12.

Consequently, all eigenvalues are real. They are then nonpositive because all eigenvalues of Markov generators have real part $\leqslant 0$ (compare [Reference Baake and Sumner1, Prop. 2.3(1)]), as can also be checked explicitly from (4.2), where $\textrm{tr} (Q) \leqslant 0 \leqslant \sqrt{D} \leqslant \lvert \textrm{tr} (Q)\rvert$ .

Considering the convex combinations

\begin{equation*} M(\alpha) \,:\!=\, \alpha E^{}_{1,1,2} + (1 - \alpha) E^{}_{2,3,3} = \begin{pmatrix} \alpha & \quad 1 - \alpha & \quad 0 \\[4pt] \alpha & \quad 0 & \quad 1 - \alpha \\[4pt] 0 & \quad \alpha & \quad 1 - \alpha \end{pmatrix} \quad \text{with } \alpha \in (0,1) ,\end{equation*}

in the notation of Table 1, one finds $\textrm{tr} ( M(\alpha)) = 1$ and $\det\!(M(\alpha)) = - \alpha (1 -\alpha) < 0$ . This shows that cases with a simple negative eigenvalue exist. On the other hand, Proposition 4.1 also means that the spectrum of a matrix $M\in\mathcal{M}^{}_{3,\preccurlyeq}$ is positive if and only if $\det\!(M) > 0$ .

Corollary 4.2. Consider any $M \in \mathcal{M}^{}_{3,\preccurlyeq}$ with $\det\!(M)<0$ . Then, M is neither embeddable nor can it have a monotone nth Markov root for n even. In fact, M has no Markov or, indeed, real square root at all, and M is not infinitely divisible.

Proof. Since $\det\!(M) < 0$ , embeddability is ruled out immediately, and so is the existence of a real square root because M has only a simple negative eigenvalue; compare [Reference Higham14, Thm. 6.6] or [Reference Higham and Lin15]. Consequently, M cannot be infinitely divisible.

Further, $B \in \mathcal{M}^{}_{3,\preccurlyeq}$ with $B^{2m} = M$ for any $m\in\mathbb{N}$ implies nonnegative spectrum for M since $\sigma (B) \subset \mathbb{R}$ , which contradicts $\det\!(M) < 0$ .

Let us now analyse when a matrix $M\in\mathcal{M}^{}_{3,\preccurlyeq}$ is embeddable. For this, M must be nonsingular and thus, by Corollary 4.2, have positive spectrum. So, all eigenvalues of M must satisfy $0 < \lambda \leqslant 1$ . Further, $A = M - \mathbb{1}$ is a generator that inherits monotonicity from M as all off-diagonal elements of $T^{-1}\! A T$ are nonnegative. Since the spectral radius of A is $\varrho^{}_A < 1$ ,

(4.3)

\begin{equation} Q \,:\!=\, \log (\mathbb{1} + A ) \, = \sum_{m=1}^{\infty} \frac{({-}1)^{m-1}}{m} A^m\end{equation}

converges and defines a real matrix with zero row sums and $M =\textrm{e}^Q$ . As we are not interested in other types of solutions, we now introduce the non-unital, real algebra

(4.4)

\begin{equation} \mathcal{A}^{_{(3)}}_{\, 0} \,:\!=\, \{ B \in \textrm{Mat} (3, \mathbb{R}) \,:\, \text{all row sums of \textit{B} are 0} \} ,\end{equation}

which certainly contains the Q from (4.3). Since positive spectrum of M means $\sigma (M) \subset ( 0, 1 ]$ , all eigenvalues of Q are nonpositive real numbers by the SMT. It remains to analyse the generator property and potential uniqueness of Q.

Let q be the minimal polynomial of A. Its degree satisfies $\deg\!(q) \in \{ 1,2,3 \}$ and equals the degree of the minimal polynomial of M. Further, let

\begin{equation*} \textrm{alg} (A) \,:\!=\, \langle A^m \,:\, m \in \mathbb{N} \rangle^{}_{\mathbb{R}}\end{equation*}

be the real algebra spanned by the positive powers of A, which does not contain $\mathbb{1}$ , as one can check easily. In fact, one has $\textrm{alg} (A) = \big\langle A, A^2 \big\rangle^{}_{\mathbb{R}}$ by the Cayley–Hamilton theorem, so $\textrm{alg} (A)$ is a subalgebra of $\mathcal{A}^{_{(3)}}_{\, 0} \!$ of dimension $\leqslant 2$ . Indeed, we clearly get

(4.5)

\begin{equation} \dim\!\big( \textrm{alg} (A) \big) = \deg\!(q) - 1 \, \in \, \{ 0,1,2 \} \end{equation}

together with $ Q \in \textrm{alg} (A)$ by (4.3), which leads to different situations as follows.

Proposition 4.3. Let the matrix $M \in \mathcal{M}^{}_{3,\preccurlyeq}$ have a minimal polynomial of degree $\leqslant 2$ . Then, the following properties are equivalent:

1. The spectrum of M is positive.
2. One has $\det\!(M) > 0$ .
3. M is embeddable.
4. M is embeddable with a monotone generator.

If M is embeddable, there is precisely one monotone generator Q with $M = \textrm{e}^Q$ , namely the one given in Equation (4.3).

Proof. The implications $(4) \Rightarrow (3) \Rightarrow (2)$ are clear, while $(2) \Longleftrightarrow (1)$ follows from Proposition 4.1. It remains to show $(1) \Rightarrow (4)$ , so assume $\sigma (M) \subset \mathbb{R}_{+} \,:\!=\,\{x\in\mathbb{R}\,:\, x>0\}$ . As above, let q be the minimal polynomial of $A=M - \mathbb{1}$ , which has the same degree as that of M.

When $\deg\!(q) = 1$ , since 0 is always an eigenvalue of A, the only possibility is $q (x) = x$ ; hence $A=\mathbb{0}$ and thus also $Q=\mathbb{0}$ from (4.3), which gives the trivial case $\mathbb{1} = \exp\!(\mathbb{0})$ , where $\mathbb{1}$ is the only matrix in $\mathcal{M}^{}_{3,\preccurlyeq}$ with a minimal polynomial of degree 1. By Remark 2.11, $Q=\mathbb{0}$ is the only generator with $\textrm{e}^Q=\mathbb{1}$ , which is trivially monotone.

When $\deg\!(q) = 2$ , we have $A\ne \mathbb{0}$ , and hence $\textrm{tr} (A) < 0$ , so by (4.5) we get $A^2 = - \alpha A$ for some $\alpha \in \mathbb{R}$ , where $\textrm{tr} \big(A^2\big) > 0$ implies $\alpha > 0$ . Here, $A=M -\mathbb{1}$ is diagonalisable, with eigenvalues 0 and $-\alpha > -1$ , in line with $\sigma (M) \subset \mathbb{R}_{+}$ . Then (4.3) simplifies to

\begin{equation*}Q = - \frac{\log (1-\alpha)}{\alpha} A,\end{equation*}

which is a positive multiple of A and hence a monotone generator, so M is embeddable as $M=\textrm{e}^Q$ .

To establish the uniqueness claim, consider any $Q^{\prime} \in \mathcal{A}^{_{(3)}}_{\, 0}$ such that $M = \textrm{e}^{Q^{\prime}}\!$ , where M is diagonalisable by assumption, hence also Q ^′ by [Reference Baake and Sumner1, Fact 2.15]. We then still have $[Q^{\prime}, A]=\mathbb{0}$ , but not necessarily $Q^{\prime}\in\textrm{alg} (A)$ . Now, by [Reference Baake and Sumner1, Lemma 6.1], there are two possibilities, namely $\dim \bigl(\textrm{alg} (Q^{\prime} ) \bigr) \in \{ 1,2 \}$ . Here, if the dimension is 2, Q ^′ must be simple, which is only possible if Q ^′ has a complex conjugate pair of (non-real) eigenvalues. But then, Q ^′ cannot be monotone, by Proposition 4.1. It remains to consider $ \dim \bigl(\textrm{alg} (Q^{\prime} ) \bigr) =1$ , where we get $\textrm{alg} (Q^{\prime} ) = \textrm{alg} (A)$ from [Reference Baake and Sumner1, Lemma 6.1(1)], and hence $Q^{\prime} = a A$ for some $a>0$ . By taking the determinant on both ends of $\textrm{e}^{Q^{\prime}} \! = M = \textrm{e}^Q$ , which gives a positive number, one finds

\begin{equation*} a \, \textrm{tr} (A) = - \frac{\log (1-\alpha)}{\alpha} \, \textrm{tr} (A). \end{equation*}

As $\textrm{tr} (A) \ne 0$ , this implies $a= - \frac{\log (1-\alpha)}{\alpha}$ and thus $Q^{\prime} = Q$ .

Let us pause to state the asymptotic behaviour of $M^n$ for the embeddable matrices covered by Proposition 4.3. In the above notation, one trivially has $M^n=\mathbb{1}$ for all n when $\deg\!(q) = 1$ , while a simple calculation gives

\begin{equation*} M_{\infty} \, \,:\!=\,\lim_{n\to\infty} M^n = \mathbb{1} + \lim_{n\to\infty} \! \frac{1 - (1-\alpha)^n}{\alpha} A = \mathbb{1} + \frac{1}{\alpha} A\end{equation*}

for the more interesting case that $\deg\!(q) = 2$ .

When $\deg\!(q) = 3$ , the situation becomes a little more complex. Here, $A=M - \mathbb{1}$ is always cyclic, with 0 being a simple eigenvalue. Then, we obtain $A^3 = r A + s A^2$ together with $r = \textrm{tr} (M) - \det\!(M)-2$ and $s=\textrm{tr} (A)$ . Since $\sigma (A) = \{ 0, \mu^{}_{+}, \mu^{}_{-} \}$ , where $\mu^{}_{\pm}$ are negative numbers by Proposition 4.1, we get $r = - \mu^{}_{+} \mu^{}_{-} < 0$ and $s = \mu^{}_{+} + \mu^{}_{-} < 0$ . This remains correct when $\mu^{}_{+} = \mu^{}_{-}$ , where A has a nontrivial Jordan normal form (as it is cyclic). Note that, although A is a real matrix, it is more convenient, and also completely consistent, to always employ the complex Jordan normal form of A in our arguments.

Let us first consider the case that A is simple (and hence also diagonalisable). Here, we have $-1 < \mu^{}_{-} < \mu^{}_{+} < 0$ together with $\sigma (M) = \big\{ 1, 1+\mu^{}_{+}, 1+\mu^{}_{-} \big\}$ . As A is cyclic, any matrix $Q \in \mathcal{A}^{_{(3)}}_{\, 0}$ with $M=\textrm{e}^Q$ must lie in $\textrm{alg} (A) = \mathbb{R} [A] \cap \mathcal{A}^{_{(3)}}_{\, 0}$ , so $Q=\alpha A + \beta A^2$ for some $\alpha,\beta \in \mathbb{R}$ , again by Frobenius’s theorem. Then, the SMT implies

\begin{equation*} \alpha \mu^{}_{\pm} + \beta \mu^{2}_{\pm} = \log \big( 1+ \mu^{}_{\pm}\big) \, \in \, \mathbb{R} ,\end{equation*}

which is an inhomogeneous system of linear equations for $\alpha$ and $\beta$ . As

\begin{equation*} \det \begin{pmatrix} \mu^{}_{+} & \quad \mu^{2}_{+} \\[4pt] \mu^{}_{-} & \quad \mu^{2}_{-} \end{pmatrix} = \mu^{}_{+} \mu^{}_{-}\big(\mu^{}_{-} - \mu^{}_{+} \big) \, < \, 0 ,\end{equation*}

we get a unique solution, which is given by

(4.6)

\begin{equation} \alpha \, = \, \frac{\mu^{2}_{-} \log \big(1+\mu^{}_{+}\big) - \mu^{2}_{+} \log \big(1 +\mu^{}_{-}\big)}{ \mu^{}_{+} \mu^{}_{-} \big(\mu^{}_{-} - \mu^{}_{+} \big) } \, , \quad \beta \, = \, \frac{\mu^{}_{+} \log \big(1+\mu^{}_{-}\big) - \mu^{}_{-} \log \big(1 +\mu^{}_{+}\big)}{ \mu^{}_{+} \mu^{}_{-} \big(\mu^{}_{-} - \mu^{}_{+} \big) } .\end{equation}

This shows that $M = \textrm{e}^Q$ has precisely one solution with $Q\in \mathcal{A}^{_{(3)}}_{\, 0}$ , which must be the one from (4.3). One can check via the Taylor series that $\alpha > 0$ and $\beta < 0$ , though this is not sufficient to guarantee the generator or the monotonicity property of Q. So far, we have derived the following constructive result.

Theorem 4.4. Let $M\in\mathcal{M}_{3,\preccurlyeq}$ have simple spectrum with $\det\!(M) > 0$ , and set $A = M - \mathbb{1}$ , with $\sigma (A) = \{ 0, \mu^{}_{+}, \mu^{}_{-} \}$ as above. Then, there is precisely one $Q \in \mathcal{A}^{_{(3)}}_{\, 0}$ such that $M = \textrm{e}^Q$ , namely the matrix Q from (4.3), which also satisfies $ Q = \alpha A + \beta A^2$ with $\alpha, \beta$ from (4.6).

Further, Q is a generator, and $M =\textrm{e}^Q$ , precisely when $\big(\alpha A + \beta A^2\big)^{}_{ij} \geqslant 0$ holds for all $i\ne j$ , and Q is also monotone when the criterion from Example 3.12 is satisfied by $\alpha A + \beta A^2$ .

It remains to consider the case that A is cyclic, but not simple. Here, its eigenvalues are 0 and $-1 < \mu < 0$ , the latter twice; also, any solution of $M=\textrm{e}^Q$ with $Q\in\mathcal{A}^{_{(3)}}_{\, 0}$ must be of the form $Q=\alpha A + \beta A^2$ , which implies the condition $\alpha \mu + \beta \mu^2 = \log (1+\mu)$ by the SMT. Using the standard Jordan normal form of A, which must comprise the Jordan block $\big( \small\begin{array}{c@{\quad}c} \mu & 1 \\ 0 & \mu \end{array}\big)$ because of our assumption, one obtains another condition from $\textrm{e}^Q = \mathbb{1} + A$ , this time from the superdiagonal element of the Jordan block, namely $(1+\mu) (\alpha + 2 \beta \mu) = 1$ . This results in the unique solution

(4.7)

\begin{equation} \alpha = 2 \, \frac{\log (1+\mu)}{\mu} - \frac{1}{1+\mu} \quad \text{and} \quad \beta = \frac{1}{\mu (1+\mu)} - \frac{\log (1+\mu)}{\mu^2} .\end{equation}

Note that (4.7) also follows from (4.6) by an approximation argument of de L’Hospital type, via setting $\mu^{}_{-} = \mu = \mu^{}_{+} + x$ and letting $x\to 0$ . So, the Q from (4.3) is once more the only solution for $M=\textrm{e}^Q$ with $Q\in\mathcal{A}^{_{(3)}}_{\, 0}$ . Here, $\alpha > 0$ for $\mu$ sufficiently large (above approximately $-0.7$ ) and $\beta < 0$ , which is more complicated than in the previous case. Nevertheless, we have the following result.

Corollary 4.5. Let $M\in\mathcal{M}_{3,\preccurlyeq}$ be cyclic, but not simple. Then, the matrix $A = M - \mathbb{1}$ has spectrum $\sigma (A) = \{ 0, \mu \}$ with $-1 < \mu < 0$ , where $\mu$ has algebraic multiplicity $2$ , but geometric multiplicity $1$ . Further, all statements of Theorem 4.4 remain true, this time with the coefficients $\alpha, \beta$ from (4.7).

Not all $M\in\mathcal{M}_{3,\preccurlyeq}$ with positive determinant (and hence spectrum) can be embeddable, as there are cases with a single 0 in one position; see [Reference Baake and Sumner1, Reference Davies5, Reference Guerry12] and references therein for further examples. For $d\geqslant 4$ , the possibility of complex conjugate pairs of eigenvalues increases the complexity of the treatment, which is nevertheless possible with the recent results from [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2].

5. Uniqueness of embedding and further directions

The explicit treatment of $\mathcal{M}_{3,\preccurlyeq}$ in the previous section shows that some useful sufficient criteria for unique embeddability should be in store, such as the one stated in [Reference Higham14, Sec. 2.3] for Markov matrices with distinct positive eigenvalues. Let us first recall a classic result on the existence of a real logarithm of a given matrix, which can be found in several places, for instance in [Reference Gantmacher10, Sec. 8.8.2] or as [Reference Culver3, Thm. 1].

Fact 5.1 For $B\in \textrm{GL} (d,\mathbb{R} )$ , the equation $B = \textrm{e}^R$ has a solution $R\in \textrm{Mat} (d,\mathbb{R})$ if and only if every elementary Jordan block of B with an eigenvalue on the negative real axis occurs with even multiplicity. When B is diagonalisable, this simplifies to the condition that each eigenvalue of B on the negative real axis has even algebraic multiplicity.

Any matrix $R\in \textrm{Mat} (d,\mathbb{R})$ that solves $B=\textrm{e}^R$ is called a real logarithm of B. When considering a nonsingular Markov matrix M, we are only interested in real logarithms of M with zero row sums, that is, in elements from the subalgebra $\mathcal{A}^{(d)}_{\, 0}\!\subset\textrm{Mat} (d,\mathbb{R})$ . This is justified by the following observation. Suppose $\textrm{e}^R$ has unit row sums with a real matrix R that fails to have zero row sums. Then the set of matrices $\textrm{e}^{tR}$ with unit row sums and $t\in\mathbb{R}$ forms a discrete subgroup of $\big\{ \textrm{e}^{tR}\,:\,t\in\mathbb{R}\big\}\simeq \mathbb{R}$ . This is so because the existence of an accumulation point with unit row sums, $t^{}_0$ say, would result in $(1, \ldots, 1)^T$ being an eigenvector of R with eigenvalue 0, which is a contradiction.

In analogy to the previous case with $d=3$ , now for $A\in\mathcal{A}^{(d)}_{\, 0}\!$ with arbitrary $d\geqslant 2$ , we define the non-unital algebra

\begin{equation*} \textrm{alg} (A) \,:\!=\, \big\langle A^m \,:\, m \in \mathbb{N} \big\rangle^{}_{\mathbb{R}} = \big\langle A, A^2, \ldots , A^{d-1}\big\rangle^{}_{\mathbb{R}} \, \subset \, \mathcal{A}^{(d)}_{\, 0} ,\end{equation*}

where the second formulation follows from the Cayley–Hamilton theorem in conjunction with the fact that $\mathcal{A}^{(d)}_{\, 0}$ is non-unital.

Lemma 5.2. Let $M \in \mathcal{M}_d$ be cyclic and nonsingular, and assume that M possesses at least one real logarithm, according to Fact 5.1. Then, with $A = M - \mathbb{1}$ , any real logarithm R of M satisfies $R \in \textrm{alg} (A)$ .

Proof. Clearly, we have $[R,M ] = [R,A ]=\mathbb{0}$ , so M cyclic implies $R\in\mathbb{R} [A ]$ by Frobenius’s theorem, and thus $R = \alpha^{}_{0} \mathbb{1} + \sum_{n=1}^{d-1} \alpha^{}_{n} A^{n}$ for some $\alpha^{}_{0}, \ldots , \alpha^{}_{d-1} \in \mathbb{R}$ . Hence, $R=\alpha^{}_{0} \mathbb{1} + X$ with $X\in \mathcal{A}^{(d)}_{\, 0}$ , where $\textrm{e}^X$ then has unit row sums. Consequently, all row sums of $\textrm{e}^R = \textrm{e}^{\alpha^{}_0} \textrm{e}^X$ equal $\textrm{e}^{\alpha^{}_0}$ , which must be 1. So we get $\alpha^{}_{0} = 0$ and $R\in \textrm{alg} (A) \subset \mathcal{A}^{(d)}_{\, 0}$ as claimed.

Now, we can extend the uniqueness result mentioned earlier to cyclic matrices. It is a variant of [Reference Culver3, Thm. 2], but we give a different and constructive proof that later leads to an effective (and numerically stable) criterion for embeddability. It generalises what we saw in Theorem 4.4 and Corollary 4.5, and also differs from the approach used in [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2].

Theorem 5.3. Suppose $M\in\mathcal{M}_d$ is cyclic and has real spectrum. Then, M possesses a real logarithm R, so $M=\textrm{e}^R$ , if and only if the spectrum of M is positive. In this case, R is unique, and is always an element of $\textrm{alg} (A) \subset \mathcal{A}^{(d)}_{\, 0}$ , where $A = M - \mathbb{1}$ .

Proof. When M is cyclic, no elementary Jordan block can occur twice, and the first implication follows from Fact 5.1. When $\sigma (M) \subset \mathbb{R}_{+}$ , by Fact 5.1 and Lemma 5.2, all real logarithms of M must lie in $\textrm{alg} (A)$ , and there is at least one $R\in\textrm{alg} (A)$ with $\textrm{e}^R=M$ , so we have $R = \sum_{i=1}^{d-1} \alpha^{}_{i} A^i$ for some $\alpha^{}_{1}, \ldots, \alpha^{}_{d-1} \in \mathbb{R}$ . It remains to establish uniqueness.

First, assume that A is simple. As A is a generator, this means $\sigma (A) = \big\{ 0, \mu^{}_{1}, \ldots , \mu^{}_{d-1}\big\}$ , with distinct $\mu^{}_{i} \in ({-}1,0)$ because of our assumptions. Then, since all $\alpha^{}_{i}$ and $\mu^{}_{j}$ are real, the SMT implies the $d- 1$ identities

(5.1)

\begin{equation} \sum_{\ell=1}^{d-1} \alpha^{}_{\ell} \, \mu^{\ell}_{i} = \log \big(1 + \mu^{}_{i}\big) , \qquad 1 \leqslant i \leqslant d- 1 .\end{equation}

They constitute an inhomogeneous system of linear equations for the $\alpha^{}_{i}$ with the matrix

(5.2)

\begin{equation} B = \begin{pmatrix} \mu^{}_{1} & \quad \mu^{2}_{1} & \quad \cdots & \quad \mu^{d^{}-1}_{1} \\[4pt] \mu^{}_{2} & \quad \mu^{2}_{2} & \quad \cdots & \quad \mu^{d^{}-1}_{2} \\[4pt] \vdots & \quad \vdots & \quad & \quad \vdots \\[4pt] \mu^{}_{d-1} & \quad \mu^{2}_{d-1} & \quad \cdots & \quad \mu^{d^{}-1}_{d-1} \end{pmatrix} .\end{equation}

Since $\det\!(B) = \left( \prod_i \mu^{}_{i} \right) \prod_{k>\ell}(\mu^{}_{k} - \mu^{}_{\ell})$ by an obvious variant of the standard Vandermonde determinant formula, B is invertible when A is simple, and (5.1) has a unique real solution.

When A is cyclic, but not simple, the appearance of nontrivial Jordan blocks necessitates a more refined argument. Clearly, as A is a generator and also cyclic, 0 is a simple eigenvalue of A by [Reference Baake and Sumner1, Prop. 2.3(2)]. Let $\mu \in ({-}1,0)$ be any of the other eigenvalues, say with algebraic multiplicity m. When $m=1$ , we get one condition from the SMT, and nothing else is needed. So, assume $m\geqslant 2$ . As A is cyclic, the geometric multiplicity of $\mu$ is 1, and the corresponding Jordan block in standard form is $\mathbb{J}^{}_{\mu} = \mu \mathbb{1}_m + N_m$ , where $N_m$ is the nilpotent matrix with entries 1 on the first superdiagonal and 0 everywhere else. It satisfies $N^{m}_{m} = \mathbb{0}$ , while $N^{k}_{m}$ , for $1\leqslant k < m$ , has entries 1 on the kth superdiagonal and 0 elsewhere. In this case, we get only one condition from the SMT, namely

(5.3)

\begin{equation} \sum_{\ell =1}^{d-1} \alpha^{}_{\ell} \, \mu^{\ell} = \log\! (1 + \mu) ,\end{equation}

as in (5.1), while we need $m-1$ independent further ones. They will come from derivatives of (5.3), which needs a justification as follows.

First, from $\textrm{e}^R = \mathbb{1} + A$ , one concludes that we must have

(5.4)

\begin{equation} \exp \Bigg( \,\sum_{\ell=1}^{d-1} \alpha^{}_{\ell} \, \mathbb{J}^{ \ell}_{\mu}\Bigg) = \mathbb{1}^{}_{m} + \mathbb{J}^{}_{\mu} = (1+\mu) \mathbb{1}^{}_{m} + N^{}_{m} .\end{equation}

Now, defining the polynomial $\phi (x) = \sum_{\ell=1}^{d-1} \alpha^{}_{\ell} \, x^{\ell}$ and the function $\psi (x) = \textrm{e}^{\phi (x)}\!$ , one can employ the standard method from [Reference Higham14, Sec. 1.2.1] to calculate the exponential in (5.4) as

\begin{equation*} \psi \bigl( \mathbb{J}^{}_{\mu}\bigr) = \psi (\mu) \, \mathbb{1}^{}_{m} \, + \sum_{k=1}^{m-1} \frac{\psi^{(k)} (\mu)}{k !} N^{k}_{m} ,\end{equation*}

where $\psi^{(k)}$ denotes the kth derivative of $\psi$ . As $\psi (\mu) = 1+\mu $ by (5.3), a comparison with (5.4) leads to the conditions $\psi^{(k)} (\mu) = \delta^{}_{k,1} $ for $1 \leqslant k \leqslant m-1$ , noting that $1+\mu\ne 0$ . Iterating the product rule on $\psi = \textrm{e}^{ \phi}$ and inserting (5.3) then results in

(5.5)

\begin{equation} \frac{\textrm{d}^k}{\textrm{d} \mu^k} \, \sum_{\ell=k}^{d-1} \alpha^{}_{\ell} \, \mu^{\ell} = \phi^{(k)} (\mu) = \frac{({-}1)^{k-1} (k-1) !}{(1+\mu)^k} = \frac{\textrm{d}^k}{\textrm{d} \mu^k} \log (1+\mu) ,\end{equation}

which shows that the additional conditions on the $\alpha^{}_{\ell}$ emerge via derivatives of the fundamental relation (5.3). It remains to check when we obtain a unique solution $\alpha^{}_{1}, \ldots , \alpha^{}_{d-1}$ this way.

Here, the matrix B that generalises (5.2) is specified by an n-tuple

\begin{equation*}\big( \big(\mu^{}_{1}, m^{}_{1}\big), \ldots , \big(\mu^{}_{n}, m^{}_{n}\big)\big)\end{equation*}

of distinct nonzero eigenvalues $\mu^{}_i$ of A with their algebraic multiplicities $m^{}_i$ , where $n\leqslant d- 1$ and $m^{}_{1} + \ldots + m^{}_{n} = d- 1$ . A pair $\big(\mu^{}_{i}, m^{}_{i}\big)$ is responsible for $m^{}_{i}$ rows of B of length $d- 1$ , with the first derived from (5.3), followed by $m^{}_{i} - 1$ rows induced by (5.5). Here, each new row emerges from the previous one by differentiation with respect to $\mu^{}_{i}$ . The resulting matrix is a variant of the confluent Vandermonde matrix (see [Reference Higham13, Sec. 22.2] or [Reference Luther and Rost23]), which is known from Hermite interpolation. It is invertible if and only if the $\mu^{}_{i}$ are distinct.

To substantiate the latter claim, define the sequence $(\gamma^{}_{n})^{}_{n\in\mathbb{N}}$ by

\begin{equation*} \gamma^{}_{n} = \det \begin{pmatrix} 1 & \quad 1 & \quad 1 & \quad \cdots & \quad 1 \\[4pt] 1 & \quad 2 & \quad 4 & \quad \cdots & \quad 2^{n-1} \\[4pt] \vdots & \quad \vdots & \quad \vdots & \quad & \quad \vdots \\[4pt] 1 & \quad n & \quad n^2 & \quad \ldots & \quad n^{n-1} \end{pmatrix} ,\end{equation*}

which starts as 1, 1, 2, 12, 288, 34560; compare [Reference Sloane26, A000178]. Then one finds

\begin{equation*} \det\!(B) = \Bigg( \,\prod_{i=1}^{n} \mu^{m_i}_{i} \gamma^{}_{m_i} \Bigg) \prod_{k>\ell} \bigl( \mu^{}_{k} - \mu^{}_{\ell} \bigr)^{m^{}_k m^{}_{\ell}}\end{equation*}

(compare [Reference Higham13, Exc. 22.6]), which reduces to the determinant formula stated previously for the special case $m^{}_{1} = \ldots = m^{}_{n} = 1$ . Since the $\mu^{}_{i}$ are distinct, $\det\!(B)\ne 0$ is clear, and the claimed uniqueness follows.

The benefit of this approach is that one can calculate $B^{-1}$ and thus determine the coefficients purely from the eigenvalues of A. In particular, the unique R is a generator if and only if $\sum_{i=1}^{d-1}\alpha^{}_{i} A^{i}$ satisfies the corresponding conditions.

Remark 5.4. The Vandermonde matrix and its inverse are well known from Lagrange interpolation theory. Therefore, in minor modification of the results from [Reference Horn and Johnson17, Sec. 0.9.11], one finds the matrix elements of the inverse of B from (5.2) as

\begin{equation*} \big( B^{-1}\big)_{ij} = \frac{({-}1)^{i-1} S^{(d-2)}_{d-1-i} \big( \big\{ \mu^{}_{1} , \ldots , \mu^{}_{d-1} \big\} \setminus \big\{ \mu^{}_{j} \big\}\big)}{\mu^{}_{j} \prod_{k\ne j} \big(\mu^{}_{k} - \mu^{}_{j}\big) },\end{equation*}

where $S^{(n)}_{m} \bigl( \bigl\{ a^{}_{1}, \ldots , a^{}_{n} \bigr\} \bigr)$ is the elementary symmetric polynomial as defined by $S^{(n)}_{0} \equiv 1$ , then $S^{(n)}_{1} \bigl( \bigl\{ a^{}_{1}, \ldots , a^{}_{n} \bigr\} \bigr) =a^{}_{1} + \ldots + a^{}_{n}$ , followed by $S^{(n)}_{2} \bigl( \bigl\{ a^{}_{1}, \ldots , a^{}_{n} \bigr\} \bigr) =\sum_{i<j} a^{}_{i} a^{}_{j}$ , and so on, up to the final one, which is $S^{(n)}_{n} \bigl( \bigl\{ a^{}_{1}, \ldots , a^{}_{n} \bigr\} \bigr) =\prod_{i} a^{}_{i}$ .

With a little more effort, this formula can be extended to the cyclic situation as well; see [Reference Luther and Rost23] for a constructive approach to $B^{-1}$ in this case.

The above result leads to the following sufficient criterion for unique embeddability.

Corollary 5.5. Let $M\in\mathcal{M}_d$ be cyclic and have real spectrum, $\sigma (M) \subset \mathbb{R}$ . Then, M has a real logarithm if and only if $\sigma (M) \subset \mathbb{R}_{+}$ .

In this case, the spectral radius of $A=M - \mathbb{1}$ is $\varrho^{}_{A} < 1$ , and there is at most one Markov generator Q such that $M = \textrm{e}^Q$ . The only choice is $ Q = \log\! (\mathbb{1} + A) \in \textrm{alg} (A) \subset \mathcal{A}^{(d)}_{\, 0} \!$ , calculated with the standard branch of the matrix logarithm as a convergent series. In particular, M is embeddable if and only if the matrix $\log (\mathbb{1} + A)$ is a generator.

Proof. The first claim follows from Theorem 5.3. If $\sigma (M) \subset \mathbb{R}_{+}$ , all eigenvalues of A lie in the half-open interval $({-}1, 0 ]$ , so $\varrho^{}_{A}<1$ is clear. By Theorem 5.3, as M is cyclic with $\sigma (M) \subset (0,1]$ , there is precisely one real matrix R with $M=\textrm{e}^R$ . Since $\varrho^{}_{A} < 1$ , the series

\begin{equation*} \log (\mathbb{1} + A) \, = \sum_{m=1}^{\infty} \frac{({-}1)^{m-1}}{m} A^m\end{equation*}

converges. The limit is then an element of $\textrm{alg} (A)$ , because this algebra is a closed subset of $\textrm{Mat} (d,\mathbb{R})$ . So we get $R = \log (\mathbb{1} + A) \in \textrm{alg} (A)$ in this case, which has zero row sums, but need not be a generator.

One can go beyond this result, but care is required with the branches of the complex logarithm; see [Reference Higham14, Ch. 11] for background and [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2] for recent progress in this direction.

Remark 5.6. The results from Theorem 4.4 and Corollary 4.5 in particular apply to all cyclic matrices $M\in\mathcal{M}_3$ with positive spectrum, in that the embeddability of M can most easily be verified via testing whether $\alpha A + \beta A^2$ is a generator, where $A=M - \mathbb{1}$ and $\alpha,\beta\in\mathbb{R}$ are the numbers from Equation (4.6) if M is simple, or from Equation (4.7) otherwise.

Clearly, uniqueness results have interesting consequences on the structure of Markov roots, as can be seen in the following refinement of [Reference Baake and Sumner1, Ex. 3.9].

Example 5.7. The two-dimensional Markov matrix

\begin{equation*} M = \begin{pmatrix} \tfrac{3}{4} & \quad \tfrac{1}{4} \\[5pt] \tfrac{1}{2} & \quad \tfrac{1}{2} \end{pmatrix}\end{equation*}

is uniquely embeddable by Lemma 2.10, so $M = \textrm{e}^Q$ with a unique generator Q. Nevertheless, as follows from a simple calculation, it has precisely two Markov square roots, namely

\begin{equation*} M^{}_{1} = \begin{pmatrix} \tfrac{5}{6} & \quad \tfrac{1}{6} \\[5pt] \tfrac{1}{3} & \quad \tfrac{2}{3} \end{pmatrix} \quad \text{and} \quad M^{}_{2} = \begin{pmatrix} \tfrac{1}{2} & \quad \tfrac{1}{2} \\[6pt] 1 & \quad 0 \end{pmatrix} .\end{equation*}

Of these, $M^{}_{1} = \exp \bigl( \frac{1}{2} Q \bigr)$ is embeddable, while $M^{}_{2}$ is not. So, in the embeddable case, there is always at least one Markov nth root for every $n\in\mathbb{N}$ of the form $\exp \bigl( \frac{1}{n} Q \bigr)$ , but there can still be others. A uniqueness result for the embedding of M then means that, among all Markov roots, there is precisely one sequence of embeddable Markov nth roots of M.

The set $\mathcal{M}^{\textrm{E}}_{d}$ of embeddable matrices is a relatively closed subset of $\{ M\in\mathcal{M}_d \,:\, \det\!(M) >0\}$ by [Reference Kingman20, Prop. 3], but (for $d\geqslant 2$ ) it is not a closed subset of $\mathcal{M}_d$ . The closure of $\mathcal{M}^{\textrm{E}}_{d}$ is still a subset of the infinitely divisible elements of $\mathcal{M}_d$ , and it is a natural question which matrices lie on the boundary, which we denote by $\partial\mathcal{M}^{\textrm{E}}_{d}$ . Clearly, there can be embeddable cases, such as $\left(\small\begin{array}{c@{\quad}c} 1 & 0 \\ 1-\alpha & \alpha \end{array}\right)$ or $\left(\small\begin{array}{c@{\quad}c} \alpha & 1-\alpha \\ 0 & 1 \end{array}\right)$ for $d=2$ and $0 < \alpha \leqslant 1$ , as well as (singular) idempotent ones, such as $M_{\infty} = \lim_{t\to\infty}\textrm{e}^{t Q}$ for any generator $Q\ne \mathbb{0}$ . For $d>2$ , there are further possibilities. Any $M\in \partial\mathcal{M}^{\textrm{E}}_{d}$ satisfies $M=\lim_{n\to\infty}\textrm{e}^{Q_n}$ for some sequence $(Q_n)^{}_{n\in\mathbb{N}}$ of generators, which implies $\lim_{n\to\infty} \textrm{e}^{\textrm{tr} (Q_n)} = \det\!(M) \geqslant 0$ .

When $M\in \partial\mathcal{M}^{\textrm{E}}_{d}$ has $\det(M) > 0$ , it is embeddable by Kingman’s infinite divisibility criterion. Alternatively, a positive determinant implies that the sequence $\bigl(\textrm{tr}(Q_n)\bigr)_{n\in\mathbb{N}}$ converges and is thus bounded. Since all diagonal elements of a generator are nonpositive, they are bounded as well, hence also all elements of the $Q_n$ thanks to the vanishing row sums. By a standard compactness argument, there is thus a subsequence $(Q_{n_i})^{}_{i\in\mathbb{N}}$ such that $\lim_{i\to\infty}Q_{n_i} = Q$ is a generator with $M=\textrm{e}^Q$ as expected, and M lies in the Markov semigroup $\big\{ \textrm{e}^{t Q}\,:\,t\geqslant 0\big\}$ .

When $\det\!(M)=0$ , the sequence $(\textrm{tr}(Q_n))_{n\in\mathbb{N}}$ must be (negatively) unbounded, and we may assume that, at least at one off-diagonal position, the entries of the $Q_n$ are (positively) unbounded. When $d=2$ , this suffices to show that M is a singular idempotent. Already for $d=3$ , the situation becomes more complex, since one can have a limiting M with $\det\!(M) = 0$ that is not an idempotent, by considering

\begin{equation*} \exp \begin{pmatrix} -a - b & \quad a & \quad b \\[4pt] c & \quad -c - d & \quad d \\[4pt] e & \quad f & \quad -e - \! f \end{pmatrix}\end{equation*}

for $a,b,\ldots,f \geqslant 0$ . Then, fixing $b, \ldots, f$ at generic values and letting $a\to\infty$ produces such examples, and similarly for various other choices. When $d=4$ , one can have mixtures in block matrix form, such as

\begin{equation*} M = \begin{pmatrix} 1 & \quad 0 \\[4pt] 1 - \alpha & \quad \alpha \end{pmatrix} \oplus \begin{pmatrix} \beta & \quad 1 - \beta \\[4pt] \beta & \quad 1\! -\beta \end{pmatrix}\end{equation*}

for $\alpha\in (0,1)$ and $\beta \in [ 0,1]$ , which is singular but not idempotent.

It seems worthwhile to characterise the boundary more completely, for instance by relating semigroups with reducible generators to properties of the boundary, which we leave as an open problem at this point. It is clear, though, that Markov idempotents and their connection with blockwise equal-input matrices will be important here. More generally, a simplified systematic treatment of the embeddability problem for $d\leqslant 4$ would be helpful in view of the applications in phylogeny; see [Reference Casanellas, Fernández-Sánchez and Roca-Lacostena2] for recent progress in this direction. Finally, even some of the elementary questions become much harder in the situation of countable states, where many new phenomena occur; see [Reference Eisner and Radl6] and references therein for some recent results.

Acknowledgements

It is our pleasure to thank Frederic Alberti and Martin Möhle for valuable discussions and suggestions, as well as Tanja Eisner, Agnes Radl, and Peter Taylor for helpful comments on the manuscript. Further, we thank two anonymous referees for their thoughtful comments, which helped us to improve the presentation.

Funding Information

This work was supported by the German Research Foundation (DFG), within the CRC 1283 at Bielefeld University, and by the Australian Research Council (ARC), via Discovery Project DP 180102215.

Competing Interests

There were no competing interests to declare which arose during the preparation or publication process for this article.

References

Baake, M. and Sumner, J. (2020). Notes on Markov embedding. Linear Algebra Appl. 594, 262–299.10.1016/j.laa.2020.02.016CrossRef Google Scholar

Casanellas, M., Fernández-Sánchez, J. and Roca-Lacostena, J. (2020). The embedding problem for Markov matrices. Preprint. Available at https://arxiv.org/abs/2005.00818.Google Scholar

Culver, W. J. (1966). On the existence and uniqueness of the real logarithm of a matrix. Proc. Amer. Math. Soc. 17, 1146–1151.10.1090/S0002-9939-1966-0202740-6CrossRef Google Scholar

Daley, D. J. (1968). Stochastically monotone Markov chains. Z. Wahrscheinlichkeitsth. 10, 305–317.10.1007/BF00531852CrossRef Google Scholar

Davies, E. B. (2010). Embeddable Markov matrices. Electron. J. Prob. 15, 1474–1486.10.1214/EJP.v15-733CrossRef Google Scholar

Eisner, T. and Radl, A. (2021). Embeddability of real and positive operators. To appear in Linear Multilinear Algebra. Available at https://arxiv.org/abs/2003.08186.Google Scholar

Elfving, G. (1937). Zur Theorie der Markoffschen Ketten. Acta Soc. Sci. Fennicae A2, 1–17.Google Scholar

Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. II, 2nd edn. John Wiley, New York.Google Scholar

Fernández-Sánchez, J., Sumner, J. G., Jarvis, P. D. and Woodhams, M. D. (2015). Lie Markov models with purine/pyrimidine symmetry. J. Math. Biol. 70, 855–891.10.1007/s00285-014-0773-zCrossRef Google Scholar PubMed

Gantmacher, F. R. (1986). Matrizentheorie. Springer, Berlin.CrossRef Google Scholar

Göndöcs, F., Michaletzky, G., Móri, T. F. and Székely, G. J. (1985). A characterization of infinitely divisible Markov chains with finite state space. Ann. Univ. Sci. Budapest R. Eötvös Nom. 27, 137–141.Google Scholar

Guerry, M.-A. (2019). Sufficient embedding conditions for three-state discrete-time Markov chains with real eigenvalues. Linear Multilinear Algebra 67, 106–120.CrossRef Google Scholar

Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia.10.1137/1.9780898718027CrossRef Google Scholar

Higham, N. J. (2008). Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia.10.1137/1.9780898717778CrossRef Google Scholar

Higham, N. J. and Lin, L. (2011). On pth roots of stochastic matrices. Linear Algebra Appl. 435, 448–463.10.1016/j.laa.2010.04.007CrossRef Google Scholar

Högnäs, G. and Mukherjea, A. (2011). Probability Measures on Semigroups: Convolution Products, Random Walks and Random Matrices, 2nd edn. Springer, New York.10.1007/978-0-387-77548-7CrossRef Google Scholar

Horn, R. A. and Johnson, C. R. (2013). Matrix Analysis, 2nd edn. Cambridge University Press.Google Scholar

Keilson, J. and Kester, A. (1977). Monotone Markov matrices and monotone Markov processes. Stoch. Process. Appl. 5, 231–241.10.1016/0304-4149(77)90033-3CrossRef Google Scholar

Kijima, M. (1997). Markov Processes for Stochastic Modeling. Springer, Boston.CrossRef Google Scholar

Kingman, J. F. C. (1962). The imbedding problem for finite Markov chains. Z. Wahrscheinlichkeitsth. 1, 14–24.10.1007/BF00531768CrossRef Google Scholar

Lang, S. (1993). Algebra, 3rd edn. Addison-Wesley, Reading, MA.Google Scholar

Lindqvist, B. H. (1987). Monotone and associated Markov chains, with applications to reliability theory. J. Appl. Prob. 24, 679–695.CrossRef Google Scholar

Luther, U. and Rost, K. (2004). Matrix exponentials and inversion of confluent Vandermonde matrices. Electron. Trans. Numer. Anal. 18, 91–100.Google Scholar

Marcus, M. and Minc, H. (1992). A Survey of Matrix Theory and Matrix Inequalities. Dover, New York.Google Scholar

Norris, J. R. (2005). Markov Chains. Cambridge University Press.Google Scholar

Sloane, N. J. A. (1964). The On-Line Encyclopedia of Integer Sequences. Available at https://oeis.org.Google Scholar

Steel, M. (2016). Phylogeny—Discrete and Random Processes in Evolution. Society for Industrial and Applied Mathematics, Philadelphia.10.1137/1.9781611974485CrossRef Google Scholar

Sumner, J. (2017). Multiplicatively closed Markov models must form Lie algebras. ANZIAM J. 59, 240–246.10.1017/S1446181117000359CrossRef Google Scholar

Sumner, J. G., Fernández-Sánchez, J. and Jarvis, P. D. (2012). Lie Markov models. J. Theoret. Biol. 298, 16–31.CrossRef Google Scholar PubMed

Webster, R. (1994). Convexity. Oxford University Press.Google Scholar

Yang, Z. and Ranala, B. (2012). Molecular phylogenetics: principles and practice. Nature Rev. Genet. 13, 303–314.CrossRef Google Scholar PubMed

Article contents

On equal-input and monotone Markov matrices

Abstract

Keywords

MSC classification

1. Introduction

2. Equal-input matrices and some of their properties

2.1. Equal-input Markov matrices

2.2. Equal-input generators and embeddability

2.3. Markov idempotents and equal-input matrices

3. Monotone Markov matrices and embeddability

3.1. Monotone Markov matrices

3.2. Monotonicity and embeddability

3.3. Idempotents and infinite divisibility

4. Monotone Markov matrices in three dimensions

5. Uniqueness of embedding and further directions

Acknowledgements

Funding Information

Competing Interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests