Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-02-11T06:25:53.480Z Has data issue: false hasContentIssue false

RANDOM FINITE SUBSETS WITH EXPONENTIAL DISTRIBUTIONS

Published online by Cambridge University Press:  15 December 2006

Kyle Siegrist
Affiliation:
Department of Mathematical Sciences, University of Alabama in Huntsville, Huntsville, AL, E-mail: siegrist@math.uah.edu
Rights & Permissions [Opens in a new window]

Abstract

Let S denote the collection of all finite subsets of . We define an operation on S that makes S into a positive semigroup with set inclusion as the associated partial order. Positive semigroups are the natural home for probability distributions with exponential properties, such as the memoryless and constant rate properties. We show that there are no exponential distributions on S, but that S can be partitioned into subsemigroups, each of which supports a one-parameter family of exponential distributions. We then find the distribution on S that is closest to exponential, in a certain sense. This work might have applications to the problem of selecting a finite sample from a countably infinite population in the most random way.

Type
Research Article
Copyright
© 2007 Cambridge University Press

1. INTRODUCTION

The motivation for this article is the problem of selecting a finite set of positive integers in “the most random way possible.” The phrase in quotes is ambiguous, of course, but we will at least obtain a partial solution by considering the collection of finite sets as a positive semigroup (S,·) whose associated partial order is set inclusion. Positive semigroups are the natural mathematical home for probability distributions with exponential-type properties, including the memoryless and constant failure rate properties, special properties related to uniform distributions, and maximum entropy properties. All of these relate, in some sense, to the degree of randomness of the distribution.

More specifically, our goal is to describe the semigroup (S,·) and study probability distributions on S that have exponential properties. We will show that there are no exponential distributions on S, but that S naturally partitions into a countable collection of subsemigroups, each of which supports a one-parameter family of exponential distributions. We will then construct a two-parameter family of distributions on S that are close to exponential in a strong sense. For basic information about probability distributions on semigroups, see Högnäs and Mukherjea [2]. For properties and characterizations of the standard exponential distribution, see Azlarov and Volodin [1]. For the general theory of random sets, see Matheron [3].

2. POSITIVE SEMIGROUPS

A semigroup (S,·) consists of a set S and a binary operation · on S that is associative:

A positive semigroup is a semigroup (S, ·) that has an identity e, satisfies the left cancellation law, and has no nontrivial inverses; that is, for all x,y,zS, the following hold:

  1. xe = ex = x.
  2. xy = xz implies y = z.
  3. xy = e implies x = y = e.

The relation [prcue ] on S defined by x [prcue ] y if and only if y = xt for some tS is a partial order on S; that is, x [prcue ] x for each xS (the reflexive property), x [prcue ] y and y [prcue ] x imply x = y (the antisymmetric property), and x [prcue ] y and y [prcue ] z imply x [prcue ] z (the transitive property). If x [prcue ] y, then tS satisfying xt = y is unique and is denoted x−1y. For each xS, the mapping txt is an order isomorphism from S onto the set xS = {yS : x [prcue ] y}; that is, x [prcue ] y if and only xt [prcue ] yt for all x,y,tS. Indeed, the algebraic assumptions are precisely the ones needed for the partially ordered set (S, [prcue ]) to have this self-similarity property.

Positive semigroups are often found embedded in groups. Specifically, suppose that (G,·,[prcue ]) is a left-ordered group; that is, (G,·) is a group and [prcue ] is a partial order on G satisfying x [prcue ] yzx [prcue ] zy for x, y, zG. Let e denote the identity element of G and let S = {xG : x [sccue ] e} denote the set of positive elements of G. Then (S,·) is a positive semigroup and [prcue ] restricted to S is the associated partial order. Conversely, suppose that (G,·) is a group and that S is a positive subsemigroup of G. For x,yG, define x [prcue ] y if and only if xt = y for some tS. Then (G,·,[prcue ]) is a left-ordered group with S as the set of positive elements. On the other hand, as this article hopefully illustrates, there are interesting positive semigroups that cannot be embedded in groups.

In general, topological and measure-theoretic assumptions are imposed on S as well, but in this article we will only be concerned with discrete semigroups (where S is countably infinite). In this case, the one additional assumption needed is that [e,x] = {tS : t [prcue ] x} is finite for each xS, so that the partially ordered set (S,·) is locally finite. Note that counting measure # is left-invariant for (S,·); that is, #(xA) = #(A) for xS and AS. Moreover, # is the unique left-invariant measure up to multiplication by positive constants.

If T is a nonempty subset of S and is closed under the operation ·, then (T,·) is also a positive semigroup, since the other algebraic assumptions are simply inherited. However, the partial order associated with T is not, in general, the restriction of [prcue ] to T, but a subpartial order of this restriction; that is, if x, yT, then x [prcue ]T y implies x [prcue ] y, but the converse is not true unless x−1yT. If T is closed under · but does not contain e, then T ∪ {e} is also a closed under · and, hence, is a positive subsemigroup of S.

3. EXPONENTIAL DISTRIBUTIONS

Suppose that (S,·) is a (discrete) positive semigroup and that X is a random variable taking values in S (so that P(X = x) > 0 for each xS). The tail probability function of X is the mapping xP(X [sccue ] x). In general, this function does not uniquely determine the distribution of X.

Random variable X has an exponential distribution if

Equivalently, the conditional distribution of x−1X given X [sccue ] x is the same as the distribution of X for each xS. Random variable X has a memoryless distribution if (1) holds for all xS and all A of the form yS, where yS. Equivalently,

so that the conditional tail probability function of x−1X given X [sccue ] x is the same as the tail probability function of X. In the language of reliability theory, X has constant failure rate (or simply constant rate) if the (discrete) probability density function is proportional to the tail probability function:

for some positive constant α. In general,

so that if X has constant rate, then the rate constant must be

There are a number of nice properties and characterizations of exponential distributions on positive semigroups, perhaps a bit surprising given the minimal algebraic assumptions. We mention a few of these; for more details, see Siegrist [5,6,7]. First, X has an exponential distribution on S if and only if X is memoryless and has constant rate. In general, however, a distribution can have one of these properties (memoryless or constant rate), but not the other. Second, F : S → (0,1] is the tail probability function of an exponential distribution on S if and only if F(xy) = F(x)F(y) for all x,yS and

(the reciprocal of this sum is then the rate constant). This characterization prescribes a method for finding all exponential distributions on a given positive semigroup. Finally, suppose that X and Y are independent and identically distributed (i.i.d.) on S. Then the common distribution is exponential if and only if the conditional distribution of X given XY = z is uniform on [e,z] for each zS. Furthermore, if X1,X2,… are i.i.d. exponential and Yn = X1···Xn is the corresponding “gamma” variable of order

, then the conditional distribution of (Y1,…,Yn) given Yn+1 = z is uniform on the set

For general positive semigroups, counting measure # is replaced by a left-invariant measure λ; sums become integrals with respect to λ; and density functions and uniform distributions are with respect to λ as well. At least in terms of the algebraic structure, exponential distributions specify the most random way to select elements of S. Of course, the motivating example for this theory is the positive semigroup ([0,∞),+). The associated partial order is the ordinary order and Lebesgue measure is the invariant measure. The exponential distributions for this positive semigroup are the ordinary exponential distributions. Rowell and Siegrist [4] explore positive semigroups isomorphic to the standard one, in the context of reliability theory. We briefly mention a few discrete examples.

Example 1: Let

denote the set of nonnegative integers. Then

is a positive semigroup, and the associated partial order is the ordinary order ≤. The exponential distribution with rate parameter p ∈ (0,1) has tail probability function and density function given by

Of course, this is the standard geometric distribution with success parameter p.

Example 2: Let

denote the set of positive integers. Then

is a positive semigroup where · is ordinary multiplication. The associated partial order is the division order: x [prcue ] y if and only if x divides y. The exponential distributions turn out to be precisely the Dirichlet distributions with completely multiplicative coefficient functions; the zeta distribution is an important special case (see Siegrist [7]).

Example 3: Let A be a finite alphabet and let S = A* denote the space of all finite words with letters from A. The free semigroup (S,·) is a positive semigroup where · is the concatenation operation. The identity is the “empty word” e, and x [prcue ] y if and only if x is a prefix of y. An exponentially distributed variable has the form

where the random letters X1,X2,… are i.i.d. on the alphabet A and where the length L is independent of (X1,X2,…) and has a geometric distribution on

.

4. THE POSITIVE SEMIGROUP OF FINITE SUBSETS

Let S denote the set of all finite subsets of

. It is easy to see that the partially ordered set (S, ⊆) satisfies the self-similarity property described in Section 2. Our goal in this section is to define and study the corresponding positive semigroup.

We identify a nonempty subset x of

with the function given by

with domain {1,2,…,#(x)} if x is finite or with domain

if x is infinite. If x is nonempty and finite, max(x) denotes the maximum value of x; by convention, we take max(Ø) = 0 and max(x) = ∞ if x is infinite. Thus, #(x) ≤ max(x) for every x. If x and y are nonempty subsets of

with max(y) ≤ #(x), we let xy denote the subset whose function is the ordinary composition of x and y: xy(i) = x(y(i)). We also define x ○ Ø = Ø for any

. Note that xy is always defined when x is infinite.

We now define a binary operation · on S by

Note that the operation is well defined since xc is infinite. The operation might seem contrived, but it is not. Up to a relabeling of the positive integers, there is only one way to associate a positive semigroup with the subset partial order.

Theorem 1: (S,·) is a positive semigroup with the subset partial order.

Proof: The associative rule holds, and in fact

The empty set is the identity

The left-cancellation law holds: Suppose that xy = xz. Then x ∪ (xcy) = x ∪ (xcz) by definition and, hence, xcy = xcz since the pairs of sets in each union are disjoint. However, then y = z. There are no nontrivial inverses: If xy = Ø, then x ∪ (xcy) = Ø. Hence, we must have x = Ø and, therefore, also

.

Finally, the associated partial order is the subset order. Suppose first that xu = y. Then x ∪ (xcu) = y so xy. Conversely, suppose that xy. Let

. Then x ∪ (xcu) = y, so xu = y. █

Note that the irreducible elements of (S, ·) are the singletons {i}, where

. Note also that

Thus, the binary operation is not commutative and the right-cancellation law does not hold, so (S,·) just satisfies the minimal algebraic assumptions of a positive semigroup. In particular, S cannot be embedded in a group. Finally, if i1 < i2 < ··· < in, then

Proposition 1: For x, yS,

Proof: #(xy) = #[x ∪ (xcy)] = #(x) + #(xcy) since x and xcy are disjoint. However, clearly, #(xcy) = #(y).

Equation (3) is trivial if x or y is the identity (Ø), so we will assume that x and y are nonempty. Note that, by definition,

Let i = #(x) and n = max(x). Then nx and the remaining i − 1 elements of x are in {1,2,…, n − 1}. Hence, xc contains ni elements of {1,2,…, n − 1}, together with all of the elements of {n + 1,n + 2,…}. If max(y) ≤ ni, then max(xcy) = xc(max(y)) ≤ n − 1, so max(xy) = n = max(x). If max(y) > ni, then max(xy) = max(xcy) = xc(max(y)) = n + (max(y) − (ni)) = max(y) + i. █

The semigroup (S, ·) has an interesting structure, but to see it we need some additional notation. For

, let

and let Tk = {Ø} ∪ Sk. For

, let

Of course, S0,0 = {Ø}. If

and

, then

since xSn,k must contain the element n + k and n − 1 elements from {1,2,…,n + k − 1}. If we interpret the binomial coefficient

as 1, then (4) is valid for n = k = 0 also.

Theorem 2: Sk is a subsemigroup of S, and hence Tk is a positive subsemigroup of S, for each

. The associated partial order on Tk is the subset partial order.

Proof: We first show that xySk for x,ySk. The result is trivial if x = Ø or y = Ø (which can only happen when k = 0). Thus, we will assume that x and y are nonempty. Then max(y) > max(x) − #(x), since the left-hand side is k + #(y) and the right-hand side is k. By Proposition 1, max(xy) = max(y) + #(x). Hence,

Therefore, xySk. To prove the last statement in the theorem, it suffices to show that if x, ySk and xy, then x−1ySk. Thus, suppose that xSm,k, ySn,k, where m < n, and xu = y for some uS (so that xy). Then max(y) = n + k > m + k = max(x), so by another application of Proposition 1, max(y) = max(u) + m and, hence, max(u) = nm + k. However, #(u) = nm and, hence, uSk. █

Note that

. If yS and #(y) = n, then

In particular, {1,2,…, m}{1,2,…, n} = {1,2,…, m + n}, so (S0, ·) is isomorphic to

, the positive semigroup in Example 1, and x [map ] #(x) is an isomorphism. Finally, note that Ø ∈ S0, so T0 = S0. To characterize the exponential distributions on Tk, we must first characterize the minimal elements of Sk (which are the irreducible elements of Tk).

Proposition 2: The set of minimal element of Sk is

There are 2k minimal elements.

Proof: First, we show that if xSk is not a minimal element of Sk, then xMk. Thus, suppose that x = uv, where u,vSk are nonempty. Then max(u) > k and max(u) ∈ uuv = x. Moreover, max(u) < max(x), so the rank of max(u) in x is less than #(x) = #(u) + #(v). Therefore, xMk.

Next, we show that if xSk[setmn ]Mk, then x is not a minimal element of Sk. Thus, suppose that xSk and x(i) > k for some i < #(x). Construct uS as follows: x(i) ∈ u and u contains x(i) − k − 1 elements of x that are smaller than x(i). This can be done since x(i) − ik and, hence, x(i) − k − 1 ≤ i − 1, and by definition, x contains i − 1 elements smaller than x(i). Now note that max(u) − #(u) = x(i) − (x(i) − k) = k, so uSk. By construction, ux, so there exists vS such that uv = x. By Theorem 2, vSk and, hence, x is not a minimal element of Sk.

Next, note that if xSk and #(x) ≥ k + 2, then xMk, since one of the k + 1 elements of x of rank less than #(x) must be at least k + 1. For nk + 1, the number of elements xMk with

, since x must contain n + k and n − 1 elements in {1,2,…, k}. Hence,

For example, the minimal elements of S2 are {3}, {1,4}, {2,4}, and {1,2,5}.

5. EXPONENTIAL DISTRIBUTIONS ON Tk

Theorem 3: There are no memoryless distributions on S and, hence, no exponential distributions.

Proof: Suppose that X is a random variable taking values in S and that X has a memoryless distribution. By the memoryless property,

However, {i}{i} = {i + 1}{i}, as noted earlier, so we must have

for every

. Next, note that if i1 < i2 < ··· < in then by another application of the memoryless property,

It therefore follows that the events

are i.i.d. Hence, infinitely many of the events must occur with probability 1, so X is infinite—a contradiction. █

Although there are no exponential distributions on S, the subsemigroup Tk has a one-parameter family of exponential distributions for each

.

Theorem 4: A random variable X taking values in Tk has an exponential distribution if and only if the tail probability function F and density function f have the following form, for some α ∈ (0,1):

Proof: The function F(x) = α#(x) takes values in (0,1] and satisfies F(xy) = F(x)F(y) for all x, yTk. Moreover,

It follows that F and f as given earlier are the tail probability function and density function, respectively, of an exponential distribution.

Conversely, suppose that F is the tail probability function of a memoryless distribution on Tk. As noted earlier, T0 is isomorphic to

, with # an isomorphism. Thus, if k = 0, F must have the form F(x) = α#(x), where α = F({1}) ∈ (0,1). For general k, we will show by induction on #(x) that F(x) = α#(x), where α = F({k + 1}) ∈ (0,1). The result is trivially true if #(x) = 0, since x = Ø. The result is also trivially true if #(x) = 1, since the only such xTk is x = {k + 1}. Suppose now that F(x) = α#(x) for all xTk with #(x) ≤ n. Let xTk with #(x) = n + 1. If x is not irreducible, then x = uv, where u,vTk, #(u) ≤ n, #(v) ≤ n, and #(u) + #(v) = #(x). In this case,

On the other hand, if x is irreducible, let j = min{ix : i + 1 ∉ x}. Note that j < #(x) since max(x) = #(x) + k. Now let yTk be obtained from x by replacing j with j + 1. Note that #(y) = #(x) and, moreover, yc can be obtained from xc by replacing j + 1 with j. We claim that xx = yx; that is, x ∪ (xcx) = y ∪ (ycx). To see this, note first that if ij and ij + 1, then ix if and only if iy, and ixc if and only if iyc. On the other hand, jx and jycx, since j = yc(x(1)) (by definition, there are x(1) − 1 elements less than x(1) in yc; the next element in yc is j). Similarly, j + 1 ∈ y and j + 1 ∈ xcx, since j + 1 = xc(x(1)). Since xx = yx, it follows from the memoryless property that F(x) = F(y). Continuing this process, we find that F(x) = F(y) for some ySk that is not minimal, but with #(y) = #(x). It then follows that F(x) = F(y) = α#(y) = α#(x) and the proof is complete. █

To illustrate the last part of the proof, suppose that x = {3,4,5,8,15} ∈ T10. Then j = 5, y = {3,4,6,8,15}, and xx = yx = {3,4,5,6,7,8,9,12,15,20}.

Suppose that X has the exponential distribution on Tk given in Theorem 4. From the general theory in Section 3, the expected number of subsets of X in Tk is the reciprocal of the rate parameter in the density function. Thus,

If k = 0 (recall that S0 = T0), note that

On the other hand, suppose that

. Then

Thus, the conditional distribution of X given XSk has density function

The density function of X depends on xTk only through #(x). The following corollary gives the distribution of #(X).

Corollary 1: Suppose that X has the exponential distribution in Theorem 4 and let U = #(X). Then

where we interpret the binomial coefficient as 1 when n = 0.

When k = 0, (9) gives P(U = n) = (1 − α)αn for

, so U has a geometric distribution on

. In general, U has a modified negative binomial distribution. It is easy to see from (10) that for each

, E(U) is a strictly increasing function of α and maps (0,1) onto (0,∞). Thus, the exponential distribution on Tk can be reparameterized by expected cardinality. Moreover, the exponential distribution maximizes entropy with respect to this parameter:

Corollary 2: The exponential distribution in Theorem 4 maximizes entropy over all distributions on Tk with expected value given by (10).

Proof: We use the usual inequality for entropy: if f and g are probability density functions of random variables X and Y, respectively, taking values in Tk, then

If X has the exponential distribution in Theorem 4 and if E(#(Y)) = E(#(X)), then substituting into the right-hand side of (11), we see that the entropy of Y is bounded above by

where ck is the rate parameter of the exponential density in (6) and μk is the mean cardinality in (10). Of course, the entropy of X achieves this upper bound. █

6. ALMOST EXPONENTIAL DISTRIBUTIONS ON S

There are no exponential distributions on S. However, we can define distributions that are “close” to exponential by forming mixtures of the distributions in (7) and (8). Thus, suppose that X takes values in S with probability mass function

where αkk ∈ (0,1) for each

and

. Thus, the conditional distribution of X given XSk is the same as the corresponding conditional distribution of an exponential variable on Tk (with parameter αk). Note that the conditional distribution of X on Tk itself is not exponential. In fact, we cannot construct a distribution on S by requiring that the conditional distributions on Tk be exponential for each k, essentially because these semigroups share Ø and, thus, are not disjoint. The distribution of X is as close to exponential as possible, in the sense that X is essentially exponential on each of the subsemigroups Sk, and these semigroups partition S.

There is not much that we can say about the general distribution in (12). In the remainder of this section we will study a special case with particularly nice properties. For our first construction, let N have a geometric distribution on

with rate parameter 1 − r ∈ (0,1), as in Example 1. Next, given N = n, random variable X is distributed on the subsets of {1,2,…, n}, so that iX, independently, with probability p for each i ∈ {1,2,…, n}. Of course, if N = 0, then X = Ø.

Theorem 5: For xS,

Proof: For xS,

If n < max(x), then x is not a subset of {1,2,…, n}, so P(X = x|N = n) = 0. If n ≥ max(x), then x is a subset of {1,2,…, n} and, by assumption, P(X = x|N = n) = p#(x)(1 − p)n−#(x). Substituting gives

which simplifies to (13). By a similar argument,

which simplifies to (14). █

The distribution of X depends on xS only through #(x) and max(x) − #(x). As before, let U = #(X) and now let V = max(X) − #(X).

Corollary 3: For

,

Corollary 4: For

, the conditional distribution of X given U = n, V = k is uniform on Sn,k.

Corollary 5: For

, the conditional distribution of V given U = n is negative binomial with parameters n and r(1 − p) (when n = 0, the conditional distribution of V is point mass at 0).

Corollary 6: The distribution of U is geometric with parameter (1 − r)/(1 − r + rp).

Of course, Corollaries 4–6 determine the distribution of X and give an alternate way of constructing the distribution in the first place: We first give U a geometric distribution with a parameter a ∈ (0,1); given U = n, we give V a negative binomial distribution with parameters n and b ∈ (0,1); and, finally, given U = n and V = k, we give X the uniform distribution on Sn,k. The two constructions are equivalent, since there is a one-to-one correspondence between the pairs of parameters (r,p) and (a,b).

Our next goal is to study the distribution of the random subset X on the subsemigroups Sk. First, note that

Thus, for

, X has constant rate

on the subsemigroup Sk. In particular, for xS0,

Hence, X has the memoryless property on S0 (in addition to the constant rate property). To find the conditional distribution of X given XSk, we first need P(XSk) or, equivalently, the probability density function of V.

Corollary 7: V has a modified geometric distribution:

Corollary 8: The conditional distributions of X on Sk are as follows:

Thus, X has an almost exponential distribution in the sense of (12), with αk = 1 − rp for each

and with the mixing probabilities given in Corollary 7.

From Theorem 3, no exponential distribution on S exists because the events

would have to be independent with a common probability. The next corollary explores these events for the random variable in Theorem 5.

Corollary 9: Suppose that X has the distribution in Theorem 5.

1. P(iX) = pri for

.

2. If

with i1 < i2 < ··· < in then

3. For

, the events {1 ∈ X},{2 ∈ X},…,{j − 1 ∈ X} are conditionally independent given {jX} with P(iX| jX) = p for i < j.

Property 3 in Corollary 9 is clearly a result of the original construction of X. Property 2 is reminiscent of a Markov property. This property implies that the events

are positively correlated, but asymptotically uncorrelated. In fact the correlation decays exponentially, since

From Corollaries 5 and 6, we can compute the expected value of U = #(X) and W = max(X) = U + V:

It is easy to see from (17) and (18) that (E(U),E(W)), as a function of (r,p), maps (0,1)2 one-to-one and onto {(c,d) : 0 < c < d < ∞}. Thus, the distribution of X can be reparameterized by expected cardinality and expected maximum. Moreover, the distribution of X maximizes entropy with respect to these parameters. The proof of the following corollary is essentially the same as the proof of Corollary 2

Corollary 10: The distribution in Theorem 5 maximizes entropy among all distributions on S with expected cardinality given by (17) and expected maximum given by (18).

Of fundamental importance in the general theory of random sets [3] is the hitting probability function G:

This function completely determines the distribution of a random set, and in the general setting (which lacks the algebraic structure that we have here), it plays the role of a “distribution function.” Note that G is defined for all subsets of the positive integers, not just finite subsets.

Theorem 6: Suppose that X has the almost exponential distribution with parameters p and r given in Theorem 5. Then

where, as usual, x(i) is the ith smallest element of x.

Proof: Suppose first that x is finite (so that xS). From the standard inclusion–exclusion formula (or from [3]),

Hence, substituting the result in (14), we have

For infinite x, the formula holds by passing to the limit and using the continuity of probability. █

Acknowledgment

I am grateful to Marcus Pendergrass for suggesting this model.

References

REFERENCES

Azlarov, A.T. & Volodin, N.A. (1986). Characterization problems associated with the exponential distribution. Berlin: Springer-Verlag.CrossRef
Högnäs, G. & Mukherjea, A. (1995). Probability measures on semigroups. New York: Plenum Press.CrossRef
Matheron, G. (1975). Random sets and integral geometry. New York: Wiley.
Rowell, G.H. & Siegrist, K. (1998). Relative aging of distributions. Probability in the Engineering and Informational Sciences 12: 469478.Google Scholar
Siegrist, K. (1994). Exponential distributions on semigroups. Journal of Theoretical Probability 7: 725737.Google Scholar
Siegrist, K. (2006). Decomposition of exponential distributions on positive semigroups. Journal of Theoretical Probability 19: 204220.Google Scholar
Siegrist, K. (2007). Exponential and gamma distributions on positive semigroups, with applications to Dirichlet distributions. Bernoulli (to appear).CrossRefGoogle Scholar