A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES IN RANDOM POOLING DESIGNS

F. K. Hwang; Y. C. Liu

doi:10.1017/S0269964804182028

A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES IN RANDOM POOLING DESIGNS

Published online by Cambridge University Press: 16 April 2004

F. K. Hwang and

Y. C. Liu

Show author details

F. K. Hwang: Affiliation:
Department of Applied Mathematics, National Chiao Tung University, Hsinchu 30050 Taiwan, Republic of China, E-mail: fhwang@math.nctu.edu.tw;, u8722518@math.nctu.edu.tw
Y. C. Liu: Affiliation:
Department of Applied Mathematics, National Chiao Tung University, Hsinchu 30050 Taiwan, Republic of China, E-mail: fhwang@math.nctu.edu.tw;, u8722518@math.nctu.edu.tw

Article contents

Abstract
1. INTRODUCTION
2. A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES
3. RANDOM INCIDENCE DESIGN
4. RANDOM k-SET DESIGN
5. RANDOM r-SIZE DESIGN
6. RANDOM DISTINCT k-SET DESIGN
7. SUMMARY AND NUMERICAL DATA
Footnotes
References

Rights & Permissions

Abstract

In this paper, we develop a general approach to compute the probabilities of unresolved clones in random pooling designs. This unified and systematic approach gives better insight for handling the dependency issue among the columns and among the rows. Consequently, we identify some faster computation formulas for four random pooling designs proposed in the literature, and we derive some probability distribution functions of the number of unresolved clones that were not available before.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 18 , Issue 2 , April 2004 , pp. 161 - 183

DOI: https://doi.org/10.1017/S0269964804182028 [Opens in a new window]
Copyright: © 2004 Cambridge University Press

1. INTRODUCTION

A pooling design is a (binary) incidence matrix where each column represents a clone and each row represents a pool. A 1-entry in cell (i,j) signifies that clone j is contained in pool i. A clone represents a short DNA fragment. It can be a positive if it contains a specific DNA sequence as a subsequence, or a negative if otherwise. All clones contained in a pool are tested together as a group. The outcome is positive if and only if the pool contains a positive (otherwise, the outcome is negative). The goal of a pooling design is to identify all positives with as few pools as possible. Note that all pools in the matrix can be tested simultaneously.

Random pooling designs have been proposed [1, 2, 3, 4, and 5] for their wide applicability. However, they do not guarantee to identify all clones. Suppose that there are d positives among n clones. Let N denote the numbers of unresolved negatives and P the number of unresolved positives. Then, it is important to estimate P and N for evaluating a random pooling design.

Four types of random design have been studied in the literature. Consider a t × n binary matrix M:

1. Random incidence design (RID). Each cell in M has probability p of being one.

2. Random k-set design (RkSD). Each column is a random k-set of the set [t] = {1,…, t}; that is, there are exactly k 1-entries in each column, with the locations of these 1's being equally likely to be any of the

possibilities.

3. Random distinct k-set design (RDkSD). RkSD with all columns being distinct.

4. Random r-size design (RrSD). Each row is a random r-set of the set [n] = {1,…, n}.

RID was first proposed by Erdös and Rényi [3] in search problems. Balding, Bruno, Knill, and Torney [1] proposed RkSD and showed it to have much smaller P and N (i.e., it identified many more positive clones); hence, it is much more powerful than RID. Both [1] and [2] alluded to bounding of intersections (pools in which two columns coincide) of two columns to avoid heavy similarity without giving any analysis. Hwang [4] carried out the idea by studying the RDkSD in which the sampling is without replacement to avoid some apparent inefficiency. Another motivation to study RDkSD is that some other pooling designs [6,7] are special cases of RDkSD. Hwang also proposed RrSD to compare its row structure with the column structure of RkSD. Further, RrSD is suitable for the situation when the size of a pool is restricted. Note that one can also propose random distinct r-size design (RDrSD). However, since Hwang and Liu [5] found that the performances of RkSD and RDkSD are about the same, and the performance of RrSD is much worse, there is not much motivation to study RDrSD.

The basic probabilities to be computed for these four models are P(N = u) and P(P = v). In particular, we are interested in P(N = 0) and P(P = 0), the cases that all negatives and all positives, respectively, are identified. From P(N = u) and P(P = v) we also obtain E(N) and E(P), which in turn yield the following

, the probability that a random negative clone is unresolved and

, the probability that a random positive clone is unresolved.

However, since E(N) and E(P) are usually quite messy, it is often easier to argue for P⁻ and P⁺ directly.

We will also look into the problem of choosing p, k, and r to minimize these probabilities. Due to the messiness of the probability function, only limited results have been obtained.

2. A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES

Let M be a t × n matrix and D a set of d positives. We say that the rows (columns) are i.d. if the distribution of the number of 1-entries are identical for all rows (columns). Furthermore, if the distribution is independent between the rows (columns), then we say the rows (columns) are i.i.d. In the random designs we study here, the rows and the columns are always i.d.

We say a row and a column intersect if the intersection cell contains a 1-entry. Partition the rows of M into three parts X,Y,Z according to whether a pool intersects D at least twice, exactly once, or not. Define f (z) = P(|Z| = z).

Theorem 2.1: Suppose that the rows are i.d. Then,

Proof: f (z) is computed by an inclusion–exclusion formula. █

Corollary 2.2: Suppose that the rows are i.d. and the columns are i.i.d. Then,

Proof: The event “the i rows including Z not intersecting D” can also be expressed as “d columns not intersecting the i rows including Z.” █

When the rows are i.i.d., then f (z) is much simpler.

Lemma 2.3: Suppose that the rows are i.i.d. Then,

Next, we give P(N = u) and then P(P = v).

Theorem 2.4: Suppose that the columns and rows are both i.d. Then,

Proof: A negative column not in Z is an unresolved negative. █

Either the column i.i.d. or the row i.i.d. will bring some simplification.

Corollary 2.5: Suppose that the rows are i.d. and the columns are i.i.d. Then,

Corollary 2.6: Suppose that the columns are i.d. and the rows are i.i.d. Then,

Let f (z,y) denote P(|Z| = z,|Y | = y) and q_S the number of rows in S not containing any unresolved negative.

Theorem 2.7: Suppose that the columns and rows are both i.d. Then,

Proof: Note that P(P = v|q_Y) is the probability that v positives do not appear in the q_Y rows, hence unresolved. P(P = v|q_Y) can also be viewed as the probability of getting v empty holes in rolling q_Y balls into d holes. █

Although, in principle, the computation of y and q can be combined into one step, such a computation would be difficult to carry out unless the rows are independent. Here, we give the i.i.d. version.

Corollary 2.8: Suppose that the columns are i.d. and the rows are i.i.d. Then,

where E is the event that a row in X∪Y contains a single positive but no unresolved negative.

We next discuss the computation of P⁻ and P⁺. For convenience, in computing P⁻, the negative whose resolvability is in concern will be denoted by C. In computing P⁺,D₁ is the positive in concern. We first give a general formula for computing P⁻.

Theorem 2.9: Suppose that the rows are i.d. Then,

Corollary 2.10: P⁻ is independent of n if and only if f (z) is independent of n.

Corollary 2.11: P⁻ is independent of n if the columns are independent.

Proof: f (z) is determined by the columns of D, hence independent of n if the columns are independent. █

Corollary 2.12: Suppose that the rows are i.i.d. Then,

However, we can do better by combining the computation of the probabilities of z and of the property.

Theorem 2.13: Suppose that the rows are i.i.d. Then,

Even without the row i.d., we can interpret Theorem 2.9 in a way such that there is no need to compute f (z). Let the weight of a column be the number of its 1-entries.

Theorem 2.14: Suppose that C has weight k. Then,

Proof: The i specified rows are in the k rows contained in C. █

Corollary 2.15: Suppose that the columns are i.i.d. Then,

To compute P⁺, let Y₁ be the subset of Y containing D₁ but no other D_j. Define

Theorem 2.16: Suppose that the rows are i.d. Then,

Proof: The last sum gives the probability that no row in Y₁ satisfies the condition that every negative either appears in Z (hence resolved) or does not appear in the row of Y₁ (hence not obstructing the identification of D₁). Note that the condition characterizes the identification of D₁. █

Corollary 2.17: Suppose that the rows are i.d. and the columns are i.i.d. Then,

With the row i.i.d., we obtain a different set of formulas.

Theorem 2.18: Suppose that the rows are i.i.d. Then,

Proof: The [] term is the probability that D₁ is not identified by any of the t − z rows, hence unresolved. █

Theorem 2.19: Suppose that the columns are i.d. and the rows are i.i.d. Then,

Proof:

The second equality is true by Corollary 2.6.

Define the following:

A: the event that besides N, exactly j − u additional clones not in Z

B: the event D₁ not identified given N = u

Since B depends on D and N, and A depends on j − u clones not in D ∪ N,A and B are independent events. Hence,

Theorem 2.19 can also be obtained from Theorem 2.18 by replacing P(N = u|z) with the terms in Corollary 2.6 and summing over z. The proof we gave here is more insightful.

3. RANDOM INCIDENCE DESIGN

Let M be a t × n RID. Note that both rows and columns are i.i.d. Using the row i.i.d., the following is easily obtained:

Lemma 3.1:

Hwang [4] gave the following theorem.

Theorem 3.2:

Proof: The proof follows immediately from Corollary 2.5 and Lemma 3.1. █

The special case u = 0 was first given by Balding et al. [1].

Corollary 3.3:

Corollary 3.4: E(N) = (n − d)[1 − p(1 − p)^d]^t.

Proof:

█

Corollary 3.5: P⁻ = [1 − p(1 − p)^d]^t.

Note that Corollary 3.5 can also be argued directly from Theorem 2.13 by noting that p(1 − p)^d is the probability that a row contains C but none of D. Then, Corollary 3.5 can be obtained by multiplying by (n − d). We did it the hard way just for demonstration purposes.

Let p_*⁻ minimize P⁻ (or E(N)). Balding et al. [1] gave the following:

Theorem 3.6: p_*⁻ = (d + 1)⁻¹.

Proof: Clearly, to minimize P⁻ is to maximize p(1 − p)^d. Set

We obtain p_*⁻ = (d + 1)⁻¹. █

Let p₀⁻ minimize P(N = 0). No analytic solution of p₀⁻ is known.

The corresponding probabilities of unresolved positives are considerably messier.

Theorem 3.7:

Proof: This is proved by Corollary 2.8. █

Because P(P = v) is unwieldy to maneuver, it is desirable to derive P⁺ and E(P) independently. We give several such derivations and compare their terms' complexities. First, a lemma is needed.

Lemma 3.8:

Proof: A pool is not in Z ∪ Y₁ if and only if it contains a positive other than D₁. █

We can use the column i.i.d. to compute P⁺.

Theorem 3.9:

Proof: 1 − (1 − p)^z is the probability that a negative appears in Z, and (1 − p)^z+i is the probability that a negative does not appear in Z or in the i specified rows of Y₁. Theorem 3.9 follows immediately from Corollary 2.17. █

Note that P⁺ in Theorem 3.9 can be computed in O(t³) time.

Alternatively, we can use the row i.i.d. formula in Corollary 2.6 (after summing over z).

Theorem 3.10:

Proof: 1 − (1 − p)^d is the probability that a pool contains a positive; hence, it is positive. In a positive pool, D₁ is identified if and only if it is the only positive in the pool and no unresolved negative is in the pool. The probability of this is p(1 − p)^d−1+u, given there are u negative pools. Therefore, 1 − (1 − p)^d − p(1 − p)^d−1+u is the probability that a pool is positive but not identifying D₁ given N = u.

On the other hand, (1 − p)^d is the probability that a pool is negative and (1 − p)^j is the probability that it does not contain the j specified negatives including N. Hence, (1 − p)^d+j is the probability that both events happen. Theorem 3.10 follows immediately from Theorem 2.19. █

Note that P⁺ in Theorem 3.10 can be computed in O(n²) time.

Finally, we can also use the other row i.i.d. formula.

Theorem 3.11:

Proof: p(1 − p)^d−1+u is the unconditional probability that a pool contains D₁ but no other D_i nor any unresolved negative. Its division by 1 − (1 − p)^d given the same probability conditional on the pool is positive (in X ∪ Y). Theorem 3.11 follows immediately from Theorem 2.18. █

P⁺ in Theorem 3.11 can be computed in O(tn) time. Since t is usually much smaller than n, Theorem 3.11 seems to be an improvement over Theorem 3.10 with respect to computation. Note that Theorem 3.9 uses the column independence with time complexity a function of t, Theorem 3.10 uses the row independence with time complexity a function of n, and Theorem 3.11 uses both column and row independence with time complexity a function of both t and n.

Corollary 3.12: E(P) = dP⁺

No analytic solution has been given to minimize either P⁺ or P(P = 0).

4. RANDOM k-SET DESIGN

The columns in RkSD are i.i.d., but the rows are only i.d.

Lemma 4.1:

Proof: Since the rows are not independent, the inclusion–exclusion formula is used to compute the exact probability of z. █

Theorem 4.2:

Proof: The probability that a negative does not appear in a row of Z is

. Theorem 4.2 now follows immediately from Corollary 2.4. █

It is easier to argue for P⁻ independently than from P(N = u).

Theorem 4.3:

Proof: The probability that a positive does not appear in i of the k appearances of C is

. Theorem 4.3 follows immediately from Corollary 2.15. █

Corollary 4.4: E(N) = (n − d)P⁻.

Let k_*⁻ minimize P⁻. Our formula for P⁻ is very similar to that Macula [8] gave for the probability of a positive being unresolved under the representative decoding [6]. Hence, we imitate the approximation he gave:

Then, k′ = (t ln 2)/d minimizes (1 − e^−kd/t)^k.

To compute P(P = v), we need f (z,y). Let Π(y,d) denote the set of partitions π = y₁,…,y_d of

distinct objects into d distinct parts with 0 ≤ y_j ≤ k.

To compute P⁺, we need f (z, y₁).

Lemma 4.7:

Proof: By definitions of z and y₁, each of the remaining t − z − y₁ pools must contain a D_i, i ≠ 1. The last sum in Lemma 4.7 gives this probability using the inclusion–exclusion formula, where

is the probability that D_i does not appear in a specified set of z + y₁ + h pools (including the pools in Z ∪ Y₁). Finally, D₁ must appear in the y₁ rows of Y₁. Its other k − y₁ appearances must not be in Z ∪ Y₁. █

Theorem 4.8:

Proof:

is the probability that a negative appears in Z.

is the probability that a negative does not appear in z or in the i specified pools of Y₁. Theorem 4.8 follows immediately from Corollary 2.17. █

Note that P⁺ in Theorem 4.8 can be computed in O(t²k²) time.

Corollary 4.9: E(P) = dP⁺.

No analytic solution has been given to minimize either P⁺ or P(P = 0).

5. RANDOM r-SIZE DESIGN

The rows of RrSD are i.i.d., but the columns are only i.d.

Lemma 5.1:

Proof:

is the probability that a pool does not contain any positive; hence, it is in Z. █

Theorem 5.2:

Proof:

is the probability that a pool in Z does not contain any of the j specified negatives, including the given u ones. Theorem 5.2 now follows immediately from Corollary 2.6 and Lemma 5.1. █

It is simpler to derive P⁻ directly.

Theorem 5.3:

Proof: A pool contains C, but none of D must take its other r − 1 clones from the other n − d − 1 negatives. Theorem 5.3 now follows immediately from Theorem 2.13. █

Let r_*⁻ minimize P⁻. Lin (private communication) observed the following:

Theorem 5.4: r_*⁻ ∈ {[lceil ]r*[rceil ],[lfloor ]r*[rfloor ]} where r* = (n − d)/(d + 1).

Proof: Clearly, minimizing P⁻ is the same as maximizing

When r increases, both factors decrease. Hence, the ratio decreases in r, and maximum g(r) is obtained at the two integers that flank the r* satisfying g(r + 1)/g(r) = 1; that is, r* = (n − d)/(d + 1). █

Note that r* divided by the number of negatives yields P_*⁻ in RID.

For the unresolved positive, we have the following theorem.

Theorem 5.5:

Proof: Theorem 5.5 follows from Corollary 2.8. We will only comment on the term P(E) (defined as in Corollary 2.8), as the other terms have been obtained earlier.

is the number of ways of choosing a (positive) pool containing D₁ but no other D_j nor any unresolved negative. d times this quantity counts the number of ways of choosing a simple positive (not necessarily D₁), but no unresolved negatives.

is the number of ways of choosing a positive row. Thus, the ratio

gives the conditional probability that a positive pool contains a single positive and no unresolved negative (hence, the positive is identified). █

Again, we derive P⁺ independently.

Theorem 5.6:

Proof:

is the number of ways of choosing a positive pool.

is the number of ways of choosing a pool containing D₁ but no other D_j nor an unresolved negative (D₁ is identified). Hence,

is the number of ways of choosing a positive pool not identifying D₁.

is the number of ways of choosing a negative pool not containing any of the j specified negatives including N. Theorem 5.6 now follows immediately from Theorem 2.19. █

P⁺ in Theorem 5.6 can be computed in O(n²) time.

Analytic solutions for optimal r to minimize either P⁺ or P(P = 0) are not known.

6. RANDOM DISTINCT k-SET DESIGN

RDkSD is neither column independent, nor row independent. Hence the computation of the probabilities of unresolved clones poses both a challenge but also an opportunity to expand the formulas beyond the independence threshold.

Lemma 6.1:

Proof: All k appearances of a positive must be outside of Z. There are

such distinct k-sets from which to choose d. Since the rows are not independent, the inclusion–exclusion formula is required. █

Theorem 6.2:

Proof: There are

k-sets not intersecting Z. d of them are chosen by the positives. The u unresolved negatives must be chosen from the remaining ones, and the n − d − u resolved negatives must be chosen from the

k-sets intersecting Z. Theorem 6.2 now follows from Theorem 2.4. █

We argue for P⁻ independently.

Theorem 6.3:

Proof: i is the number of rows intersecting C. The ratio represents the probability that no positive appears in these i rows. █

We do not have a formula for P(P = v), even f (z,y) seems too difficult to attempt. Hwang and Liu [5] gave formulas for P⁺ and f (z, y₁). Let ε_x = 1 if x = 0, otherwise ε_x = 0.

Lemma 6.4:

Proof: The sum in Lemma 6.4 gives this probability using the inclusion–exclusion formula, where

is the probability that no D_j, j ≠ 1, appears in a specified set of z + y₁ + h pools (including the pools in Z ∪ Y₁). There are

ways of choosing D₁. However, if y₁ = 0, then D₁ is also chosen from the

k-sets, hence (d − 1) k-sets, which have been selected as positives, should be subtracted. █

Theorem 6.5:

Proof:

is the number of k-sets intersecting Z, and

is the number of k-sets intersecting neither Z nor the i specified rows. Thus, a k-set taken from the union of the two sets satisfies the condition in Theorem 2.16. However, the (d − 1)D_j, j ≠ 1, are also taken from the second set. Therefore, these d − 1 k-sets must be subtracted before the n − d negatives can be chosen. Further, if i = 0, then D₁ is also chosen from the second set; hence, one more k-set should be subtracted. █

No analytic solution for optimal k to minimize any P⁻, P(N = 0), P⁺, and P(P = 0) is known.

7. SUMMARY AND NUMERICAL DATA

The method in [5] first computes the probability u(j) that there are j unresolved negatives and then computes f (z, y₁) from [sum ]_j u(j) f (z, y₁| j). The summation over j requires O(n) times. The general approach we proposed in this article takes advantage of column independence in RID and RkSD to focus on the probability of a single negative blocking the identification of the positive and then to multiply that probability (n − d)-fold to account for all negative clones. Thus, there is no need to sum over j.

Bruno et al. [2] eliminated the summation over j for RkSD, not through the argument we offered but simply through combinatorial maneuvering. However, they overlooked something that resulted in an unnecessary inflation of the time complexity to O(t⁴). For RkSD and RDkSD, the range of y₁ is from 1 to k, where k is typically much smaller than t. Thus, the summation over y₁, as well as the summation over O(y₁) terms when computing the probability that all y₁ appearances intersect with some unresolved negatives, should both involve O(k) terms instead of O(t). This brings a reduction of time complexity to O(k²t²). Bruno et al. may have missed this point by using the variable z + y₁ instead of y₁, which somehow obscured the number of terms. We should also point out that the substitution of O(k) for O(t) in two summations in RkSD and RDkSD was also not observed in Hwang and Liu [5].

For RDkSD, although the columns are not independent, they are structured well enough so that we can argue over the n − d negatives collectively—again, no need to introduce j. For RrSD, our general approach takes advantage of row independence to focus on the probability that a positive cannot be identified in a certain pool, and then to multiply that probability t-fold to account for all pools. Hence, the time complexity is reduced to O(n²), which is independent of t; the old method needs O(n²t³) times.

The general approach helps us speed up the computation. We can only compute for n ≤ 100 in [5], whereas we can compute for n ≥ 1000, even for n = 10,000 for some designs now. Our program is written by Mathematica and not optimized. Hence, there is still the possibility for computation of larger parameters.

We present some numerical data in this section. First, we draw the RkSD and the RDkSD together in Figure 1 for easier comparison. As mentioned in Section 1, the difference between RkSD and RDkSD is slight. Hence, during the pool's construction, rejecting any k-set that already occurs in the design [1] becomes unnecessary.

Comparison between RkSD and RDkSD (dashed line) with n = 5000, t = 70, and d = 3 (a) or d = 5 (b).

Then, the comparison of P⁺ between RID and RkSD is presented in Figure 2. Because the range of possible p is from zero to one and the range of possible k is from zero to t, we make k = t × p for normalization. With the k restriction on the column weight, RkSD performs better than RID. Sometimes, the difference between these two designs is very significant and can be critical for their suitability. For example, in Figure 3a, when n = 10,000, t = 85, and d = 5, the optimal P⁺ of RkSD is less than 0.1 and makes it a good design, whereas that of RID is about 0.69.

Comparison between RID (dashed line) and RkSD (solid line) with n = 5000, t = 70, and d = 3 (a) or d = 5 (b).

The performance difference between RID (dashed line) and RkSD (solid line) with n = 10,000, t = 85, and d = 5.

Figure 4 presents the comparison of P⁺ between RID and RrSD. Here, we make r = n × p. The performance of RrSD is about the same with RID, hence worse than RkSD. It seems to suggest that the column structure (of RkSD) is much more important than the row structure (of RrSD), although we have no explanation for it.

Comparison between RID (dashed line) and RkSD (solid line) with n = 1298, t = 47, and d = 2 (a) or d = 4 (b).

The data we show in Figure 4 has parameters n = 1298 and t = 47, which are much smaller than that we use in the comparison between RID and RkSD. This is because the time complexity of computing P⁺ for RrSD is O(n²), whereas for RkSD, it is O(k²t²). When n is large, kt is usually much smaller than n, so that P⁺ of RkSD is still computable and it takes an unacceptably long time to compute P⁺ for RrSD.

Although the time complexity of our formula for computing P⁺ of RkSD is claimed to be O(k²t²), which is independent of n, this ignores the fact that when n grows, the numbers in the formula have more bits and the division of large numbers takes longer to compute. Right now, we can compute for n ≤ 10,000 with kt ∼ 1500. We still need more efficient equations to deal with larger parameters (e.g., for n is in the order of 10⁶).

In case that no explicit exact formulas can be obtained, we need good approximations in explicit forms. The reason of the need for explicit forms is not only for faster computation but also for being able to solve for optimal design parameters p, k, and r analytically. The numerical evidence certainly suggests that a unique optimum exists for each design.

Percus, Percus, Bruno, and Torney [9] gave an approximation whose leading term gives P⁻ if the rows were independent, then correction terms and some higher-order terms. For example, the approximation of P⁻ for RkSD is Eq. (42) of [9]. The first term is P⁻ for RkSD if the rows were independent. The second term corrects for this independence assumption and the third term reflects the consequences of dispersion and nondiscreteness of the number of positives.

Footnotes

Partially supported by Republic of China, National Science Council grant NSC 90-2115-M-009-027.

References

REFERENCES

Balding, D.J., Bruno, W.J., Knill, E., & Torney, D.C. (1995). A comparative survey of non-adaptive pooling designs, Genetic Mapping and DNA Sequencing, IMA Volumes in Mathematics and Its Applications. New York: Springer-Verlag, pp. 133–155.

Bruno, D.J., Knill, E., Balding, D.J., Bruce, D.C., Doggett, N.A., Sawhill, W.W., Stalling, R.L., Whittaker, C.C., & Torney, D.C. (1995). Efficient pooling designs for library screening. Genomics 26: 21–30.Google Scholar

Erdős, P.A. & Rényi, A. (1963). On two problems of information theory, Magyar Tud. Akad. Mat. Kutató Int. Kőzl. 8: 229–243.Google Scholar

Hwang, F.K. (2000). Random k-set pool designs with distinct columns. Probability in the Engineering and Informational Sciences 14: 49–56.Google Scholar

Hwang, F.K. & Liu, Y.C. (2001). The expected number of unresolved positive clones in various random pool designs. Probability in the Engineering and Informantional Sciences 15: 57–68.Google Scholar

Hwang, F.K. & Liu, Y.C. (to appear). Random pooling designs under various structures. Journal of Combinatorial Optimization.

Macula, A.J. (1997). A simple construction of d-disjunct matrices with certain weights. Discrete Mathematics 80: 311–312.Google Scholar

Macula, A.J. (1999). Probabilistic nonadaptive group testing in the presence of errors and DNA library screening. Annals of Combinatorics 1: 61–69.Google Scholar

Percus, J.K., Percus, O.E., Bruno, W.J., & Torney, D.C. (1999). Asymptotics of pooling designs performance. Journal of Applied Probability 36: 951–964.Google Scholar

Comparison between RkSD and RDkSD (dashed line) with n = 5000, t = 70, and d = 3 (a) or d = 5 (b).

Comparison between RID (dashed line) and RkSD (solid line) with n = 5000, t = 70, and d = 3 (a) or d = 5 (b).

The performance difference between RID (dashed line) and RkSD (solid line) with n = 10,000, t = 85, and d = 5.

Comparison between RID (dashed line) and RkSD (solid line) with n = 1298, t = 47, and d = 2 (a) or d = 4 (b).

Article contents

A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES IN RANDOM POOLING DESIGNS

Abstract

1. INTRODUCTION

2. A GENERAL APPROACH TO COMPUTE THE PROBABILITIES OF UNRESOLVED CLONES

3. RANDOM INCIDENCE DESIGN

4. RANDOM k-SET DESIGN

5. RANDOM r-SIZE DESIGN

6. RANDOM DISTINCT k-SET DESIGN

7. SUMMARY AND NUMERICAL DATA

Footnotes

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests