I. INTRODUCTION
How many different crystal structures can have the identical single (powder) diffraction pattern? When a crystal structure fits well to an observed pattern, how one can check whether or not there exist similarly good solutions? Although these problems have been pointed out [e.g., Engel (Reference Engel1988)], they have not been studied so rigorously. In powder indexing, although the problem to be solved is different, the problem about the uniqueness of solutions known as geometrical ambiguities was rigorously studied in Mighell and Santoro (Reference Mighell and Santoro1975). Recently, much progress was also made in the author's two papers (Oishi-Tomiyasu, Reference Oishi-Tomiyasu2014, Reference Oishi-Tomiyasu2016). In particular, the developed method can be used to check results of powder indexing.
For the same purpose, the uniqueness of solutions in crystal structure analysis from single-crystal diffraction patterns is investigated herein. Mathematically, crystal structures that have the same diffraction patterns as the input crystal structure are generated by: (a) enumerating all the sets of atomic coordinates with the identical set of difference vectors, and (b) determining the atomic species of every coordinates from the diffraction pattern of the original crystal structure. Both of (a) and (b) are not straightforward, considering combinatorial explosion that may occur in (a), and the necessity to solve a system of quadratic equations in (b).
Our new software succeeded in: (a) avoiding combinatorial explosion when medium-sized unit cells containing several tens atoms are input, and (b) solving the system of quadratic equations generally and efficiently by using the positive semidefinite programming (SDP) solver SDPA (Yamashita et al., Reference Yamashita, Fujisawa, Nakata, Nakata, Fukuda, Kobayashi and Goto2010). In this paper, the theory and some computational results are introduced.
By using the software, it is now possible to computationally answer the two questions posed in the first paragraph. The introduced results indicate that another plausible solution will not exist for ten crystal structures randomly chosen. We are now proceeding the exhaustive search for crystal structures that may have the same diffraction pattern with another crystal structure, by using crystal-structure database (see the acknowledgments for the information about the database, which currently contains more than 4000 CIF files). The results will be reported in the author's subsequent articles.
If two single crystals are judged to have the perfectly identical diffraction patterns (i.e., crystal lattice and set of structure factors), their powder samples also have the identical powder diffraction pattern. It should be noted that it is more complex mathematically to similarly check the uniqueness of solutions for powder crystal structure analysis. However, the frameworks introduced in this paper can be used for it.
II. PROBLEM TO BE SOLVED
The problem considered herein is how to generate crystal structures having structure factors with the perfectly identical absolute values as a given crystal structure. This is different from the problem normally solved in ab initio crystal structure analysis, in the sense that influence of observational errors can be largely ignored. This simplification drastically reduces the difficulty of the problem. Nevertheless, our problem is also useful for ab initio crystal structure analysis, because it can be used to check results, and understand the mathematics hidden in the phase-retrieval problems.
Throughout this article, the following ρ is used as a simple model of electron/nucleus distributions in a crystal:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn1.gif?pub-status=live)
where L is the lattice of a crystal, x i (1 ≤ i ≤ n) are positions of atoms in the unit cell. In the neutron case, C i is the scattering factor of each atom. In the X-ray case, the scattering factors are normally approximated by the sum of exponential functions depending on the d-spacings:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn2.gif?pub-status=live)
which is the Fourier transform of the sum of the modified Lorenzian function:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqnU1.gif?pub-status=live)
where a ik and b ik are real numbers, and c i can be a complex, if anomalous dispersion is considered. From the integral
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqnU2.gif?pub-status=live)
it is seen that in the X-ray case, C i can be approximated by
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqnU3.gif?pub-status=live)
which enables us to use formula (1), in common to the neutron case. The Patterson function, i.e. the Fourier transform of the diffraction pattern of ρ(x) is given by:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn3.gif?pub-status=live)
where
$\overline {C_j} $
is the complex conjugate of C
j
, thus
$\overline {C_j} = C_j,\; $
since the scattering factors are now assumed to be real.
As a consequence, two crystal structures have the same diffraction pattern if and only if:
-
(i) their lattices are congruent in
${\opf R}^{\rm 3}$ ,
-
(ii) they have the identical set (denoted by Λ) of difference vectors x i − x j (1 ≤ i, j ≤ n), and
-
(iii) For each Δ in Λ, the values of
$\sum\nolimits_{x_i - x_j = \Delta} {C_i\overline {C_j}} $ (integrated intensity of the peak of the Patterson function corresponding to Δ) are the same.
In the following computation, information about chemical formula is also used.
What follows, for precise, the primitive cell of L (not the conventional cell depending on the centring type) is always used as the unit cell for simplicity. A finite set {x 1, …, x n } provided by all the coordinates in the unit cell is called a configuration, in order to mean that a configuration is the crystal structure without information about atomic species. With regard to (ii), our judgment whether x i − x j = x k − x l always allows a small difference in their values.
In this situation, the following two stages are required for the computation:
-
(1) Enumerate all the configurations with the set of difference vectors perfectly same as that of the input crystal structure. (Let Δ0 = 0, ±Δ1, …, ±Δ N be all the elements of the set.)
-
(2) For each configuration enumerated in (1), determine atomic species at every coordinate, by estimating the scattering factors. More concretely, the following system of equations is solved:
(4)$$\left\{ \eqalign{\mathop \sum \limits_{i = 1}^n x_i = & \; \mathop \sum \limits_{\,j = 1}^m C_i, \cr \mathop \sum \limits_{v_i - {\rm \;} v_j = \Delta _k} x_ix_j = & \; \mathop \sum \limits_{u_i - {\rm \;} u_j = \Delta _k} C_iC_j\quad \left( {0{\rm \;} \le k{\rm \;} \le N} \right),} \right.$$
where C j , uj (1 ≤ j ≤ m) are the scattering factors and the coordinates of the input crystal structure, and x i , vi (1 ≤ i ≤ n) are those of the considered configuration. If the determined x i is close to some C j , then the atomic specie at u j is also disposed at v i .
It is straightforward to solve the system of Eq. (4), if the peaks of the Patterson function are not overlapping, i.e. all the v i − v j (1 ≤ i, j ≤ n) are distinct to each other. However, overlapping peaks seems to appear more frequently than expected, owing to the symmetry of the crystal structure (see Table I). In order to generally solve the system, the method provided by the Gröbner basis will be a candidate, if the coefficients of the equations are exact. If the errors contained in the positions of the input crystal structure and coefficients of scattering factor formula (2) are ignored, this may be considered to hold in our situation. However, the following formulation based on optimization is preferable, in order to fully consider the propagated errors contained in the atom positions determined from observational data. Therefore, the following minimization is adopted in this paper:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn5.gif?pub-status=live)
Table I. Result of Algorithm III.A; the input parameter ε used for judgment whether any vectors (v
1, v
2, v
3) in
${\opf R}^{\rm 3}/{\opf Z}^{\rm 3}$
equal 0 (i.e., if
$\sum\nolimits_{i = 1}^3 {\left\vert {{\rm exp}(2\pi \sqrt { - 1} v_i) - 1} \right\vert^2} \lt \varepsilon ^2$
) is set to 0.001.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-82420-mediumThumb-S0885715616000804_tab1.jpg?pub-status=live)
In the next section, the above minimization will be solved by using the method called SDP relaxation. Minimization of
$\sum\nolimits_{k = 0}^{N - 1} {\big\vert\sum\nolimits_{v_i - {\rm \;} v_j = \Delta _k} {x_ix_j -} \sum\nolimits_{u_i - {\rm \;} u_j = \Delta _k} {c_ic_j} \big\vert^2} $
is not considered herein, because a polynomial of less degrees is preferable, in order to apply the SDP relaxation in the following. This is why the sum of absolute values is minimized in (5).
III. ALGORITHMS
A. Method to enumerate configurations with an identical set of difference vectors.
The following is the algorithm implemented in the new software:
Input: Λ = {±Δ1, …, ±Δ N }: set of all the difference vectors of a given crystal structure, not including the vector 0, and assumed to be sorted in the descending order of lengths.
Output: all the configurations in
${\opf R}^{\rm 3}/{\opf Z}^{\rm 3}$
that have Λ as the set of difference vectors.
Algorithm:
-
(i) Set the stage number s = 1, which is incremented, while the algorithm is carried out.
-
(ii) As the initial step of the stage s = 1, make the following configuration:
-
(iii) At the stage s = k, for each node (denoted by v) taken from each configuration that has been constructed, make new configurations, by repeating the following procedures (a), (b):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-35183-mediumThumb-S0885715616000804_fig1g.jpg?pub-status=live)
Figure 1. Difference vectors of a configuration of atoms.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-76892-mediumThumb-S0885715616000804_fig2g.jpg?pub-status=live)
Figure 2. Configuration consisting of two atoms with the different vector Δ1.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-63712-mediumThumb-S0885715616000804_fig3g.jpg?pub-status=live)
Figure 3. New configuration formed in the stage k.
The formed configurations are saved, as long as they are distinct from the others, and all of their difference vectors are in Λ. Proceed to the next stage s = k + 1, if it becomes impossible to add another Δ k , without violating the rule that all difference vectors are in Λ.
A graph is called a complete graph, if every pair of distinct nodes is connected by a unique edge. If C
1, …, C
m
are all the configurations that have Λ as the set of difference vectors, each C
i
can be identified with a complete graph G with the edges assigned to one of ±Δ1, …, ±Δ
N
. A complete subgraph
${H_k}\! \subset \!G$
is obtained, by connecting all the points in G that are endpoints of ±Δ1, …, ±Δ
k
in G.
In short, in the stage k, the above algorithm aims to generate the H k that is uniquely determined for each C i . This is in order to avoid combinatorial explosion; distinct subgraphs of G i should not be simultaneously saved in each stage. This use of H k succeeded in suppressing the memory usage for medium-sized unit cells containing not more than several tens of atoms.
B. Method to solve quadratic optimization problem (QP)
What follows, the set of all the difference vectors of a given crystal structure Λ = {Δ0, ±Δ1, …, ±Δ N } is assumed to include Δ0 = 0. It is straightforward to verify that the following optimization problem is equivalent to (5):
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn6.gif?pub-status=live)
If we put
$\varepsilon _{1,k} = \varepsilon _k + \sum\nolimits_{v_i - {\rm \;} v_j = \Delta _k} {x_ix_j -} {\rm \;} \sum\nolimits_{u_i - {\rm \;} u_j = \Delta _k} {c_ic_j} $
and
$\varepsilon _{2,k} = \varepsilon _k - \sum\nolimits_{v_i - {\rm \;} v_j = \Delta _k} {x_ix_j} + \sum\nolimits_{u_i - {\rm \;} u_j = \Delta _k} {c_ic_j} $
, the problem (6) is also equivalent to:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn7.gif?pub-status=live)
The problems (6) and (7) belong to the category of QPs, because the objective function and the constraints are polynomials of degree ≤2. The SDP relaxation is one of the standard methods to solve QP problems. According to the notation of SDP, the symbol • is used to represent the inner product of symmetric matrices X = (X ij ) and Y = (Y ij ) of the same size n:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn8.gif?pub-status=live)
What follows, X ≥ 0 is used to mean that the symmetric matrices X is positive semidefinite.
An optimization problem is classified in the category of positive SDP, if it can be represented as either of the following forms:
-
(i) Minimize C · X subject to
(9)$$\left\{ {\matrix{ {A_k{\rm \cdot} X = b_k\quad \left( {k = 1, \ldots, N} \right)} \cr {X \ge 0.} \cr}} \right.$$
-
(ii) Maximize
${\rm \;} \sum\nolimits_{k = 1}^N {b_ky_k} $ subject to
(10)$$\left\{ {\matrix{ {Y + \mathop \sum \limits_{k = 1}^N y_iA_i = C} \cr {Y \ge 0.} \cr}} \right.$$
If the symmetric matrices A k (k = 1, …, N), C and the vector (b 1,…, b N ) are common in (i), (ii), the problem (i) [resp. (ii)] is called the primal problem, when (ii) [resp. (i)] is called the dual problem. This is owing to the following facts known as the duality theorems:
Theorem 1 (weak duality theorem)
If some X, (Y, y
1,…, y
N
) satisfies (9), (10), respectively,
$C{\rm \cdot} X \ge \sum\nolimits_{k = 1}^N {b_ky_k} $
always hold.
Theorem 2 (strong duality theorem)
Suppose that some X, (Y, y 1,…, y N ) satisfies (9), (10), respectively. In this case:
-
• If X is positive-definite, there exists an optimum solution (Y*, y 1 *,…, y N *) of (ii), and the optimum value coincides with the lower limit of C · X under the constraint (9).
-
• If Y is positive-definite, there exists an optimum solution X* of (i), and the optimum value coincides with the upper limit of
$\sum\nolimits_{k = 1}^N {b_ky_k} $ under the constraint (10).
-
• If both X and Y are positive-definite, then there exist optimum solutions X*, (Y*, y 1 *,…,y N *), and
$C{\rm \cdot} X^* = \sum\nolimits_{k = 1}^N {b_ky_k^*} \; $ holds.
In general, positive SDP belongs to the category of convex optimization. It is known that interior point methods can efficiently find the global optimum solution.
According to the following steps, a SDP relaxation of (7) is obtained as a result:
-
1. By using the constraint
$\mathop \sum \limits_{i = 1}^n x_i = \mathop \sum \limits_{\,j = 1}^m {\rm \;} c_j,$ remove the variable x n from the problem.
-
2. Now the problem has the variables x 1, …, x N-1 and ε 1,k ≥ 0, ε 2,k ≥ 0 (1 ≤ k ≤ N). The variable set can be identified with the following matrix X, by putting x = (1, x 1, …, x N-1) and
$\tilde X = x^Tx$ (x T is the transposition of x):
(11)$$\eqalign{& X = \left( {\matrix{ {\tilde X} & {} \cr {} & {\matrix{ {\matrix{ {\varepsilon _{1,1}} & {} & {} \cr {} & \ddots & {} \cr {} & {} & {\varepsilon _{1,N}} \cr}} & {} \cr {} & {\matrix{ {\varepsilon _{2,1}} & {} & {} \cr {} & \ddots & {} \cr {} & {} & {\varepsilon _{2,N}} \cr}} \cr}} \cr}} \right)\ge 0,\cr & \tilde X_{11} = 1,\tilde X\;{\rm is}\;{\rm rank\;} \;1,}$$
where
$\tilde X_{ij}$
is the (i, j)-entry of
$\tilde X$
, and all the empty entries of X are assumed to be zero.
As a result, (7) is equivalent to the following problem:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqnU4.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn12.gif?pub-status=live)
where C is the following symmetric matrix:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn13.gif?pub-status=live)
A k , bk are the symmetric matrix and the real number uniquely determined from the constraint:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170905091958799-0123:S0885715616000804:S0885715616000804_eqn14.gif?pub-status=live)
Z is the matrix with 1 in the (1, 1)-entry and 0 in the others; hence, Z
·
X = 1 means
$\tilde X_{11} = 1$
.
If the rank-1 condition is removed from (12), the obtained problem is a SDP problem, because it is represented as in (9). Owing to the convexity, it is not necessary to worry about “local minimums” when the SDP problem is solved. If the objective functions of P and P* converge to a value larger than zero, it is computationally proved that the original problem (5) has no solutions. Otherwise, if the objective function converges to a value close to 0, and X converges to a rank-1 matrix
$\tilde X$
, it is proved that the vector
x
= (1, x
1, …, x
N−1) satisfying
$\tilde X = {}_{}^T {\bi xx}\; $
is the “unique” solution of (5).
In general, one of the obstacles in utilizing the SDP relaxation is that the size n of the variables X, Y and the number N of constraints have a tendency to increase to a large number, although the numbers must be under practical limits. For example, SDPARA-C can handle the cases of n < 20 000 and N < 20 000. In the considered situation, n and N are normally <60 and <2000; hence, public SDPA solvers can efficiently solve the problem on a home PC. Another difficulty is that it is sometimes not straightforward to evaluate results, when the objective function converges to a value close to 0 and X converges to a matrix of rank >1. If influence of the errors is now assumed to be little, it is normally possible to obtain a finite set containing all the solutions from
$\tilde X$
and A
k
by using the Gröbner basis or some other methods.
IV. IMPLEMENTATION AND RESULTS
The methods described in III.A and III.B were implemented into programs with C++. The program for the latter calls SDPA (Yamashita et al., Reference Yamashita, Fujisawa, Nakata, Nakata, Fukuda, Kobayashi and Goto2010) as a solver of the SDP problem. Although the software was able to deal with only unit cells of primitive centring as of June, 2016 when EPDIC-15 was held, it can now handle all the centring types.
In addition to the NaCl sample (case of two atoms in a unit cell), the test data were randomly chosen from a database including more than 2000 CIF files. In Table I, results of Algorithm III.A are presented. It is seen that the Patterson functions have a number of overlapping peaks; otherwise, the value of N (N−1)/2 in the table equals the number of difference vectors.
On the computer with i7-6500 CPU@2.50 GHz, 8 GB memory, the computation times did not exceed 5 s for all the test data. Although memory allocation error did not happen for the unit cells in the tables, the memory usage becomes rather severe, as the number of atoms becomes more than several tens. Except for sample 6, the results indicate that the crystal structures do not have the same set of difference vectors as distinct configurations.
The program outputs the generated configurations in an html file, so that users can see the results with the 3D viewer of web browsers (Figure 4).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-18290-mediumThumb-S0885715616000804_fig4g.jpg?pub-status=live)
Figure 4. Results for sample 6 (the output displayed on a web browser); No. 1 is the configuration of the original crystal structure and confirmed to have the same set of difference vectors as its two subsets Nos 2 and 3.
With regard to sample 6, newly obtained configurations are subsets of the original configuration, which indicates that their chemical formula must be different from the original one. Therefore, they will not be judged as good candidates experimentally.
Subsequently, Algorithm III.B was applied to the data 1–10 by using original crystal structures as the configuration to determine the atomic species. In this setting, the scattering factors should be the same as those of the original structure. In addition, with regard to the NaCl sample, at least two solutions should be found, because, if a good solution exists, another good solution is generated just by switching the two scattering factors x 1, x 2.
The results of our numerical test are presented in Table II. In all the cases, SDPA carried out its global search very quickly. Except for the NaCl sample, a rank-1 matrix was returned as the optimum solution
$\tilde X\; $
in (11), and the scattering factors obtained as a result were very close to those of the atoms assigned in the original crystal structure, which indicates that the diffraction patterns will not be the same, if the atomic species of the input crystal structures are replaced.
Table II. Result of Algorithm III.B.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170905095734-89051-mediumThumb-S0885715616000804_tab2.jpg?pub-status=live)
Although the solution was not uniquely determined for Sample 1, nothing more difficult than solving quadratic equations is necessary for this case. For further investigation, it will be also necessary to find crystal structures that provide cases of rank >1.
Overall, it may be concluded that our project to develop programs to check the uniqueness of solutions in crystal structure analysis from diffraction patterns, has gained satisfactory results and software. As applications of the methods developed herein, the following researches are planned:
-
• Exhaustive search for distinct crystal structures with the identical diffraction patterns, using crystal-structure database.
-
• Extension of the methods and the programs to the cases of powder diffraction.
-
• Implementation of a new method to handle the cases when the objective function of the SDP problem converges to 0, and the optimum solution a tilde should be on top of the
${\tilde X} $ as in equation 11 is rank >1.
V. CONCLUSION
Methods to check the uniqueness of the solution in single-crystal structure analysis was provided. The developed program works well for unit cells containing not more than several tens of atoms, and was able to provide a check that the crystal structure is the only possible solution. The theory of positive-semidefinite programming and the solver SDPA were used to solve the systems of quadratic equations required for the computation.
ACKNOWLEDGEMENTS
The author would like to thank Professor Kamiyama of High Energy Accelerator Research Organization for offering his private database to this software project. She would also like to extend her gratitude to Mr. Tamura and Mr. Kurosawa of Visible Information Inc. for their collaboration in implementing the software.