Introduction
Soybeans [Glycine max (L.) Merr.] have mostly either purple or white flowers, although there are some colour variations such as purple throat, near white, pink and magenta. The flower colour of soybeans is primarily controlled by six genes (W1, W2, W3, W4, Wp and Wm). These genes, with the exception of W2 (see below), encode enzymes involved in anthocyanin and flavonol biosynthesis and thereby determine the pigmentation of flower petals (Stephens and Nickell, Reference Stephens and Nickell1992; Palmer et al., Reference Palmer, Pfeiffer, Buss, Kilen, Specht and Boerma2004; Zabala and Vodkin, Reference Zabala and Vodkin2007; Takahashi et al., Reference Takahashi, Dubouzet, Matsumura, Yasuda and Iwashina2010; Yan et al., Reference Yan, Di, Rojas Rodas, Rodriguez Torrico, Murai, Iwashina, Anai and Takahashi2014).
Under the W1 genotype, W3W4 soybeans have dark-purple flowers, whereas soybeans with w3W4 and w3w4 alleles have purple and near-white flowers, respectively (Hartwig and Hinson, Reference Hartwig and Hinson1962). In the anthocyanin and flavonol biosynthetic pathway, both w3 and w4 encode dihydroflavonol 4-reductase (DFR) (Palmer and Groose, Reference Palmer and Groose1993; Fasoula et al., Reference Fasoula, Stephens, Nickell and Vodkin1995; Xu et al., Reference Xu, Brar, Grosic, Palmer and Bhattacharyya2010; Yan et al., Reference Yan, Di, Rojas Rodas, Rodriguez Torrico, Murai, Iwashina, Anai and Takahashi2014). DFR is required for the synthesis of three anthocyanin classes (delphinidins, pelargonidins and cyanidins). Soybeans with the wp or wm genotype have pink or magenta flowers, respectively (Johnson et al., Reference Johnson, Stephens, Fasoula, Nickell and Vodkin1998; Takahashi et al., Reference Takahashi, Githiri, Hatayama, Dubouzet, Shimada, Aoki, Ayabe, Iwashina, Toda and Matsumura2007). Wp encodes flavonone 3-hydroxylase enzymes required in an earlier step than DFR in the biosynthetic pathway of anthocyanins (Zabala and Vodkin, Reference Zabala and Vodkin2005). Wm encodes flavonol synthase, which catalyses the reduction of dihydroflavonols to flavonols. The w2 allele results in purple-blue colour, and its wild-type gene encodes a myeloblastosis transcription factor required for vacuolar acidification of flower petals (Takahashi et al., Reference Takahashi, Matsumura, Oyoo and Khan2008, Reference Takahashi, Benitez, Oyoo, Khan and Komatsu2011).
Unlike the recessive alleles of other flower pigmentation genes, the recessive allele w1 of the W1 locus eliminates flower colour completely, because the W1 locus encodes flavonoid 3′,5′-hydroxylase, which produces the early precursor dihydroflavonols for anthocyanidin and flavonol synthesis (Buzzell et al., Reference Buzzell, Buttery and MacTavish1987; Zabala and Vodkin, Reference Zabala and Vodkin2007). The w1 allele from the G. max accession Williams 82 was created by a small 65 bp insertion with tandem repeats and a 12 bp deletion in the third exon of W1, resulting in the premature termination of translation and thus non-functional W1 proteins. Takahashi et al. (Reference Takahashi, Dubouzet, Matsumura, Yasuda and Iwashina2010) proved that W1 is also an essential gene in anthocyanin biosynthesis in a wild soybean accession (G. soja Sieb. & Zucc).
Of the 19,648 soybean accessions in the United States Department of Agriculture – Germplasm Resource Information Network (USDA-GRIN), 13,132 accessions (67%) have purple flowers and 6344 (32%) have white flowers; only a small fraction of them have flowers of other colours. In contrast, almost all wild soybean accessions have purple flowers. This therefore invites questions as to why and how the white-flowered accessions have become abundant among the cultivated soybeans as well as what the genetic and regional origin of the white-flowered accessions is. To address these questions, we, first, tried to define the cause of white colour production in the white-flowered accessions and found that all the 99 white-flowered landraces randomly selected from the worldwide germplasm collections have the same w1 allele that Williams 82 has. We also analysed the nucleotide sequences of the W1 genes of purple-flowered landraces from Korea, China, Japan and other Asian countries and determined the origin of the white-flowered accessions.
Materials and methods
Plant materials
A total of 153 accessions of G. max and G. soja were used in this study. Among them, 99 landrace accessions from Korea (64), China (17) and Japan (18) had white flowers and were randomly selected from the National Agrobiodiversity Center of Republic of Korea (IT accessions) and from the USDA-GRIN Soybean Germplasm Collections (PI accessions) (Table S1, available online). These accessions were used for the insertion and deletion (indel) detection analysis to distinguish the w1 allele-containing soybeans. Williams 82 and Hwangkeumkong, which have white and purple flowers, respectively, were used as controls. A total of 39 purple-flowered landrace accessions from Korea (10), China (10), Japan (7), India (2), Indonesia (2), Myanmar (2), Nepal (2), Vietnam (2) and Russia (2) were selected for phylogenetic analysis (Table S2, available online). In addition, 15 purple-flowered G. soja accessions from Korea (5), China (5) and Japan (5) were included in the analysis (Table S3, available online).
Indel and phylogenetic analyses
The genomic DNA of soybean accessions was isolated from trifoliolate leaves using the cetyltrimethylammonium bromide method (Doyle and Doyle, Reference Doyle and Doyle1987). Polymerase chain reaction (PCR) analysis was performed to identify the 53 bp indel insertion of the w1 allele using the following profile: 40 cycles of 94°C for 30 sec, 58°C for 30 sec, and 72°C for 1 min. The sequences of upstream and downstream primers were 5’-TGGTGCTGGGAGGAGGATTT-3′ and 5′- CTTGCTGCTTTGGTTACCCC-3′, respectively. The nucleotide sequences of the W1 locus of the purple-flowered landraces and wild soybeans were determined. The amplified DNAs were about 4.7 kb in length and included a 25 bp upstream region of ATG, 2.9 kb of two introns, 1.5 kb of three exons and a 244 bp downstream region of the stop codon (Zabala and Vodkin, Reference Zabala and Vodkin2007). The GenBank accession numbers of the W1 genes of soybean accessions (PI483462B, PI458538, PI464939A, PI407290, PI549036, PI366121, PI507624, PI406684, PI507646, PI378701A, PI407280, PI407271, PI407229, PI424120, PI339731, IT182932, IT115634, IT165401, IT141739, IT022876, PI416833, IT154524, PI518716, IT208306 and IT263338) ranged from KJ911862 to KJ911886. Twenty-seven sequences of the W1 gene were aligned and a neighbour-joining phylogenetic tree was constructed using Mega5.2 with a Kimura two-parameter model and 1000 bootstrap replicates (Tamura et al., Reference Tamura, Peterson, Peterson, Stecher, Nei and Kumar2011). A hierarchical likelihood ratio test was carried out using JModelTest to determine which substitution model best described the evolution of W1 sequences (Posada, Reference Posada2008). The Tamura–Nei model was specified for W1 sequences using the Akaike information criterion (Tamura and Nei, Reference Tamura and Nei1993). PhyML 3.0 was used for maximum-likelihood analysis with the model TPM1uf+I+G (Guindon et al., Reference Guindon, Dufayard, Lefort, Anisimova, Hordijk and Gascuel2010). In addition, nucleotide diversity, or the average number of nucleotide differences per site, was estimated using DnaSP 5.0 (Rozas et al., Reference Rozas, Sanchez-DelBarrio, Messeguer and Rozas2003).
Results
Polymorphism and indel analysis
We wondered why the USDA-GRIN database had a higher proportion of white-flowered accessions and which gene was mutated in white-flowered soybeans. To address this question, 99 soybean landraces, not cultivars, with white flowers from Korea (64), China (17) and Japan (18) were randomly selected (Table S1, available online) and were used for PCR analysis with the primer set spanning the indel. The PCR products of all the 99 landraces were compared with those of Williams 82 with white flowers and of Hwangkeumkong with purple flowers, which served as controls, and polymorphism between them was determined. The sizes of the PCR products all the landraces examined were identical to those of Williams 82, as shown in 12 representative examples (Fig. 1), and the sizes were greater than those of Hwangkeumkong due to the excess of 53 bp nucleotides in the third exon (Zabala and Vodkin, Reference Zabala and Vodkin2007). To assess polymorphism among the w1 alleles examined, 20 landraces were randomly selected from among the 99 landraces (Korean (10), Chinese (5) and Japanese (5)) and about 4.7 kb-long nucleotide sequences of w1 genes were determined. The nucleotide sequence analysis revealed that the w1 genes of all the 20 landraces with white flowers had nucleotide sequences that were the same as the w1 allele of Williams 82 (data not shown), indicating that there is no polymorphism between the w1 alleles at all.

Fig. 1 Results of the indel analysis carried out to determine the presence of w1 alleles in white-flowered landrace accessions. The PCR products of white-flowered Williams 82 and purple-flowered Hwangkeumkong are loaded as controls and their lengths are 236 bp and 183 bp, respectively. M, molecular marker.
Phylogenetic analysis
We analysed the nucleotide sequences of the W1 genes of 39 purple-flowered landraces and 15 wild soybean accessions from eight Asian countries and Russia (Tables S2 and S3 and Fig. S1, available online). The sequences of IT182932 (G. soja used in genome sequencing by Kim et al., Reference Kim, Lee, Van, Kim, Jeong, Choi, Kim, Lee, Park, Ma, Kim, Kim, Park, Lee, Kim, Kim, Shin, Jang, Kim, Liu, Chaisan, Kang, Lee, Kim, Moon, Schmutz, Jackson, Bhak and Lee2010) and L79-908 (G. max used in the characterization of the W1 gene by Zabala and Vodkin, Reference Zabala and Vodkin2007) were obtained from the GenBank database. A phylogenetic tree was constructed to infer the regional origin of white-flowered soybeans and to evaluate the genetic relationship between G. max and G. soja along with Williams 82 (white flower) (Fig. 2). These soybean accessions were grouped into two main clusters. The soybean accessions examined in this study are shown in Fig. 2(b). The nucleotide sequence analysis revealed that the W1 genes of G. max accessions belonging to the same branch are identical. Cluster I mostly consisted of wild soybeans, including G. soja accession IT182932 (soja16), but not the G. soja accession PI549036 (soja15). However, wild soybeans from China, Japan and Korea, even if they originated from the same country, did not group to form a subcluster, but rather scattered across the branches (Fig. 2(a)), suggesting that the wild accessions from these three countries might have genetically intermingled with each other, which is in agreement with the results of previous studies (Hymowitz and Kaizuma, Reference Hymowitz and Kaizuma1981; Han et al., Reference Han, Abe and Shimamoto1999).

Fig. 2 Phylogenetic relationships between W1 genes. (a) A phylogenetic tree. About 4.7 kb-long nucleotide sequences of the W1 genes of 39 purple-flowered landraces and 15 wild soybeans were determined. Twenty-seven sequences of the W1 genes including those of IT182932 (soja16) and L79-908 (max10) and the w1 gene of a white-flowered soybean (Williams 82) were aligned and a neighbour-joining phylogenetic tree was constructed. The sequences of IT182932 (Glycine soja) and L79-908 (G. max) were obtained from the GenBank database. The number on each node indicates the bootstrap value and the scale bar indicates nucleotide substitutions per site. (b) List of soybean accessions used in the construction of the phylogenetic tree. The nucleotide sequences of the W1 genes of purple-flowered accessions belonging to the same branch are identical to each other. Abbreviations in parentheses of IT and PI accessions: C, China; I, India; IN, Indonesia; J, Japan; K, Korea; M, Myanmar; N, Nepal; R, Russia; and V, Vietnam.
Almost all the purple-flowered soybean accessions (G. max) were grouped along with Williams 82 (white flower) and G. max accession L79-908 (max10) in cluster II. However, two G. max accessions, IT208306 (max1) and IT263338 (max2), were placed in cluster I, suggesting that they are more similar to G. soja than to G. max. On the other hand, one G. soja accession PI549036 (soja15) was grouped together with the purple-flowered G. max in cluster II, indicating that it is more similar to G. max than to G. soja.
The nucleotide sequences of the W1 genes of G. max in the branch max6 [IT022876 (China); PI423935, PI227212, PI200458 and PI561377 (Japan); and IT136306 and IT200606 (Nepal)] perfectly matched those of the w1 gene of Williams 82 when the indel sequence was excluded. Similarly, PI416833 (Japan) of the single-membered branch max7 closely matched the w1 gene of Williams 82, but with the exception that their first introns differed only in a single nucleotide. Therefore, it is tempting to speculate that the w1 allele of white-flowered soybeans might have diverged from the branch max6. The maximum-likelihood analysis presented a topology similar to the tree derived from the neighbour-joining analysis (data not shown).
Using the 500 bp interval sliding-window analysis, the nucleotide sequences of 16 G. soja accessions and 10 G. max accessions (one representative accession from each branch of G. max in the phylogenetic tree) were analysed to investigate nucleotide diversity (Fig. 3). The nucleotide sequences and single nucleotide polymorphisms (SNPs) of G. soja and G. max accessions are shown in Fig. S2 (available online). We found that the distribution of nucleotide diversity was uneven along the whole length of the W1 gene and the diversity patterns of G. max and G. soja were similar (Fig. 3). The diversity levels of introns were much higher than those of exons, especially the highest diversity was observed in the approximately 0.4 kb region within the second intron spanning from 2 to 2.4 kb (Fig. 3(c)).

Fig. 3 Nucleotide diversity along the whole W1 locus. The nucleotide diversity (π) distributions of the W1 genes of (a) 16 Glycine soja accessions and (b) 10 G. max accessions were plotted using the 500 bp interval sliding-window analysis and (c) then combined as a pooled diagram. Numbers in bp correspond to the nucleotide position of W1 gene. CDR, coding region.
Discussion
White-flowered G. max accessions occupy the second highest position in the USDA-GRIN database after the purple-flowered accessions. We found that the white flower colour of all the soybean landraces examined is due to the presence of the same indel mutation that Williams 82 has. The nucleotide sequence and phylogenetic analyses revealed that the w1 alleles of the 99 white-flowered landraces were identical to those of the white-flowered Williams 82 and also that the w1 allele of white-flowered soybeans might have diverged from the branch max6.
Furthermore, the phylogenetic tree revealed that the mixed-up pattern of wild and cultivated soybean accessions has also been reported previously (Xu and Gai, Reference Xu and Gai2003; An et al., Reference An, Zhao, Dong, Wang, Li, Zhuang, Gong and Liu2009). Molecular analyses based on the chloroplast and nuclear simple sequence repeat (SSR) marker variation revealed the occurrence of introgression from cultivated soybean accessions into wild soybeans (Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006; Abe et al., Reference Abe, Hasegawa, Fukushi, Mikami, Ohara and Shimamoto1999; Wang et al., Reference Wang, Li, Zhang, Chen, Zhang and Yu2010). Similarly, Li et al. (Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010) and Lam et al. (Reference Lam, Xu, Liu, Chen, Yang, Wong, Li, He, Qin, Wang, Li, Jian, Wang, Shao, Wang, Sun and Zhang2010) found the occurrence of introgression from wild soybeans into cultivated soybeans, explaining the swapping in the position of some accessions in the phylogenetic clusters. With these exceptional accessions, clusters I and II are indicative of a distinct genetic difference between wild and cultivated soybeans, and similar results have been obtained previously through other analyses, such as random amplified polymorphic DNA (Xu and Gai, Reference Xu and Gai2003), SSR (Powell et al., Reference Powell, Morgante, Doyle, McNicol, Tingey and Rafalski1996; Wen et al., Reference Wen, Ding and Zhao2009), amplified fragment length polymorphism (Maughan et al., Reference Maughan, Saghai-Maroof, Buss and Huestis1996), SNP (Li et al., Reference Li, Li, Zhang, Yang, Chang, Gaut and Qiu2010) and microsatellite variation (Kuroda et al., Reference Kuroda, Kaga, Tomooka and Vaughan2006).
However, the high occupancy ratio of white-flowered landraces throughout the world is still unclear. At present, we consider three possibilities. First, white flowers may be simply attractive to the farming community when compared with purple flowers. However, chances of pollination are very low because of the small size of soybean flowers. Second, white flowers may be closely linked to agronomic traits, such as seed yield and stress resistance. Ortiz-Perez et al. (Reference Ortiz-Perez, Horner, Hanlin and Palmer2006) reported that white-flowered accessions have more seed set than purple-flowered accessions. In addition, Severson (Reference Severson1983) reported that there is a considerable difference between white-flowered and purple-flowered soybeans with regard to fructose and glucose content, nectar volume and total carbohydrate content per flower. Finally, even if white flowers are not linked to any important agronomic trait, there is the possibility that white-flowered soybeans might have been superior to other cultivated soybeans at the time when seed dispersal of the white-flowered soybean took place in the past.
In spite of these considerations, however, no strong evidence has been found for the high ratio of white-flowered landraces among the cultivated soybeans. Therefore, questions regarding the precise origin and abundance of white-flowered soybeans remain to be answered in the future.
Supplementary material
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1479262114000938
Acknowledgements
This research was supported by a grant from the Next-Generation Biogreen 21 Program (Plant Molecular Breeding Center No. PJ008137), Rural Development Administration, Republic of Korea.