Introduction
Soybean [Glycine max (L.) Merr.] is a leading oilseed crop accounting for 358.77 million metric tonnes in 2018–19 (USDA, 2019). Soybean in India, grown annually as rain fed crop with a production of 11.50 million metric tonnes in 2018–19 (USDA, 2019). Suitable mapping population is one of the major prerequisite for identification of genomic locations associated with quantitative traits. Biparental mapping populations have been extensively utilized to understand the genetic architecture of complex traits such as yield (Li et al., Reference Li, Pfeiffer and Cornelius2007), early maturity (Kong et al., Reference Kong, Nan, Cao, Li, Wu and Wang2014), seed weight (Karikari et al., Reference Karikari, Chen, Xiao, Chang, Zhou, Kong, Akhter Bhat and Zhao2019), etc. However, biparental mapping populations such as F2:3 BC1, BC2 not suitable for replicated/multi-location evaluations which undermines precise phenotyping. Further, mapping resolution of biparental approach is poor due to limited recombinations during development of population and lower population size (usually <400). To overcome limitations of biparental mapping populations, association mapping (AM) is utilized in soybean and other crops for mapping of complex traits with higher resolution. AM depends on linkage disequilibrium in natural populations and historical recombinant events occurred through the evolutionary process. AM has been employed in soybean to genetically dissect quantitative traits such as seed weight (Wang et al., Reference Wang, Chu, Zhang, Zhu, Cheng and Yu2016), seed protein, oil contents (Li et al., Reference Li, Zhao, Han, Li and Xie2019), etc. However, resolution of quantitative trait locus (QTL) mapping in AM is influenced by selection, migration, genetic drift and linkage disequilibrium (LD) decay. Therefore, to overcome the shortfalls of biparental mapping and AM, multiparent mapping population such as multi-parent advanced generation intercross (MAGIC) and nested association mapping (NAM) have be conceptualized in crop breeding. MAGIC population is developed by three levels (two way, four way and eight way) of inter-crosses among the founder parents and further intermating of siblings (Shivakumar et al., Reference Shivakumar, Kumawat, Gireesh, Ramesh and Husain2018). NAM captures the best features of both linkage and AM. It enables the high power and high resolution of QTL mapping through joint linkage-association analysis (McMullen et al., Reference McMullen, Kresovich, Villeda, Bradbury, Li and Sun2009). NAM population was conceptualized and developed by Buckler et al. (Reference Buckler, Holland, Bradbury, Acharya, Brown, Browne, Ersoz, Flint-Garcia, Gracia, Galubitz, Goodman, Harjes, Guill, Kroon, Lasson, Lepak, Li, Mitchell, Pressoir, Peiffer, Rosas, Rocheford, Romay, Romero, Salvo, Villeda, Silva, Sun, Tian, Upadyayula, ware, Yates, Yu, Zhang, Kresovich and Mc Mullen2009) in maize wherein 25 NAM families were developed crossing 25 diverse maize genotypes with a common reference genotype, B75. Since then, many workers have employed it in a wide range of crops including sorghum (Jordan et al., Reference Jordan, Mace, Hunt and Henzell2011), wheat (Bajgain et al., Reference Bajgain, Rouse, Tsilo, Macharia, Bhavani, Jin and Anderson2016) and rice (Fragoso et al., Reference Fragoso, Moreno, Wang, Heffelfinger, Arbelaez, Aguirre, Franco, Romero, Labadie, Zhao, Dellaporta and Lorieux2017).
Soybean NAM population comprises 5600 recombinant inbred lines (RIL) developed by crossing 40 diverse soybean lines with common parent (IA3023, a high-yielding MG III cultivar) through the single seed descent method (Song et al., Reference Song, Yan, Quingley, Jordan, Fickus, Schroeder, Song, Charles, Hyten, Nelson, Rainey, Beavis, Specht, Diers and Cregan2017) and utilized to study the flowering date in soybean (Li et al., Reference Li, Cao, He, Zhao and Gai2017), yield and agronomic traits (Diers et al., Reference Diers, Specht, Katy, Cregan, Song, Ramasubramanian, Graef, Nelson, Schapaugh, Wang, Shannon, Mchale, Kantartzi, Xavier, Mian, Stupar, Michno, An, Gottel, Ward, Fox, Lipka, Hyten, Cary and Beavis2018) and drought tolerance (Buezoa et al., Reference Buezoa, Sanz-Saezb, Jose, David, Iker and Raquel2019). Considering the significance of NAM approach in soybean improvement, a soybean NAM population was developed using JS 335 (a widely adapted cultivar in India) as common founder parent and 20 diverse founder parents. The common founder parent JS 335 was crossed with 20 diverse founder parents and developed >2000 (F2:5 generation) RIL population through the single seed descent method. For genetic analysis of yield and attributing characters 900 RILs derived from eleven crosses were used and analysed for correlation, regression, principal component analysis (PCA) and cluster analysis.
Materials and method
Plant material
A NAM population was developed by hybridizing JS 335, a popular variety of central India with 20 diverse soybean genotypes for mapping and improving towards drought and water-logging tolerance, yellow mosaic virus (YMV) and rust resistance, early maturity and higher yield (Table 1). The hybridization was carried out pairwise between JS 335 and 20 parents by pollination without emasculation procedure (Talukdar and Shivakumar, Reference Talukdar and Shivakumar2012; Shivakumar et al., Reference Shivakumar, Gireesh and Talukdar2016) to produce 20 different F1s. Due to small quantity of F1 seed produced for two combinations (NRC 37 × JS 335 and Valder × JS 335) only 18 F1 combinations along with 21 parents were grown in completely randomized block design with three replications in research field at ICAR-Indian Institute of Soybean Research (ICAR-IISR), Indore, India. The hybridity of the F1 plants was confirmed based on morphological markers such as stem pigmentation, flower colour, pubescence on stem and pod. The F2 seeds from each of true F1 plants of respective crosses were grown under field conditions. The seeds from each of F2 plants were harvested separately for all cross combination and F2–3 families were again planted and then single plant was randomly selected from each family for generation advancement. The same procedure was followed up to F5 generation. The details of 11 crosses used for the genetic variability and multivariate analysis are presented in online Supplementary Table S3. Standard cultivation practices recommended for soybean crop has been followed to raise good crop (ICAR, 2009).
Table 1. Details of NAM families used in the current study

Plant characters and data observations
To estimate heterosis, we recorded days to maturity and grain yield traits on five randomly selected plants of 18 different hybrids and respective parents of each replication. Similarly, F2 population was grown under field conditions and observations on maturity and grain yield per plant was recorded on 20 F2 populations. A total of 900 NAM-RILs derived from 11 crosses were sown in augmented design in 3 m row length along with six checks JS 335, JS 20-29, JS 20-34, NRC 86, JS 97-52 and JS 71-05 replicated in 12 blocks. The data on seven quantitative traits on these 900 NAM-RILs were phenotyped for plant height, number of branches, number of nodes, number of pods, biomass (g), grain yield (g) and harvest index. For recording of biomass trait, five randomly selected plants were harvested at physiological maturity and then oven dried plants’ sample weights were recorded and averaged.
Statistical analysis
Heterosis over standard check (standard heterosis) was calculated as per Hayes et al. (Reference Hayes, Immer, Smith, Hayes and Immer1955). Significance of heterosis estimates was tested by employing t test as per the formula given by Arunachalam (Reference Arunachalam2017). Adjusted means for all the traits under considerations for all the 900 NAM-RILs were estimated by using the R package ‘augmentedRCBD’ (Aravind et al., Reference Aravind, Sankar, Dhammaprakash and Kaur2018). Genetic variability parameters, frequency distribution and descriptive statistics were also estimated by using the R package ‘augmentedRCBD’. Phenotypic variance, genotypic variance, phenotypic coefficient of variation (PCV), genotypic coefficient of variation (GCV) (Aravind et al., Reference Aravind, Sankar, Dhammaprakash and Kaur2018), heritability (H2) (Lush, Reference Lush1940) and genetic advance (GA) (Johnson et al., Reference Johnson, Robinson and Comstock1955) were calculated by using the following formulae:







where, k is the selection intensity and σ g is the genotypic standard deviation.
Correlation analysis was performed by using the R function cor(). Correlation plots were obtained by using the R packages ‘corrplot’ (Wei and Simko, Reference Wei and Simko2017.) and ‘PerformanceAnalytics’ (Peterson and Carl, Reference Peterson and Carl2018). Linear regression analysis was performed using the R function lm(), and PCA was performed by using the R packages ‘corrplot’ (Wei and Simko, Reference Wei and Simko2017.), ‘factoextra’ (Kassambara and Mundt, Reference Kassambara and Mundt2017) and ‘FactoMineR’ (Le et al., Reference Le, Josse and Husson2008). The K-means cluster analysis was carried out using SYSTAT software version 13.2.
Results
Estimation of F1 heterosis
Analysis of variance revealed significant differences (P < 0.05) for grain yield per plant (F ratio: 49.09***) and days to maturity (F ratio: 12.47***) across all the genotypes including parents and 18 crosses. For grain yield per plant, highest standard heterosis was recorded for the cross JS 97-52 × JS 335 (77.35%) followed by Bragg × JS 335 (52.04%), JS 20-38 × JS 335 (35.58%), RVS 2007-6 × JS 335 (33.66%), EC 546882 × JS 335 (26.25%), EC 656641 × JS 335 (13.7%) and IC 15759A × JS 335 (12.6%) (online Supplementary Table S2). For days to maturity, standard heterosis was equally highest for PK 472 × JS 335 and IC 501198 × JS 335 (4.81%) followed by Hardee × JS 335 (4.54%), RVS 2007-6 × JS 335 (4.34%), EC 656641 × JS 335 (3%) and Gaurav 2 × JS 335 (2.73%) (online Supplementary Table S2).
Mean and range in F2 generation
NAM population consisting of 20 cross combination (F2 generation) were evaluated for yield and attributing traits. A total of 2385 progenies from these 20 crosses were evaluated for seed yield and yield-attributing traits. Enormous variability was observed in the plants for all the traits under consideration. The crosses viz., G-11 × JS 335 (27.78 g) and JS 97-52 × JS 335 (22.50 g) had highest mean seed yield amongst all. Plant with highest seed yield (82.7 g) recovered from the cross JS 97-52 × JS 335. Plants from the cross JS 335 × EC 656641 were the earliest to mature (92 days) as compared to rest of the crosses (online Supplementary Table S4).
Genetic properties of NAM population
Mean, range, variance, coefficient of variance, heritability and genetic advance
The statistical analysis of data on seven quantitative traits indicated wide range of variability among the accessions (online Supplementary Tables S1 and S5). Number of pods, grain yield and biomass were found to have higher range when compared to other traits studied. Grain yield (46.15/39.88), nodes/ plant (34.51/30.52), biomass (36.03/22.21), number of pods (35.63/22.21), plant height (26.22/23.14) found to have high PCV and GCV (online Supplementary Table S5). The heritability was highest for number of nodes (87.18%), plant height (77.86%) and grain yield (74.68%), and lowest heritability was recorded for the number of branches (3.17%). GA as percent mean was recorded high for the traits viz.; grain yield (71.09%), number of nodes (55.66%), plant height (42.13%) and it was lowest for branches per plant (1.74%) (online Supplementary Table S5).
Frequency distribution of quantitative traits
Frequency distribution showed that 22% of the lines produced grain yield of 30.1–40 g per plant and 48% of the lines showed harvest index of 40.1–50% (data not shown). The number of pods also showed large variation and 25% of the lines found to have pods per plant in the range of 34–44. The trait biomass was also recorded good amount of variability and 30% of the lines produced 90–120 g per plant biomass. The frequency distribution of seven quantitative traits is presented in online Supplementary Fig. S1.
Correlation and regression
Correlation matrix (online Supplementary Table S7) showed significant positive correlation of grain yield with biomass (0.86), harvest index (0.60), pods per plant (0.51), branches per plant (0.30), plant height (0.21) and nodes per plant (0.11) and no trait was negatively associated grain yield however, harvest index was negatively correlated with plant height (−0.11), number of nodes (−0.06) and branches per plant (−0.07). Similarly, nodes and branches (−0.09) are also negatively correlated (Fig. 1 and online Supplementary Fig. S2). Simple linear regression analysis performed for grain yield versus other traits showed that biomass, had highest direct effect on seed weight followed by harvest index, pods per plant, branches per plant, nodes per plant and plant height (data not presented).

Fig. 1. Graphical depiction of correlation among the quantitative traits in NAM-RILs.
Genetic diversity and principal component analysis
Multivariate analysis was performed using genetic diversity and principal component analyses. All the 900 NAM-RILs were grouped into 10 clusters using k-means clustering. Each cluster was found to have varied number of accessions (online Supplementary Table S6). The number of accessions falling in each cluster was highest (183) in cluster 2 (C2) followed by C3 (139), C10 (133), C1 (123), C6 (81), C5 (79), C7 (71), C8 (47), C4 (37) and C9 (7). The mean value of accessions grouped into each cluster (online Supplementary Table S6) showed that accessions in C9 and C4 produced high grain yield per plant and biomass whereas cluster 4 found to has the highest harvest index. C9, C8, C7 and C4 produced highest number of pods per plant (online Supplementary Table S6). PCA used to eliminate the redundancy in dataset revealed that all the seven quantitatively measured traits have been loaded on first five components, however, major portion of variance (80.9%) in NAM-RILs is explained by the first three components with eigenvalue >1 (Table 2). The first component (PC1) accounted for 42.05% of variation largely through biomass, grain yield, number of pods and branches; PC2 accounted for 22% of variation contributed through harvest index, nodes, plant height and yield and PC3 contributed 17% through number of branches, harvest index and number of nodes. PC4, PC5 and PC6 contributed 8, 6 and 3% of total variation, respectively. Biploting of PC1 and PC2 indicated grain yield (20.31%) contributed highest to the first two principal components followed by biomass (17.74%) and number of pods (15.79%) (Fig. 2 and online Supplementary Fig. S3). Similarly, biplots of PC1 and PC2 involving only top hundred contributing RILs to the total variation identified number of superior genotypes forming clusters on the same axis (online Supplementary Fig. S4). Further, the relationship among 11 NAM families is depicted in online Supplementary Fig. S5.

Fig. 2. Biploting of different variables loaded on PC1 and PC2 components.
Table 2. Eigenvalues, variance contribution (%), variables coordinates and percent contribution of each variable on individual five principle components of 900 NAM-RILs

Discussion
Soybean [G. max (L.) Merr.] is one of the most economical crops in the international market, being the fourth most produced and consumed crop worldwide (Gesteira et al., Reference Gesteira, Bruzi, Zito, Vanoli and Arantes2018). Soybean [G. max (L.) Merr.] has been studied and enhanced for most of its economically important traits. The majority of studies on genetic variability and multivariate analysis were carried out on breeding population derived from two crosses commonly known as biparental populations. Even, the conventional plant breeding dealing with the development of new crop varieties is also based on the selection of superior progenies derived from biparental populations. Recently, researchers have started making multiparent-based populations such as MAGIC and NAM and several successful examples of development and utilization of these populations are already demonstrated (Kover et al., Reference Kover, Valdar, Trakalo, Scarcelli, Ehrenreich, Purugganan, Durrant and Mott2009; Cook et al., Reference Cook, McMullen, Holland, Tian, Bradbury, Ross-Ibarra, Buckler and Flint-Gracia2012; Huang et al., Reference Huang, George, Forrest, Kilian, Hayden, Morell and Cavanagh2012; Bandillo et al., Reference Bandillo, Raghavan, Pauline, Sevilla, Irish, Christine, Tung, Susan, Michael, Ramil, Singh, Gregorio, Redona and Leung2013; Peiffer et al., Reference Peiffer, Flint-Garcia, De Leon, McMullen, Kaeppler and Buckler2013; Mackay et al., Reference Mackay, Basler, Barber, Alison, Cockram, Gosman, Andy, Horsnell, Howells, Donal, Gemma and Phil2014; Bajgain et al., Reference Bajgain, Rouse, Tsilo, Macharia, Bhavani, Jin and Anderson2016; Bouchet et al., Reference Bouchet, Olatoye, Marla, Perumal, Tesso, Yu, Tuinstra and Morris2017; Fragoso et al., Reference Fragoso, Moreno, Wang, Heffelfinger, Arbelaez, Aguirre, Franco, Romero, Labadie, Zhao, Dellaporta and Lorieux2017; Li et al., Reference Li, Cao, He, Zhao and Gai2017; Ren et al., Reference Ren, Hou, Lan, Basnet, Singh, Zhu, Cheng, Cui and Chen2017; Song et al., Reference Song, Yan, Quingley, Jordan, Fickus, Schroeder, Song, Charles, Hyten, Nelson, Rainey, Beavis, Specht, Diers and Cregan2017; Shivakumar et al., Reference Shivakumar, Kumawat, Gireesh, Ramesh and Husain2018). The most significant use of NAM population is gene mapping through the presence of two sources of recombination: shuffling of parental alleles over several generations through segregation and genetic recombination, and historical recombination of haplotypes present in the various diversity donors. The current study reports the creation of NAM population, and in the process of creation, we estimated number of genetic parameters such as estimation of F1 heterosis, genetic variability, correlations, regression, PCA and cluster analysis in NAM-RILs derived from 11 crosses. Soybean NAM developed from this study involves diverse donor founder parents for various complex traits such as yield attributing traits, biotic and abiotic stress tolerance which will facilitate the elucidation of genetic architecture of complex traits in soybean. Further, it helps in detection of QTLs and its novel alleles associated with complex traits, single-nucleotide polymorphisms and candidate genes associated with complex traits, and broadening the genetic base of soybean breeding population. Utilization of diverse parents in developing soybean NAM population enhanced the identification and introgression of allelic diversity in the breeding population. Recent advances in soybean genomics and next generation sequencing (NGS) technologies, the soybean NAM can be effectively utilized for fine mapping of complex traits and map-based cloning, genomic selection for genetic improvement of soybean.
Diers et al. (Reference Diers, Specht, Katy, Cregan, Song, Ramasubramanian, Graef, Nelson, Schapaugh, Wang, Shannon, Mchale, Kantartzi, Xavier, Mian, Stupar, Michno, An, Gottel, Ward, Fox, Lipka, Hyten, Cary and Beavis2018) developed soybean NAM population and evaluated 5600 inbred lines to broaden the narrow genetic base of crop. Heterosis does exist in soybean [G. max (L.) Merr.], soybean hybrid technology is possible, if appropriate parent combinations and an economical means of producing hybrid seed was identified (Perez et al., Reference Perez, Silvia and Reid2009). The estimation of heterosis among 18 crosses revealed higher heterosis in some of the crosses; whereas negative heterosis is also found in few crosses studied for both grain yield and days to maturity. The differential expression of heterosis in soybean is reported by other researchers as well (Chaudhary and Singh, Reference Chaudhary and Singh1974; Gadag and Upadhyaya, Reference Gadag and Upadhyaya1995; Perez et al., Reference Perez, Silvia and Reid2009). The variability observed in F2 generation in 18 crosses of NAM population reflected the diversity among parents used for hybridization. The genetic variability parameters analysed in 900 NAM-RILs indicated high PCV and GCV for grain yield, biomass, number of pods, number of nodes and plant height. The similar trend of PCV and GCV for these traits reported in biparental populations (Shivakumar et al., Reference Shivakumar, Basavaraja, Salimath, Patil and Talukdar2011; Besufikad, Reference Besufikad2018). Our results found that high heritability coupled with high GA percent mean was recorded for plant height, number of nodes and grain yield and similar values for grain yield is reported (Shivakumar et al., Reference Shivakumar, Basavaraja, Salimath, Patil and Talukdar2011; Besufikad, Reference Besufikad2018; Shruti and Basavaraja, Reference Shruti and Basavaraja2019) in biparental populations. Correlation studies revealed that traits such as biomass, harvest index and number of pods were having the highest significant positive effect on grain yield. The simple regression analysis also complimented these results indicating the potentiality of indirect selection of biomass, harvest index and number of pods to improve yield. The findings in biparental population and core collections in soybean also reported a similar type of association among these traits thereby indicating the traits to be kept in plant breeders eye during early generation selections (Shivakumar et al., Reference Shivakumar, Basavaraja, Salimath, Patil and Talukdar2011; Gireesh et al., Reference Gireesh, Husain, Shivakumar, Satpute, Kumawat, Arya, Agarwal and Bhatia2015). Using PCA, we can able to assess the relative contribution of different components to the total divergence together with nature of forces operating at intra- and inter-cluster levels (Sharma et al., Reference Sharma, Mishra and Rana2009). In the current study, first three components explained >80% of the total variation. In the first two PCs, maximum percent of variation is contributed by grain yield, biomass and number of pods. Therefore, these traits are important in explaining the genetic variation in the 900 RILs under study. The results are in agreement with various other studies that reported the maximum contribution of biomass and grain yield in soybean (Nidhi et al., Reference Nidhi, Avinashe and Shrivastava2018). Cluster analysis grouped the 900 RILs into 10 clusters with cluster C1, C2, C3 and C10 comprising maximum number of breeding lines indicating close relatedness of the genotypes. Similar way of grouping of 4274 accessions of common bean into 10 clusters was reported by Rana et al. (Reference Rana, Sharma, Tyagi, Chahota, Gautam, Singh, Sharma and Ojha2015). Genotypes having highest grain yield per plant, biomass and harvest index are fallen under clusters C4 and C9. These genotypes can further be utilized in hybridization programmes for further improving yield potential.
Conclusion
The success of development of variety is more if it involves an evaluation of large number of breeding lines derived from diverse crosses. In this direction, the current study has generated information on large number of recombinant inbred lines derived from multiple parents possessing number of useful traits such as water-logging tolerance, drought tolerance, resistance to mechanical damage, wider adaptability, long juvenility, bacterial pustule and YMV resistance. The 900 RILs derived from 11 crosses of NAM population will serve as a useful genetic resource for mapping of several economic traits segregating in these RILs and can be shared to soybean breeders to utilize in improving crop for various situations prevailing different agro-climatic zones. Thus NAM population developed in this study will constitute useful genetic material for both applied and basic studies for genetic improvement of soybean crop.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1479262119000352.
Acknowledgements
The authors greatly acknowledge support provided by the Director, ICAR-Indian Institute of Soybean Research, Indore. The work is supported and conducted under Institute Research Council approved project (Code NRCS1.6/92)