Introduction
Triticale ( × Triticosecale Wittm.) is an intergeneric hybrid between wheat species (Triticum ssp. AA, AABB and AABBDD) and rye (Secale cereale, RR). Triticale possesses favourable agronomic attributes originating from both its wheat and rye progenitors, including high grain and biomass yields. Although significant DNA losses from the parental genomes have been reported (Ma and Gustafson, Reference Ma and Gustafson2008), little is known about the coordination of gene expression of rye and wheat genomes in this intergeneric hybrid. So far, public databases (e.g. GenBank) are reporting about 14,000 Expressed Sequence Tag (EST) or DNA sequences of rye and a few dozen sequences for triticale.
Next-generation sequencing provides an efficient tool to address biological questions at the transcriptomic and genomic levels. RNA-Seq for transcriptomics (Wang et al., Reference Wang, Gerstein and Snyder2009) has been successfully employed in crop plants (Zenoni et al., Reference Zenoni, Ferrarini, Giacomelli, Xumerle, Fasoli, Malerba, Bellin, Pezzotti and Delledonne2010; Zhang et al., Reference Zhang, Guo, Hu, Zhang, Li, Li, Zhuang, Lu, He, Fang, Chen, Tian, Tao, Kristiansen, Zhang, Li, Yang, Wang and Wang2010).
We are reporting on the utilisation of the Roche 454 sequencing technology to investigate the transcriptome of triticale and rye in different tissues and at different developmental stages. This preliminary analysis reports on gene expression and identification of triticale- and rye-assembled genes and on the development of an enhanced full-length (FL) cDNA database of wheat to facilitate our analysis.
Materials and methods
Tissues including leaf, root, stem and different reproductive organs (stigma, anthers, pollen and immature heads) from genotypes of rye (Prima and Vacaria) and triticale (AC Certa, hollow stemmed; Triticale 797 and Triticale 1308, both solid stemmed) sampled at different stages of development were used in this study. AC Certa seedling tissues were also exposed to water stress.
RNA-Seq was carried out. Briefly, total RNA (Trizol (Invitrogen) followed by Qiagen RNeasy midi purification) was extracted from the different tissues. PolyA+ mRNA using Poly (A) Purist™ Kit (Ambion, Inc) was purified from 0.6 mg of total RNA followed by cDNA library synthesis. Five micrograms of double stranded cDNA was utilized for 454 sequencing. Eleven triticale and eight rye libraries were sequenced using the 454 GS FLX Titanium (Roche) technology. Different publically available and proprietary software programs were used in the analysis of 454 sequencing results and included: BLAST (ftp://ftp.ncbi.nih.gov/blast/; Altschul et al., Reference Altschul, Madden, Schäffer, Zhang, Zhang, Miller and Lipman1997), TGICL (http://compbio.dfci.harvard.edu/tgi/software/; Pertea et al., Reference Pertea, Huang, Liang, Antonescu, Sultana, Karamycheva, Lee, White, Cheung, Parvizi, Tsai and Quackenbush2003), CAP3 (Huang and Madan, Reference Huang and Madan1999), SeqClean (http://compbio.dfci.harvard.edu/tgi/software/), OrthoMCL 1.4 (http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi), CD-HIT-EST (http://weizhong-lab.ucsd.edu/cdhit_suite/cgi-bin/index.cgi), BioPerl module Bio::SeqIO and Bio::SearchIO (http://www.bioperl.org/wiki/Main_Page) and DNASTAR SeqMan and NGEN (DNASTAR, Madison, WI, USA) for de novo assembly.
Results and discussion
Assembling sequences reflects the genome divergence of triticale and rye
Sequencing results from eleven triticale libraries generated 3,310,375 reads with an average length 320 bp, while results from eight rye libraries yielded 3,124,641 reads with an average length 345 bp. De novo assembly of these two datasets with parameters set at 92% identity and minimum match of 30 nt yielded 162,686 contigs from triticale, which was 35% higher than the number of contigs obtained from rye, 120,416. Interestingly, the number of long contigs above 2 kb from rye, 4657, was almost double than those from triticale, 2774. Given the similar number of input sequences for assembly of rye and triticale, it was not surprising to identify a smaller number of contigs above 2 kb in triticale due to the divergence between the three subgenomes (AABBRR) of triticale (Chalupska et al., Reference Chalupska, Lee, Faris, Evrard, Chalhoub, Haselkorn and Gornicki2008) and the stringent parameters used for both the rye and triticale contigs assembly.
To further investigate expressed sequence divergence between triticale and rye, we compared these contigs to the Brachypodium proteome using BLASTX. Triticale contigs matched 17,915 (70.3%) Brachypodium proteins, whereas 19,142 (75.1%) Brachypodium proteins were recognized by rye contigs. The triticale and rye sequences displayed homology to 15,904 Brachypodium proteins, while triticale- and rye-assembled sequences identified 2011 and 3777 specific proteins, respectively. These results clearly show that a substantial proportion of rye sequences are not expressed in triticale.
We also combined all the rye and triticale 454 reads together and conducted a second de novo assembly utilising the same parameters used for the individual rye and triticale assembly. When we parsed the contig makeup, we found that most contigs were made up of sequences from only one species, triticale or rye (Fig. 1), thus clearly indicating the genome origin of the majority of triticale transcripts.
Sequence variation between rye (RR) and triticale (AABBRR) was observed as exemplified in Fig. 2. As indicated by the arrows, two diagnostic nucleotides between rye and triticale were identified. In triticale, five and possibly six reads out of the 14 sequences were clearly related to the rye genome. This can also provide information on relative expression of homeologous sequences from wheat and rye in triticale when applied to datasets from each individual cDNA library.
Enhancing the wheat FLcDNA dataset for analysis of triticale and other small grain cereals
In order to compare transcriptional units of rye (RR), triticale (AABBRR) and common wheat (AABBDD), we built an enhanced reference FLcDNA dataset starting with 1,067,304 wheat EST sequences available in GenBank. Firstly, we optimized the assembly pipeline by introducing a ‘noise site correction’ step for the individual ESTs and an iterative assembly step based on the widely used EST assembly program TGICL (Pertea et al., Reference Pertea, Huang, Liang, Antonescu, Sultana, Karamycheva, Lee, White, Cheung, Parvizi, Tsai and Quackenbush2003). More specifically, we used a first round CAP3 assembly with overlap of 40 nt and per cent identity of 80% to remove the ‘noise’ nt from the aligned sequences. Then, a second round of CAP3 assembly, still with an overlap of 40 nt but with a per cent identity increased to 95%. Under these conditions, the assembly of over 1 million wheat EST dataset yielded a total of 79,034 consensus sequences. The FLcDNA prediction pipeline guided by four known grass genomes (Brachypodium, rice, maize and sorghum) identified 21,756 FL wheat cDNAs. The 21,002 publically available FLcDNAs from TriFLDB (11,877; Mochida et al., Reference Mochida, Yoshida, Sakurai, Ogihara and Shinozaki2009) and GenBank (9125) were combined and treated with ‘Cd-hit-est’ to delete the redundancy at 95% identical level to yield 12,715 non-redundant (nr) sequences. This number was almost doubled to 24,789 nr wheat FLcDNAs when our FLcDNA-EST dataset was amalgamated under the same parameters. A comparison to the Brachypodium proteome suggests that the expanded nr wheat FLcDNA dataset covers the majority of the proteins of this reference species.
To evaluate the usefulness of the nr wheat FLcDNA dataset, we mapped 384,470 triticale 454 ESTs using MegaBLAST. More than 66% of our 454 reads could be aligned to our enhanced nr wheat FLcDNAs dataset. This dataset will also be very valuable for any wheat RNA-Seq project.
We have assembled de novo rye and triticale 454 datasets, distinguished rye from wheat sequences and developed an enhanced nr wheat FLcDNA dataset of almost 24,800 distinct elements.
Acknowledgements
Funding from the AAFC-ABIP program to The Cellulosic Biofuel Network (159) and the Canadian Triticale Biorefinery Initiative (227) projects and from Alberta Energy–Genome Alberta is greatly appreciated.