Introduction
The Arabidopsis community has developed comprehensive databases for gene description, annotation and expression analysis (Brazma et al., Reference Brazma, Parkinson, Sarkans, Shojatalab, Vilo, Abeygunawardena, Holloway, Kapushesky, Kemmeren, Lara, Oezcimen, Rocca-Serra and Sansone2003; Toufighi et al., Reference Toufighi, Brady, Austin, Ly and Provart2005; Zimmermann et al., Reference Zimmermann, Hennig and Gruissem2005). The available information is not limited to transcriptome but is expanded to proteome and metabolome data as well (De Vos et al., Reference De Vos, Moco, Lommen, Keurentjes, Bino and Hall2007; Meyer et al., Reference Meyer, Steinfath, Lisec, Becher, Witucka-Wall, Törjék, Fiehn, Eckardt, Willmitzer, Selbig and Altmann2007; Baerenfaller et al., Reference Baerenfaller, Grossmann, Grobei, Hull, Hirsch-Hoffmann, Yalovsky, Zimmermann, Grossniklaus, Gruissem and Baginsky2008). Arabidopsis is also an important model for seed science and provides valuable insight into the processes underlying germination, dormancy and stress resistance (Finkelstein et al., Reference Finkelstein, Reeves, Ariizumi and Steber2008; Holdsworth et al., Reference Holdsworth, Bentsink and Soppe2008a, Reference Holdsworth, Finch-Savage, Grappin and Jobb).
The seed represents a critical stage in the plant life cycle. After fertilization the embryo is formed, which is surrounded by the endosperm and seed coat (Liu et al., Reference Liu, Koizuka, Homrichhausen, Hewitt, Martin and Nonogaki2005). Seeds acquire desiccation tolerance and dormancy during maturation to survive under harsh conditions after seed dispersal (Bewley, Reference Bewley1997). Germination at an appropriate time is a critical step for the initiation of the plant life cycle (Huang et al., Reference Huang, Schmitt, Dorn, Griffith, Effgen, Takao, Koornneef and Donohue2010; Moyers and Kane, Reference Moyers and Kane2010). Seeds are equipped with accurate sensors for water, light and temperature to monitor optimal seasonal timing for germination, successful seedling establishment and further plant development (Franklin and Quail, Reference Franklin and Quail2010). Upon imbibition, protein synthesis and DNA transcription are resumed and cell wall expansion and degradation facilitate the penetration of the radicle through the endosperm and seed coat (Nonogaki et al., Reference Nonogaki, Chen, Bradford, Bradford and Nonogaki2007). Finally, energy sources are remobilized to enable the fast growth of the emerging seedling (Nonogaki, Reference Nonogaki2006). The concerted operation of these molecular processes is organized by plant hormones, hormone- or photo-receptors and transcription factors (Holdsworth et al., Reference Holdsworth, Bentsink and Soppe2008a). Modern ‘-omics’ tools can provide valuable insight into the function and regulation mechanisms of these molecular processes (Joosen et al., Reference Joosen, Ligterink, Hilhorst and Keurentjes2009).
Many tools to analyse transcriptome, proteome or metabolome data rely on approaches to detect co-expression or co-existence. Such clustering methods and principal component analysis are efficient tools to summarize data and detect groups of genes, proteins and metabolites with similar behaviour (Rensink and Hazen, Reference Rensink and Hazen2006). However, more insights into multiple biological processes can be captured by organizing annotations in such a way that profiling datasets are integrated with pre-existing biological knowledge (Zhou and Su, Reference Zhou and Su2007). A good example of this type of approach is the creation of seed-specific annotations which can be combined with filtered gene-expression datasets (Taggit; Carrera et al., Reference Carrera, Holman, Medhurst, Peer, Schmuths, Footitt, Theodoulou and Holdsworth2007). Taggit provides pie diagrams which visualize relative proportions of functional categories affected by the treatments or developmental stages of interest. A more comprehensive tool that uses a similar approach is called MapMan (Thimm et al., Reference Thimm, Bläsing, Gibon, Nagel, Meyer, Krüger, Selbig, Müller, Rhee and Stitt2004). This tool allows users to display genomic datasets onto pictorial diagrams. The diagrams can be fully customized to depict the biological processes of interest. One of the most critical points in using pre-existing knowledge is the quality of the annotation of genes, proteins and metabolites in terms of functional classes. The MapMan tool uses information from the TIGR database (http://compbio.dfci.harvard.edu/tgi/) and input from a number of experts to curate specific biological processes. It has been employed in different studies and different plant species, such as barley grain maturation and germination (Sreenivasulu et al., Reference Sreenivasulu, Usadel, Winter, Radchuk, Scholz, Stein, Weschke, Strickert, Close, Stitt, Graner and Wobus2008) and diurnal changes in Arabidopsis (Blasing et al., Reference Blasing, Gibon, Gunther, Hohne, Morcuende, Osuna, Thimm, Usadel, Scheible and Stitt2005).
Here we describe the development of two new diagrams that can be used in MapMan and that are focused on biological processes important for seed dormancy and germination. By using PageMan (Usadel et al., Reference Usadel, Nagel, Steinhauser, Gibon, Blasing, Redestig, Sreenivasulu, Krall, Hannah, Poree, Fernie and Stitt2006), a tool combined with the MapMan package, we defined the most informative functional categories. We combined these categories in the first diagram which summarizes transcript and/or metabolite level changes in the pathways important for seed germination. The second diagram provides a focused view of cell wall modification and degradation that are key processes for the completion of seed germination. This comprehensive approach, using the MapMan tools offers the seed science community an easy way to analyse and visualize transcriptome and metabolome data for Arabidopsis.
Methods
We used publicly available data sources that describe seed dormancy and germination. To study the dormancy transcriptome we used data from Finch-Savage et al. (Reference Finch-Savage, Cadman, Toorop, Lynn and Hilhorst2007) and Cadman et al. (Reference Cadman, Toorop, Hilhorst and Finch-Savage2006). They compared gene expression in dormant seeds with that in non-dormant seeds under a variety of conditions. Transcriptome changes during seed germination are accurately profiled by data sets from Nakabayashi et al. (Reference Nakabayashi, Okamoto, Koshiba, Kamiya and Nambara2005) and polar metabolite changes by data from Fait et al. (Reference Fait, Angelovici, Less, Ohad, Urbanczyk-Wochniak, Fernie and Galili2006). Penfield et al. (Reference Penfield, Li, Gilday, Graham and Graham2006) dissected Arabidopsis seeds into the embryo and endosperm shortly after radicle protrusion to analyse gene expression. Their transcriptome data sets were also used. In total, we gathered data of 20 seed-specific transcriptome analyses (Table 1).
* Polar metabolite profiling with gas chromatography–mass spectrometry (GC–MS). All other samples consist of expression profiling using the Affymetrix AtH1 microarray.
All microarray data were normalized using MAS 5.0 (Hennig, Reference Hennig, Menges, Murray and Gruissem2003) and raw expression values were filtered to display expression above a background value of 50 in four or more experiments. The initial screening filter yielded 11,443 seed-expressed genes (see supplementary Table S1, available online only at http://journals.cambridge.org). The PageMan tool v0.12 (http://MapMan.gabipd.org; Usadel et al., Reference Usadel, Nagel, Steinhauser, Gibon, Blasing, Redestig, Sreenivasulu, Krall, Hannah, Poree, Fernie and Stitt2006) was used to identify functional categories with significant enrichment or depletion of up-regulated genes. Within the PageMan package we made use of a Wilcoxon test combined with Benjamin–Hochberg filtering to calculate P values for enriched categories. The obtained P values were transformed to z-scores and plotted as a heat map (Fig. 1). Only significant functional categories are shown in the figure. Because we intended to select the categories with a general role only in dormancy and germination, we excluded mutant transcriptome datasets from the PageMan analysis.
To create a detailed view of the enriched functional categories that we identified with PageMan, we made two custom pathway images (using CorelDRAW graphics suite X4, www.corel.com) which can be used in the MapMan tool v3.5.0 (http://MapMan.gabipd.org). First, we created an ‘Arabidopsis seed – Molecular Networks’ diagram including all enriched functional categories (see Figs 2–4). The diagram of hormonal regulation was adopted from Finkelstein et al. (Reference Finkelstein, Gampala and Rock2002) and was simplified to depict hormone signalling. More detailed information about hormone signalling can be found in Kucera et al. (Reference Kucera, Cohn and Leubner-Metzger2005) and Holdsworth et al. (Reference Holdsworth, Bentsink and Soppe2008a). Two functional categories describing genes that were linked to dormancy or germination were added to the mapping file (data derived from Taggit ontology; Carrera et al., Reference Carrera, Holman, Medhurst, Peer, Schmuths, Footitt, Theodoulou and Holdsworth2007). Second, we created an ‘Arabidopsis seed – Cell wall Networks’ diagram that allows a focused view of cell wall changes (synthesis, modification, degradation and proteins). For this second diagram some subdivisions were made within the original ‘Cell wall’ bin. The ‘Cell wall’ bin 10.5.1, ‘Cell wall Proteins AGP (arabinogalactan proteins)’ was further divided to ‘AGPs’, ‘FLA (fasciclin-like arabinogalactan proteins)’ and ‘AGP Other’ and ‘Cell wall’ bin 10.7, ‘Cell wall Modification’ was subdivided to ‘Expansin A’, ‘Expansin B’ and ‘Xyloglucan’ (see Fig. 5). All these files are freely available at http://mapman.gabipd.org/web/guest/mapmanstore. Both transcript and metabolite levels can be visualized with this user-friendly package. All individual genes within a functional category are represented as a square box and their expression levels are shown in a colour (blue–red) scale. Metabolites are represented as coloured circles (Fig. 2B). Users can load raw expression levels as well as expression ratios. We calculated expression ratios by dividing Log2 expression values and subtracting − 1 for scaling around 0 (‘log2–1’). A two-tailed paired t-test was used to calculate P values for all expression ratios (see supplementary Table S2, available online only at http://journals.cambridge.org). AGI codes (The Arabidopsis Genome Initiative, 2000) or metabolite names are used to match the data with a mapping file that contains the functional categorization of genes and metabolites.
Here we show the power of efficient data visualization of changes in transcriptome and/or metabolome using four examples: (1) dry seeds versus imbibed seeds resulting in germination (dry versus 24-h-imbibed); (2) dormant imbibed versus non-dormant germinating seeds (PD24h versus LIG); (3) 24-h-imbibed, stored Ler versus 24-h-imbibed, stored cts-1 seeds; and (4) embryo versus endosperm tissue.
Results and discussion
To examine the efficiency of MapMan data visualization, expression ratios were calculated for dry versus 24-h-imbibed seeds (Fig. 2A). In the diagram global transcriptome changes are obvious at a first glance. For example, strong up-regulation of genes related to amino acid biosynthesis (‘Amino acid’), energy metabolism (‘Energy’) and cell wall modification (‘Cell wall’) in 24-h-imbibed seeds were visualized (red squares). In contrast, transcripts related to late embryogenesis abundant (LEA) and seed storage proteins (‘Seed storage proteins’) rapidly decline (blue squares). Also a decline in stress-related transcripts was observed. Most likely these transcripts accumulated at the end of seed maturation and dehydration and were rapidly lost upon imbibition. In this figure we combined both transcript and metabolite level ratios for dry versus germinating seeds, which allowed us to analyse changes at the metabolite levels in relation to transcriptional changes. For example, several enzymes in the tricarboxylic acid (TCA) cycle in the ‘Energy’ category were up-regulated in 24-h-imbibed seeds (Fig. 2A), which is consistent with higher levels of TCA intermediates, such as citrate, iso-citrate, 2-oxoglutarate and malate, known to occur in imbibed Arabidopsis seeds (Fait et al., Reference Fait, Angelovici, Less, Ohad, Urbanczyk-Wochniak, Fernie and Galili2006). In Fig. 2B, we depict an example; the concomitant accumulation of malate and a transcript (FUM2) encoding a fumarase that catalyses the conversion of fumarate to malate. This particular example should be interpreted with some caution since transcript levels of Arabidopsis Columbia (Col) seeds were compared with metabolite levels of stratified Arabidopsis Wassilewskija (Ws) seeds in this case. However, this type of analysis opened the possibility of combining transcript and metabolite data using MapMan.
Arabidopsis seeds show certain levels of primary dormancy immediately after seed harvest. In our second example, we visualized changes in molecular processes that are affected by dormancy (Fig. 3). Therefore, we plotted the expression ratios for primary dormant seeds that were imbibed for 24 h (PD24h) and seeds that were afterripened for 120 d and imbibed for 24 h in the dark with a 4 h pulse of red light (LIG). The PD24h seeds will not complete germination, in contrast to the LIG-treated seeds which do complete germination. When dormant and non-dormant seeds were compared, obvious transcriptional differences were observed in the gene clusters ‘Cell wall’, ‘Stress’, ‘Secondary metabolism’ and ‘Hormones’. Surprisingly, relatively small changes were observed in Taggit gene clusters ‘Dormancy-related’ and ‘Germination-related’.
In Figs 2 and 3, the same sets of data were used for the initial selection of gene function categories in PageMan. Because this could potentially lead to a self-fulfilling, re-detection of differentially regulated genes, we also analysed a transcriptome dataset which was not used for our pathway selection. We compared transcript profiles of 24-h-imbibed, stored wild-type Landsberg erecta (Ler) seeds (Ler AR) and 24-h-imbibed, stored comatose (cts)-1 mutant seeds (cts-1 S) using our MapMan diagram (Fig. 4). As expected, CTS-1 levels were strongly reduced in the mutant. Consistent with the results described by Carrera et al. (Reference Carrera, Holman, Medhurst, Peer, Schmuths, Footitt, Theodoulou and Holdsworth2007), effects on the production of anthocyanin pigment 2 protein (PAP2 = MYB90), GA-responsive GAST1 protein homologues (GASA-1, GASA-4) and the flavonoid pathway were clearly visible (Fig. 4, ‘RNA’ and ‘Hormones’). Theodoulou et al. (Reference Theodoulou, Job, Slocombe, Footitt, Holdsworth, Baker, Larson and Graham2005) described the jasmonic acid (JA)-deficient phenotype of the cts-1 mutant. Our results suggest that up-regulation of a seed-specific JA biosynthesis gene (putative 12-oxophytodienoic acid reductase, OPR) could be part of this mechanism (Fig. 4, ‘Hormones’). The up-regulation of several photosynthesis pathway genes in the ‘Energy’ category in the mutant is noteworthy and might be an intriguing starting point for new research.
Several studies describe the important role of the endosperm layer in the regulation of seed germination (e.g. Müller et al., Reference Müller, Tintelnot and Leubner-Metzger2006; Penfield et al., Reference Penfield, Li, Gilday, Graham and Graham2006). We calculated ratios of gene expression levels between endosperm and embryo to visualize affected genes and molecular processes (see supplementary Fig. F1, available online only at http://journals.cambridge.org). As expected, genes related to photosynthesis and DNA synthesis are mainly expressed in the embryo. Since cell wall changes are crucial for the completion of germination, we created a diagram specifically to map cell wall changes and used it to highlight differences in cell wall modification between embryo and endosperm (Fig. 5). Arabinogalactan proteins (AGPs) were more linked to embryo cell walls. AGPs are a diverse class of cell wall proteins which are implicated in growth, development and plant–microbe interactions (Seifert and Roberts, Reference Seifert and Roberts2007), and therefore it is not surprising to find these transcripts more abundantly in embryo tissues.
It is a major challenge to interpret the overwhelming amount of information currently available for genes, proteins and metabolites and understand their function in various biological processes. Combining the information about gene expression levels with known biological function of genes or gene classification can be very helpful in creating a categorization or applying priority within the data. The freeware tool MapMan has proved to be an easy-to-use and helpful tool to visualize multilevel data (transcriptomics, metabolomics and proteomics).
The data used in this study (summarized in Table 1) and the way of data visualization (Fig. 1) that we examined turned out to be very informative as it clearly summarizes the molecular processes that are affected. For example, the importance of triacylglycerol (TAG) and protein synthesis during germination can readily be inferred from the diagram. This is in agreement with the role of TAG metabolism in germination control and seedling establishment (Penfield et al., Reference Penfield, Pinfield-Wells, Graham, Bradford and Nonogaki2007) and the essential role described for translation in the completion of germination (Rajjou et al., Reference Rajjou, Gallardo, Debeaujon, Vandekerckhove, Job and Job2004).
When transcript levels of dry seeds are compared to seeds that have been imbibed for 24 h many important molecular processes can be recognized (Fig. 2A). Genes in the Calvin cycle, glycolysis and TCA cycle seem to be up-regulated, as well as redox modification, cell cycle, cell-wall modification and degradation, and protein activation and folding. Contrarily, transcripts for seed storage proteins, LEA proteins and TAG synthesis are severely decreased in imbibed seeds. It is noteworthy that expression of genes involved in electron transport and respiration seems to be lower in 24-h-imbibed seeds as compared to dry seeds. Our analysis indicated that the major differences between dormant and non-dormant seeds were in the ‘Hormone’ and ‘Cell wall’ clusters (Fig. 3). Interestingly, only minor differences were observed in other processes, such as seed storage proteins and LEA proteins, between dormant and non-dormant seeds. This is due to the decrease in the transcript levels of these proteins to a similar extent in both dormant and non-dormant seeds upon imbibition. It is possible that seed storage proteins and LEA transcripts are remnants from the seed developmental stages, which are probably no longer necessary during imbibition. These MapMan diagrams allow users not only to observe global changes but also to retrieve detailed information about gene annotation and expression, because the users can click on an individual process or gene in the interactive MapMan tools.
By plotting the transcriptome differences between the comatose-1 mutant and wild-type we showed that our selection of molecular processes has the potential to clearly visualize the affected genes and pathways as they were previously described (Fig. 4) (Theodoulou et al., Reference Theodoulou, Job, Slocombe, Footitt, Holdsworth, Baker, Larson and Graham2005; Carrera et al., Reference Carrera, Holman, Medhurst, Peer, Schmuths, Footitt, Theodoulou and Holdsworth2007). For a more detailed view on a certain process or metabolic pathways one can make a customized diagram (as we showed for cell wall changes) or use already available diagrams in the MapMan package.
We have explored the possibility of using MapMan for multi-level data by combining transcriptome data from Nakabayashi et al. (Reference Nakabayashi, Okamoto, Koshiba, Kamiya and Nambara2005) (dry versus 24-h-imbibed seeds) with metabolome data from Fait et al. (Reference Fait, Angelovici, Less, Ohad, Urbanczyk-Wochniak, Fernie and Galili2006) (dry versus germinating seeds). In this way, relationships between gene expression and metabolome changes can easily be visualized, as we depicted for the TCA cycle genes and metabolites (Fig. 2B).
While our analysis demonstrated the usefulness of the Seed – Molecular Networks diagram, it does not cover all functional categories in every possible seed experiment. Besides, one should bear in mind that non-annotated genes are rarely selected for visualization, which hampers the discovery of new genes with unknown function. Also, it can be misleading when the original functional annotation is incorrect. Despite the aforementioned issues, the annotation used in MapMan has attained a high-quality level and its usability will only improve, because knowledge about many genes and biological processes is rapidly increasing.
In conclusion, the MapMan tool allows a quick identification of the molecular processes that are regulated during a developmental programme of interest, for which candidate genes with known annotation can easily be identified. This way of data visualization and the two pathway files that we have created provide a solid base for the next level of statistical data analysis, and are useful tools for the seed science community.
Supplemental data
Figure F1. Seed – Molecular Networks map showing differences in transcript levels between embryo and endosperm. Log2–1 ratios of endosperm versus embryo samples of seeds shortly after radicle protrusion are used to express level differences with the help of a false colour scale. Red indicates higher levels in embryo, blue indicates higher levels in endosperm. Only ratios with a P value < 0.05 are represented.Table S1. Transcriptome data of 20 microarray experiments used for PageMan analysis.Table S2. Log2 ratios with P values used for MapMan analysis.
Acknowledgements
This work was supported by the Technology Foundation STW (R.V.L.J., W.L.) and the ERA-NET Plant Genomics grant vSEED (B.J.W.D.). Raw microarray data were kindly provided by N. Provart and H. Nahal, University of Toronto, Canada. The background photograph in the cell wall diagram was kindly provided by N. Everitt, N. Weston and S. Pearce, University of Nottingham, UK.