Portable search engine for registered crop germplasm: a new concept for enhancing access to information on plant genetic resources

Sunil Archak; Vikas Kumar

doi:10.1017/S147926211200038X

Portable search engine for registered crop germplasm: a new concept for enhancing access to information on plant genetic resources

Published online by Cambridge University Press: 26 November 2012

Sunil Archak and

Vikas Kumar

Show author details

Sunil Archak*: Affiliation:
Agricultural Knowledge Management Unit, National Bureau of Plant Genetic Resources, Pusa Campus, New Delhi110 012, India
Vikas Kumar: Affiliation:
Agricultural Knowledge Management Unit, National Bureau of Plant Genetic Resources, Pusa Campus, New Delhi110 012, India
*: *Corresponding author. E-mail: sarchak@nbpgr.ernet.in

Article contents

Abstract
Introduction
Materials and methods
Results
Discussion
Conclusion
References

Rights & Permissions

Abstract

The National Bureau of Plant genetic Resources (NBPGR), New Delhi carries out the registration of unique crop germplasm in order to protect the intrinsic intellectual property as well as to facilitate greater utilization of germplasm in crop improvement programmes. It is therefore imperative to enhance access to information on registered crop germplasm. Here, we present a concept of a search engine that can suffice dual functions of a Web-based and portable search application. The concept entails converting raw data through a series of transformations from a Microsoft-Excel to eXtensible Markup Language (XML) data format. XML data initialized on compatible Web browsers are then queried for the search term based on a looping regular expression matching. The results are then loaded onto the browser in a tabulated output. The concept is implemented in the form of ‘Inventory of Registered Crop Germplasm’ on the Web as well as on a portable memory (compact disk or flash drive). The portable search engine works with minimal hardware and software requirements to enable its widespread utility to ensure greater access to information on registered crop germplasm. The portable search engine can be obtained from the NBPGR, New Delhi and the Web-based search engine can be accessed at http://www.nbpgr.ernet.in/IRCG/index.htm.

Keywords

portable search engine registered crop germplasm XML

Type: Research Article
Information: Plant Genetic Resources , Volume 11 , Issue 1 , April 2013 , pp. 62 - 67

DOI: https://doi.org/10.1017/S147926211200038X [Opens in a new window]
Copyright: Copyright © NIAB 2012

Introduction

The National Bureau of Plant Genetic Resources (NBPGR) is the nodal organization in India for the exchange, quarantine, collection, conservation, evaluation and systematic documentation of plant genetic resources. Since 1993, NBPGR has been implementing a germplasm registration system to document unique germplasm as well as to give credit to researchers who have (1) developed experimental material including parents and inbred lines or (2) identified promising germplasm (NBPGR, 2006). As an outcome of such an institutional mechanism to recognize the intellectual property inherent to the identification of elite genetic resources and consequently to promote their use, information on registered germplasm is published regularly in the Indian Journal of Plant Genetic Resources (Singh, Reference Singh2002) and special crop germplasm inventories are circulated (NBPGR, 2006; Kak et al., Reference Kak, Srinivasan and Sharma2009; Kak and Tyagi, Reference Kak and Tyagi2010). The primary objective of publicity is to promote the flow of unique germplasm to the ongoing breeding programmes.

The advent of information and communication technology revolution throughout the world has brought basic computational resources to each researcher in the form of either personal computers or shared computer laboratories. As a result, searching information in databases (wherever available) instead of printed registers and inventories has become the order of the day among students, researchers and even at remote experimental farms. Computer-assisted search identifies most suitable options quickly and easily from a vast data source, thereby tremendously promoting access to the desired information.

In order to relay the information on registered germplasm to the desktops of breeders and other interested clients, it is necessary to develop a simple and easy-to-use customized search engine. Search engines are widely used in today's computing world with various purposes, and they come in a range of forms, complexity, skill requirement, etc. (Grappone and Couzin, Reference Grappone and Couzin2011; Wikipedia (2012)). The nature of a search engine depends largely on (1) the type of data, (2) functionalities of the system and (3) hardware and software sophistication, and hence search engines end up as closed systems that may not always be universally compatible and portable. Search engines can be general (Google, Yahoo!) or subject specific (Entrez, AGRIS). Search engines can be limited (desktop search engines) or extensive (meta-search engines). The superior the search engine in its architecture and sophistication, the greater the requirement of computational power and Internet connectivity (Kreinovich, Reference Kreinovich2012). On the contrary, typically a registered crop germplasm search engine should serve both clients with adequate Internet access and contemporary computational resources and clients without a reliable Internet facility and having basic computer facilities.

We have developed a search engine solution that requires no installation procedures, needs no Internet connectivity, runs from a compact disk (CD) or a pen drive, demands nothing from the computer except for a default Internet browser (e.g. Internet Explorer), provides intuitive search facility and requires no specialized experience to use. We report here the employment of such a solution to develop a portable search engine for the Inventory of Registered Crop Germplasm of NBPGR.

Materials and methods

Technologies involved

Data were transformed into an eXtensible Markup Language (XML) format, an industry standard for handling a large amount of data in text form without a database management system (Fung, Reference Fung2000; Chaudhri et al., Reference Chaudhri, Rashid and Zicari2003) and sharing both the format and the data on the Internet. JavaScript, an open-source programming language, was implemented to enable a standalone or Web-based solution to integrate complex data-related functionality (Bayross, Reference Bayross2005) by interacting with Hyper Text Markup Language (HTML) source code. Decoding of the XML-tagged data by the JavaScript language into an application was carried out by an XML parser. The application output was encapsulated in HTML pages and presented via Web browsers in online and offline modes (Fig. 1).

Fig. 1 Data flow diagram of the portable search engine. A colour version of this figure can be found online at http://journals.cambridge.org/pgr.

Architecture of the search engine

The architecture of the portable search engine consists of four distinct units (Fig. 2), which are introduced below.

Fig. 2 Architecture of the portable search engine for registered crop germplasm. A colour version of this figure can be found online at http://journals.cambridge.org/pgr

Loading XML data application (block 1)

First, the unit of the architecture links and then loads the XML-based data source (local or Web based) to the search application. The module ensures the loading of the XML data into the computer memory before the search application operates.

Search input (block 2)

Searching the database is initiated and loaded via a Web browser. A search box-based query initiates the search process. Currently, the minimum number of search characters to be entered is 3.

Running the algorithm (block 3)

This module accepts the user input with a Boolean logic. The algorithm searches sequentially in every record of the XML data file and transfers all matches to a temporary storage. The algorithm iterates the process until the end of the file.

Results display (block 4)

Results are displayed as a formatted HTML page. The results are saved by copy–pasting into applications such as Microsoft (MS)-Excel, MS-Word, plain text file in comma separated values (CSV) format, etc.

User interface

User interface of the search engine opens in the default Web browser of the computer (fully compatible with old and new versions of all major Web browsers). A single-line text box is provided to accept the user's query input. The search box supports single or multiple words search term(s) with the Boolean operator AND to narrow down the search.

Result page

Hits produced by the search engine are presented in a tabulated format. If a search term, fully or partially, matches with any of the records, then the entire accession record is moved to the result pools that are arranged row-wise. Each result row contains the information on crop name, botanical name, national identity (accession number), donor identity, INdian Germplasm Registration (INGR) number (registration number), year, pedigree, developer, developing institute and novel unique features. The result page also displays the search term, the number of matched records and the total number of records available in the IRCG database.

Results

The portable search tool was implemented to query the registered germplasm data. Since 30 December 2011, as many as 1030 genotypes possessing unique traits have been registered with NBPGR (Tyagi and Kak, Reference Tyagi and Kak2012). When new records are available, the application would make a provision for users to update the data file by adding new rows. This provision helps to keep the application updated. Users, however, cannot incorporate additional information in the existing records.

Search engine

The primary objective of developing a portable search application is to cater to resource-challenged end users. The portable application carries (1) data, (2) portable search engine and (3) Web browser as the interface.

The algorithm used in the application is designed to consume minimum system resources and respond back in minimum time without affecting the system performance. Our tests showed that the application could handle up to 100K records without perceivable differences. For instance, a search with two AND operators (i.e. three search terms) utilized 2.7 megabyte (MB) memory and 5 ms time for application loading, and 14.7 MB memory and 7.3 s time for scripting and rendering. The observations confirm that the application is built for searching the XML data file with minimum memory and time overheads. This feature makes the application compatible with even old computers and browser versions.

Search terms for registered crop germplasm

The user is provided with a free-text search box and can search literally for any word (at least three letters). However, structured search is targeted against the contents of germplasm registration data, which are maintained for ten fields, namely crop name, botanical name, national identity (accession number), donor identity, INGR number (registration number), year, pedigree, developer, developing institute and novel unique features. Hits can be narrowed or expanded by using AND or OR between multiple search terms, respectively. They can be restricted to only exactly matching ones by using a space before or after the search term.

Search methodology

It is better to begin the search with a term matching any of the ten fields. For instance, to know whether there is any wheat genotype registered, the search term ‘wheat’ is used, which returns 152 hits. Alternatively, specifying wheat registered in 2008 (‘wheat and 2008’) returns 14 hits. Further narrowing the search with the location of the germplasm (‘wheat and 2008 and flowerdale’) returns three hits. Searching for durum wheat genotypes registered in 2008 (‘wheat and 2008 and durum’) results in a solitary hit. Searching for trait-specific tomato germplasm (‘tomato and high total soluble solids (TSS)’) returns a single hit. The search engine is provided with an illustrative help file (http://www.nbpgr.ernet.in/IRCG/html/IRCG_EXAMPLE.pdf).

Discussion

Search engines are not new in agricultural research; they have been used in several applications as an essential tool. Many genebanks and international organizations have databases to facilitate germplasm searches in a comprehensive manner. Some of these Web portals are hosted by Genesys (www.genesys-pgr.org/), the National Institute of Agrobiological Sciences of Japan (www.gene.affrc.go.jp/databases-plant_search_en.php), the International Rice Research Institute (www.iris.irri.org/germplasm/), the Centre for Genetic Resources of Wageningen (www.cgn.wur.nl/applications/cgngenis/), the National Plant Germplasm System of the USA (www.ars-grin.gov/npgs/searchgrin.html), etc. Major genebanks around the world have implemented robust databases and Web portals not only to facilitate access to information but also to act as a comprehensive genebank management system. Even a cursory browsing of these excellent online resources shows that using these germplasm information systems requires good Internet connectivity. Further, most of these portals employ latest tools of visualization and data presentation, and their best use is possible only with adequate computer resource replete with plug-ins and apps.

In order to realize the enhanced utilization of germplasm, it is imperative that information on germplasm is accessible to field researchers. In reality though, the absence of adequate infrastructure appears to thwart the connection between excellent databases and field workers devoid of requisite IT facility. To overcome this impediment, many organizations including the NBPGR produced and distributed CDs containing published data on genetic resources. However, (1) the methodology always lacked the provision to ‘search’ the data and (2) gaps appeared between the information available online and through CDs. We conceived a lightweight specialized search tool and the concept has been implemented in the form of a portable search engine to access unique crop germplasm registered at the NBPGR. The portable search engine along with data in the form of CDs was distributed among various plant breeders by the NBPGR. When new records are available, the users will update the excel file by adding new rows. This provision helps to keep the application updated. Users, however, cannot incorporate additional information in the existing records.

Conclusion

We present here a concept to develop a simple yet effective search tool as an offline solution to cater to people and places with low-end computational resources. The concept is implemented as a search engine for the NBPGR's registered crop germplasm. The new tool is expected to attract attention of breeders and other plant researchers, and is projected to facilitate search and eventually enhance the utilization of plant genetic resources.

Acknowledgements

The authors acknowledge the Director, NBPGR for facilities, Dr R. K. Tyagi for support and encouragement, and Mr Rajeev Gambhir and Mr Vijay Mandal for technical support. Vikas Kumar was financially supported by the Department of Biotechnology, New Delhi in the form of research associateship.

References

Bayross, I (2005) HTML, DHTML, JavaScript, Perl CGI. New Delhi: BPB Publication.Google Scholar

Chaudhri, AB, Rashid, A and Zicari, R (2003) XML Data Management: Native XML and XML-Enabled Database Systems. Reading: Addison Wesley Professional.Google Scholar

Fung, KY (2000) XSLT – Working with XML and HTML. Addison-Wesley: Boston.Google Scholar

Grappone, J and Couzin, G (2011) Search Engine Optimization: An Hour a Day. Indianapolis: John Wiley and Sons.Google Scholar

Kak, A and Tyagi, RK (2010) Inventory of Registered Germplasm (2009–2010). New Delhi: National Bureau of Plant Genetic Resources.Google Scholar

Kak, A, Srinivasan, K and Sharma, SK (2009) Plant Germplasm Registration. New Delhi: National Bureau of Plant Genetic Resources.Google Scholar

Kreinovich, V (2012) Designing, understanding, and analyzing unconventional computation: the important role of logic and constructive mathematics. Applied Mathematical Sciences 6: 629–644.Google Scholar

NBPGR(2006) Plant Germplasm Registration. New Delhi: National Bureau of Plant Genetic Resources.Google Scholar

Singh, AK (2002) Germplasm registration notice. Indian Journal of Plant Genetic Resources 15: 294–305.Google Scholar

Tyagi, RK and Kak, A (2012) Registration of Plant Genetic Resources in India – a review. Indian Journal of Agricultural Sciences 82(8): 651–659.Google Scholar

Wikipedia (2012) Web Search Engine. Wikimedia Foundation. Available at http://en.wikipedia.org/wiki/Web_search_engine.Google Scholar

Fig. 1 Data flow diagram of the portable search engine. A colour version of this figure can be found online at http://journals.cambridge.org/pgr.

Fig. 2 Architecture of the portable search engine for registered crop germplasm. A colour version of this figure can be found online at http://journals.cambridge.org/pgr

Article contents

Portable search engine for registered crop germplasm: a new concept for enhancing access to information on plant genetic resources

Abstract

Keywords

Introduction

Materials and methods

Technologies involved

Architecture of the search engine

Loading XML data application (block 1)

Search input (block 2)

Running the algorithm (block 3)

Results display (block 4)

User interface

Result page

Results

Search engine

Search terms for registered crop germplasm

Search methodology

Discussion

Conclusion

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests