RESTful API documentation
For more details on the RESTful API check the documentation and its related OpenAPI specification.
EMBL's European Bioinformatics Institute
Information on how to use the EBI Search service efficiently.
For more details on the RESTful API check the documentation and its related OpenAPI specification.
EBI Search [Publications] is a scalable text search engine that provides easy and uniform access to the biological data resources hosted at the European Bioinformatics Institute (EMBL-EBI).
The data resources in EBI Search include: nucleotide and protein sequences at both the genomic and proteomic levels; structures ranging from chemicals to macro-molecular complexes; gene-expression experiments; binary level molecular interactions as well as reaction maps and pathway models; functional classifications; biological ontologies; diseases; and comprehensive literature libraries covering the biomedical sciences and related intellectual property.
EBI Search, based on Apache Lucene, presents search results that are up-to-date with the data resources and provides an easy inter-domain navigation via a network of cross-references. It can be accessed over the web or programmatically using the RESTful Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines.
EBI Search is developed and maintained by the Knowledge Management team in collaboration with all the data providers at the EMBL-EBI. For any feedback, please use our support & feedback page.
EMBL-EBI hosts a vast amount of molecular data and other information that is indexed by EBI Search. This includes gene and protein sequences, protein families, structures, gene expression data, protein interactions, pathways and small molecules, to name a few. You can also search across academic literature and patents as well as information about our institute and staff members. In EBI Search boxes you can enter any meaningful term to find relevant information by typing, for example, accession numbers/identifiers (such as VAV_HUMAN), gene symbols (for instance tpi1), species or keywords. For more complex queries you can use EBI Search query syntax .
The results page for a search is organised into three main columns: on the left there is a summary of the hits per category/domain with the available facets displayed underneath; in the middle there is the list of search results; on the right, related data and alternative views are shown. In the left hand column users can filter the search results to the selected category or domain. Once filters are applied, buttons may appear for various operations, for example to save the results or send the results to a tool, and the option to create an RSS alert is displayed.
The navigation summary on the left allows users for a compact view and easy navigation across different categories and domains. It provides a means for exploring the search results grouped in relevant subsets and drilling down the scope of the results.
Vertical faceted menus, if available, are shown on the left side below the navigation summary. Values across different facets are normally applied conjunctively, whereas values applied within a given facet are applied disjunctively.
This is a list of the search results found in EBI Search with direct URLs to the data entries in the original portals and cross-references/alternative views of data. If your search query was for a gene or protein, links to summaries are presented above the main search results in the section titled Gene & protein summaries.
These summaries are a useful way to explore the data at EMBL-EBI from the perspective of a gene or protein, for certain key species. A summary collates data from several EMBL-EBI resources and is arranged along the central dogma of molecular biology. The summary page has a stable URL and can be exported/printed as a report. It incorporates information about the gene and its genomic context, its expression within an organism and in response to experimental factors, a wide range of functional information about the protein along with its interaction partners and folded 3D structure. Peer-reviewed publications and patents relevant to the gene or protein are also included. For each gene/protein, a summary comprises five individual sections that you can switch between. These are: gene, expression, protein, protein structure, and literature. You can also switch to another species in order to display equivalent information for a gene's orthologues.
When the user types any text in EBI Search boxes or specifies the value for the query parameter of a RESTful Web Services call, the input is translated into an Apache Lucene query that is then executed to get the search results. The actual query executed is generated following the typical Apache Lucene query syntax in order to provide a generic approach avoiding complex query rearrangements.
Multiple search terms separated by white spaces are combined by default in AND logic. Therefore an input text containing for example glutathione transferase is treated as glutathione AND transferase and only entries having both terms will be found.
The default order of results is based on their relevance, i.e. the proximity of the terms in the entries.
In the table below an overview of some useful query syntax elements is presented.
Element | Meaning | Usage | Example | Notes |
---|---|---|---|---|
AND | In addition to | term1 AND term2 | glutathione AND transferase | Matches entries where both glutathione and transferase occur. |
OR | Equivalence | term1 OR term2 | glutathione OR transferase | Matches entries where either glutathione or transferase occur. |
NOT | Exclusion | term1 NOT term2 | coding NOT fragment | Matches entries containing coding but not fragment. |
* | Wildcard | partialTerm* | gluta* | Matches for instance glutathione, glutamate, glutamic. |
" " | Exact match | "quoted text" | "x-ray diffraction" | Exact matching for entries containing x-ray diffraction. |
( ) | Grouping | (text) | (reductase OR transferase) AND glutathione | |
Field: | Field-specific search | fieldId:term | description:dopamine | Matches for a field description containing dopamine. |
The following characters within queries require to be escaped (using a ' \ ' before the character to escape) in order to be correctly interpreted:
+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /
Since Apache Lucene supports regular expression searches (matching a pattern between forward slashes) the forward slash ' / ' has become a special character to be escaped. For example to search for cancer/testis use the query cancer\/testis. If special characters are not escaped the actual query performed may be different from what expected.
Following the aforementioned query syntax, users can easily search and filter results according to data content and characteristics.
A few examples of queries that can be performed using EBI Search are listed below.
As mentioned before colons are to be considered special characters. Some data resources though, such as Gene Ontology (GO), have colons ' : ' in their main identifiers. Unfortunately when the format [PREFIX]:[number] is adopted for a search field, some issues may arise in query parsing since colons are interpreted as special separators by default. Despite the fact that some implicit escaping mechanism is in place the advice is to either quote or escape adequately the search terms in case of doubt.
For instance to search for all the cross-references called GO that refer to the entry identifier GO:0005730 you have two equivalent options:
Please consider the following notes:
The order in the list of results presented on the web pages for a search is mainly based on Apache Lucene scoring system: hits with closer matches are more relevant.
Although EBI Search can be configured to boost some particular domains and/or individual fields, it is recommended to use whenever possible a boosting factor at search time. To boost a term at search time use the caret symbol ' ^ ' with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. For instance if you wish to weigh more the first term in the query prostate AND cancer you can reformulate the query in this way: prostate^4 AND cancer.
On the search results page for a category or a domain, a 'Save result' button is shown to save search data in various formats including XML, JSON, TSV, CSV and a simple list of accession numbers/identifiers. Users can select through checkboxes the search entries to download as input to other analysis piplelines. If the 'Save result' button is pressed without pre-selected entries, it is assumed that users want to get all available search results. The maximum number of entries downloadable is currently limited to 100. If more than 100 entries are needed, pagination is available through the RESTful Web Services interface.
Search results in some domains such as UniProtKB can be used for launching analysis tools. For instance, it is possible to run a BLAST search with a selected entry from UniProtKB search results. The list of available tools depends on the specific domain and whether the tool buttons become active or inactive according to the number of selected entries. In UniProtKB, for example, Clustal Omega is available only with multiple entries, whereas BLAST and UniSave are enabled when a single entry is selected. Hovering a mouse cursor on a tool button shows a short description of it. Once a tool button is clicked on, the tool web page will be opened with selected entry identifiers pre-filled in either a new browser tab or window.
EBI Search allows users to subscribe to query results via RSS feeds. In search result pages query alert links are shown for each category and domain.
Query alerts enable users to stay up-to-date with information and data in particular areas of interest, providing means of monitoring new or updated content.
Alerting systems usually send notifications through emails. EBI Search instead is based on RSS format. Various RSS readers can be used and modern browsers can also deal with and render RSS content. To set up an alert on web result pages, click the Create alert button and bookmark or save the resulting URL using an RSS client.
It is possible to check for updates using:
Query alerts can be useful to retrieve the latest publications related to a particular topic in the literature resources; to obtain lists of the latest reviewed proteins in UniprotKB; to get the latest new or updated macromolecular structures in the PDBe.
Example feeds:
EBI Search resources can be accessed programmatically using the RESTful Web Services interface.
There are training materials as part of the EBI Train online:
[1] Madeira F., Pearce M., Tivey ARN., Basutkar P., Lee J., Edbali O., Madhusoodanan N., Kolesnikov A., Lopez R. (2022)
Search and sequence analysis tools services from EMBL-EBI in 2022.
Nucleic Acids Research published online on April 12, 2022.
Abstract DOI: 10.1093/nar/gkac240 full-text PDF .
[2] Madeira F., Park Y.M., Lee J., Buso N., Gur T., Madhusoodanan N., Basutkar P., Tivey ARN., Potter SC., Finn RD., Lopez R. (2019)
The EMBL-EBI search and sequence analysis tools APIs in 2019.
Nucleic Acids Research published online on April 12, 2019.
Abstract DOI: 10.1093/nar/gkz268 full-text PDF .
[3] Park Y.M., Squizzato S., Buso N., Gur T., Lopez R. (2017)
The EBI search engine: EBI search as a service—making biological data accessible for all.
Nucleic Acids Research published online on May 2, 2017.
Abstract DOI: 10.1093/nar/gkx359 full-text PDF .
[4] Squizzato S., Park Y.M., Buso N., Gur T., Cowley A., Li W., Uludag M., Pundir S., Cham J.A., McWilliam H., Lopez R. (2015)
The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI.
Nucleic Acids Research published online on April 8, 2015.
Abstract DOI: 10.1093/nar/gkv316 full-text PDF .
[5] Valentin F., Squizzato S., Goujon M., McWilliam H., Paern J. and Lopez R. (2010)
Fast and efficient searching of biological data resources using EB-eye.
Briefings in Bioinformatics Advance Access published online on February 11, 2010.
Abstract DOI: 10.1093/bib/bbp065 full-text PDF .
[6] Goujon M., Valentin F., Miyar T., McWilliam H. and Lopez, R. (2008)
The EB-eye
EMBnet.news 13.4: 18-21 December 2007.
full-text PDF