This website requires cookies, and the limited processing of your personal data in order to function. By using the site you are agreeing to this as outlined in our Privacy Notice and Terms of Use.

Edit

The current browser is not supported; this may affect the site functionality. Please see a list of supported browsers.

Help and documentation

Information on how to use the EBI Search service efficiently.

What is EBI Search?

EBI Search [Publications] is a scalable text search engine that provides easy and uniform access to the biological data resources hosted at the European Bioinformatics Institute (EMBL-EBI).

The data resources in EBI Search include: nucleotide and protein sequences at both the genomic and proteomic levels; structures ranging from chemicals to macro-molecular complexes; gene-expression experiments; binary level molecular interactions as well as reaction maps and pathway models; functional classifications; biological ontologies; diseases; and comprehensive literature libraries covering the biomedical sciences and related intellectual property.

EBI Search, based on Apache Lucene, presents search results that are up-to-date with the data resources and provides an easy inter-domain navigation via a network of cross-references. It can be accessed over the web or programmatically using the RESTful Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines.

EBI Search is developed and maintained by the Knowledge Management team in collaboration with all the data providers at the EMBL-EBI. For any feedback, please use our support & feedback page.

EBI Search Web interface

What can you search for?

EMBL-EBI hosts a vast amount of molecular data and other information that is indexed by EBI Search. This includes gene and protein sequences, protein families, structures, gene expression data, protein interactions, pathways and small molecules, to name a few. You can also search across academic literature and patents as well as information about our institute and staff members. In EBI Search boxes you can enter any meaningful term to find relevant information by typing, for example, accession numbers/identifiers (such as VAV_HUMAN), gene symbols (for instance tpi1), species or keywords. For more complex queries you can use EBI Search  query syntax .

Search results page

The results page for a search is organised into three main columns: on the left there is a summary of the hits per category/domain with the available facets displayed underneath; in the middle there is the list of search results; on the right, related data and alternative views are shown. In the left hand column users can filter the search results to the selected category or domain. Once filters are applied, buttons may appear for various operations, for example to save the results or send the results to a tool, and the option to create an RSS alert is displayed.

Summary

The navigation summary on the left allows users for a compact view and easy navigation across different categories and domains. It provides a means for exploring the search results grouped in relevant subsets and drilling down the scope of the results.

Facets

Vertical faceted menus, if available, are shown on the left side below the navigation summary. Values across different facets are normally applied conjunctively, whereas values applied within a given facet are applied disjunctively.

List of search results

This is a list of the search results found in EBI Search with direct URLs to the data entries in the original portals and cross-references/alternative views of data. If your search query was for a gene or protein, links to summaries are presented above the main search results in the section titled Gene & protein summaries.

Gene & protein summaries

These summaries are a useful way to explore the data at EMBL-EBI from the perspective of a gene or protein, for certain key species. A summary collates data from several EMBL-EBI resources and is arranged along the central dogma of molecular biology. The summary page has a stable URL and can be exported/printed as a report. It incorporates information about the gene and its genomic context, its expression within an organism and in response to experimental factors, a wide range of functional information about the protein along with its interaction partners and folded 3D structure. Peer-reviewed publications and patents relevant to the gene or protein are also included. For each gene/protein, a summary comprises five individual sections that you can switch between. These are: gene, expression, protein, protein structure, and literature. You can also switch to another species in order to display equivalent information for a gene's orthologues.

Query syntax

Overview

When the user types any text in EBI Search boxes or specifies the value for the query parameter of a  RESTful Web Services  call, the input is translated into an Apache Lucene query that is then executed to get the search results. The actual query executed is generated following the typical Apache Lucene query syntax in order to provide a generic approach avoiding complex query rearrangements.

Multiple search terms separated by white spaces are combined by default in AND logic. Therefore an input text containing for example glutathione transferase is treated as glutathione AND transferase and only entries having both terms will be found. 
The default order of results is based on their relevance, i.e. the proximity of the terms in the entries.

In the table below an overview of some useful query syntax elements is presented.

Element Meaning Usage Example Notes
AND In addition to term1 AND term2 glutathione AND transferase Matches entries where both  glutathione  and  transferase  occur.
OR Equivalence term1 OR term2 glutathione OR transferase Matches entries where either  glutathione  or  transferase  occur.
NOT Exclusion term1 NOT term2 coding NOT fragment Matches entries containing  coding  but not fragment.
* Wildcard partialTerm* gluta* Matches for instance glutathione, glutamate, glutamic.
" " Exact match "quoted text" "x-ray diffraction" Exact matching for entries containing x-ray diffraction.
( ) Grouping (text) (reductase OR transferase) AND glutathione  
Field: Field-specific search fieldId:term description:dopamine Matches for a field  description  containing dopamine.

Other search engines may provide similar capabilities in their query languages, however the results obtained can differ from EBI Search. These differences are usually related to the way data are searched and the nature of the query systems.
 

Escaping special characters

The following characters within queries require to be escaped (using a ' \ ' before the character to escape) in order to be correctly interpreted:

+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

Since Apache Lucene supports regular expression searches (matching a pattern between forward slashes) the forward slash ' / ' has become a special character to be escaped. For example to search for cancer/testis use the query cancer\/testis. If special characters are not escaped the actual query performed may be different from what expected.

Query examples

Following the aforementioned query syntax, users can easily search and filter results according to data content and characteristics.
A few examples of queries that can be performed using EBI Search are listed below.

Identifiers containing colons

As mentioned before colons are to be considered special characters. Some data resources though, such as Gene Ontology (GO), have colons ' : ' in their main identifiers. Unfortunately when the format [PREFIX]:[number] is adopted for a search field, some issues may arise in query parsing since colons are interpreted as special separators by default. Despite the fact that some implicit escaping mechanism is in place the advice is to either quote or escape adequately the search terms in case of doubt.

For instance to search for all the cross-references called GO that refer to the entry identifier GO:0005730 you have two equivalent options:

  • GO:GO\:0005730
  • GO:"GO:0005730"

Notes

Please consider the following notes:

  • Fuzzy queries are deactivated (e.g. gene~0.8).
  • Regular expression queries (e.g. /gene/) are only allowed on explicitly provided fields.
  • Prefix and wildcard queries need at least 3 characters, such as for hum*.
  • Range queries can only be applied to a specific field (e.g. publication_date:[2010 TO 2011]).
  • If no field is explicitly indicated, the actual query is executed through an expansion of the search text to all fields for each domain.
  • The execution time for a given depends on query complexity and scope.

Relevance

The order in the list of results presented on the web pages for a search is mainly based on Apache Lucene scoring system: hits with closer matches are more relevant.

Although EBI Search can be configured to boost some particular domains and/or individual fields, it is recommended to use whenever possible a boosting factor at search time. To boost a term at search time use the caret symbol ' ^ ' with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. For instance if you wish to weigh more the first term in the query prostate AND cancer you can reformulate the query in this way: prostate^4 AND cancer.

Additional features

Save result

On the search results page for a category or a domain, a 'Save result' button is shown to save search data in various formats including XML, JSON, TSV, CSV and a simple list of accession numbers/identifiers. Users can select through checkboxes the search entries to download as input to other analysis piplelines. If the 'Save result' button is pressed without pre-selected entries, it is assumed that users want to get all available search results. The maximum number of entries downloadable is currently limited to 100. If more than 100 entries are needed, pagination is available through the RESTful Web Services interface.

Launching a tool

Search results in some domains such as UniProtKB can be used for launching analysis tools. For instance, it is possible to run a BLAST search with a selected entry from UniProtKB search results. The list of available tools depends on the specific domain and whether the tool buttons become active or inactive according to the number of selected entries. In UniProtKB, for example, Clustal Omega is available only with multiple entries, whereas BLAST and UniSave are enabled when a single entry is selected. Hovering a mouse cursor on a tool button shows a short description of it. Once a tool button is clicked on, the tool web page will be opened with selected entry identifiers pre-filled in either a new browser tab or window.

Query alerts via RSS feeds

EBI Search allows users to subscribe to query results via RSS feeds. In search result pages query alert links are shown for each category and domain.

What are alerts for?

Query alerts enable users to stay up-to-date with information and data in particular areas of interest, providing means of monitoring new or updated content.

How are alerts set up?

Alerting systems usually send notifications through emails. EBI Search instead is based on RSS format. Various RSS readers can be used and modern browsers can also deal with and render RSS content. To set up an alert on web result pages, click the Create alert button and bookmark or save the resulting URL using an RSS client.

How do I check for updates?

It is possible to check for updates using: 

  • a browser: the RSS feed URL can be stored as a bookmark on a browser. Going back to that bookmark will re-run the query against EBI Search server.
  • an RSS client: stored feeds get re-run and updated every time the user requests to view them.
Examples of alerts

Query alerts can be useful to retrieve the latest publications related to a particular topic in the literature resources; to obtain lists of the latest reviewed proteins in UniprotKB; to get the latest new or updated macromolecular structures in the PDBe.
Example feeds:

RESTful Web Services API

EBI Search resources can be accessed programmatically using the RESTful Web Services interface.

  • You can generate your own client from the public  WADL  or take  RESTful sample clients  as a reference implementation.
  • The Web Services API covers everything users can do on the Web interface.

Training materials

There are training materials as part of the EBI Train online:

Publications

[1] Madeira F., Pearce M., Tivey ARN., Basutkar P., Lee J., Edbali O., Madhusoodanan N., Kolesnikov A., Lopez R. (2022) 
Search and sequence analysis tools services from EMBL-EBI in 2022.  
Nucleic Acids Research published online on April 12, 2022. 
Abstract  DOI:  10.1093/nar/gkac240    full-text PDF .

[2] Madeira F., Park Y.M., Lee J., Buso N., Gur T., Madhusoodanan N., Basutkar P., Tivey ARN., Potter SC., Finn RD., Lopez R. (2019) 
The EMBL-EBI search and sequence analysis tools APIs in 2019.  
Nucleic Acids Research published online on April 12, 2019. 
Abstract  DOI:  10.1093/nar/gkz268    full-text PDF .

[3] Park Y.M., Squizzato S., Buso N., Gur T., Lopez R. (2017) 
The EBI search engine: EBI search as a service—making biological data accessible for all.  
Nucleic Acids Research published online on May 2, 2017. 
Abstract  DOI:  10.1093/nar/gkx359    full-text PDF .

[4] Squizzato S., Park Y.M., Buso N., Gur T., Cowley A., Li W., Uludag M., Pundir S., Cham J.A., McWilliam H., Lopez R. (2015) 
The EBI Search engine: providing search and retrieval functionality for biological data from EMBL-EBI.  
Nucleic Acids Research published online on April 8, 2015. 
Abstract  DOI:  10.1093/nar/gkv316    full-text PDF .

[5] Valentin F., Squizzato S., Goujon M., McWilliam H., Paern J. and Lopez R. (2010) 
Fast and efficient searching of biological data resources using EB-eye.  
Briefings in Bioinformatics Advance Access published online on February 11, 2010. 
Abstract  DOI:  10.1093/bib/bbp065    full-text PDF .

[6] Goujon M., Valentin F., Miyar T., McWilliam H. and Lopez, R. (2008) 
The EB-eye  
EMBnet.news 13.4: 18-21 December 2007. 
full-text PDF

Your privacy


Edit