Overview

The European Variation Archive is an open-access database of all types of genetic variation data from all species.

All users can download data from any study, or submit their own data to the archive. You can also query all variants in the EVA by study, gene, chromosomal location or dbSNP identifier using our Variant Browser.

We will be adding new features to the EVA on a regular basis, and welcome your comments and feedback.



Search for SNPs

The RS ID release 6 is available in our FTP or through our API . See release page for details.

News


Statistics

Short genetic variants studies (<50bp)

Structural variants studies (>50bp)

This web application makes an intensive use of new web technologies and standards like HTML5. Please see FAQs for further browser compatibility notes.

Submit

Please read our Data Requirements and the Key stages of submission below. All data valid for EVA submission shall be made available via the Study Browser and will be browsable using both the Variant Browser and the EVA API. Variant Effect Predictor annotations shall be available for variants mapped to genome assemblies that are known to Ensembl.

Data submitted to the EVA is brokered to our collaborating databases at NCBI, dbSNP and dbVar. It is therefore unnecessary to submit data to multiple resources.

Data requirements

EVA accepts all types of precise genetic variants, in any species providing the following requirements are met:

  1. Data is described in valid VCF file(s). This can be tested prior to submission using the EVA VCF validation suite found here. For help with converting variation data to VCF, please see our help pages.
  2. Data includes sample genotypes and/or allele frequencies
  3. The reference sequence used is INSDC registered, or will be at point of submission. A "reference" can be any of the following, but not restricted to:PLEASE NOTE: Sequence identifiers in VCF must match those in the reference FASTA file.
  4. If consent was gathered for any individual human genotype data then a consent statement must be completed prior to submission.

Variant accessions (ss# and rs#) and study accessions will only be provided for data which satisfies all data requirements. More details on whether your data is suitable can be found here.

Alternative resources for data not accepted by EVA

  • Submit structural variations that cannot be expressed in VCF(s) to DGVa.
  • Submit variations with sensitive clinical data to EGA.
  • Submit variations with clinically relevant genetic variant data, i.e. data that relates genetic variation(s) with clinical significance values (e.g. pathogenic, benign, etc.), to the ClinVar archive at NCBI.

Key stages of EVA submissions

Prepare

  • Prepare valid VCF file(s), which can be validated prior to submission using the EVA VCF validation suite.
  • Complete a metadata template describing the samples and analyses in your study. Please provide as much metadata as possible since this information is extremely useful for downstream analysis and is directly related to the frequency at which datasets archived at EVA are reused. For reference, here is an example of a completed metadata template.

Please also note that the template requires the submitter to fill in some personal data, which will be used as described in our privacy notice.

Contact

Contact eva-helpdesk@ebi.ac.uk to request a submission. You will receive a custom private FTP account to deposit your data.

Submit

Upload your VCF file(s), metadata template and any associated data file(s) to your private FTP location.

Receive

The EVA aims to process submission requests within two business days. Accession numbers will be sent via email to the submitter upon successful archival of the deposited data.

Feedback

If you have any questions related to the European Variation Archive resource, please contact us.

Follow us on Twitter using @evarchive

API

The general structure of a EVA REST web service URL is one of:

Where:

* version: indicates the version of the API, this defines the available filters and JSON schema to be returned. Currently there is only version 'v1'.
* category: this defines what objects we want to query. Currently there are five different categories for variant information queries: variants, segments, genes, files and studies and two categories for accessioning queries: submitted-variants (to query by SS ID) and clustered-variants (to query by RS IDs)
* resource: specifies the resource to be returned, therefore the JSON data model.
* filters: each specific endpoint allows different filters.

REST web services have been implemented using GET protocol since only queries are allowed so far. Several IDs can be concatenated using comma as separator.
For more detailed information about the API and filters you can visit the project wiki and Swagger documentation for variant information queries and accessioning locus queries.

Some example of queries include:

* To fetch all the variants in a segment region:
http://www.ebi.ac.uk/eva/webservices/rest/v1/segments/11:128446-128446/variants?species=hsapiens_grch37


* To fetch all the info for a variant by ID:
http://www.ebi.ac.uk/eva/webservices/rest/v1/variants/rs666/info?species=hsapiens_grch37


* To fetch locus info for a variant by SubSNP ID (SS ID):
https://www.ebi.ac.uk/eva/webservices/identifiers/v1/submitted-variants/99308221


* To fetch locus and type info for a variant by RefSNP ID (RS ID):
https://www.ebi.ac.uk/eva/webservices/identifiers/v1/clustered-variants/17870277


* To fetch associated SubSNP IDs (SS IDs) for a variant by RefSNP IDs (RS IDs):
https://www.ebi.ac.uk/eva/webservices/identifiers/v1/clustered-variants/17870277/submitted


* To fetch all the Short Genetics Variations studies:
http://www.ebi.ac.uk/eva/webservices/rest/v1/meta/studies/all


* To fetch all the Structural Variations studies:
https://www.ebi.ac.uk/dgva/webservices/rest/v1/meta/studies/all


* To fetch all info of a study:
http://www.ebi.ac.uk/eva/webservices/rest/v1/studies/PRJEB4019/summary


* To fetch all file information of a study:
http://www.ebi.ac.uk/eva/webservices/rest/v1/studies/PRJEB4019/files?species=hsapiens_grch37

Rate Limiting for Variant Region queries

Rate limiting has been implemented on Variant Region queries in order to ensure fairness when serving multiple client requests. Therefore, please limit Variant Region API request rates to 5 requests/second. Higher rates of request might result in a HTTP 429 (Too many requests) response.

Also, when specifying the "limit" parameter in these region queries, please restrict it to 10000 or lower. Greater values for this parameter are disallowed and will result in a HTTP 500 (Bad Request) response. Please note that this restriction on the "limit" parameter does not mean that large queries are forbidden altogether. It just means that only a maximum of 10,000 records will be served in a single request. Therefore, a client program can employ the limit parameter in conjunction with the "skip" parameter to "page through" the results from a large region. For example: the following queries can be used to page through the 20,259 results in the variant region 105000001-105500000 in chromosome 1 of the Mouse grcm38 assembly.

Help

  • What is the European Variation Archive (EVA)?

    The European Variation Archive (EVA) is EMBL-EBI's open-access genetic variation archive. The EVA accepts submission of all types of precise genetic variants, ranging from single nucleotide polymorphisms to large structural variants, observed in germline or somatic sources, from any organism. The EVA permits access to these data at two distinct levels:

    i) The raw variant data as was submitted to the EVA, via the EVA Study Browser

    ii) The normalised and processed variant data, via the EVA Variant Browser and EVA API

  • What are the EVA normalisation and variant processing steps?
    EVA Variant Level Processing: Submitted data from the EVA Study Browser -> Variants are merged, normalized and annotated for functional consequences and statistical values -> Non-human variants are accessioned by the EVA. Human variants are brokered to dbSNP at NCBI and resulting SS/RS accessions are ingested by EVA -> Data are exposed as JSON objects either via the EVA website GUI or API

    Normalisation

    Variants submitted to the EVA have been determined by a number of different algorithms and software packages. As a result, the VCF files generated by these differing methodologies describe variants in a number of different ways. The primary processing step of the EVA is to normalise variant representation following two basic rules:

    1. Each variant is shifted to be left-aligned
    2. The Start and End positions represent exactly the range where the variation occurs (which could, in the case of insertions, result in the reference allele being recorded as 'empty')

    Examples of our variant normalisation process can be seen here

    Annotation

    Once variants have been normalised, the EVA uses the Variant Effect Predictor (VEP) of Ensembl to annotate variant consequences. The variant consequences are described using Sequence Ontology terms and both the VEP version and Ensembl gene build used are described via the "i" help bubbles on the EVA Variant Browser.

    N.B. Variants that have been mapped to a reference genome sequence that is not supported by Ensembl are not annotated.

    Statistical calculations

    The EVA adopts the classical definition of allele frequency (AF): 'a measure of the relative frequency of an allele at a genetic locus in a given population'. The AF value(s) stored by the EVA for each variant is (are) study specific - i.e. the same variant reported in two distinct studies shall be given two allele frequencies, one for each study. There are two methodologies by which the EVA is able to determine allele frequency values, dependent on the datatype of the study in question:

    Variants associated with genotypes:

    For variants associated with genotypes, the EVA determines the AF values via the calculation:

    AF = (number of alternate allele observations (AC)) / (number of observations (AN))

    The result of this calculation allows the EVA to also store the minor allele frequency (MAF) for each variant (defined as the minimum of the reference or alternative allele frequency) and the MAF allele (the allele associated with the MAF).

    Variants not associated with genotypes:

    For variants that are not associated with genotypes, the EVA is dependent on the AF value(s) estimated from the primary data and provided in the submitted VCF files(s). AF values that are specifically provided in the submitted aggregated VCF file(s) are directly stored. In cases where no AF is provided then the EVA uses the AC and AN values in the submitted aggregated VCF file(s) to calculate AF value(s) via the calculation:

    AF = AC / AN

    Population / sample cohort allele frequency values:

    The EVA accepts submission of pedigree files, or structured samples (using "derived_from" and/or "subject" layers), to define populations and cohorts within studies. In cases where such information is associated with variants that have genotypes then the EVA calculates intra-study population/cohort specific AF values via the method described above, with the caveat that the (total number of populations/cohorts):(total number of samples) ratio must be less than 1:10. For studies that do not contain genotypes but instead provide intra-study population/cohort AF values in the submitted aggregated VCF file(s), or AC and AN values, then these are directly stored, or calculated by the EVA using the method described above, again with the caveat that a ratio of 1:10 (total number of populations/cohorts):(total number of samples) must not be exceeded.

    *NB: there are a low number of variants for which the EVA is unable to determine any allele frequency value(s) as the submitted VCF file(s) contain neither genotypes nor AF or AC and AN values. The EVA discourages submission of variants that cannot be associated with an AF.

  • With whom does the EVA collaborate?
    The EVA receives data from direct submissions, and exchanges it with international collaborators


    dbSNP and dbVar

    dbSNP and dbVar maintain the accessions for human variants, while the EVA accessions any non-human species. Human data is routinely exchanged between resources.

    Database of Genomic Variants Archive

    The sister database of the EVA was initially solely responsible for storing structural variants. This data can now be archived in the EVA, as long as it can be represented in Variant Call Format (VCF).

    Europe PubMed Central

    An archive for life science literature, which can be directly linked to archived data within the EVA using unique study accessions.

    Open Targets

    A private-public partnership aiming to systematically identify and prioritize drug targets. The EVA aids in the automation and manual curation of ClinVar traits for the Open Targets platform.

    Accelerating Medicines Partnership - Type 2 Diabetes

    A knowledge portal enabling observation of human genetic information linked to type 2 diabetes. With the EVA as one of the resources leading the federation component, the software architecture is designed and developed between both resources, to allow submission and consumption across multiple nodes.

    European Nucleotide Archive and Biosamples

    Data archived at the EVA is first brokered to BioSamples and the ENA to provide unique accessions for each submission. Reference sequences used by submitters must also be present within the ENA, which is part of the International Nucleotide Sequence Database Collaboration.

    Ensembl

    Ensembl is a genome browser that provides support for a large diversity of species. The EVA provides non-human variant and genotype information, which can then be viewed and downloaded via the Ensembl web browser and API. Ensembl provide the Variant Effect Predictor (VEP) tool, which the EVA uses to functionally annotate archived variants.

    Global Alliance for Genomics and Health

    An international organization that defines standards for genomic data sharing, with the EVA being one of the driver projects that guide GA4GH development efforts, as well as an active contributor to specifications like the Variant Call Format (VCF).

    Elixir

    An organization that brings together infrastructure and various scientific communities across Europe. Elixir both supports the EVA as a recommended deposition database, and FAIR data principles for the accessibility and reusablity of genetic data.

  • How can I follow the development of the EVA?
    The following are our GitHub repositories:
    • The EVA VCF validator checks that a file is compliant with the VCF specification. It includes and expands the validations supported by the vcftools suite. It supports versions 4.1, 4.2 and 4.3 of the specification.

    • The EVA pipeline processes VCF files, stores the variation data in a database and post-processes it, in a way that can be later consumed via web services.

    • The EVA web services serve the data generated and stored by the EVA pipeline. They follow the REST paradigm and can be consumed by any external application.

    • The EVA website displays the data served by the EVA REST web services API in a user-friendly way.

    Acknowledgement

    We would like to acknowledge the following software support.