GWAS Catalog

The NHGRI-EBI Catalog of human genome-wide association studies

Frequently Asked Questions

A. General

1. How can I access GWAS Catalog data?

GWAS Catalog data from published studies is available through our search interface. Separate pages are available for each publication, study, trait, variant and gene in the Catalog so that each of these can be explored individually. To get started, search for any text you wish in the search bar, then select a specific page for more information. See Searching in the GWAS Catalog below for further tips on how to find specific types of information, or see the introductory video. GWAS Catalog data can also be downloaded in spreadsheet form. To download full association and study data, see our file downloads page. You can also download specific association data sets from each “Publication”, “Study”, “Trait”, “Variant” and "Gene" page using the “Download Associations” button.

Summary statistics files are available to download from our FTP site via links from the study in the search interface or our designated summary statistics page. Harmonised summary statistics are also available from our summary statistics database via API.

The GWAS Catalog diagram presents a graphical view of the GWAS Catalog data.

We also provide REST API access to the GWAS Catalog data. See Programmatic access below for more information.

From 2020, the GWAS Catalog started accepting submissions for pre-published and unpublished GWAS. For more information, see Does the GWAS Catalog include data that isn’t associated with a journal publication?

2. Why did the Catalog move from the NHGRI to the EBI?

From September 2010 to the present, delivery and development of the Catalog has been a collaborative project between EMBL-EBI and NHGRI. In March 2015 the Catalog infrastructure moved to EMBL-EBI to enable delivery of an improved user interface, including ontology driven Catalog searching, and new curatorial infrastructure, supporting improved QC processes. Content available from the NHGRI site was last updated 20 February 2015 and is now frozen. Updated content is available from here. The latest updated download file is now available from here.

3. How can I learn more about using the GWAS Catalog?

Have a look at our Related Resources page for training materials, or see the FAQ sections below for some hints and tips. You can also read a description of our curation methodology, and find a list of publications by the GWAS Catalog.

4. How can I stay informed about new search features and developments?

You can subscribe to our announcement list by sending an e-mail to gwas-announce-join@ebi.ac.uk with subject heading "subscribe". Traffic on this list will be limited to important announcements only so you don’t need to worry about getting bombarded with loads of emails. For queries and user discussion, we have separate mailing lists, gwas-info@ebi.ac.uk to contact the Catalog team and gwas-users@ebi.ac.uk for user discussion (subscribe by emailing gwas-users-join@ebi.ac.uk with subject heading "subscribe"). You can also follow us on Twitter @GWASCatalog.

5. How should I cite the GWAS Catalog?

Please see the About page for citation guidance.

6. What are the terms and conditions for accessing the GWAS Catalog data and code?

Summary statistics are available under CC0 terms unless otherwise indicated - see below for more details. Other GWAS Catalog data can be used under the standard Terms of Use for EBI services which can be found at http://www.ebi.ac.uk/about/terms-of-use. Our code is available under the Apache version 2.0 license.

7. How often are the GWAS Catalog and diagram updated?

New data is added to the GWAS Catalog and diagram every two weeks. Data releases include all downloadable spreadsheets. You can find the date of the most recent data release at the bottom of the Catalog home page. Summary statistics files are made available as soon as possible, even before the study is included in our data release. Therefore, if a manuscript states that summary statistics are available from the Catalog and you cannot find them in the list of studies with summary statistics files or on our FTP site please contact us at gwas-info@ebi.ac.uk and we can give you direct access to the files.

8. A new GWAS paper has just been published. Why is it not in the GWAS Catalog?

Due to the considerable manual curation effort that goes into each publication in the GWAS Catalog, it takes a while for publications to be included in the Catalog after they have been first indexed in PubMed. The GWAS Catalog curation team work as fast as they can to process studies while maintaining the high standard of accuracy our users expect of the Catalog. If your publication of interest is more than a couple of months old, please contact us at gwas-info@ebi.ac.uk to confirm we have identified it and that it is in our curation queue. We will prioritise publications of particular interest to our users.

9. Where can I find the GWAS Catalog infrastructure code?

All our code is freely available from our Github repository.

10. What is the difference between a publication and a study?

A publication refers to an article published in a scientific journal. We use each publication’s unique PubMed ID to keep track of it in the GWAS Catalog. Some publications contain multiple genome-wide association studies with distinct traits, sample cohorts or other unique characteristics. Each of these separate analyses is stored as a study in the Catalog and is given a stable accession number beginning with “GCST”. You can read more about how we curate publications containing multiple analyses in our Curation methods section.

11. How do I find out the accession number for my study of interest?

Each separate study in the GWAS Catalog has an accession number beginning with “GCST”. Study accessions are visible at the top of each “Study” page and in the “Studies” and “Associations” data tables on other pages. Accession numbers are included in the v1.0.2 and v1.0.3 spreadsheets for associations and studies as well the ancestry spreadsheets. Accession numbers are not provided in the v1.0 spreadsheets as these are legacy formats provided only to support backwards compatibility with the old NHGRI spreadsheet.

12. What is the difference between a trait and a reported trait?

We assign each study in the Catalog one or more standardised trait terms from the Experimental Factor Ontology to represent the disease, phenotype, measurement or drug response under investigation. For more information about how ontologies are used in the Catalog, see our ontology page. Each trait has its own page in the Catalog, where you can see all of the relevant studies, and any variants associated with the trait.

In addition, each study has a reported trait, based on the authors’ description of the phenotype analysed. The reported trait takes the study design into account and is useful for understanding the specific details of the phenotype, especially in complex studies that include SNP-by-environment interactions etc.

13. What is a background trait?

A background trait is a characteristic that is shared by all participants in a study, but is not directly tested in the association analysis. For example, a study of "Allergic rhinitis in asthma" compares cases (individuals with allergic rhinitis) vs controls (individuals without allergic rhinitis), where all samples (both cases and controls) have asthma. The aim of this study would be to identify variants associated with allergic rhinitis - the main trait under investigation. Asthma in this example is a background trait: the study wouldn’t be able to identify variants associated with asthma, but the fact that all participants have asthma may provide important context to the allergic rhinitis associations that are reported. Therefore we display both main and background traits, clearly labelled as such, in the GWAS Catalog.

See FAQ B(iii)-5 for more information about how background traits are displayed in the GWAS Catalog.

14. How do I identify interaction studies in the GWAS Catalog?

The GWAS Catalog contains SNP-by-SNP and SNP-by-environment interaction studies as long as the SNPs analysed meet our criteria of being genome-wide. For both types of study, the term “interaction” is included in the reported trait.

For SNP-by-SNP interaction studies the term “SNP x SNP interaction” is added in parenthesis. For SNP-by-environment interaction studies, the environmental component is included in the reported trait. Since July 2018, we have added information to distinguish between the different statistical tests for SNP-by-environment interactions: the 2-degree of freedom test of both the main effect and the interaction term versus the 1-degree of freed test of just the interaction term. For these recent studies, the reported trait is represented as e.g. “Lung cancer x smoking interaction (1df test)”. Earlier studies do not include the type of test e.g. “Lung cancer (smoking interaction)”.

To identify interaction studies, go to the “Trait” page for either the main phenotype or an interaction term, e.g. “diastolic blood pressure”. You can then use the search box in the “Associations” or “Studies” tables to search for “interaction”.

You can also search for “interaction” in the download spreadsheet.

15. How do I identify targeted and exome array studies in the GWAS Catalog??

Targeted/exome array studies included in the Catalog are indicated by a small “target” icon. This icon appears in the search results next to any publication that includes a targeted array study. It is also displayed in the “Studies” table (on the “Publication”, “Trait”, “Variant” or "Gene" page), in the “Study accession” column.

Targeted/exome array studies are identifiable in the download file from the presence of an extra column displaying the field “Genotyping technology (additional array information)”, as described in our download section.

16. Why are only certain targeted or exome array studies available in the GWAS Catalog?

We are working on expanding the scope of the GWAS Catalog to include large-scale targeted/non-genome-wide arrays, including the Metabochip, Immunochip and Exome array. Feedback from our users has indicated a high demand for studies of this type to be included in the Catalog. This is currently in a pilot phase where prioritisation of targeted and exome array studies for inclusion in the Catalog is by 1) relevance of the trait analysed 2) user request.

17. Does the GWAS Catalog include CNV studies?

CNV studies are not currently within the scope of the GWAS Catalog for literature identification and curation of associations. However, we can accept submissions of summary statistics from CNV studies and will make files and metadata available for these studies.

18. How can I separate OR from beta in the associations download?

It is not currently possible to download the entire Catalog with OR and beta in separate columns. However, betas and ORs can be distinguished as all betas have a unit and direction e.g. “unit increase” or “cm decrease”. In the download, this is included in the "95% CI (TEXT)” column.

Alternatively, if you download search results directly from the Associations table on the web interface (using the download button to the top right of the table), the file will replicate what you see in the table, with OR and beta in separate columns.

19. Is each association in the GWAS Catalog unique?

Each association in the Catalog comes from a unique analysis. However, certain cohorts are repeatedly analysed in slightly different ways so the same associations may appear multiple times in the Catalog. Similarly, the association results from component groups of a meta-analysis may be represented in the Catalog, as well as the association results from the meta-analysis itself. Users can check the sample number and ancestry as a clue to “duplicated” analyses, but we encourage users to examine the source publications further for more detail.

20. Are results from sequencing-based association analyses included in the GWAS Catalog?

We welcome user submissions of summary statistics for sequencing-based association analyses, so some sequencing-based associations will appear in the Catalog. However, our full manual curation process is only routinely applied to array-based association analyses at the current time. We are investigating expanding the scope to include more sequencing-based association studies. You can read the results of our review of the sequencing-based association literature in McMahon et al, Sequencing-based genome-wide association studies reporting standards, Cell Genomics (2021), and see our list of studies and curated metadata or give us your input on our pilots page.

21. How does the GWAS Catalog represent extremely small p-values?

Some publications may report a GWAS association p-value of 0 due to the limits of methods or analysis software to compute very small numbers. Where authors are unable to provide the precise p-value we will extract the maximum threshold provided to us, e.g. <1e-300 will be extracted as 1e-300. However it should be noted that the true p-value may be much smaller.

22. Does the GWAS Catalog include data that isn’t associated with a PubMed-indexed journal publication?

From 2020, the GWAS Catalog started accepting submissions for unpublished GWAS (including pre-published GWAS i.e. data associated with a pre-print or article in press). Unpublished GWAS metadata is available via the study pages for each accession number and in more detail in the unpublished download files. Unpublished summary statistics are available via our FTP site and summary statistics page page as well as via the study pages. Unpublished data is made available exactly as submitted by authors and has not been reviewed by our curators. Upon publication it is curated, annotated, extended to include top associations and incorporated into our main database.

23. How can I find the mouse orthologs in IMPC for the genes in the GWAS Catalog?

Each GWAS Catalog gene with a mouse ortholog in IMPC is linked via a button on the gene page. Orthology predictions are provided by IMPC’s reference database which is rebuilt every week to include the latest HCOP ortholog relationships and data from MGI. For more information on ortholog mapping, refer to IMPC’s documentation and publication. Where no ortholog has been established, the button is not displayed.

24. What are "top associations"?

Top associations describe all those which appear within Associations tables of the Catalog, which have been manually curated from published articles. These are filtered via our curation process to include only those:

  • significant (p<1e-5) in all stages of an analysis

  • either described as independent by the authors, or the peak association within 100kb range

Top associations are distinguished from full summary statistics which contain all associations discovered in a GWAS, regardless of independence or significance.

B. Searching the GWAS Catalog

1. How do I search the GWAS Catalog?

Type your query, e.g. “breast carcinoma”, into the search box and hit return or click the search icon. You can type any text you wish into the search bar. The search then returns any publications (marked with the letter P), variants (V) or traits (T) in the Catalog that contain an exact string match within a number of data fields. You can use the “Refine search results” box on the left to show only publications, variants or traits. See B(i-iv) below for more details on how to search for each specific document type.

B(i). Searching by publication

1. How do I search by publication?

You can find a publication by searching for the PubMed ID, any author or any word within the publication title. Note that all authors associated with a publication are included in our database, so searching for an author name will return all publications featuring that author, not only first author publications. This means that an author name can return a very large number of results. If you are looking for a specific publication we recommend searching by PubMed ID.

2. I am searching for the author of a publication. Why do my search results also contain traits?

The search returns all publications, traits and variants that contain a match for the text string entered across all fields, so if your search term is for example "Parkinson", you will find publications with an author named Parkinson as well as publications with “Parkinson” in the title and traits related to Parkinson’s disease. If you are looking for a specific publication we recommend searching by PubMed ID.

B(ii). Searching by variant or gene

1. How do I search by variant?

You can find a variant (or single nucleotide polymorphism, SNP) by searching for an rsID, a genomic region or a gene mapped to that variant. As mapped genes and genomic regions can return a large number of results, we recommend searching by rsID if you are looking for a specific variant.

See Genomic mappings below for details of how we map variants to genes.

2. How do I search by gene?

You can search for a gene in the main search bar eg. STAT4. This will return any matching genes, as well as variants annotated with that gene by out mapping pipeline. The results may also include publications with the gene name in the title.

The "Gene" page provides a list of all associations mapped to that gene as well as other gene-specific data. See Genomic mappings below for details of how we map variants to genes. Note that this may not always match the gene reported by authors for a given variant, as they may use different criteria.

Author-reported genes can be found in the full data download. Opening the file in Excel and applying a filter for your gene of interest to the REPORTED GENE(S) column will enable you to extract all associations in that gene.

You can also use our REST API to return associations for a specific gene or genomic region.

3. How do I search by genomic region?

You can search by genomic region using the format chromNumber:bpLocation-bpLocation, for example 6:16000000-25000000. You can also search using cytogenetic nomenclature, for example 2q37.1. These searches will return a list of genes and variants within the region.

B(iii). Searching by trait

1. How do I search by trait?

To find a trait, type the name of any disease, phenotype, measurement or drug response. The search will return traits matching your search term, synonyms of traits matching your search term and child traits of both of these e.g. a search for “cancer” would also return all cancer subtypes. Note that it will also return publications where the title includes your search term.

If you can’t find your trait of interest, it may be that it is included in the GWAS Catalog under a different name. For example, searching for “general cognitive ability” will return the synonym “intelligence”, which is how that trait is stored in the GWAS Catalog. Note that the search bar offers suggestions as you type, including possible synonyms for your trait of interest.

2. When searching for a trait, do the results include all publications for that trait?

A publication is only returned if the publication title, authors or PubMed ID contain your search term. If you want to find all of the studies on a particular trait, first go to the “Trait” page and then look at the “Studies” table.

3. When I search for a certain trait why are other traits returned?

Sometimes it may not be immediately obvious why your search has returned a particular trait.

In addition to exact string matches and synonyms for your search term, the search results may also include more specific child terms of a trait that matches your search. This can be useful, for example, if you want to look for subtypes of a particular disease, e.g. searching for “thyroid disease” returns the traits “Hashimoto’s thyroiditis” and “Graves disease”, both types of thyroid disease. Hierarchical relationships between traits are based on the Experimental Factor Ontology (EFO). For more information about how ontologies are used in the Catalog, see our ontology page.

The search results may also contain traits that have been studied together with your trait of interest in some way, for example in a GWAS for multiple traits or for a compound trait. For example, searching for “asthma” also returns the trait “response to bronchodilator”. This is because the GWAS Catalog includes a study on response to bronchodilator in a sample of people who all have asthma. See FAQ B(iii)-4 and -5 to find out how more complicated phenotypes are represented in the Catalog.

You may also find a publication in the search results, if the publication title contains your trait of interest.

4. How are multiple or compound traits represented in the Catalog?

Some studies are mapped to more than one trait, usually because those studies involve a more complex definition of the phenotype under investigation. Currently, the best way to understand the relationship between multiple traits in the same study is to look at the reported trait, which is based on the phenotype description used in the original paper.

Where a study has combined groups of individuals with different traits in the same analysis, this is indicated by the use of the word “or” in the reported trait. For example, if individuals with bipolar disorder and individuals with schizophrenia were compared to controls in the same analysis, the reported trait would be "bipolar disorder or schizophrenia”. The study would be mapped to two traits from the ontology: “bipolar disorder” and “schizophrenia”.

Where a study includes individuals each having multiple traits, this is indicated by the word “and” in the reported trait. For example, if individuals diagnosed with bipolar disorder who show binge-eating behaviour were compared to controls, the reported trait would be “bipolar disorder and binge eating”. The study would be mapped to two traits from the ontology: “bipolar disorder” and “binge eating”.

Please note: due to issues of scale with the increasing number of studies associated with biobanks, where reported traits include the words “UKB data field” or ICD codes we cannot guarantee these follow our standard naming conventions as they may have been extracted unedited from the paper. Users are recommended to refer to the source of the code (e.g. https://biobank.ndph.ox.ac.uk/ukb/index.cgi) to confirm details in this case.

5. How are background traits represented in the Catalog?

A background trait is a characteristic that is shared by all participants in a study, but is not directly tested in the association analysis. See FAQ A13 for a more detailed introduction to background traits.

Since July 2021, we present Experimental Factor Ontology terms for any background traits in a separate field to the main trait. This can be seen on every Study page, as well as in each Studies and Associations data table. Previously, both traits were displayed but they were not as straightforward for users to distinguish. The reported trait continues to include both components in a single description, usually written as "[main trait] in [background trait]", e.g. "Allergic rhinitis in asthma". For more on the difference between traits and reported traits see FAQ A12 above.

The Trait page, by default, only displays studies and associations where the currently-viewed trait is the main trait of interest in the GWAS. If you would like to also include studies and associations where that trait is a background trait, please check the "Include background traits data" box above the data tables. This option will also update the association plot at the bottom of the page to include background trait data.

In the full GWAS Catalog spreadsheet downloads, the background trait is only included in the most recent version (v1.0.3), under the MAPPED BACKGROUND TRAIT and MAPPED BACKGROUND TRAIT URI columns. Earlier versions of the spreadsheet include only the main trait.

Note that the GWAS Catalog API currently returns only main traits, however we hope to include an option to access studies and associations by background trait in the future.

6. Why did my search return no results even though I am sure there used to be a trait like this in the Catalog?

Our search functionality searches for exact text string matches, so if you accidentally type "beast cancer" instead of "breast cancer", you will not get any results. Equally, "metabolic disorder" won’t return any results while "metabolic disease" will return a lot. The search bar provides an autocomplete function that will suggest possible search terms as you type. Alternatively, try varying your search term or searching for your term in EFO to get an idea of what other terms might be available.

B(iv). Searching by study

1. Can I search for a study?

Individual studies within a particular publication are not currently displayed in the search results. To find a study, search for a publication, trait or variant and then go to the “Studies” table to click through to the linked studies.

If you already know the accession number of a particular study (beginning with “GCST”), you can search for this on the homepage to return the publication containing that study.

2. How can I search for targeted and exome array studies and associations?

You can enter the genotyping technology of your interest in the search bar, e.g. “targeted genotyping array”, “exome genotyping array”. This will return any publication that uses that specific genotyping technology.

C. Exploring specific pages

1. How do I download a list of associations for a given trait, publication, study or variant?

There are two ways to download association data on the specific “Trait”, “Publication”, “Study”, “Variant” or "Gene" pages. The “Download Associations” button downloads a spreadsheet (.tsv) of the full data for every association displayed on the current page. This data is formatted in the same way as the full Catalog spreadsheets available from our file downloads page and includes study information for each association.

The specific pages also contain “Studies” and “Associations” tables, which display a condensed view of the data with fewer columns. These can be downloaded in .csv format using the “export” button in the top righthand corner of each table. Columns can be added or removed from this table using the “Add/Remove Columns” button – only the selected columns will be included in the exported table.

2. How do I use the association plot on the “Trait” page?

The association plot displays all associations in the Catalog for the selected trait. Individual associations are plotted as circles and are coloured according to the same broad trait categories that are used in the GWAS Catalog Diagram (see the legend in the top left of the plot). You can mouse over or click on one of the circles for more information about a particular variant. You can also download an image of the plot. The plot is constructed using the LocusZoom plugin.

3. How do I use the linkage disequilibrium (LD) plot on the “Variant” page?

The LD plot integrates data from Ensembl with GWAS Catalog data. It shows the degree of linkage disequilibrium between the selected variant and other variants within a 50kb window. You can select the population of interest and LD measurement (r2 or D’) using the drop-down menus and set your own LD threshold. You can also download the data shown in the plot as a .tsv file.

LD information between a variant of interest and the surrounding variants can be accessed programmatically using the Ensembl REST API (http://rest.ensembl.org/documentation/info/ld_pairwise_get) where you can specify a variant ID, a window size of the region surrounding the variant, a population and a cut-off for the calculation results. In case of a dataset with more than 1 variant of interest, several independant calls to the Ensembl REST API (http://rest.ensembl.org/documentation/info/ld_pairwise_get) can be made.

4. How do I view, search and download information from the "Associations" and "Studies" tables?

These tables can be found on the each of the specific "Trait", "Variant", "Gene", "Publication" and "Study" pages. The data displayed is highly customisable. You can refine the results by typing into a) the search box above the table, to search all columns, or b) the filter boxes at the top of each column, to search only within a specific column. You can customise the columns displayed using the "Add/Remove Columns" button. You can sort by clicking on the column header. Finally, you can use the "Export data" button to download the table as a csv file. Note that the csv file will contain the data displayed in the table, taking into account any changes you have made to the rows, columns displayed or sorting.

D. Diagram

1. How do I display SNP information for a given dot?

To view all the SNPs associated with any trait in a given location, simply click on the trait (coloured circle) you are interested in. An interactive pop-up will display the SNPs for that trait, the p-value for each SNP-trait association, the study in which the association was identified, the trait assigned by the GWAS Catalog curators and the EFO term the SNP-trait association is mapped to. The SNP, disease trait, EFO term and study fields are interactive, linking to a search of the full Catalog for that particular field. SNP, EFO term and study also link out via the external link icon to Ensembl, EFO and UKPMC, respectively. Clicking outside the pop-up automatically closes the current pop-up. Alternatively, close the pop-up by clicking on the cross in its top right corner or on the "Close" button.

2. How do I filter the diagram?

The full diagram can be filtered by typing a trait into the search box to the left the diagram and hitting "Enter" or clicking the "Apply" button. Once you have typed 3 to 4 characters, the text box will offer auto-completed suggestions for your search based on EFO traits. You can navigate the suggestion list using your mouse or the up and down keys.

Once you have filtered the diagram by a selected trait, all other traits will be faded to a lower visibility to highlight the desired trait. A counter in the top left corner of the diagram will indicate how many dots on the diagram correspond to your search term. Searchable traits are based on EFO categories and may not coincide with GWAS Catalog reported traits, e.g. a search for "hair color" will highlight SNP-trait associations labelled hair color as well as "black vs blond hair" and "red vs non-red hair".

3. How do I display the legend?

A legend of the colour scheme is available to the left of the diagram. The legend includes a count of the number of dots of each colour in the diagram. You can hide the sidebar of increase the amount of screen space for the diagram by clicking on the little chevron icon at the top of the sidebar. Click on any item in the legend to filter the diagram by that category. This does not work for any of the "other"-type categories (other measurement, other disease and other trait). Please note that some traits, in particular some diseases, belong to multiple categories, eg Crohn’s disease is both a digestive system disease and an immune system disease. Each dot on the diagram can only be assigned one colour and colour assignment is determined by a term’s most specific ancestor (ancestor that has itself the most number of ancestors) in EFO so it is possible to find dots of a different colour when searching for example for "digestive system disease".

4. How do I display labels?

Chromosomes and traits (coloured circles) have labels that display when hovering the mouse pointer over a given element. The displayed labels correspond to the EFO term mapped this SNP.

5. How do I zoom and move the diagram?

The diagram was designed to have GoogleMaps-style interactivity. There are two ways to zoom in and out. The easiest option is to use the scroll wheel on the mouse or touch pad on a laptop. Scrolling up zooms in and scrolling down zooms out. This feature may not work with all touch pads. Alternatively, the top right-hand corner of the diagram features a zoom bar which can be used to generate exactly the same effect, by dragging the little square left or right along the bar with the mouse pointer or clicking the plus and minus buttons. The diagram can be moved around the viewing area by clicking on any part of the diagram with the left mouse button and, holding the mouse button down, dragging the diagram around the screen until the desired part is visible. This feature is particularly useful for centring the diagram on a specific location at higher zoom levels.

6. How should I cite the diagram?

Please see the About page for citation guidance.

7. How do I download the diagram?

Download options are listed here.

8. Can I download a filtered version of the diagram?

The diagram can be filtered by trait to present only a subset of specific associations. At present we don’t have a native function for downloading diagrams filtered by trait. We suggest taking a screenshot if a high resolution image is not required.

As a workaround a high resolution image can be created by saving the web-displayed image as an .svg (scalable vector graphics format). These instructions are for Firefox, it’s slightly different in other browsers. Right click on the filtered diagram, click 'inspect element', in the inspector window hover over the svg element (this starts <svg), right click and 'copy - outer html'. Paste this text into a text editor and save. Change the file extension from .txt to .svg. You will then be able to open the image as an .svg in an image processing program (e.g. Inkscape or Illustrator). From there you can convert to your preferred format.

E. Genomic mappings

1. How is the genomic annotation for each SNP provided?

We use an Ensembl mapping pipeline that provides the genomic annotation (chromosome location, cytogenetic region and mapped genes), alongside the curated content in the GWAS Catalog. The mapping information is updated at every Ensembl release, every 2-3 months.

The annotation available on our online search interface includes any Ensembl genes in which a SNP maps, or the closest upstream and downstream gene within 50kb. More detailed mapping information is available through our REST API including all Ensembl and RefSeq genes mapping within 50kb upstream and downstream of each GWAS Catalog variant.

2. Which genome build is the Catalog on?

Data in the GWAS Catalog is currently mapped to genome assembly GRCh38.p14 and dbSNP Build 156.

3. How can I access the GWAS Catalog data on alternative genome builds?

You can use the Ensembl API to map the SNP rsIDs in the GWAS Catalog to previous genome builds. For GRCh37 this is available at http://grch37.rest.ensembl.org/. The variation call http://grch37.rest.ensembl.org/documentation/info/variation_id can be used to retrieve the dbSNP mapping of all SNPs on GRCh37. Alternatively, you can also use https://www.ncbi.nlm.nih.gov/genome/tools/remap.

4. Why do some SNPs not have any corresponding mapping information?

SNPs are extracted from the literature exactly as reported by the authors of a publication. If there is a typographical error in a publication or the authors report non-standard SNP identifiers, the subsequent mapping pipeline may not be able to provide any mapping information for this SNP. Alternatively, if an older SNP is no longer found on the latest genome build used in the GWAS Catalog, the SNP identifier extracted from the paper will still be reported in the GWAS Catalog but no mapping information for this SNP will be provided.

5. How are SNP-SNP interactions and multi-SNP haplotypes displayed and annotated?

For SNP-SNP interactions, all elements that are specific to a given SNP (rsID, risk allele, mapped gene, chromosome location etc) are separated by an "x" (eg "rs1336472-A x rs4715555-G", "1p31.3 x 6p12.1", "3_prime_UTR_variant x upstream_gene_variant"). For multi-SNP haplotypes, elements are separated by a ";" (eg "rs17310467-?; rs6088735-?; rs6060278-?; rs867186-?", "MYH7B; EDEM2 - PROCR; EDEM2 - PROCR; PROCR", "upstream_gene_variant; intergenic_variant; intergenic_variant; missense_variant"). In both cases, the position of each element is the same across all variables, so the first rsID corresponds to the first mapped gene or mapped gene range (for intergenic SNPs), the first bp location etc.

While we do provide the mapped gene and position information in this format in both the results page and the download, we excluded some of the additional gene-related information such as upstream/downstream gene IDs and distances from SNPs to genes from the download spreadsheet. This decision was made as it is almost impossible to present this kind of multi-dimensional data cleanly in the current spreadsheet format. In particular in large multi-SNP haplotypes, it is possible for some of the SNPs to be located within a gene while others are intergenic. Splitting gene IDs and distances by in-gene, upstream and downstream position would make the individual values much harder to pair up.

6. What does the most severe consequence/CONTEXT field represent?

The 'CONTEXT/Most severe consequence' column provides information on a variant’s predicted most severe functional effect from Ensembl. The effect of the allele of each variant on different transcripts may differ, but only the most severe consequence is reported here. Definition of terms and the order of severity is provided in Ensembl’s documentation: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html#consequences

7. Why does the risk allele of a variant not match either of the alleles in the reference genome assembly?

Variants and their risk alleles are extracted exactly as reported in the paper. In a small number of cases, the curated risk allele does not match either of the alleles reported in the reference genome assembly, which are displayed in the Info panel on the variant page. This may be due to strand flipping between genome builds, where the association was originally reported on an older genome build. We are unable to account for this in the curated data as genome build is not consistently reported by authors. For more detail on this issue, please refer to Sheng et al 2022.

F. Population descriptors

1. How are the population descriptors in the GWAS Catalog provided?

The GWAS Catalog team has developed and published a framework to represent sample metadata including population descriptors, in a standarised manner. Our framework involves representing the samples in two forms: (1) a detailed sample description and (2) an ancestry category label from a controlled list. Detailed descriptions aim to capture informative and comprehensive information regarding the population of each distinct sample based on author-provided information. Label assignment reduces complexity within data sets and enables the placement of samples in context with other samples, groups, and populations. For more information please view our Documentation page.

2. How do I search for population descriptors in the GWAS Catalog?

Sample metadata can be searched for particular population descriptors in the Studies table on any of the Trait, Publication, Gene or Variant pages, by entering relevant text in the Discovery/Replication Sample Number column (displayed by default) or Discovery/Replication Sample Description columns (enabled via the "Column visibility" button). For more information please view our Documentation page.

3. Can I find all associations with a particular ancestry label?

The GWAS Catalog website does not currently have a way to view all associations for a particular ancestry label. We recommend using our REST API. All sample metadata, including Country of Recruitment and Additional information, is also available as a download file from our download page. For an overview of the kind of data found in this file, refer to the file header descriptions.

4. What does the "Pre-2011 ancestry not double-curated" flag next to some of the sample metadata mean?

As of September 2016, we release publicly all population descriptors extracted from the GWAS Catalog. Sample metadata from studies published before 2011 has not been reviewed by a second curator and so may not always conform to the strict standardised way we now present population descriptors.

5. Why is the detailed sample metadata provided in a separate downloadable spreadsheet ("Ancestry data") to the rest of the study-level information?

Most GWAS Catalog studies include at least two sets of sample metadata, one for the initial stage and one for the replication stage, and some studies may have several entries for each stage. As there is no way of usefully representing this multi-dimensional data in a single row in a spreadsheet, this data is instead provided in a separate spreadsheet, with each ancestry label in its own row.

6. What is included in the COHORT field?

The COHORT field describes the discovery stage (genome-wide) cohorts used in each study. Cohort abbreviations from discovery stage GWAS are extracted from literature to match a predefined list shared with the PGS Catalog, and are made available in the COHORT field of the studies download file. The initial list of common cohorts used in genetics studies that seeded these annotations is from Mills & Rahal. Communications Biology (2019). A full list of abbreviations and corresponding full cohort names is available to download separately. Since the list is shared with the PGS Catalog, this may include cohorts that are not currently associated with a GWAS Catalog study. Where a sample cohort in the literature was not already in the predefined list, or was not clearly and unambiguously described by authors, “other” will appear in our studies download file. Where a sample in literature had no cohort reported, “NR” will appear in our studies download file. Empty cohort fields will appear for studies which were curated before the extraction of this information began (~2020).

Note: cohorts appearing in the unpublished download files are yet to undergo in-house curation and therefore may not exactly match against the predefined list.

G. Programmatic access

1. How do I use the GWAS Catalog REST API?

The GWAS Catalog REST API is now available for programmatic access to the Catalog. See the full technical documentation here, as well as usage examples.

H. Summary statistics

1. What are summary statistics?

There are thousands of genome-wide association studies and each study yields association data for hundreds of thousands of variants across the human genome. Manual curation of each GWAS publication by a dedicated team of scientists ensures that the Catalog contains the most significant findings (p-value <10-5). Studies are often accompanied with summary statistics providing the association data for all the variants analysed across the genome in a given study.

2. How do I find out which publications have full summary statistics available?

Published studies with full summary statistics are indicated by an icon in the “Association count” column of the studies table in the search interface. You can view a full list of studies with summary statistics files (published and pre-published/unpublished) here together with links to other summary statistics resources.

Summary statistics files are available as soon as possible, even before the publication is included in our data release (approximately every two weeks). Therefore, if a manuscript states that summary statistics are available from the Catalog and you cannot find them in the list of studies with summary statistics files or on our FTP site please contact us at gwas-info@ebi.ac.uk and we can give you direct access to the files.

3. How do I access summary statistics?

There are two methods. We have developed a dedicated summary statistics database, enabling users with searchable, filterable, harmonised data via the summary statistics REST API. Alternatively, non-programmatic access to the original, standardised and harmonised data is available on the FTP site (which can be accessed via links in the search interface or the list of studies with summary statistics files).

4. How should I cite summary statistics downloaded from the GWAS Catalog?

Users of summary statistics are requested to cite the data as follows: accession ID of the GWAS Catalog study e.g. “GCST007240”, the GWAS Catalog, and the date the summary statistics were downloaded. If the summary statistics originated from a published GWAS, please also cite the original publication. For example, “Summary statistics were downloaded from the NHGRI-EBI GWAS Catalog (Sollis et al., 2022) downloaded on 01/11/2020 for study GCST007240 (Riveros-McKay et al., 2019)”.

5. What are standardised/harmonised summary statistics?

Please refer to the documentation here.

6. What are the risks of subject identification associated with sharing of summary statistics?

Currently the feeling in the community is that the unrestricted sharing of summary statistics holds a great deal of potential benefits, with low risk to participants’ privacy. A study by Homer et al., (PMID:18769715) in 2008 indicated that it was possible to determine if a specific individual participated in a study based on summary-level statistics (including allele frequencies of the study participants) and the genotype information of the individual. However, this would require that an individual had made public their genotype information, and also participated in a study for which summary-level allele frequencies were available. Since this publication there has been widespread discussion in the scientific community, along with several publications (including Craig et al., 2011, PMID:21921928) on the benefits and risks surrounding sharing of summary statistics. After considering the risks and benefits the NIH has published guidance supporting open sharing of summary statistics information, including allele frequencies (https://osp.od.nih.gov/2018/11/01/provide-access-gsr/, https://grants.nih.gov/grants/guide/notice-files/NOT-OD-19-023.html).

7. Why do some datasets have the CC0 license mark?

Since March 2021, we have asked all submitters to agree to share their data under the terms of CC0. This dedicates the data to the public domain, allowing downstream users to consume the data without restriction. Data submitted prior to March 2021 is made available under the EBI standard terms of use. Whilst these terms do not themselves impose any restrictions on downstream use, the application of CC0 license removes any ambiguity. A small number of datasets are made available under different license terms. We advise consumers of data hosted by the GWAS Catalog to note the license terms of individual datasets, if applicable to their specific use case, which are accessible via the summary statistics page and the individual study pages. Please ensure that the original data are cited whenever they are used in a publication. If you have any questions or concerns about licensing, please contact us via gwas-info@ebi.ac.uk.

I. Submitting summary statistics

1. How can I submit summary statistics to the GWAS Catalog?

We currently extract summary statistics files from publications where they are made freely available either as Supplementary files or via a web link. We also accept submissions of summary statistics for both published and pre-published/unpublished GWAS. We encourage authors to submit their data directly through our submission page. Detailed instructions can be found in our documentation.

Note that for summary statistics to be made available through the GWAS Catalog, your study must fulfil our eligibility criteria.

2. How should summary statistics be formatted for submission?

Please refer to the documentation for the standard format and to access our summary statistics validation tool.


Got a question that isn’t answered here?

Email us at gwas-info@ebi.ac.uk.


Last updated: 16 March 2021