FAQ
- Which file should I use to get the fullest set of GO annotations for a species?
- Why can't I see an annotation in a UniProtKB record when it appears in the gene association file?
- Why are the species-specific files and the UniProt multi-species gene association file different?
- How do I change between UniProt accessions and other identifiers, e.g. Ensembl, EMBL, RefSeq Gene ID?
- What GO tools can I use to display/compare annotations to my selected proteins/genes?
- Is it correct to assume that gene products annotated to a child of a GO term will automatically be considered part of the parent term?
- Who do I contact if there is an annotation error in one of your files?
- How do I download a bulk set of GO annotations? What formats are there?
- GO slims - What are they? How can I make one?
- What do the evidence codes mean?
- How do I cite GOA?
1. Which file should I use to get the fullest set of GO annotations for a species?
In the GO Consortium, there are a number of model organism databases that are the
authoritative source of GO annotations for their respective species. These groups also integrate
annotations from other sources including GOA (a multi-species resource) on a regular basis.
GOA also provides a number of species-specific files for human, mouse, rat, zebrafish,
Arabidopsis, chicken, dog, cow, pig, Dictyostelium, worm, yeast and Drosophila. The annotations
in these files are based on entries in the
UniProtKB gene-centric reference proteome (GCRP), protein complexes (Complex Portal), and non-coding RNAs (RNACentral). GOA integrates manual annotations from all other GO
Consortium groups, as well as a number of external annotating groups where the annotated gene
product identifier can be mapped to one of the three identifiers we support (UniProtKB, Complex
Portal, and RNACentral IDs).
Both model organism group and GOA species-specific files are available on our FTP site.
2. Why can't I see an annotation in a UniProtKB record when it appears in the gene association file?
There could be a number of reasons for this:
A. If it appears that a manual annotation is missing:
If the GO annotation has been recently created, then UniProtKB may not yet have cross-referenced the annotation; there can be a time lag of up to 3 months.
B. If it appears that an electronic annotation is missing:
If you are looking at a curated UniProtKB entry (i.e. one in the Swiss-Prot section of UniProtKB), then not all electronic annotations are displayed here. Only annotations from certain methods, such as the HAMAP2GO and EC2GO mappings, are included.
In addition, sets of GO annotations displayed in UniProtKB are filtered to try to
provide a comprehensive yet concise set of cross-references. To get from the UniProtKB record to
the QuickGO browser (which will show the most up-to-date and full set of manual and electronic
annotations for a protein) click on the '[View the complete GO annotation on QuickGO]' link at
the bottom of the GO cross-references section of the UniProtKB entry.
However if none of these reasons appear to apply to your missing annotation please let us know
and we will investigate!
3. Why are the species-specific files and the UniProt multi-species gene association file different?
The GOA UniProt gene association file contains all manual and electronic annotations
that GOA has assigned to UniProtKB entries. This dataset contains annotations to more than
800,000 different species (https://www.ebi.ac.uk/GOA/uniprot_release) and is redundant for
electronic annotations where two different electronic methods have assigned the same or a less
granular GO term.
The species-specific files are created using the reference complete proteome sets to determine
the protein composition of the files. The species-specific files can contain annotations to both
reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProtKB accessions. Any user wishing to only
identify the reviewed (Swiss-Prot) UniProt protein annotation subset will be able continue to do
so using the information supplied in the gp_information.goa_uniprot file, which can be found
here; ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/gp_
.
We aim to remove electronic annotations from the species-specific files that have been created by
the same technique and that have predicted the same or less granular GO terms.
4. How do I change between UniProt accessions and other identifiers, e.g. Ensembl, EMBL, RefSeq Gene ID?
UniProt provides a mapping service that can convert UniProt accessions to accessions
from multiple databases including the EMBL/Genbank/DDBJ nucleotide sequence databases, Ensembl,
GeneID and RefSeq. and vice versa. This service can be found here. If you require any more information or help
using this service, you can mail help@uniprot.org
.
5. What GO tools can I use to display/compare annotations to my selected proteins/genes?
Besides QuickGO, the GO consortium has available on their site other tools that are
useful for GO analysis. They can be found here: http://amigo.geneontology.org/amigo/software_list
6. Is it correct to assume that gene products annotated to a child of a GO term will automatically be considered part of the parent term?
Yes, it is safe to assume this, since every GO term must follow the true path rule: if the child term describes the gene product, then all its parent terms must also apply. So if a gene product is annotated to ‘protein tyrosine kinase activity’, the parent terms such as ‘protein kinase activity’ and ‘peptidyl-tyrosine phosphorylation’ also apply to the gene product.
7. Who do I contact if there is an annotation error in one of your files?
If you find an annotation error, please e-mail GOA (goa@ebi.ac.uk) with as much
detail as possible regarding the annotation in question. We will then either be able to correct
it or pass it on to the database responsible for the annotation so that they may correct
it.
8. How do I download a bulk set of GO annotations? What formats are there?
All GOA GO annotations to UniProtKB accessions are available from:
ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/UNIPROT/
There is a ReadMe.txt file that explains the different annotation files available. We generate
two file formats that are used across the GO consortium. The Gene Annotation File
(GAF) is a 17 column tab-delimited file. The file format conforms to the specifications
demanded by the GO Consortium and therefore GO IDs and not GO term names are shown. In addition
to the GAF format, we also offer a Gene Product
Annotation Data file (GPAD), which is a 12 column tab-delimited file and is more normalized
than GAF. If you are after a more customised version of an annotation file, our QuickGO tool can allow you to filter for the
annotations you are interested in and export them as a TSV file. The TSV format will also have
both the GO term and GO term ID, allowing you to quickly see what GO term a gene product has been
annotated to.
9. GO slims - What are they? How can I make one?
GO slims are cut-down versions of the GO ontologies containing terms that
cover the main aspects of each of the three GO ontologies. They give a broad overview of the
ontology content without the detail of the specific fine-grained terms.
As each community has different needs, a variety of GO slim files have been archived on the GO
home page by Consortium members. Further documentation and links to these slims can be found
at: http://www.geneontology.org/GO.slims.shtml
The QuickGO tool from GOA can be used to access or modify the GO Consortium's slims or to create one of your own. You can access this functionality from the Explore Biology page on QuickGO
10. What do the evidence codes mean?
Every annotation submitted to GO must be attributed to a source - such as a
literature reference, another database or a computational analysis. In addition, these
annotations must indicate what kind of evidence is found in the cited source to support the
association between the gene product and the GO term. If you would like to find more detailed
information on the meaning and usage of evidence codes, documentation can be found at the GO web
site at: http://www.geneontology.org/page/guide-go-evidence-codes
11. How do I cite GOA?
If you use any data obtained from GOA or QuickGO in a publication, please cite the
following paper:
Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O’Donovan C
The GOA database: Gene Ontology annotation updates for 2015.
Nucleic Acids Res. 2015 Jan; 43:D1057-63