0%

Functional information

UniProKB attaches as much functional information as possible to each protein sequence to provide users with an overview of the available information for a given protein. This information is added manually by the UniProt biocurators who are all trained biologists or added automatically through various annotation systems which have been developed within the group (Figure 3).

Figure 3 Relationship between manual curation and automatic annotation in the UniProt Knowledgebase (UniProtKB).

Manual curation

Manual curation consists of a critical review of experimental and predicted data for each protein and also of each protein sequence itself.

Curation methods applied include:

  • evaluation of each protein sequence including splice sites and sites of post-translational cleavage.
  • manual extraction and structuring of information from the literature
  • manual verification of results from computational analyses
  • mining and integration of large-scale data sets
  • continuous updating as new information becomes available

You can find more information about the manual curation process on the UniProt website.

Automatic annotation

UniProt has developed two prediction systems to automatically annotate UniProtKB/TrEMBL in a scalable manner with a high degree of accuracy:

  1. UniRule is a collection of manually curated annotation rules which define annotations that can be propagated From reviewed to unreviewed entries based on specific conditions
  2. The Association-Rule-Based Annotator (ARBA) is a multiclass learning system trained on expertly annotated entries in UniProtKB/Swiss-Prot. ARBA uses rule mining techniques to generate concise annotation models based on the properties of InterPro group membership and taxonomy. 

You can find out more information about the automatic annotation pipeline on the UniProt website.