0%

InterPro entry types

InterPro entries are classified into one of five categories, depending on the biological entity they represent: homologous superfamily, protein family, domain, repeat or site.

The entry type is indicated by a specific icon (Figure 4), which can be found on the top left hand side of an InterPro entry page

Figure 4 Icons denoting the different type of entries (homologous superfamily, family, domain, repeat or site) that can be found in the InterPro database.

Family and homologous superfamily

An InterPro protein family is a group of proteins that share a common evolutionary origin, reflected by their related functions and similarities in sequence or structure. Protein families are often arranged into hierarchies, with proteins that share a common ancestor subdivided into smaller, more closely related groups. For example, steroid hormone receptors constitute a family of nuclear receptors responsible for signal transduction mediated by steroid hormones, and can be subclassified into different groups, including the liver X receptor subfamily (Figure 5). This subfamily consists of nuclear receptors that regulate the metabolism of several important lipids, including oxysterols.

Figure 5 Example of a protein family hierarchy. The steroid hormone receptor family can be subdivided into a number of smaller, closely related subfamilies.

A Homologous Superfamily is a large diverse family, usually with a shared tertiary structure. They are exclusively composed of CATH-Gene3D and/or SUPERFAMILY methods, both of which utilise a collection of underlying profile hidden Markov models (HMMs) to represent diverse structural families, rather than one single model. Unlike Family entries, Homologous Superfamily entries do not exist in hierarchies.

Domain

Domains are distinct functional and/or structural units in a protein. Usually they are responsible for a particular function or interaction, contributing to the overall role of a protein. Domains may exist in a variety of biological contexts, where similar domains can be found in proteins with different functions. 

For example, the pleckstrin homology (PH) domain is a small modular domain that occurs in a large variety of proteins and is involved in phospholipid binding. One group of proteins containing a PH domain are the beta-adrenergic receptor kinases (Figure 6). Four domains have been identified in these proteins: an RSG (regulator of G protein signalling) domain, a protein kinase (PK) domain, an AGCK domain, involved in regulation by phosphorylation, and a C-terminal PH domain.

Figure 6 Graphical representation of the domain architecture of beta-adrenergic receptor kinases.

Sites and repeats

Sites

Sites are groups of amino acids that confer certain characteristics upon a protein, and may be important for its overall function. Sites are usually quite small (often only a few amino acids long). The types of site covered by InterPro are:

  • active sites, which contain amino acids involved in catalytic activity
  • binding sites, containing amino acids that are directly involved in binding molecules or ions
  • post-translational modification (PTM) sites, which contain residues known to be chemically modified (phosphorylated, palmitoylated, acetylated, etc) after the process of protein translation
  • conserved sites, which are found in specific types of proteins, but whose function is unknown. 

For example, the stretch of residues involved in the catalytic function of the S1B subfamily of serine peptidases is specific to this type of peptidases and constitutes the active site of these proteins.

Repeats

Repeats are typically short amino acid sequences that are repeated within a protein, and may confer binding or structural properties upon it. 

For example, pentapeptide repeats are sequence motifs of five amino acids found in multiple tandem copies. They were first identified in cyanobacterial proteins, where they can be found in many copies. Their function is currently unknown.