0%

Modelling in Pfam

Creating a Pfam model is an iterative process. The starting point is the selection and alignment of curated example sequences (i.e. seed alignment), which is used to calculate a profile hidden Markov model (HMM), see panels 1 and 2 in Figure 3. This profile HMM is then used to search against a reference proteomes database to find additional matching members that pass the inclusion thresholds. Thresholds are adjusted to avoid inclusion of false positives.

The information in this new set of sequences (full alignment) is used to improve the probabilities in the model which may then lead to a slightly different alignment, see panel 3 in Figure 3. This adjusted full alignment is refined (through the determination of boundaries and minimisation of redundancy) to produce a new seed, and a new profile HMM is generated. This iterative process is repeated until no more homologs are detected in the sequence database search (i.e. the search has converged).

Figure 3 Creating a Pfam entry using HMM is an iterative process involving seed alignment, construction of profile HMM models and full alignment of sequences against a reference database.