Modelling in Pfam
Creating a Pfam model is an iterative process. The starting point is the selection and alignment of curated example sequences (i.e. seed alignment), which is used to calculate a profile hidden Markov model (HMM), see panels 1 and 2 in Figure 3. This profile HMM is then used to search against a reference proteomes database to find additional matching members that pass the inclusion thresholds. Thresholds are adjusted to avoid inclusion of false positives.
The information in this new set of sequences (full alignment) is used to improve the probabilities in the model which may then lead to a slightly different alignment, see panel 3 in Figure 3. This adjusted full alignment is refined (through the determination of boundaries and minimisation of redundancy) to produce a new seed, and a new profile HMM is generated. This iterative process is repeated until no more homologs are detected in the sequence database search (i.e. the search has converged).
![](https://www.ebi.ac.uk/training/online/courses/pfam-creating-protein-families/wp-content/uploads/sites/84/2023/03/Screen-Shot-2018-03-26-at-11.55.26.png)