0%

AlphaFold2: A high-level overview

Users wishing to predict the 3D structure of a protein only need to supply its amino acid sequence. To analyse this, AlphaFold2 uses a multiple sequence alignment (MSA) combining the sequences of multiple related proteins. The software generates a set of pair representations modelling the relationships between every pair of amino acid residues. The software uses the MSA to predict all the pair representations and thus the 3D structure of the protein.

Figure 13. High-level overview of how AlphaFold2 predicts a protein’s structure from its amino acid sequence.

The role of multiple sequence alignment (MSA)

From the user’s perspective, the only input AlphaFold2 needs is protein sequence(s). However, AlphaFold2 works by building a multiple sequence alignment (MSA), in which multiple similar protein sequences are set alongside each other. The MSA is generated by querying several protein sequence databases with the input sequence.

The primary input for AlphaFold2’s neural network is then the MSA. AlphaFold2 uses MSAs to compare and analyse the sequences of similar proteins from different organisms. It highlights similarities and differences, which helps understand the evolutionary relationships between the proteins.

If two amino acids in a protein are in close contact, mutations in one of them will probably be followed by mutations of the other. This preserves the structure of the protein, and is known as co-evolution or covariation. The opposite is also true: if two regions of a protein are changing and evolving independently from each other, it is likely that they are not in direct contact (Benner & Gerloff, 1991; Göbel et al., 1994; Korber et al., 1993; Taylor & Hatrick, 1994).

A high-quality MSA is essential for AlphaFold2 to produce an accurate prediction of protein structure. A diverse and deep MSA, with hundreds or thousands of sequences in the alignment, will help AlphaFold2 to identify co-evolutionary signals and use them to figure out the protein’s 3D structure. Conversely, a shallow MSA, with only tens of sequences and low variability among them, is the most common reason for failing, non-confident and inaccurate AlphaFold2 predictions.

The role of pair representations

When AlphaFold predicts the 3D structure of a protein, it creates a set of “pair representations”. Every pair of amino acid residues in the protein, no matter how distant, is represented separately. This enables the software to encode the co-evolutionary relationships between them based on the MSA. This information can ultimately be interpreted as the relative positions of amino acid residues and distances between them.

AlphaFold2 uses a neural network called Evoformer. This interprets and updates both the MSA and the pair representations. The important aspect of this network is the continuous flow of information between the MSA and the pair representations. This enables reasoning about spatial and evolutionary relationships, which refines the structural hypothesis.   

If available, AlphaFold2 can use supplied protein structures (e.g. structures derived from experiment) as templates. However, AlphaFold2 tends to ignore such templates if there is enough information coming from the MSA.

How do we get a structure?

AlphaFold2’s structure module takes both the updated pair representation and the original sequence (which is the first row of the updated MSA) from the Evoformer. The structure module first turns this into a backbone of the 3D structure. It then finishes the modelling by placing the amino acid side chains and refining their positions.

AlphaFold2 then performs an iterative process called “recycling”. It feeds the MSA, the pair representations and the 3D structure back to the neural network, and generates a new 3D structure. This process is repeated three times, allowing AlphaFold2 to improve the accuracy of the final structure. 

This video presents the intermediate structure trajectory of the CASP14 target T1044, a large (2180 residues) and multi-domain RNA polymerase, predicted by AlphaFold2. Observe the differential folding rates of individual domains, with some folding quickly and others requiring more time. Watch the AlphaFold’s prediction process, as it recycles its predictions to refine the final structure (Jumper et al., 2021).

For additional technical details, please refer to the Supplementary Information in the original AlphaFold2 paper. This contains a detailed description of the neural network architecture and training set (Jumper et al., 2021).