Edit

Bateman Group

Analysis of protein sequence and structure

The group studies the sequences and structures of proteins to understand their evolution and function. Our major focus in on the identification and analysis of adhesive proteins of the bacterial cell surface.

Edit

The proteins of the bacterial cell surface provide a challenge for sequence analysis. These proteins are fundamentally important for interactions between bacteria and the environment, including host–pathogen interactions. But, due to the strong selective pressure on these proteins and short bacterial generation times, bacterial surface proteins evolve very rapidly. This means they are often specific to narrow ranges of bacterial species and they are over- looked by protein family databases due to their limited number of homologues. However, we can harness the billions of protein sequences becoming available from genomic and meta- genomic sequencing that allow us to identify distant similarities and to better understand the structure, function and evolution of these proteins. We have focussed on a recently defined class of surface proteins called fibrillar adhesins, which consist of a terminal adhesive domain and a stalk of repeated domains that is anchored to the cell surface. These long proteins have important roles in host colonisation and biofilm formation, and are potential targets for vaccine design. We aim to comprehensively identify these proteins and work with experimental collaborators to characterise them structurally and functionally.

The group’s second strand of research aims to identify spuriously translated sequences in protein sequence databases. These contaminating sequences may represent up to 5% of some sequence sets and can lead to wasted computational and experimental effort. We aim to develop tools that enable the routine cleaning of sequence sets of spurious proteins.

FUTURE PROJECTS AND GOALS

There are many exciting developments in the world of protein sequence and structure that are leading the group’s research in new directions. Deep Learning has had a major impact in many areas, including structure prediction. We are using the power of AlphaFold to under- stand the function and evolution of bacterial cell surface proteins in unprecedented detail. Deep Learning methods can also create an embedding of sequence space that creates entirely novel ways to investigate protein sequence, structure and evolution. For example, spurious proteins may cluster within the embedding enabling detection. These new techniques may have a disruptive influence on the whole field of sequence analysis in the coming years and we are well placed to harness such developments.


The prediction of novel fibrillar adhesins using machine learning techniques confirmed with AlphaFold. Each fibrillar adhesin is composed of a series of stalk domains that project the adhesive domain far beyond the cell wall to enhance interactions with host cells.
Edit