Course at EMBL-EBI

Mining PDBe and PDBe-KB using a graph database

This workshop covers the use of the PDBe graph database to extract data for solving complex structural biology queries. It will introduce the PDBe graph database and how to write Cypher queries to retrieve data of interest. Workshop participants will be able to use the graph database to explore data relevant to their own research with support and guidance from the development team at PDBe.

The graph database integrates annotations provided by PDBe-KB partners and is implemented in Neo4J. In this graph each PDB entry is represented as a tree, with the root being the PDB entry, connected to chains and entities, which are then connected to residues. Each of the PDB residues (>150 million) are linked to available annotations (e.g. is the residue part of a catalytic site?, or is it on a macromolecular interaction interface?) and are also directly connected to their corresponding UniProt residues. Storing PDBe-KB data as a graph offers great benefits in particular by allowing straightforward transfer of annotations between PDB entries which map to the same UniProt accession, as well as to highly identical UniProt accession.

Read the database schema here.

Who is this course for?

This workshop is aimed at bioinformaticians with experience of analysing data from the PDB, either by processing archive files or via API access. We encourage applications from individuals with specific questions relating to PDB data that are difficult to solve using existing data queries. Programming experience is required, with a preference for those familiar with Python, although  this is not an absolute requirement.

An example use case might involve research into a specific drug molecule, where protein structure is relevant to drug specificity. The graph database would allow the analysis of all common interaction sites in PDB at the residue level, with the potential to expand this search across ligands containing similar fragments. Additional searches could analyse the protein-protein interaction sites between different isoforms of the same protein, and cross-reference them to sequence conservation data and predicted functional annotations.

Researchers should submit a 200-word abstract when they apply that describes their work and potential queries related to PDB data. This should include details on how PDB data has been accessed previously and the types of questions trying to be answered.

What will I learn?

Learning outcomes

At the end of this workshop participants will be able to:

  • Access the PDBe graph database using Neo4J
  • Query the database using Cypher queries
  • Find complex data connections
  • Answer complex questions about protein structures

Course content

This course will cover:

Trainers

Tom Hancocks
EMBL-EBI, UK
David Armstrong
EMBL-EBI, UK
Mihaly Varadi
EMBL-EBI, UK
Sreenath Nair
EMBL-EBI, UK
Lukas Pravda
EMBL-EBI, UK
This course has ended

18 - 20 February 2020
European Bioinformatics Institute
United Kingdom
£50
Contact
Meredith Willmott

Organisers
  • Tom Hancocks
    EMBL-EBI, UK
  • David Armstrong
    EMBL-EBI, UK
  • Sameer Velankar
    EMBL-EBI, UK

In association with:


Share this event with: