Mining PDBe and PDBe-KB using a graph database

Course at EMBL-EBI

Mining PDBe and PDBe-KB using a graph database

This workshop covers the use of the PDBe graph database to extract data for solving complex structural biology queries. It will introduce the PDBe graph database and how to write Cypher queries to retrieve data of interest. Workshop participants will be able to use the graph database to explore data relevant to their own research with support and guidance from the development team at PDBe.

The graph database integrates annotations provided by PDBe-KB partners and is implemented in Neo4J. In this graph each PDB entry is represented as a tree, with the root being the PDB entry, connected to chains and entities, which are then connected to residues. Each of the PDB residues (>150 million) are linked to available annotations (e.g. is the residue part of a catalytic site?, or is it on a macromolecular interaction interface?) and are also directly connected to their corresponding UniProt residues. Storing PDBe-KB data as a graph offers great benefits in particular by allowing straightforward transfer of annotations between PDB entries which map to the same UniProt accession, as well as to highly identical UniProt accession.

Read the database schema here.

Who is this course for?

This workshop is aimed at bioinformaticians with experience of analysing data from the PDB, either by processing archive files or via API access. We encourage applications from individuals with specific questions relating to PDB data that are difficult to solve using existing data queries. Programming experience is required, with a preference for those familiar with Python, although this is not an absolute requirement.

An example use case might involve research into a specific drug molecule, where protein structure is relevant to drug specificity. The graph database would allow the analysis of all common interaction sites in PDB at the residue level, with the potential to expand this search across ligands containing similar fragments. Additional searches could analyse the protein-protein interaction sites between different isoforms of the same protein, and cross-reference them to sequence conservation data and predicted functional annotations.

Researchers should submit a 200-word abstract when they apply that describes their work and potential queries related to PDB data. This should include details on how PDB data has been accessed previously and the types of questions trying to be answered.

What will I learn?

Learning outcomes

At the end of this workshop participants will be able to:

Access the PDBe graph database using Neo4J
Query the database using Cypher queries
Find complex data connections
Answer complex questions about protein structures

Course content

This course will cover:

Trainers

Tom Hancocks
EMBL-EBI, UK

David Armstrong
EMBL-EBI, UK

Mihaly Varadi
EMBL-EBI, UK

Sreenath Nair
EMBL-EBI, UK

Lukas Pravda
EMBL-EBI, UK

Programme

Day 1 – Tuesday 18 February 2020
11:30	Shuttle from Cambridge Station (Stop 5)
12:00-13:00	Arrival, registration and lunch
13:00-14:00	Welcome, introductions and networking	Tom Hancocks
14:00-15:30	Overview of PDBe, PDBeKB and the graph data	Mihaly Varadi
15:30-16:00	Break and group photo
16:00-17:00	Introduction to participant case studies	All
17:00-18:30	Initial exploration of PDBe data on case studies	Mihaly Varadi
18:30	End of day
18:45	Check-in at Conference Centre
19:30	Evening meal	Hinxton Hall

Day 2 – Wednesday 19 February 2020
08:45	Arrival and registration
09:00-10:30	Utilising the graph database	Sreenath Nair
10:30-11:00	Break
11:00-12:30	Utilising the graph database	Sreenath Nair
12:30-13:30	Lunch
13:30-15:30	Project work	All
15:30-16:00	Break
16:00-18:30	Project work	All
18:30	End of day
19:00	Evening meal	Hinxton Hall

Day 3 – Thursday 20 February 2020
08:30	Check-out of Conference Centre
08:45	Arrival and registration
09:00-10:30	Project work	All
10:30-11:00	Break
11:00-12:30	Project work	All
12:30-13:30	Lunch
13:30-14:30	Project discussion	All
14:30-14:45	Wrap-up and feedback	Tom Hancocks
14:45	End of workshop	All
15:00	Shuttle to Cambridge Station (Stop 5)

This course has ended

18 – 20 February 2020

European Bioinformatics Institute

United Kingdom

£50

Contact
Meredith Willmott

Organisers

Tom Hancocks
EMBL-EBI, UK
David Armstrong
EMBL-EBI, UK
Sameer Velankar
EMBL-EBI, UK

In association with:

Share this event with: