Course at EMBL-EBI
Mining PDBe and PDBe-KB using a graph database
This workshop covers the use of the PDBe graph database to extract data for solving complex structural biology queries. It will introduce the PDBe graph database and how to write Cypher queries to retrieve data of interest. Workshop participants will be able to use the graph database to explore data relevant to their own research with support and guidance from the development team at PDBe.
The graph database integrates annotations provided by PDBe-KB partners and is implemented in Neo4J. In this graph each PDB entry is represented as a tree, with the root being the PDB entry, connected to chains and entities, which are then connected to residues. Each of the PDB residues (>150 million) are linked to available annotations (e.g. is the residue part of a catalytic site?, or is it on a macromolecular interaction interface?) and are also directly connected to their corresponding UniProt residues. Storing PDBe-KB data as a graph offers great benefits in particular by allowing straightforward transfer of annotations between PDB entries which map to the same UniProt accession, as well as to highly identical UniProt accession.
Read the database schema here.
Who is this course for?
This workshop is aimed at bioinformaticians with experience of analysing data from the PDB, either by processing archive files or via API access. We encourage applications from individuals with specific questions relating to PDB data that are difficult to solve using existing data queries. Programming experience is required, with a preference for those familiar with Python, although this is not an absolute requirement.
An example use case might involve research into a specific drug molecule, where protein structure is relevant to drug specificity. The graph database would allow the analysis of all common interaction sites in PDB at the residue level, with the potential to expand this search across ligands containing similar fragments. Additional searches could analyse the protein-protein interaction sites between different isoforms of the same protein, and cross-reference them to sequence conservation data and predicted functional annotations.
Researchers should submit a 200-word abstract when they apply that describes their work and potential queries related to PDB data. This should include details on how PDB data has been accessed previously and the types of questions trying to be answered.
What will I learn?
Learning outcomes
At the end of this workshop participants will be able to:
- Access the PDBe graph database using Neo4J
- Query the database using Cypher queries
- Find complex data connections
- Answer complex questions about protein structures
Course content
This course will cover:
Trainers
Tom Hancocks
EMBL-EBI, UK David Armstrong
EMBL-EBI, UK Mihaly Varadi
EMBL-EBI, UK Sreenath Nair
EMBL-EBI, UK Lukas Pravda
EMBL-EBI, UK
Programme
Day 1 – Tuesday 18 February 2020
11:30
Shuttle from Cambridge Station (Stop 5)
12:00-13:00
Arrival, registration and lunch
13:00-14:00
Welcome, introductions and networking
Tom Hancocks
14:00-15:30
Overview of PDBe, PDBeKB and the graph data
Mihaly Varadi
15:30-16:00
Break and group photo
16:00-17:00
Introduction to participant case studies
All
17:00-18:30
Initial exploration of PDBe data on case studies
Mihaly Varadi
18:30
End of day
18:45
Check-in at Conference Centre
19:30
Evening meal
Hinxton Hall
Day 2 – Wednesday 19 February 2020
08:45
Arrival and registration
09:00-10:30
Utilising the graph database
Sreenath Nair
10:30-11:00
Break
11:00-12:30
Utilising the graph database
Sreenath Nair
12:30-13:30
Lunch
13:30-15:30
Project work
All
15:30-16:00
Break
16:00-18:30
Project work
All
18:30
End of day
19:00
Evening meal
Hinxton Hall
Day 3 – Thursday 20 February 2020
08:30
Check-out of Conference Centre
08:45
Arrival and registration
09:00-10:30
Project work
All
10:30-11:00
Break
11:00-12:30
Project work
All
12:30-13:30
Lunch
13:30-14:30
Project discussion
All
14:30-14:45
Wrap-up and feedback
Tom Hancocks
14:45
End of workshop
All
15:00
Shuttle to Cambridge Station (Stop 5)
Please read our page on application advice before starting your application. In order to be considered for a place on this course, you must do the following:
- Complete the online application form
- Submit a Microsoft Word (.docx) document containing three short paragraphs with a biography, work history and description of your current research interests; each paragraph should be no more than 100 words
- Submit a letter of support from a supervisor or a senior co-worker explaining why you should be selected for this course
- Submit a 200-word abstract detailing your structural biology research question that you wish to investigate during the workshop
Please submit all documents to Meredith Willmott (meredith@ebi.ac.uk) by midnight GMT on Friday 31 January, 2020. We will endeavour to respond to applications as quickly as we can and may contact you with an invitation to register before the official closing date.
Incomplete applications will not be considered.
The workshop fee covers your catering, refreshments, 2-nights accommodation and a shuttle between Station Road, Cambridge and the Wellcome Genome Campus.
EMBL-EBI, UK
EMBL-EBI, UK
EMBL-EBI, UK
EMBL-EBI, UK
EMBL-EBI, UK
Programme
Day 1 – Tuesday 18 February 2020 |
||
---|---|---|
11:30 | Shuttle from Cambridge Station (Stop 5) | |
12:00-13:00 | Arrival, registration and lunch | |
13:00-14:00 | Welcome, introductions and networking | Tom Hancocks |
14:00-15:30 | Overview of PDBe, PDBeKB and the graph data | Mihaly Varadi |
15:30-16:00 | Break and group photo | |
16:00-17:00 | Introduction to participant case studies | All |
17:00-18:30 | Initial exploration of PDBe data on case studies | Mihaly Varadi |
18:30 | End of day | |
18:45 | Check-in at Conference Centre | |
19:30 | Evening meal | Hinxton Hall |
Day 2 – Wednesday 19 February 2020 |
||
---|---|---|
08:45 | Arrival and registration | |
09:00-10:30 | Utilising the graph database | Sreenath Nair |
10:30-11:00 | Break | |
11:00-12:30 | Utilising the graph database | Sreenath Nair |
12:30-13:30 | Lunch | |
13:30-15:30 | Project work | All |
15:30-16:00 | Break | |
16:00-18:30 | Project work | All |
18:30 | End of day | |
19:00 | Evening meal | Hinxton Hall |
Day 3 – Thursday 20 February 2020 |
||
---|---|---|
08:30 | Check-out of Conference Centre | |
08:45 | Arrival and registration | |
09:00-10:30 | Project work | All |
10:30-11:00 | Break | |
11:00-12:30 | Project work | All |
12:30-13:30 | Lunch | |
13:30-14:30 | Project discussion | All |
14:30-14:45 | Wrap-up and feedback | Tom Hancocks |
14:45 | End of workshop | All |
15:00 | Shuttle to Cambridge Station (Stop 5) |
Please read our page on application advice before starting your application. In order to be considered for a place on this course, you must do the following:
- Complete the online application form
- Submit a Microsoft Word (.docx) document containing three short paragraphs with a biography, work history and description of your current research interests; each paragraph should be no more than 100 words
- Submit a letter of support from a supervisor or a senior co-worker explaining why you should be selected for this course
- Submit a 200-word abstract detailing your structural biology research question that you wish to investigate during the workshop
Please submit all documents to Meredith Willmott (meredith@ebi.ac.uk) by midnight GMT on Friday 31 January, 2020. We will endeavour to respond to applications as quickly as we can and may contact you with an invitation to register before the official closing date.
Incomplete applications will not be considered.
The workshop fee covers your catering, refreshments, 2-nights accommodation and a shuttle between Station Road, Cambridge and the Wellcome Genome Campus.