M-CSA Mechanism and Catalytic Site Atlas

Documentation

Introduction

On these pages you will find the documentation for the Mechanism and Catalytic Site Atlas (M-CSA). It is broken down into the following sections:

Defining an entry
Navigating M-CSA
Annotation
Data validation
Post-processing
Data structure

For more information on the database, please see the About section.

1 - Defining an Entry

The aim of the Mechanism and Catalytic Site Atlas is to be as representative as possible and generally speaking, an individual entry is representative of a mono-functional family that has evolved to perform the same function the same way. Thus, no two mechanisms should be identical (unless, of course, they’re the result of convergent evolution).

The golden rule: M-CSA is unique at the mechanism, evolution, and reactive centre level.

An entry is represented firstly by a UniProtKB identifier, and then (ideally) by a representative protein crystal structure (see section 1.1 for how this is chosen).

1.1 - Choosing a Representative Protein

We are looking for the “best” protein to act as the representative entry. Ideally, we want a protein with at least one PDB code, as this will be used to display the active site in the 3D models. If there is more than one crystal structure available, then the "best" is chosen using the following criteria:

Relate to the portion of the sequence that is being annotated
Wild type:
1. no mutations
2. full length (at least for the portion of the sequence/function being annotated)
3. single species
Include any catalytic metal ions and any relevant cofactors.
Highest resolution structure available, where there is more than one crystal structure that fulfils the first three selection criteria.
Crystal structures that include either substrates, products, analogues of either the reactant, substrate or transition state are considered to be advantageous

If there is no available PDB codes, an entry can still be added, as long as there is an available UniProtKB identifier and at least some basic identification of catalytic residues (e.g. M0372, UniProtKB:Q712I6).

There are two levels of annotation in M-CSA:

Full -- these entries have a detailed mechanism as well as active site annotations
Active Site Only -- currently, this represents mostly those entries that were unique to the CSA and have now been migrated across to M-CSA. In many cases, there may well be sufficient information for a full entry, but time and resource limitations mean these have currently simply been transferred with only basic updates. However, this style of entry should primarily only be used where there is enough information to identify an active site but no detailed mechanism as of yet.

1.2 - What's in an Entry

An entry includes the following information:

Overview:

A name - in most cases this is the name of the protein being annotated, but in cases where the name may be seen in multiple cases, the family or "class" or enzyme is added to the name in parentheses.
An overall description of the protein, which often includes information on the proteins function.
The Representative Protein Information, which includes:

The UniProtKB and PDB identifier for the reference protein.
A link to a list of homologues to the reference protein. The homologue page shows a list of UniProtKB sequences with an E-value score of less than 1e-6 and a list of the aligned residues colour coded as to whether they are conserved with respect to the reference protein.
The species for the representative protein, including a link to search M-CSA for other entries from the same species.
Links to external resources for the reference protein, including PDBe, RCSB-PDB, PDBSum, UniProtKB, InterPro, IntEnz, and CATH.
An interactive image of the representative protein (rendered using LiteMol) with the option to zoom into the active site.

The primary annotated overall chemical transformation (reaction). This has images of the substrates and products as well as links to ChEBI and a search option to find other entries in M-CSA that utilise a specific compound.

The Enzyme Mechanism Proposals. There may be one or more mechanism proposals that all have the following features:

A star rating, this is a determination of how confident the primary literature is of the mechanism proposed. It is used to rate proposals from one star to three (where three is the best) as follows:

Three Stars denotes a proposed mechanism that is consistent with all existing evidence
Two Stars denotes a proposed mechanism that either does not explain all the evidence or there are other similarly good proposals
One Star denotes a proposed mechanism which has been disproved by more recent data

An introduction to the mechanism (often a brief description of the mechanism is included here.
The list of catalytic residues, a description of their function, and a list of functions.
The primary literature from which the information in the entry was gathered. A single "key" reference has been highlighted and is shown first. This reference is the one that the annotators think gives the most and best information on any single entry/mechanism proposal.
Catalytic Site only entries have no further information. However, mechanism entries also have a list of the steps involved in the mechanism with more detailed annotations.

A list of contributors, those annotators that have been involved in creating (and updating) the entry.

1.3 - Downloading the M-CSA

Data in the M-CSA can be downloaded in several formats. The options are list in the Download page.

1.4 - How are Homologues Determined?

Sequence homologues are determined using PHMMER by running all the M-CSA manually curated Uniprot sequences against the "Reference Proteomes" database. Alignments with an e-value lower than 1x10^-6 are added to the database. These are then annotated based on the conservation of the catalytic residues on that sequence.

PDB homologues are determined in a similar way. We use PHMMER to compare all the sequences of the M-CSA manually curated PDB structures with the "PDB" database, and save the ones with an e-value lower than 1x10^-6. The alignments are done at the chain sequence level to avoid repetition (many PDB structures and PDB chains share the same sequence). Each catalytic residue in M-CSA is then annotated against all the PDB chain alignments to check if they are conserved.

1.5 - How is Type of Life Determined

Type of life is a M-CSA descriptor of the species that is defined using the following rules:

Green plants - Viridiplantae
Fungi - Fungi
Birds - Aves
Bacteria - Bacteria
Mammals - Mammalia
Archea - Archea
Virus - Visruses
Reptiles -Lepidosauria
Fish - this is a combination of the following classes (taken from Wikipedia):
- Class Agnatha (jawless fish)
- Class Chondrichthyes (cartilaginous fish)
- Class Placodermi (armoured fish)
- Class Acanthodii ("spiny sharks", sometimes classified under bony fishes)
- Class Osteichthyes (bony fish)
Insects - Insecta
Worms - Nematoda
Amoebea - Amoebozoa
Spiders - Arachnida
Amphibians - Amphibia
Anything not covered in the above list - "not separately classified"

2 - Using the M-CSA Website

In this section you will find basic instructions on how to navigate and search the database.

2.1 - Navigating M-CSA

There are several ways to navigate the M-CSA:

From the Browse page.
From the Statistics pages.

2.1.a - Navigating M-CSA using Browse

browse_ss1 Legend:Screen shot of the browse page (no filters applied).

The Browse page offers not only a way to go directly to a specific M-CSA entry, it allows for the entries to be shown ordered by entry id, Enzyme Name, UniProtKB identifier, EC number, PDB code or CATH domain.

To access the M-CSA entry, click on the M-CSA id field (far left column of the table). The other links will take you to UniProtKB, InterPro, EC, PDB and CATH respectively.

The Browse page is paginated to speed up the retrieval of the filtered results.

The Browse page also allows users to filter the results displayed according to five different criteria (which can be combined as is shown in the figure below)

Legend:Browse page showing results filtered by mechanism type only entries with one Ser, one His and one Asp catalytic residue.

The Entry type doughnut can be used to select either all entry types (default) or either mechanism or catalytic site only entry types. The EC sunburst wheel can be used to select all entries of a specific EC number from the top level of the EC hierarchy (class) to the Serial Number (fourth level of the hierarchy). Similarly, the CATH sunburst wheel can be used the same way. The user can also specify the residues present (e.g. in the above example a single Asp, His and Ser residue). Clicking on a residue type multiple times will increase the number of residues present (e.g. clicking on Asp twice will search for two annotated aspartate residues in the same reaction). Finally, a pull down list of cofactors annotated in the database is available so that users can filter on a specific cofactor.

2.1.b - Navigating M-CSA using Statistics

Statistics are all interactive and can also be used to search. The following is a list of the statictics and the search results that they can return. Statistics are split into seven categories.

Coverage

Entries in M-CSA by EC class. This shows the total count of M-CSA entries by EC class. The underlying dataset can be accessed by clicking on one of the columns (e.g. clicking on the EC:1 column will take the user to a list of all entries in the dataset that are listed as being oxidoreductases.
EC coverage at the third EC level. This shows the coverage of the M-CSA dataset with respect to the third level of the EC classification (sub-subclass) as a percentage. Again, clicking on one of the columns will take the user to a list of the relevant entries.
EC coverage at the fourth EC level. Similarly to the sub-subclass chart, this lists the percentage coverage of M-CSA with respect to the EC nomenclature at the serial number (4th level) of the EC classification.
A list of Third level ECs not represented in M-CSA (this is not searchable against M-CSA)

Database

Number of Proteins of Species shows the top 18 species (by full scientific name) in M-CSA. Currently, this does not link to the underlying data.
References by Journal shows the 20 most cited journals in M-CSA
Number of papers by Year shows the number of citations per year for all M-CSA literature entries.
Reference PDBs Resolution shows a histogram of the distribution of resolution for the reference crystal structures in M-CSA. This currently has no search function associated with it.

Propensity. This chart does not allow a user to navigate to the underlying data.
Residues. This page displays the a bar-chart that shows the 20 primary amino acid residues annotated in M-CSA divided by into the three classes of function annotated: interaction types, reactant roles and spectator roles. This chart can be used to change the residue data being shown on the right hand side, which is a bar-chart showing the breakdown of the functional roles for a specific residue type. The default residue is histidine. Clicking on one of the function bars (e.g. metal ligand) will bring up a list (at the bottom of the page) of all the entries in M-CSA that have a histidine annotated as a metal ligand.
Roles. This page shows two bar-charts at the top, similarly to the Residues page the graph on the left hand side is an overview and the graph on the right is a more specific view. The left hand side graph shows the number of annotated functions (split into the three function types) for all residues. Clicking on one of these bars will display a list at the bottom of the page of all residues with that function (e.g. proton acceptor) and also modify the graph on the right hand side to show the breakdown of the number times a residue type is seen performing that function. Clicking on one of those bars (e.g. His) will change the list at the bottom of the page to display all the cases of histidine acting as a proton acceptor in the database.
Components. This shows a bar-chart of all the reaction mechanism components annotated in M-CSA, this includes the more chemical mechanism descriptions (e.g. bimolecular nucleophilic addition) to the more descriptive terms like: inferred return step. Clicking on one of these bars will display a list of entries with that mechanism component at the bottom of the page (note, to see this list you will need to scroll past the bar-chart, which is rather large).
Bond Changes. This bar-chart shows all the bonds changed (formed, cleaved or changed in order) in all M-CSA entries. As with the other bar-charts, clicking on one of the bond type columns (e.g. C-O bonds formed (light blue colour) will show a list of all the reactions annotated with that bond change in the M-CSA database.

2.2 - Querying M-CSA Data

M-CSA can be queried either from the "Search" bar on the top right hand side of the banner:

Some examples of the searches available from this box are shown as clickable links below the box. This basic search queries M-CSA based on:

Enzyme Name
EC Number
Reference UniProtKB identifier
PDB Code

M-CSA can also be searched from the "Advanced Search" page (accessed from the Search option in the top bar or by clicking the "advanced search" link below the search box in the top bar.

The search is broken down into free text and identifier searches. The following query multiple fields in the database:

Enzyme Name searches the Enzyme Name as stored in the M-CSA entry, as well as the EC nomenclature accepted name and synonyms.
Species can be searched by scientific name (e.g. Homo sapiens), common name (e.g human) and type of life (e.g. mammal)

EC number and CATH code can be searched for partial matches, e.g. "1.", however we recommend that a user use the Browse functionality for searching the M-CSA with partial EC numbers and CATH codes (see above).

Multiple fields can be searched at the same time by clicking on multiple white search fields. Those fields shown in green will be the ones that the database will be queried on:

It should be noted that the more fields that are selected, the longer the search will take.

If you know which M-CSA entry you are interested in, simply type the integer into the search or advanced search boxes, e.g. 32 will take you to entry 32.

3 - Annotation in M-CSA

This section is designed primarily for M-CSA annotators, but contains much information that may be of interest to our users.

3.1 - Starting a New Entry

Choosing a new entry is as important as how it is annotated. Firstly, the validity of new entry must be determined: is it truly a new entry, or a different mechanism proposal for an existing entry. Secondly, the best representative proteins and crystal structures (if available) need to be chosen.

There are many different starting points to adding a new entry to M-CSA. However, they boil down to two options:

a specific protein (identified from the primary literature or suggested by a collaborator)
a specific function (termed EC number for simplicity in the figure below).

Legend:The pre-processing protocol for M-CSA. Boxes filled in light green represent those processes that can be automated. Green lines represent “yes” decisions and magenta lines represent “no” decisions.

Determining if the proposed protein is “valid” as a new entry is based on the premise that if we’ve seen it in M-CSA already, it may not be suitable and the existing entry should probably be updated. This determination is done by first assessing if the UniProtKB identifier has been seen before (if it has, the annotator may be interested in a different portion of the protein, not the one already present. Hence, we do not simply terminate the tests here).

Next, is the EC number unique? If it is, then we’ve got a valid new entry, even if we’ve seen the UniProtKB identifier before. If the EC isn’t unique (even if we’ve not seen the UniProtKB identifier before), we need to evaluate if the entry is part of the same family (i.e. evolutionarily related to the existing entry). This is harder to do automatically as the InterPro identifier chosen to represent the family in the database may not be of a sufficiently low level in the evolutionary hierarchy to differentiate well. Thus, at all stages in this process, the curator needs to evaluate the existing entries suggested by the checks.

However, these checks are not intended to be prescriptive, and if the curator is certain that this is a novel entry (and not just an alternative suggestion for the existing entry) then the new entry will be allowed. Once the representative UniProtKB identifier has been added to the entry, then the representative PDB code needs choosing.

The golden rule: M-CSA is unique at the mechanism, evolution, and reactive centre level.

Thus: if the same protein using the same active site can perform three different mechanisms then there should be three different entries for it (e.g. bromoperoxidase M0373, M0374 and M0389). If the same protein uses the same active site to perform three different reactions, but the reactive centres and mechanism is identical, then there should only be a single entry for the protein (e.g. M0083). However, if two proteins have the same mechanism but are from two convergently evolved families, then these proteins should both be included in the database.

Now we can add the new entry.

From your home page, click on “Add New” entry. If this is a “new” entry based on an existing entry, then that entry can be chosen as a template (e.g. for the three different mechanism of the bromoperoxidase enzyme which used the same active site to perform three different mechanisms).

The "Repeated ECs" and "Repeated Uniprot" buttons can be used to see where there is duplicated information, but are more for information that annotation. If the entry you add would end up in one of these lists, you will see warning messages as you annotate the entry.

3.2 - Editing new entry info

As soon as you have selected "Add New Entry", the following display will be presented:

Legend:Screen shot of the initial entry page. Fields marked with a red star are required.

Entry Name. This should be as informative as possible. The preferred name should include the accepted name linked to the EC number and, if required, a class or type in parenthesis after. E.g. alcohol reductase (class I).
Description. This is a free text description that gives general and/or high level information on the enzyme/family being annotated.
InterPro identifier. This is the "preferred" InterPro identifier, i.e. it is the one that describes the signature membership as accurately as possible. E.g. the best identifier is a family type signature at the lowest level of the hierarchy available.

Curator notes. These will not be displayed to users. As such, they are intended as notes to aid in any future curation of this entry.
SFLD family identifier. This allows us to provide a direct link to the SFLD database at the appropriate level of the SFLD hierarchy.

Once this process has been completed, the curator has to input the representative UniProtKB identifier. In the vast majority of cases, this will only be a single identifier. However, in the case of hetermeric proteins a user can add a comma separated list of UniProtKB identifiers to represent the entry:

Legend:Screen shot for adding a protein. If the protein is a monomer (or homomer) then only add one UniProtKB identifier (e.g. A9CEQ8). If the protein is a heteromer, then add the requires UniProtKB identifers as a comma separated list (e.g. P18316, P18315, P18314).

E.g. using UniProtKB identifier A9CEQ8

Legend:Screenshot showing the protein addition page.

The website will automatically populate the name and notes fields (however, the curator can edit these if required). Extra annotation includes if the enzyme “Is Allosteric” (pull down menu, default: unknown) and if it is “Known to Moonlight” (checkbox, default: unchecked).

Click on “Save” and the homepage will now be updated to look something like this:

Legend:Screenshot of the entry page after the addition of a protein.

At this stage, a curator can add/edit any field in any order. However, it is HIGHLY recommended that they “Edit Catalytic Residues” first, as this involves choosing the representative PDB code, and this will form the basis of the model used for the mechanism annotation.

3.3 - Edit Catalytic Residues

There are three stages to adding the active site:

Select the representative UniProtKB identifier using the pull down menu.
Select the representative PDB code using the pull down menu.
Add residues.

Legend:Screenshot of the active site editing page. Use the pull down menus to select the reference protein and PDB code. Then select the desired catalytic residues using the pull down list at the bottom of the page.

If there are residues missing in the PDB code(s), then the curator may opt to use the UniProtKB identifier as the reference source for adding catalytic residues. The curator must choose the UniProtKB identifier before the PDB code, and the PDB code before selecting residues.

The catalytic residue annotation consists of the residue identity (chosen using the pull down menu) and the location of catalytic function (also chosen from a pull down menu).

3.3.a - Location of amino acid residue function.

function_location Legend:Location of amino acid residue function

If a residue is active through multiple locations, e.g. its side chain and main chain amide, then the residue should be added twice, once for each location of function.

3.3.b - What makes a residue catalytic?

Residues are designated as being catalytic by fulfilling any one of the following criteria:

Has direct involvement in the reaction mechanism, the so-called reactant residues whose chemical structure is modified during the course of the reaction (for example, the residue is involved in covalent catalysis, electron shuttling or acts as a general acid/base).
Has indirect, but essential, involvement in the reaction mechanism, the so-called spectator residues, whose chemical structure does not change during the course of the reaction. These are the residues that:
1. polarise or alter the pKa of a residue, a water molecule or part of the substrate directly involved in the reaction.
2. Affect the stereospecificity or regiospecificity of the reaction.
3. Stabilise the reactive intermediates (either by stabilising the transition states or the intermediates themselves, or destabilising the ground states of the substrates).
4. Involved in forming the binding site of a catalytically important metal ion.

Note that this definition does not include residues that are involved solely in ligand binding and thus differs from other resources, such as UniProtKB annotations.

3.4 - Edit Cofactors

Cofactors are non-standard amino-acid small molecules that assists an enzyme in catalysis. They can be inorganic molecules (e.g. metal ions), or organic molecules (e.g. PLP), which may sometimes be complexes with metal ions (e.g. heme).

Legend:Types of cofactor and the annotation decisions involved in their annotation.

Generally speaking, cofactors are small molecules that are not consumed during the course of a single enzymatic turnover. In the literature, cofactors are called by many different names, including (but not limited to):

Cosubstrates. These are generally those molecules that are considered cofactors (e.g. NAD(P)) but are consumed during the course of the reaction. In M-CSA, these are handled as substrates/products in the overall reaction and not listed as cofactors at all.
Coenzyme. These are cofactors that dissociate from the enzyme after each catalytic cycle.
Prosthetic group. These are cofactors that remain with the enzyme through many (all) of its catalytic cycles. These can be either covalenlty attached to the active site, or very strongly bound.

M-CSA may differ from other resources in its annotation of cofactors for those cases where the cofactor is part of the substrate complex, e.g. magnesium ions in ATP dependent reactions. Here, we annotate the magnesium ion as a cofactor due to the fact that it is not consumed during the course of the reaction (although it is lost from the active site when the products dissociate).

In terms of M-CSA annotation, we do not include allosteric regulators, inhibitors, structural metals, etc. as cofactors. We require a cofactor to be present in the active site at some point during the course of the reaction. All cofactors are annotated the same way: use the pull-down menu to add one or more cofactors (the list is populated based off cofactors previously seen in MACiE and the CSA using PDB HET codes). If the same cofactor occurs twice, edit the count field, rather than adding the cofactor again.

Legend:Screenshot of the cofactor editing page. The cofactor(s) are selected using the pull down menu.

NB: If the cofactor occurs as a reactant/product in the overall reaction, then it should not be added as a cofactor in the cofactor annotation. This is because cofactors are treated in the same manner as amino acid residues by the annotation process, and if a reactant is also a cofactor then that species will occur twice in the mechanism (see later).

3.5 - Edit Reference Reaction

rxn1 Legend:Screenshot showing the reaction editing page.

The notes, EC number, KEGG reaction and Rhea ids are all optional, but recommended. If the reaction involves polymers, e.g. protein, DNA, RNA, etc. then it should be flagged as a polymeric reaction.

The EC number is a hierarchical four number code, each number separated by periods, which classifies the enzyme function according to the enzyme's overall reaction scheme and is assigned to the enzyme by the Enzyme Commission. The pull down menu is automatically populated from the IUBMB database (ExplorEnz). PDB files often include the EC number in their annotation, the literature references may also include the EC number. However, this number should always be confirmed with IntEnz, as EC numbers are not completely static and are frequently reviewed. If there isn't a number assigned, a curator is encouraged to assign the EC number as far as possible, but at least to the first level (class):

oxidoreductase
transferase
hydrolase,
lyase
transferase
ligase

The second level represents the sub-class, the exact meaning of which is dependent on the Class, the third number represents the sub-subclass and the fourth number is the Serial Number, which essentially defines the substrate specificity of the enzyme.

Once the reaction overview fields have been completed as far as possible, then the reactants and products can be added.

The reactants (aka substrates) and products are all added as ChEBI identifiers. If the specific compound isn’t in ChEBI, then it should be added (although it is possible to add M-CSA only molecules this is NOT recommended).

Once the reactants and products are added, click on “Create and map RXN with EC-BLAST”. This will map the reaction, test for balance and inform the curator of any problem atoms:

Legend:screenshot of a mapped overall reaction.

In this case, the protonation states differ between reactant and product, so either a proton needs adding to the reactant side or the correct protonation states need adding. Protonation states are notoriously challenging to get right as they often depend on the localised pH of the protein and its active site. Rhea has chosen to always represent the reaction using the protonation states of the reactants and products at pH 7. However, it is highly recommended for M-CSA entries that the substrates and products mirror what occurs in the mechanism.

Once finished editing this page, click on “Go Back” to get back to the “entry page”

If there are other reactions to add, then click on “Add New Reaction” and follow the same procedure.

3.6 - Edit Mechanism

Finally, we’re ready for the Mechanism. Select the model to be used (in this case A9CEQ8, 4hpn) and click on “Add New Mechanism”. The home page will now look something like this:

Legend:Screenshot of the home page after the reactions and a new mechanism have been added.

Click on the mechanism number on the “entry” page, this will bring up the mechanism annotation page:

Legend:Screenshot of the unedited mechanism page (full mechanism entry).

This is where the meat of the entry goes. All mechanism entries have:

“Description. This is the only absolutely required text field. It should describe the general mechanism in broad terms and detail any key assumptions taken in annotating the mechanism.
Curator Notes. Similarly to the main entry level annotation, this field is not shown to the user and is more for curators to see how annotations have been made or other assumptions.
Reversibility tag. This pull down menu describes whether the reaction is known to be reversible or not. The default is "unknown"
Rating pull down menu. This is used to assign a confidence to the mechanism proposal. It is more important when there are multiple mechanism proposals available, but can give users a good indication of how well established the mechanism is even if there is only one proposal annotated. The ratings are:
1. Low confidence in this mechanism
2. Moderate confidence in this mechanism
3. High confidence in this mechanism
Complete check box. Used to flag a mechanism as "finished" or not.
"Catalytic Site Only" check box. This is used to differentiate between an active site only and full mechanism entry.

If there is no stepwise mechanism available in the primary literature, then highlight the “Catalytic Site Only” checkbox, and select “Save”. This will change the page thus:

Legend:Screenshot of the unedited mechanism page (catalytic site entry).

Whether the entry is a catalytic site only or full mechanism, it is necessary to add references. This is done by clicking on the "Add Reference" button at the bottom of the mechanism page.

3.7 - Adding References to a Mechanism

The "Add Reference" button at the bottom of the mechanism page will take you to a new page which looks like:

Legend:Screenshot of reference addition page.

References can be added as DOI or PMID (it is recommended that you use one or the other as mixing them can lead to duplications).

A reference can apply to zero or more residues and/or cofactors. Select the ones the reference applies to using the Ctrl (or Apple equivalent) key.

If the reference has already been added, then the “Reference:” pull down menu can be used to find it again:

Legend:Screenshot of reference editing page.

A reference should always have a “Type of Evidence”. Evidence types are based around the Evidence Code Ontology (ECO) and the following are currently available in M-CSA: [[LINK TO A PAGE BUILT FROM evidenceType TABLE]].

Notes are optional (except for in the case of the csa_annotation, see below), but should be used to give more detail if required.

Once all the fields have been filled in, click on save (for a new reference, clicking on “back” will delete any information you’ve added). The data entry tool will automatically populate authors, title, etc. based on the DOI or PMID added.

Legend:Screenshot of the final reference.

The reference can always be edited once it has been added.

A mechanism should always have a primary reference identified. This is the reference that should be the first port of call for a user that doesn’t necessarily want to read every reference on the list!

All residues should have a CSA_annotation reference type, a free-text description of their function, irrespective of the mechanism type:

Legend:Screenshot showing the "csa_annotation" style of reference.

The csa_annotation should have one or more residues (any residue can only have one csa_annotation field, so only group residues when they only have a single function in the reaction) and “Notes” which describes the function being performed.

NB: if you add alternative mechanisms, you will need to add references to each one as references are linked to mechanism proposals, rather than entries.

3.8 - Annotating a mechanism

As previously stated, there are two levels of detail when annotating a mechanism in M-CSA:

A complete step-wise mechanism
An active site entry

The main difference between the two types of entry is that the first details all the steps involved in completing the overall transformation and the second is an overview of the active site and the roles of the critical components present in that active site. To switch between the two types, simply click on the check box titled "Catalytic Site Only".

3.8.a - 'Detailed mechanism' entry

Adding Steps

Firstly, from the "Chemical Steps" menu, click on "Add New Step":

There are several ways to add a new step:

DEFAULT: Fill with: empty and Insert: at the end. This is most appropriate when you are editing a new entry
filling a step with data from another step already created. This is most useful for when there are alternative mechanisms that utilise similar steps.
Inserting steps at the end or before existing steps.

Annotating Steps: Step One

Once the first step has been created, you will be able to click on the step number and will be presented with the blank step annotation (unless you created the step by copying another, in which case you will be presented with the annotation of the existing step used).

Click on “Draft From Scratch”, this will place all the components including catalytic residues, cofactors, substrates and products. It is important to draw the reaction this way as inbuilt mapping of the compounds and atoms allows for automatic annotation of certain reaction components and residue functions.

Once the components are all in place, they can be rearranged and the curly arrows added:

REMEMBER: click on “Save Scheme to Database” when you’re finished editing. Otherwise you’ll have to start from scratch!

Although molecules can be deleted at any stage in the reaction drawing process, it is highly recommended that you leave everything in place until the full mechanism has been edited.

Helpful tip: Two types of arrow in the Marvin JS interface: single headed ( 1head ) and double headed ( 2head ). In both cases, if you have the arrow tool selected, click on an arrow to cycle through:

Different atoms that the bond can be formed between
Electron movement (i.e. no bond formed).

There are also many different types of bond. Other than the standard single, double, triple and stereochemical bonds, the only one in use within M-CSA is the coordinate bond: coord

. This is used to annotate metal ligands.

Once you’re happy with the image, you need to edit the description, residue function(s) and step components.

Residue Roles

First, use the “Recalculate Automated Roles” to populate those functions that can be automatically assigned. Then, for those functions that are not calculated (mostly spectator roles like electrostatic stabiliser) select the required residue(s) and the required Function, then click on “Add Role”.

See Amino Acid Function Section for more details on the functions available for annotation.

Helpful tip: You can select multiple residues at the same time to assign them a single function

Step Components

Helpful tip: You can see atom number in the marvinJs interface. You have to click on the gear icon and then on check "index atoms”. This can be very helpful in adding atom numbers to the step component annotations

Again, as with the residue roles, use the “Recalculate Auto Components” to annotate the reaction automatically as far as possible. Then you can use the pull down menus to annotate the remaining functions.

For more detail on the step component annotations available, see: Mechanism Attributes Section, Named Mechanism Section, and C.K.Ingold Section

Once you’re happy with the step, move onto the next step. You will have the option of creating a new step from within the edit step page:

Note: if there are any errors on the page, then you will not be able to navigate using the above interface.

Once you’ve created the new step, you have the option to draft the start of step two from the end point of step one using “Draft From Previous Step”.

Again, rearrange the components as you require, and draw in the curly arrows... click on “Save Scheme to Database” when you’re happy with the step mechanism and annotate the residue function, step components and description as you require.

Once all the step annotation has been completed click “Go Back to Mechanism Page” and finalise the annotations.

3.8.b - 'Catalytic site' entry

From the mechanism “home” page, click on “Edit Roles and Mechanism Components” to get to the annotation pages.

For CSA style entries, only the residue roles section needs annotating:

Select the residue(s) to be annotated, and a specific function to assign to them. This should be as detailed as possible, but high level, i.e. “proton shuttle” rather than “proton acceptor” and “proton donor”

The main difference between catalytic site only and full mechanism style entries is that all residue function must be manually annotated for a catalytic site style entry due to the fact that no mechanism is physically drawn. Thus: select the residue(s) to be annotated, and a specific function to assign to them.

Helpful tip: You can select multiple residues at the same time to assign them a single function

Step Components can be ignored for this level of annotation. However, if you wish to add them, you can. They are added in the same was as for a "detailed mechanism" step.

Once this annotation has been completed click “Go Back to Mechanism Page” and finalise the annotations.

3.9 - Finalising an entry

Helpful tip: At any point, clicking on “List entries: user” will take the curator back to the “home page” and the entry being annotated will show with the missing annotation highlighted: missing Legend:Screenshot of the annotation check.

Once all the annotation has been added, check the “complete” box. Then, to get back to the “entry” page, click on “Go Back”

At this stage, secondary mechanism proposals can be added if, for example, there is some active debate as to how the reaction proceeds. Simply repeat the mechanism addition steps for a new mechanism. Ensure that you assign an appropriate rating to all the mechanisms. Obviously, this is subjective for the curators, and it is highly encouraged that you add Curator Notes explaining the choice if you can.

If the “main” annotation is also complete and OK, go back to the “Edit Entry” page and check that “complete” box.

On the List Entry page, the completed entry will now show at the end of your entry list. If you are happy, click on the “Go to migrate page” button and select the entry you want to push to the M-CSA user: migrate Legend:screenshot of the migration page.

This moves the entry from your personal list to the “public” version of the database. Once there it will be double checked by an M-CSA team curator and made live on the website ASAP.

4 - Data Validation

This section describes the various data checks that are performed during entry creation.

An entry on a users homepage will look something like:

Legend:Screenshot of the annotation check.

Here, the annotation being checked includes:

Is there a primary reference for an entry (the “if you only read one reference” reference)
Is a “key” interpro identifier assigned?
Do all active site residues have a csa-style description (this is shown with the "csa_flag" text)
Do all active site residues have at least one assigned role (this is shown with "role" text)

Within an entry, the following checks are performed?

Reaction Specific Checks (reference and other reactions):

Does the reaction balance? NB: Answer may well be no at the moment if it is flagged as polymeric -- and this is OK for now... not discussed adding n+1 or n+2 entities in ChEBI yet.
Does it have a KEGG and Rhea identifier (OK for the answer to this to be no as not all reactions will be represented in one or more of these resources)

For those fields that are required , a user will not be able to navigate to another section of the annotation until they have been filled in, e.g. the entry description and step description.

Other checks performed within the entry include the InterPro identifier, EC number, and UniProtKB identifier duplicate checks:

Documentation

Introduction

1 - Defining an Entry

1.1 - Choosing a Representative Protein

1.2 - What's in an Entry

1.3 - Downloading the M-CSA

1.4 - How are Homologues Determined?

1.5 - How is Type of Life Determined

2 - Using the M-CSA Website

2.1 - Navigating M-CSA

2.1.a - Navigating M-CSA using Browse

2.1.b - Navigating M-CSA using Statistics

2.2 - Querying M-CSA Data

3 - Annotation in M-CSA

3.1 - Starting a New Entry

3.2 - Editing new entry info

3.3 - Edit Catalytic Residues

3.3.a - Location of amino acid residue function.

3.3.b - What makes a residue catalytic?

3.4 - Edit Cofactors

3.5 - Edit Reference Reaction

3.6 - Edit Mechanism

3.7 - Adding References to a Mechanism

3.8 - Annotating a mechanism

3.8.a - 'Detailed mechanism' entry

Adding Steps

Annotating Steps: Step One

Residue Roles

Step Components

3.8.b - 'Catalytic site' entry

3.9 - Finalising an entry

4 - Data Validation

Documentation

Resources