What's New in MEROPS?

Release 12.5 8-September-2023

Data update

This release is a data update; there are no new features and only minor changes to the CGI scripts and services. Because of computing issues at EBI, at present the sequence, code and family tables have not yet been updated.

Release 12.4 22-October-2021

Data update

This release is mainly a data update; there are no new features and only minor changes to the CGI scripts and services.

Peptidase and inhibitor bibliographies

This is the first new release for the MEROPS database since Alan Barrett passed away in 2020. Alan was responsible for maintaining the reference collection in MEROPS which he did diligently until May 2020 when it became impossible for him to continue. I have taken over this role as well as maintaining the website, collecting and analysing the sequence and cleavage data, but I have not been able to keep up with the literature to the extent that Alan did. Alan managed to update all the literature once a fortnight; I am not able to do updates as frequently. The literature has been updated, but there are many hundreds of papers that still have to be completely processed. The explosion in literature relating to the COVID-19 pandemic has been partly responsible for this. I apologize for the inconvenience that incomplete bibliographies may cause.

Release 12.3 11-September-2020

More glutamic peptidase families

Three more families of glutamic peptidases have been added, including the first families to include human peptidases. Two of these families had beem considered to be metallopeptidases, but structural studies have indicated that the metal ions are structural rather than catalytic. The new families are: G4, which includes Tiki1 peptidase from human (the family formerly considered to be M96); G5, which includes RCE1 peptidase from Saccharomyces cerevisiae and human (the family formerly considered to be M79); and G6, which includes the Ras/Rap1-specific peptidase from the MARTX toxin of Vibrio vulnificus.

Release 12.2 15-June-2020

Data update

This release is a data update; there are no new features and only minor changes to the CGI scripts and services.

Release 12.1 26-April-2019

Data update

This release is mainly a data update; there are no new features and only minor changes to the CGI scripts and services. New homologues were obtained from a search of the NCBI Protein database and the number of sequences has increased by over 220,000 (21.4%).

Modification to pages for peptidase substrates

The MEROPS sequence identifier (MERNUM) for the peptidase performing the cleavage, where known, has been added to the page listing known substrates for a peptidase. The MERNUM is a link to the relevant sequence page. Items in the table can be ordered by MERNUM.

Release 12.0 18-September-2017

A new home for MEROPS and changes to some services

This is the first release to be posted only at the the EMBL-European Bioinformatics Institute (EMBL-EBI). Please note that MEROPS is no longer available at the Wellcome Trust Sanger Institute, and all requests to that site will be forwarded to the EMBL-EBI MEROPS homepage.

The MEROPS sequence libraries have been made available for searching on the EMBL-EBI website, and the link from the SEARCHES button on the left hand menu now points to this service. Please select a MEROPS library from the "Other Protein Databases" menu in the "PROTEIN DATABASES" box. "MEROPS-MPRO" is protease.lib (full sequences), "MEROPS-MPEP" is pepunit.lib (sequences of peptidase and inhibitor units only) and "MEROPS-MP" is merops_scan.lib (peptidase and inhibitor units of holotype sequences only). Please remember to deselect UniProt Knowledgebase before initiating the search so that the results are less confusing. It is now also possible to search the MEROPS full-sequence library using the HMMER webserver. Please select "MEROPS" from the "current database selection" pull-down menu. Unfortunately, the MEROPS batch Blast has had to be suspended.

Cross-references to the Panther database

Panther (http://www.pantherdb.org/) is one of the few databases to cluster protein sequences at a level lower than family. Many protein families contain proteins with diverse functions, and the approach Panther has taken to identify sequences within a family that are functionally related is similar to the approach we have adopted in MEROPS: once a family is assembled an alignment and a phylogenetic tree are made, and sequence clusters within the tree identified. Panther defines a subfamily to be a cluster of sequences from more than two organisms where gene duplication has preceded speciation. A Panther subfamily is appproximately equivalent to a MEROPS identifier. Panther makes full-sequence alignments, so a MEROPS family can equate to more than one Panther family, especially for multidomain peptidases. Panther, of course, also creates families for proteins other than peptidases and peptidase inhibitors. Cross-references to the Panther database are shown on the family, peptidase and inhibitor summaries.

New genomes analysed

With over a million sequences in our collection, there is little point in adding minor sequence variants or sequences from whole genome shotgun assembly where reliability may be low. Hence, we are now only adding sequences of characterized proteins and homologues from completely-sequenced genomes of organisms of evolutionary, medical or commercial relevance. In this release we have analysed five proteomes from the newly identified Asgard superfamily of archaea, of which the Lokiarchaeota are believed to be the closest living relatives of pre-mitochondrial eukaryotes.

Release 11.0 06-January-2017

A new home for MEROPS

The MEROPS website has moved to a new home at the EMBL-European Bioinformatics Institute, and this release is available both at the EMBL-EBI and MEROPS' old home at the Wellcome Trust Sanger Institute. Future releases will only be available at EMBL-EBI, so please change your browser bookmark to http://www/ebi.ac.uk/merops.

Users are reminded that the MEROPS database and website no longer receive public fundng, and have no paid staff. Alan Barrett is retired and, as of October 2015, Neil Rawlings works full-time for the InterPro database as a full-time curator. Both Alan and Neil work on MEROPS only in their free time. Consequently, major releases are now planned to be less frequent, once a year. We expect to be able to produce the occasional minor releases as we did in 2016, but we may not be able to respond to queries and problems as promptly as we would wish.

As we move on, we would like to thank the web team at the Wellcome Trust Sanger Institute for all their help in maintaining and improving MEROPS over the past fourteen years, and special thanks go to Matt Waller and Paul Bevan.

Peptidases and inhibitors from helminth genomes

As a result of a collaboration with the Pathogens team headed by Dr Matt Berriman at the Wellcome Trust Sanger Institute, we are pleased to be able to add in this release the peptidases and inhibitors from over fifty nematode and platyhelminth species.

Minor updates 9-September-2016

Substrate cleavages

Tables for substrate cleavages have been updated.

21-April-2016

Peptidase summaries for World Malaria Day (25 Apr 2016)

New peptidase summaries have been written for various peptidases important for malaria infection. These include the hemoglobin-degrading peptidases plasmepsin-1, plasmepsin-2, plasmepsin-4, histoaspartic peptidase, falcipain-2, falcipain-3, falcilysin and PfLAP aminopeptidase; PfSUB1 peptidase which is important for shedding surface proteins to escape the host immune system; PfROM4 which sheds the surface adhesins essential for erythrocyte recognition prior to entry into the host cell; and plasmepsin-5 which removes the PEXEL motif from parasite proteins destined for export to the host cell.

15-April-2016

Structure pages

We discovered that many of the external links on the Structure pages are obsolete or deprecated. The obsolete links have been replaced with links to PDBe (Protein Data Bank Europe), SCOP, CATH and PDBSum. The links to PDB and Proteopedia remain.

12-April-2016

Community edits

The peptidase summaries provided by invited experts in the field have been re-instated having been erroneously omitted from release 10.0.

PDB entries

Over 8,500 cross-references to structures in the Protein Data Bank were added to the database. References are now included on each Structure page.

Release 10.0 14-March-2016

New sequences and MERNUMs

Since release 9.13 (July 2015) the number of sequences in the MEROPS database has doubled. HMMER searches against the UniProt protein sequence databases have led to the inclusion of over 500,000 additional homologues. This has forced us to add an additional character to the MERNUM protein sequence identifiers. So the MERNUM for human gastricsin is now MER0000894.

Future releases

The process of gathering more and more homologues from more and more obscure organisms which are unlikely ever to be characterized is not only time consuming, but is unlikely to provide any major insights into the activity of peptidases and their inhibitors. Consequently, this will be the last release of MEROPS to offer a collection of all sequence homologues. In future, only sequences of characterized peptidases and inhibitors, homologues from major new genome sequencing projects (for an organism from an otherwise unrepresented phylum or class for example) and homologues for new families will be added to the data collection. Future releases will also be less regular: we will make a new release only when there is significant new data. We will try to make more minor releases, for example to add cleavage data, or add newly characterized peptidases.

Release 9.13 6-July-2015

Pathways

Links have been made to the KEGG database for biological pathways. The cross-links can be seen on any peptidase or inhibitor summary page. The section headed "Pathway" lists all the KEGG pathways in which the peptidase or inhibitor is involved, and by clicking the name of the pathway the user will be sent to the relevant pathway map in the KEGG database.

An index to all KEGG pathways in which peptidases or peptidase inhibitors are known to be involved has been provided as an item on the left hand green menu. On clicking a pathway name, a pathway page is displayed. The page shows the pathway source, which is clickable and takes the user to the external database page showing the pathway map, and two tables to show the pathway steps that involve peptidases and peptidase inhibitors. In each table there is a row for each step involving a peptidase or peptidase inhibitor. The columns in the peptidase table are the pathway step, the MEROPS identifier for the peptidase, the recommended name of the peptidase, how the peptidase is displayed on the external database page, the name of a substrate, the cleavage site if identified, the UniProt accession for the substrate (if it is a protein) and a reference that describes the cleavage. In the table for peptidase inhibitors, the columns are the pathway step, the MEROPS identifier for the inhibitor, the recommended name of the inhibitor, the recommended name of the peptidase inhibited, the MEROPS identifier for the peptidase inhibited, and a reference that describes the inhibition. The MEROPS identifiers in both tables are clickable and take the user to the relevant summary page.

We intend to extend this feature to other pathway databases in the future.

Alignments and trees

It has been apparent for some time that the full family and subfamily alignments and trees are in many cases unusable because there are so many sequences and the aligned sequences are too long. We have decided that for any family or subfamily with more than 200 sequences then only a representative set will be shown. The representative set includes all family or subfamily holotypes as well as an example from every organism phylum where a member species has a homologue that is predicted to be an active peptidase or peptidase inhibitor. The format of the alignments and trees remains the same. Because we have to generate full alignments and trees to assign MEROPS identifier correctly, the full alignments and trees which are not being displayed on the website are made available to download from our FTP site.

Eukaryote genomes

All new eukaryote genomes available on 8 May 2015 at NCBI) have been analysed for peptidases and peptidase inhibitors.

UniProt cross-references

The UniProt database has dropped many of the protein sequences from strain variants and this has required a complete rebuild of all MEROPS cross-references to UniProt. This affects the Sequences page for every peptidase and peptidase inhibitor.

Non-peptidase homologues

A new routine to calculate active site residues in genome-derived sequences has been introduced. This has enabled many sequences formerly considered to be fragments or non-peptidase homologues to be reclassified as peptidases.

Release 9.12 22-December-2014

Analysis of substrates

A new service to analyse protein substrate cleavages has been introduced. This service gives an indication of whether the cleavage site is conserved amongst orthologous sequences. The user can upload a tab-delimited file containing on each line the MEROPS identifier of the peptidase responsible for the cleavage, the Uniprot accession of the substrate, and the residue number (from the Uniprot entry) where cleavage occurs. Homologues of this substrate are downloaded from the UniRef50 database and aligned. The residues P4-P4' around the cleavage site are scored according to whether they are conserved with the known substrate, and whether a replacement amino acid is known in that particular binding pocket from all known substrates for the peptidase in question. A cleavage site containing many unacceptable replacements is unlikely to be derived from a physiological substrate of the peptidase, assuming that for a physiologically relevant substrate the cleavage site would be conserved. A poorly conserved cleavage site is either not physiologically relevant, or represents a pathological cleavage restricted to one or a few organisms. Results are returned by E-mail.

Comparison with Pfam

All the sequences of peptidase homologues have been subjected to an automated comparison with the domains as defined in the Pfam database (http://pfam.xfam.org). The consequences of this comparison have been: a) to identify and remove any false positives that were erroneously filed because they matched domains other than the peptidase domain; 2) to further refine the extent of peptidase units so that sequence that is part of a domain other than the peptidase domain is excluded; 3) to update any sequence with one that has been modified since it was filed in MEROPS; and 4) to correct any miscalculations of active site residues and metal ligands. In total 16,319 sequence records in MEROPS have changed, and a considerable number of former non-peptidase homologues have been reclassified as peptidases.

Eukaryote genomes

All of the proteomes from completely sequenced eukaryote genomes have been re-analysed, because there have been many new builds of eukaryote genomes resulting in the refinement of numerous gene predictions.

Change to MEROPS identifiers index

We have decided that the EST analyses have become less useful with the developments in genome sequencing and visualization and localization techniques. The counts of EST sequences have been removed from the MEROPS identifiers index page for peptidases and replaced by counts of substrate cleavages.

New reference topic

A new reference topic for localization (or visualization) has been introduced to bibliographies.

Release 9.11 21-July-2014

MEROPS identifier assignment

The method for assigning a MEROPS identifier has been changed. Previously, identifiers were assigned based on the phylogenetic tree: an identifier was assigned to all sequences derived from the same node as a holotype. Problems with this approach include the time taken to generate all the alignments and trees, and the method was not applied to non-peptidase homologues and fragments because these were excluded from each alignment and tree. Instead, we are generating a sequence library for each family and using FastA to search this library with each holotype sequence from the same family. Each hit is sorted to the holotype with the lowest E value and given the same MEROPS identifier, provided the sequence identity is 40% or more. To remove fragments and sequences that are near dupicates a program has been written to perform pairwise comparisons with FastA and merge the sequences if they are greater than 94% identity, from the same species, not known to be products of different genes and not tandem duplicates in the same gene.

Links to the IUPHAR Guide To Pharmacology

Links have been made between peptidases and protein inhibitors in MEROPS to the IUPHAR Guide To Pharmacology. These can be found on the relevant Pharma pages.

New model organism: zebrafish

The zebrafish (Danio rerio) has been added to the list of model organisms in MEROPS. A new MEROPS identifier has been created for each peptidase that could not be assigned to an existing identifier.

Release 9.10 20-December-2013

Small molecule inhibitors expanded

The number of small molecule inhibitors included in the database have been increased: previously, only those with a full summary were included. The small molecule inhibitor pages have been modified so that they conform more to the style used in other pages. There are buttons across the top of each page for some or all of the following: Summary, Structure, References and Inhibits. The Structure page lists all the Protein Data Bank entries that include complexes with a peptidase for the inhibitor in question. Rows in the table are ordered by the peptidase inhibited. The columns in the table are identical to the structure pages for peptidases and protein inhibitors. The table in the Inhibits page list the MEROPS identifier for the peptidase and its recommended name, the name of the inhibitor, the inhibition constant (Ki) for complex formation (if known), conditions under which inhibition occurs and the reference. The table can be re-ordered by clicking the column heading.

The corresponding Inhibitors page for a peptidase now includes all the relevant small molecule inhibitors. Where possible, the small molecule inhibitor name includes a link to the relevant small molecule inhibitor summary.

Release 9.9 23-August-2013

Move to EMBL-EBI

It had been hoped that this release of the MEROPS database would be the first joint release at the Wellcome Trust Sanger Institute and the EMBL European Bioinformatics Laboratory (EBI), but getting all the features to run at the EBI have proved to be more difficult than expected, and it will be the next release that is made available at both institutes.

Community input

This release is the first to contain peptidase summaries provided by the peptidase community. All the contibutors to the third edition of the Handbook of Proteolytic Enzymes were invited to contribute, and we encourage other peptidase experts to contact us should they wish to provide a succinct summary of the properties of their favourite peptidases. A good example of a summary can be found here.

Gene structures

A new display to present gene structures is now available at the peptidase level. The display shows the known exon and intron structure for a eukaryote gene. An exon is shown as a box, and is numbered. Introns are shown as the thick line between the exons. The phase of the intron is indicated above the intron, where phase 0 means the intron is inserted between codons, phase 1 between the first and second base of the triplet, and phase 2 between the second and third base of the triplet. All gene structures are taken from research articles where the structure was experimentally determined and are not taken from genome sequencing projects, where there may be problems with misidentification of exon-intron junctions, omission of exons and erroneous insertion of introns into coding sequence. The gene sequence displayed is from the initiation ATG to the stop codon, so introns within 5' and 3' untranslated regions are not shown. Alternatively spliced variants are shown where they have been experimentally proved to exist. Peptidase and protein inhibitor gene structures have been collected from the following eight model organisms: human, mouse, rat, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae and Schizosaccharomyces pombe. An example of the new display can be found here.

Links to Europe PubMed Central

Links to Europe PubMed Central are now provided for all references in MEROPS, not just those where the full text of papers is made available.

A new strategy to detect homologues

It has become apparent that the methods used in the past to detect the homologues within a peptidase family had been failing to return all homologues. Partly this is because of the restrictions imposed on automated BlastP searches at the NCBI server, where only a maximum of 10,000 hits are returned per search, but mostly it is because of the number of genomes that have been and are being sequenced. Searching with just the subfamily type example is failing to return all the homologues that can be detected. A new approach has been developed, where the BlastP search with the subfamily type example is replaced by a HMMER search. For each family and/or subfamily, a "seed" alignment is generated using ClustalX from the protein sequence of the peptidase unit of an example from every phylum where there is a homologue, to maximize the phylogenetic distribution. Consequently, a type organism has been set up for each phylum, and no seed alignment will have more sequences than phyla. The seed alignment is submited to the HMMER website and used to search the locally installed NCBI non-redundant protein sequence database. Results were analysed and the protein sequences for the hits were downloaded using NCBI's Entrez server. To ensure a full-length peptidase unit was returned and to calculate active site residues, each of these sequences was submitted to a BlastP search against the Merops_Scan database. To file these sequences two preprocessing steps were required: to file species new to MEROPS, and to file sequences identical to those already in MEROPS. A new script was written to prioritize filing of sequences, adding first sequences from species new to each peptidase family, and then adding new paralogues from species with homologues already known in the family.

Handbook of Proteolytic Enzymes

The third edition of the Handbook of Proteolytic Enzymes, editied by Neil Rawlings and Guy Salvesen, has been published by Elsevier as a three-volume book and an electronic book.

First catalytic type with more than a hundred families

The first family to be assigned an identifier with three digits is the cysteine peptidase family C101, with includes the FAM105B isopeptidase (C101.001).

Release 9.8 17-December-2012

Move to EMBL-EBI

The MEROPS database will be moving from the Wellcome Trust Sanger Institute to the EMBL-European Bioinformatics Institute very soon. The MEROPS team is part of the group run by Alex Bateman, who took up the position of Head of Protein Sequence Resources at EMBL-EBI in November and took his team with him. Please note that in the near future the URL and E-mail addresses for the MEROPS database will change.

The first peptidase family with mixed catalytic types

The recent crystal structure of the precursor of the pantetheinyl hydrolase ThnT from Streptomyces cattleya (Buller et al., 2012) has shown that auto-activation exposes a threonine at the new N-terminus, occupying the same position as a serine in the homologous aminopeptidase DmpA from Ochrobactrum anthropi. This means that the nucleophile in peptidases in this family can be either threonine or serine. In all other known families of peptidases, the nucleophile is absolutely conserved. This means that the family cannot be named according to the convention used so far in MEROPS in which the first letter of the family name represents the nature of the nucleophile. This family has been named P1, which is the first in a new category of families with mixed nucleophiles.

More peptidases from model organisms

The number of model organisms has been increased to eleven with the addition of a Gram-positive bacterium (Bacillus subtilis), an archaean (Pyrococcus furiosus), a protozoan (Dictyostelium discoideum) and another yeast (Schizosaccharomyces cerevisiae). A special MEROPS identifier has been created for each putative peptidase from each of these organisms.

Links to Europe PubMed Central

The literature pages for peptidases, inhibitors, families and clans now include links to Europe PubMed Central.

Release 9.7 1-August-2012

Community input

We invite users with specialist knowledge to help improve the peptidase summaries in MEROPS. We hope to be able to present a succinct but detailed summary for every peptidase. The writer will receive full acknowledgement on the peptidase summary page. Please contact us at (merops@sanger.ac.uk) if you would like to help, and you will be sent a username and password, and details of how to use the online editor.

Selection of strain on species page

It has become common practice to sequence the genomes of several different strains of bacteria. The list of strains with completely sequenced genomes can now be displayed on the species page. By selecting one of these strains, the species page filters the results for the selected strain only, and presents only those peptidases or inhibitors detected in that strain. Please be aware that the genome analysis at the foot of the page will display results for the selected strain, NOT the species.

Release 9.6 29-February-2012

Completely sequenced genomes

We consider a genome completely sequenced if an estimate of the number of protein coding genes can be obtained. For genomes from microbes, NCBI's genome database had provided this information. However, restructuring at NCBI has meant that these data are no longer being provided. Instead, for microbial proteomes that can be downloaded from NCBI's FTP site, we now calculating the number of protein coding genes. In this release of MEROPS, an additional 500 completely sequenced microbial genomes have been identified. It is much more difficult to count the number of protein coding genes from completely sequenced eukaryote genomes. Identifying all of the exons and introns is computationally challenging, and misassemblies of genes are frequent. Some genes are missed and others are concatenated. Proteomes include alternatively spliced forms of coding sequences and thus inflate the number of protein-coding genes. For these reasons, the number of eukaryote proteomes annotated as complete in MEROPS lags behind the number of completely sequenced eukaryote genomes.

New reference search

A new item has been added to the search menu. This allows a user to retrieve references by submitting a simple text search. A user can enter an author name, a term from a title, or a journal name. The retrieved list will display the full reference with, where available, links to PubMed, PubMed central, the full text of the article and clan, family, peptidase or inhibitor summaries in MEROPS.

Release 9.5 4-July-2011

Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli peptidases

We have added Drosophila melanogaster, Caenorhabditis elegans and Escherichia coli to our list of model organisms, and have created a MEROPS identifier for each peptidase.

New substrate cleavage status

In addition to "physiological", "non-physiological" and "synthetic", the new status "pathological" has been added to the substrate pages of peptidase summaries.

Second family of glutamate peptidases

A second family of glutamate peptidases has been discovered, that of the pre-neck appendage protein from Bacillus phage phi29, which is a self-cleaving protein. The family is now included in MEROPS as family G2.

Asparagine catalytic type

Self-cleaving proteins that utilize asparagine as a nucleophile perform a novel form of proteolysis that is not hydrolysis. The asparagine attacks its own carbonyl carbon atom and forms a succinimide ring and simultaneous cleavage of the peptide bond. This activity best fits the enzyme description of a lyase - "an enzyme cleaving a C-C, C-O or C-N bond by other means than hydrolysis or oxidation" (Enzyme Nomenclature, 1992). These self-cleaving proteins therefore belong to subclass EC 4.3, whereas peptidases belong to subclass EC 3.4. In MEROPS, the term "asparagine peptide lyase" will be used to describe these enzymes.

Release 9.4 31-January-2011

Inteins

The inclusion of asparagine peptidases in MEROPS has forced us to reconsider inteins, which had not been considered peptidases. These are self-splicing proteins which are structurally related to the hedgehog protein precursor C-terminal domain (C46). Proteins which contain an intein undergo two cleavage events to release the intein and a splicing event which joins the two portions of the extein. Neither cleavage involves hydrolysis. One of the two cleavages occurs at an asparagine residue and the mechanism is thought to be similar to that of asparaginyl peptidases. Although these proteins are at the limits of what can be considered peptidases, it is now sensible to include them in MEROPS. Three families of inteins have been assembled, N9, N10 and N11, with the majority of the sequences in N10. The structural relationship between these families and C46 mean that the clan to which these families belong contains peptidases of mixed catalytic type, and so the existing clan has been renamed to PD, divided into the subclans PD(C) and PD(N) for families of cysteine and asparagine peptidase, respectively.

Combinatorial substrates

Marcin Poreba and Marcin Drag have been collecting peptidase specificities from combinatorial peptide screening studies in the published literature (Poreba & Drag (2010)) and have made this collection available to MEROPS. Where applicable, a peptidase summary now includes a table showing subsite preferences derived from such studies.

Substrates

This release includes the first batch of cleavage sites in protein substrates that are being collected for MEROPS by Molecular Connections, Bangalore, India.

Searches of the MEROPS reference collection

The MEROPS literature collection includes over 42,000 references to articles, book chapters and books on peptidases and their inhibitors, to which MEROPS identifiers are assigned. We will be implementing methods to search this collection, and the first is now included in the Searches page; this is to search by PubMed identifier. Nearly 40,000 of the articles in our collection are included in PubMed. By entering a PubMed identifier, a table showing the full reference, with access to an on-line version (if available) and a list of MEROPS identifiers is returned. The identifiers are linked to the peptidase or family summary.

Release 9.3 07-September-2010

A new catalytic type

Over the recent few months it has become apparent that there are several families of proteins that cleave themselves, that cleavage occurs at an asparaginyl bond, and that the side-chain amino group of the asparagine is acting as a nucleophile. All of these cleavages happen in cis. At present seven families have been recognized, including two that had previous been included in MEROPS as aspartic-type endopeptidases. The peptidases included in the families are viral coat proteins and bacterial autotransporters. All of these asparagine-type peptidase family names start with the letter 'N'.

Rotating Richardson diagrams

A structure page at the peptidase level now shows a rotating Richardson image in addition to the static image. There is also an option to display a surface on top of the Richardson cartoon. By holding down the left-hand mouse button, the image can be manually rotated. By clicking the right-hand mouse button, the user gains access to the full range of the Astex viewer commands, allowing further manipulation of the image.

Changes to displays of cleavages in a protein

The display of cleavages in a protein now shows the known cleavages in a table as well as showing the susceptible bonds in the sequence. The rows in the table are arranged by cleavage position in the substrate. Each row shows residue number, the name of the peptidase performing the cleavage (which links to the substrates page for that peptidase), the residue range of the protein portion used in the experiment (for example, the mature protein), whether the cleavage is believed to be physiological or not, how the cleavage position was identified, and a reference.

Release 9.2 30-April-2010

Changes to Small Molecule Inhibitor displays

A unique identifier for each small molecule inhibitor (SMI) has been introduced. The identifier is the letter J followed by by a five-digit number. These are displayed on the SMI summaries, the SMI index and the inhibitors pages in the peptidase summaries. Cross-references to the University of Alberta's DrugBank database are now included on the SMI summary pages. Interactions between peptidases and SMIs are now included on the inhibitors pages. This includes not only SMIs for which we have summaries and identifiers, but also other SMIs.

Sequence features

A new page has been added to each peptidase and inhibitor summary to display our predictions of active site residues (peptidases only), metal ligands (metallopeptidases only), extent of the peptidase or inhibitor unit, sequence length, and the source of the sequence used in MEROPS with a link to the relevant primary sequence database.

Changes to substrate displays

It is now possible to filter the list of substrates cleaved by a peptidase to limit the display to either protein and peptide substrates of physiological relevance, proteins and peptide substrates that are not physiologically relevance, or synthetic substrates.

Links to Wikipedia entries

There are now links from peptidase and inhibitor summaries to Wikipedia entries. There is considerable extra data on a peptidase or an inhibitor in Wikipedia because of the effort Wikipedia users have put in to the Molecular and Cellular Biology WikiProject. We at MEROPS encourage our users to register at Wikipedia and to contribute to existing pages and create new pages for peptidases and their inhibitors.

Release 9.1 26-January-2010

New data release

This is second half of the release that began in December 2009 when all the software was updated. In this release the data have been updated. In future, software and data will be updated simultaneously.

Links to ChEMBL

Links to the ChEMBL database have been added. These are links to drug targets in ChEMBL and can be found on the pharma pages at the peptidase level.

Problems with the new codebase

We apologize that the software update in December did not go smoothly, and several aspects of the MEROPS website were either not working, not working properly or missing. Most of these problems have been fixed and we would like to thank the users who pointed out problems of which we were unaware. There are still issues to be resolved with the MEROPS batch Blast and the dynamic alignments and trees.

Release 9.0 15-December-2009

New codebase

Release 9 of MEROPS will be in two parts. The reason for this is that the software that produces the web pages has been re-written by Matthew Waller and Jody Clements from the Wellcome Trust Sanger Institute web team. In January 2010 the new data will be published. The release has been done in this way so that any programming bugs can be detected and reliably separated from data errors. The FTP site will also be updated in January.

The new software has been designed to resemble the old website, and the changes have mainly been for our benefit, to simplify the maintainence and make it easier to add new features. Users will notice that frames have disappeared, and that the left-hand green menu now scrolls with the rest of the page. Links to external databases now open in the same window so you will have to use the back button to return to MEROPS. References and sequences are now displayed in small windows in the middle of the screen.

Feedback and reporting errors

The MEROPS website now has a ticketing system for reporting errors and making comments. Each page now has a feedback link in the footer. The user will be asked to enter his or her name and an E-mail address when the comment is posted. The user will receive an automated E-mail which should not be replied to. A member of the MEROPS team will then contact the user and when the problem has been fixed the ticket will be closed. The user will receive a second automated E-mail. This should only be replied to if the issue has not been resolved to the user's satisfaction: the ticket will then be automatically re-opened. Please use this system to report programming errors, broken links and any errors or omissions in the data.

Links to PubMed Central from Literature pages

There are now links from the clan, family, peptidase and inhibitor literature pages to the full text of papers stored in PubMed Central.

Release 8.5 21-August-2009

Domain images and architecture pages

We have redesigned the domain images which appear on the peptidase summaries. Each image is scaled according to the sequence length, shown as a blue line. The peptidase unit is shown as a green box, with the active site residues and metal legands (if any) shown as red and blue "lollipops",respectively, along the bottom edge of the box. The top edge shows disulfide bridges, and known carbohydrate binding sites (as orange lollipops). An inhibitor uint is shown as a large, grey box, with reactive site residues shown on the bottom edge as red lollipops. Other domains that have been annotated by SwissProt or Pfam are shown as smaller boxes. Domains derived from Pfam are shown as red boxes and links to the Pfam database can be accessed by clicking on the domain. Signal peptides and transmembrane domains are shown as small, black boxes. Propeptides are shown as small, grey boxes. Mouse-over text gives details for each feature displayed.

Because these simpler domain images are quicker to generate, we now include at the family level a page showing the different protein architectures known in the family or subfamily, ordered by MEROPS identifier.

Comparisons of peptidase specificity

The MEROPS collection of substrate cleavages now exceeds 38,500. There are over three hundred peptidases for which ten or more substrates are known. In addition to the displays on a peptidase summary, MEROPS now includes displays to compare preferences in binding pockets S4 to S4'. These are items on the substrate index and show preference in terms of all amino acids, amino acid properties and individual amino acids. The first of these shows, for each peptidase, an amino acid if it occurs in the same binding pocket in 40% or more of the substrates. So no more than two amino acids are shown for any one binding pocket. The amino acid is shown with a green background, and the brighter the green the greater the percentage of substrates with the amino acid in that binding pocket. The second display is similar but instead of showing individual amino acids, these are collected into "aliphatic", "aromatic", "acidic", "basic" or "small" groups. In the third option the user is prompted to select an amino acid from a pull-down menu and the displays shows the number of substrates with the selected amino acid in each binding pocket for each peptidase. Where an amino acid has not been observed in a binding pocket, this is hightlighted in black. In all three displays where no amino acid is possible (for example P4, P3 and P2 for an aminopeptidase, of P2', P3' or P4' for a carboxypeptidase) the binding pocket is highlighted in grey.

Substrate alignments

If known, the substrate alignments how show protein secondary structure at the foot of the alignment. A helix is shown as a string of "a's" and is highlighted in red, a beta strand is shown as a string of "b's" and is highlighted in green.

MEROPS identifiers for another model organism

We recently expanded MEROPS identifiers to Arabidopsis thaliana, as well as human, mouse and rat, so that every gene product that is likely to be a peptidase has a unique identifier. We have now added identifiers for all probable peptidase in Saccharomyces cerevisiae. Identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).

Richardson diagrams

The number of Richardson diagrams showing cartoons of structures has substantially increased, thanks to the hard work of Matthew Jenner, who has been working with us this summer. There is now a Richardson diagram for every peptidase or inhibitor for which a tertiary structure has been solved.

Predicted sequences from the chimpanzee genome

Summer student Matthew Jenner has also been predicting protein sequences from the chimpanzee genome. Protein sequences from eukaryote genomes are collected from the Ensembl database. Although Ensembl has a sophisticated, automated pipeline for predicting protein sequences, some predictions require a further manual stage. These are predictions where exons are missed, introns are mistranslated as exons, or genes are run together. Predicted protein sequences derived from orthologue genes which show the greatest difference between human and chimpanzee have been recalculated using the GeneWise software, the human sequence as a template and nucleotide sequence found in the chimpanzee genome by using the Ensembl Blast search service.

Release 8.4 3-April-2009

Substrate indexes

Two indexes for peptidase substrates have been added. These are accessible from the left-hand green menu. The first index shows the count of known substrate cleavages per peptidase, ordered by and linked to the MEROPS identifier. Three counts are shown: the total number in the MEROPS collection, the total number of physiological substrates (peptides and proteins) and the total number of non-physiological substrates (peptides and synthetic substrates). The second index is ordered by substrate name and shows the MEROPS identifier of the peptidases known to cleave each substrate and the total number of cleavages performed by each peptidase. For substrates than can be mapped to the UniProt protein sequence database the UniProt identifier is shown with a link to the MEROPS utility which shows all cleavages of this protein in the MEROPS collection.

Substrate pages

We have introduced "flags" on the substrate pages to indicate the method used to identify the cleavage position. The flags are as follows: NT shows that the cleavage position was determined by N-Terminal sequencing, MS shows that the peptide composition was determined by mass-spectroscopy (MS) and the cleavage position computed, MU shows that the cleavage position was determined by site-directed MUtagenesis, CS indicates that the cleavage position was postulated from a concensus motif (CS) within the protein sequence.

Release of signal and transit peptides and initiating methionine

Cleavages that result in the release of signal and transit peptides and the initiating methionine have been automatically collected from the annotations in the SwissProt protein sequence database. However, cleavages were not previously assigned to a specific MEROPS identifier. This has now changed, and assignments made where possible. Such assignments can only be made for organisms where the genome has been completely sequenced and only one homologue is known from the family in question. Data for the cleavage position is usually determined by N-terminal sequencing, but readers should be aware that at least for chloroplast transit peptides, aminopeptidases are also transported into the chloroplast which may remove amino acids from proteins subsequent to the removal of the transit peptides.

Pharma pages

For peptidases that are drug targets we are now collecting links to databases of interest to the pharmaceutical industry. These are collected together on the new Pharma pages accessible for the peptidase summary. We are currently making links to the PubChem BioAssay database and the Binding Database.

Literature flags

Several new flags have been added to the Literature pages. The full list of flags is:

  • A Assay method,
  • E recombinant Expression,
  • I design of small-molecule Inhibitors,
  • K gene Knockout or other artificial genetic manipulation,
  • M natural Mutation, allelic variant or polymorphism,
  • P Substrate specificity,
  • R RNA splice variation,
  • S three-dimensional Structure,
  • T proposed as a therapeutic Target,
  • U suggested to have therapeutic potential itself,
  • V Review.
  • Structure pages

    Structure pages now include a link to Proteopedia.

    MEROPS Blog

    In keeping with many other publicly available databases MEROPS now has a blog. This can be found at http://meropsdb.wordpress.com and is where information about how the database is assembled and its developments will be posted. Users are encouraged to visit this page where they can comment on items posted.

    Release 8.3 21-December-2008

    Genome analysis

    The organism pages now include an analysis of homologues if the genome has been completely sequenced. This analysis is done family by family indicating an unusual presence (where the family is absent in 90% of closest relatives), unusual absence (where the family isd present in 90% of closest relatives), or where the number of family members is most or fewest compared to the organism's closest relatives. Closest relatives are identified by walking up the taxonomic tree until the number of organisms with completely sequenced genomes is five or more.

    Expansion of MEROPS identifiers for model organisms

    Every human or mouse peptidase homologue has a unique MEROPS identifier, and we have recently begun expanding identifiers for other model organisms with completely sequenced genomes. The first is Arabidopsis thaliana and identifiers for peptidases for this organism have the first character after the dot replaced by the letter A. When a homologue is characterized biochemically, we will replace the identifier with one in the standard format (three digits after the dot).

    Database cross-references

    A new item has been added to the Searches menu. The MEROPS database includes many cross-references to other databases and bioinformatics resources. To make it easier for others to map their database entries to MEROPS there is a new CGI that presents the cross-references between MEROPS and any database selected from a pull-down menu. There are a considerable number of cross-references between MEROPS and primary sequence databases, so these are returned in batches of 50,000.

    Modifications to distribution trees

    The displays of peptidase or inhibitor distribution among organisms have been enhanced. There is now mouse-over text at every node which gives the name of the taxon.

    Modifications to protein substrate cleavage display

    The display showing known cleavages in a selected protein substrate depended on the user knowing from which species the substrate was derived. Now if no cleavages are known for the selected protein but are known for the same protein from a different species, there is an option to display the sequence alignment with cleavages highlighted.

    New peptidases and inhibitors

    The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 8.2 4-August-2008

    Alignments of protein variants

    The sequence page of the peptidase (or inhibitor) summary now includes an ALIGN VARIANTS button. Many peptidases and inhibitors are sequenced many times and variants exist, either strain-specific or the result of alternative initiation, alternative splicing of exons, allelic variation or single nucleotide polymorphisms (SNPs). Clicking on the ALIGN VARIANTS button will generate a dynamic alignment of all the variants we have collected from the primary sequence databases. Residues that differ from the sequence we have selected for inclusion in our protein sequence collection are highlighted as white text on a black background.

    Gene name index

    A new index of gene names has been added to the main index page (the left hand menu). You can now search for any peptidase or protein inhibitor homologue knowing the name of its gene or its gene locus.

    Protein substrate annotation

    All protein substrates of peptidases are mapped to sequences in the UniProt protein sequence database. This database contains translations of full coding sequences, including the initiating methionine, signal and other targeting peptides. However, the substrates as used by researchers are usually mature proteins and peptides. The substrates page for each peptidase summary now includes an extra column in the table to show the residue range of the protein or peptide used in each respective study. The display of protein substrate alignments also shows the residue ranges graphically in the header lines that include the scissile bond symbols in the form <--+-+--->. A question mark instead of an angled bracket indicates that the terminus has not been determined.

    Peptidase-inhibitor interactions

    MEROPS identifiers have been added to the tables of peptidase-inhibitor interactions, and it is now possible to order the tables according to the identifier or the protein name.

    Chromosome locations added to Organism pages

    For eukaryotes with completely sequenced genomes, the chromosomal location (in megabases) of the peptidase or protein inhibitor homologue gene is now shown on the organism page. These locations are derived from the EnSEMBL database by searching for entries with a cross-reference to the UniProt protein sequence database, therefore a location will not be shown for gene from any genome where the copy number is low. However, the locations for all homologues from human and mouse should be shown. For human and mouse these locations are also shown in the Genetics table of the peptidase or protein inhibitor summary. Here the locations are linked to the contig view in EnSEMBL, which shows the exon and intron structure of the gene. The name of the chromosome (or genomic scaffold) precedes the location and the strand is indicated by a plus or minus sign in parentheses after the location. Users should be aware that EnSEMBL is automatically generated and is not a curated database.

    New peptidases and inhibitors

    The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 8.01 05-May-2008

    Limited alignments

    We have been aware that as more data are collected some of our alignments are becoming very large. Not only will there be hundreds (even thousands) of sequences, but the consequences of aligning so many diverse sequences means that more gap characters are inserted and the alignments get wider. These are difficult to view on a computer screen, and on scrolling the screen the residue numbers or sequence identifiers disappear off screen. To help to alleviate these problems, we have made our dendrograms ("trees") more interactive. The nodes of the tree are now active links and on clicking on the node an alignment of all the sequences derived from that node will be displayed. This alignment also includes the family type example and the sequence numbering derived from the type example sequence. The alignment displayed is not dynamic, but is derived from the full alignment by removing any insert characters common to all the sequences. In order to make this happen, we are now including the aligned peptidase or inhibitor unit sequences and the dendrograms (in New Hampshire format) in the MySQL database. Users who download the MySQL database from our FTP site should be aware that two new tables, aligned_sequence and tree, have been added.

    New peptidases and inhibitors

    The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 8.00A 14-Feb-2008

    More regular updates

    This is the first of what are intended to be monthly data updates to MEROPS. An update is not a full release, so there are no new features and only a handful of alignments and trees will have changed from the last release. All the sequence data have been updated, however, and this update includes the analysis of several new prokaryote genomes as well as several new families and family summaries.

    New peptidases and inhibitors

    The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 8.00 8-Jan-2008

    A major change to MERNUMs

    Every sequence in the MEROPS database has been assigned a unique accession which we call the MERNUM. The MERNUM consists of the letters 'MER' followed by a number. When we set-up the system we though that a five-digit number would be sufficient, but with sequencing becoming easier to do during our most recent collection of new peptidase and protein inhibitor homologues we collected our 100,000th sequence. So a MERNUM is now a six-digit number. Users who have individual sequences bookmarked as favourites will need to refresh the links in their Web browser.

    MEROPS DAS server

    A distributed annotation system (DAS) server has been set-up for MEROPS. This allows others to extract data directly from the MEROPS MySQL database for inclusion in their own Internet service. The user enters an accession as a parameter on the URL (usually this will be a UniProt accession, but an EMBL/GenBank ProtID will work for MEROPS) and data relating to the sequence stored in our collection will be returned. For a peptidase or protein inhibitor, this will include the MEROPS identifier, family and clan, the extent of the peptidase or inhibitor unit, active site residues (and metal ligands for metallopeptidases), the amino acid sequence and a link to a page in MEROPS for each feature. For a protein substrate, positions of known cleavages and the MEROPS identifiers of the peptidases responsible are returned. Example URL's are:

    http://das.sanger.ac.uk/das/merops/features?segment=P07858 (features for human cathepsin B)

    http://das.sanger.ac.uk/das/merops/sequence?segment=P07858 (sequence for human cathepsin B)

    http://das.sanger.ac.uk/das/merops/features?segment=P05067 (known cleavages for human amyloid beta A4 protein precursor)

    New specificity display

    The summary page for any peptidase with well-characterised specificity now contains an additional display to the logos, with the advantage that amino acids not known to occur within a cleavage site are now shown. This simpler tabular display shows how frequently an amino acid occurs in each substrate binding pocket. There is a column for each substrate residue P4 to P4', and a row for each amino acid (in alphabetical order of the single letter code). If only one amino acid occurs in any position, then the background of the table cell is shown in red. If an amino acids occurs in 75% of all substrates then the cell background is orange. If an amino acids occurs in less than 25% of all substrates, then the cell background is white. If an amino acid is not known to occur, then the cell is shown with a black background.

    Submission pages

    Facilities are being set-up for our users to contribute to annotation in MEROPS. We are very grateful to all our users who have provided information, pointed out errors, or made helpful suggestions, and intend a series of forms to make user contribution easier. The left-hand green menu now includes a "Submissions" button. At present there are only two submission items, both for advising us of any known protein cleavage sites that we are unaware of. The first is a form for the submission of a single cleavage, the second allows the user to upload a file of known cleavage sites. The latter has been designed with proteomics experiments in mind. The information requested will allow us to map the cleavage to an entry in the UniProt database. We look forward to receiving your submissions.

    New peptidases and inhibitors

    The full list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 7.90 17-Sep-2007

    A dynamic alignment of protein substrate sequences to show conservation around a user-selected cleavage site

    An additional option has been added to the "What are the known cleavage sites in this protein?" page. On this page the user is invited to enter the UniProt sequence database accession of any protein and the display shows known cleavages and the peptidases that perform them. The new option is to make a dynamic alignment of close homologues of the chosen protein substrate sequence. The sequences used for this alignment are taken from the relevant UniRef50 database entry, which lists all sequences from the UniProt and UniParc databases that share 50% sequence identity, and the alignment is generated by MUSCLE. The substrate in which the cleavage is known to occur is highlighted with a green background. Peptidases know to cleave the selected protein substrate are listed above the aligned sequences, and known cleavage positions are marked by a scissile bond symbol above the P1 positions in the substrate. Clicking on one of these symbols will highlight residues P4-P4' in the aligned sequences. Residues identical to those in the known substrate are shown with a pink background; replacements that are known to occur in the same position in other substrates for the selected peptidase are shown with an orange background. Replacements that are unknown in any substrate in the same position for the selected peptidase are shown as white text on a black background. This facility enables the user to assess the evolutionary conservation, and therefore probably the physiological relevance, of the known cleavage.

    Label/key files for sequence alignments and trees

    Label/key files for sequence alignments and trees have been enriched by:
    a) Inclusion of protein architecture subheadings that make use of the architecture "strings" taken from the Pfam database
    b) Inclusion of a link from each organism name to the organism card. This allows the user to find out what sort of organism it is, and what other peptidases or inhibitors it is known to express.

    Release 7.80 23-Apr-2007

    Release 7.70 22-Jan-2007

    Specificity logos

    The summary page for any peptidase with well-characterised specificity now contains a 'logo' that is a diagrammatic representation of the specificity preference in each of the subsites P4 - P4'. To generate the logos, sequences around cleavage sites (ten or more) are aligned, and a hidden Markov model is generated. This is converted to a logo by use of the WebLogo package (Crooks et al., 2004). This feature owes much to the skills of our two rotation students, Jun Kong (jk4@sanger.ac.uk) and Matias Piipari (mp4@sanger.ac.uk).

    Comparative genomics of prokaryote strains

    For many species of bacteria and archaea, genome sequences are available for multiple strains. It can be of great interest to know how the peptidases and inhibitors in the strains compare, for example when some strains are pathogenic and others not. The Comparative Genomics section at the foot of the Searches page now contains the option 'What are the common peptidases in different strains of bacteria or archaea?'. This leads to a page on which a species of bacterium or archaean can be selected to show a side-by-side comparison of the peptidases in the various strains. For each MEROPS identifier, an alignment of the sequences from the different strains is available. An example can be seen here.

    Alignments show holotype sequences

    Now that numbers of known sequences are becoming so large, full alignments of the homologous sequences in a family, which may amount to hundreds, can become confusing. As a response to this, MEROPS now provides an alignment of just the holotypes in each family and subfamily. It may be remembered that a holotype is the single 'type' representative of its MEROPS identifier.

    Peptidase-inhibitor interactions

    The new Inhibitors button above some peptide summary pages gives access to data on peptidase/inhibitor interactions. For a peptidase, the user is presented with a list of at least some of the known protein inhibitors, with Ki, conditions and a reference where known. The table can be sorted according to any of the column headings. For an inhibitor, the user is presented with an alphabetical list of peptidases inhibited, again with supplementary data.

    Better keys to sequences

    The 'Key to sequences' file associated with each alignment and tree now has a format in which each line contains a link to the sequence in MEROPS where there might previously have been a Uniprot accession number.

    Release 7.60 23-Oct-2006

    Changes to clan CA, and a new clan CN

    Prior to Release 7.6, MEROPS included many families of viral cysteine peptidases in clan CA with little supporting evidence. These were families for which crystal structures were not available, but catalytic residues were known to be cysteine and histidine, occurring in this order in the sequence as they do in papain. The correctness of this policy has now been brought into question by the description of the structure of the nsP2 peptidase of Venezuelan equine encephalitis alphavirus (peptidase C09.002). The nsP2 peptidase has a structure significantly different from those of other known cysteine peptidases, and has caused MEROPS to remove family C9 from clan CA and place it in a new clan, CN. At the same time, several other families of viral cysteine peptidases were removed from clan CA pending further evidence.

    Genome statistics show strains

    The pages "Peptidases in Whole Genome Sequences" and "Inhibitors in Whole Genome Sequences" reached from the Genomes entry on the menu bar, now show data by strain as well as species.

    Display of all known cleavage sites in a given protein

    The Substrates page that can be reached from each peptidase summary have now been enhanced. Clicking on the Uniprot accession of a protein substrate opens a window in which the complete sequence of the protein is shown together with all the cleavage sites known to MEROPS. Mouse-over text for each cleavage site gives further informtion about it.

    Work continues on small-molecule inhibitors

    Summary pages for important small-molecule inhibitors (SMI) of peptidases were introduced into MEROPS in Release 7.4, and with the current release the number of SMI included is increased to 158.

    Release 7.5017-Jul-2006

    Comparison of genomes

    We at MEROPS find that we often want to ask a question like "What peptidases are in the mouse genome but not in the human?". When the sequencing and analysis of the two genomes have been completed, it should be a simple matter to make this sort of comparison, and we now include a new "Searches" page to allow the user to do this. Please open the Searches menu from the left-hand green menu and click the option "Comparative Genomics" at the bottom. Select two species from the drop-down menus at the top, and choose to compare them at the level of Family or MEROPS Identifier. Full details of exactly how the comparisons are made are to be found in the help text.

    Merging of subfamilies in family S1

    We have re-organized the subfamilies within family S1. As more data became available, and alignment methods improved, intermediate sequences were found, and it became clear that several of the former subfamilies should merge. Family S1 is now divided into just two subfamilies, S1A and S1B. The former subfamilies S1D (which included lysyl endopeptidase from Achromobacter lyticus and arginyl endopeptidase from Lysobacter enzymogenes) and S1E (which included streptogrisins) are now included in subfamily S1A (the trypsin subfamily). The former subfamilies S1C (protease Do) and S1F (astrovirus serine peptidase) are now included in subfamily S1B (the glutamyl endopeptidase I subfamily).

    Better BLAST searches

    The sequence library that is used for the BLAST searches in MEROPS now contains more sequences, i.e. all the holotype sequences and also the "linker" sequences that are required to make the transitive links within families. This makes the e-values that are returned more meaningful, and decreases the number of false-positive hits.

    Better tables

    We are progressively enhancing the usability of the tables in MEROPS by making the columns sortable. An example is the table of Images that can be reached from the green side-menu.

    Release 7.40 15-Mar-2006

    Appearance of small-molecule inhibitors (SMI) in MEROPS

    Perhaps the major single reason for the intense research activity on peptidases is their involvement in many disease processes. This makes many of them drug targets, and scientists in hundreds of companies and academic laboratories are working to develop new, small-molecule inhibitors of peptidases for possible use as drugs. MEROPS already contains thousands of literature references detailing the results of the work on SMI (flagged I in the reference lists), but in the present release we add a whole new section to MEROPS that deals with the small-molecule inhibitors in greater depth. The size of the field is vast, and it will be some time before we can do it full justice, but we already include many of the 'big name' inhibitors that are either useful as reagents in the laboratory or are significant drugs.

    The sidebar menu on the Inhibitors side of MEROPS contains a link to the index of SMI by name, and thence to summary pages about the individual inhibitors. The summary pages for individual peptidases may also show a new section "Relevant inhibitors" that contains links to some of the SMI that have been described for this particular enzyme.

    Completion of clan summaries

    In 1993, we introduced the term 'clan' to refer to an evolutionarily-related set of families of peptidases, and this level of classification has been important in MEROPS since the beginning. The detection of distant relationships between proteins is often difficult, and the results can be controversial. We therefore need to be clear about the criteria we have used in assembling each clan so that others can assess the validity of the grouping. With this in mind we have expanded all of the text summaries that describe the clans.

    More consistent numbering of residues in Structure images

    The numbers of amino acid residues (typically active site and metal-ligand residues) cited in the legends for the Richardson-style molecular images have now been made consistent with the numbering schemes used elsehwere in MEROPS for the given peptidases, and may differ from the numbering that was used in the original Protein Databank record.

    Display of known cleavage sites in proteins

    There is an additional search option from the Searches button on the green menu bar. This allows a user to enter the UniProt accession of any protein to see the sequence displayed with any known cleavage sites. When the peptidase responsible is known there is a link to the peptidase summary page in MEROPS. A mouse-over displays the name and MEROPS identifier of the peptidase, or, if more than one peptidase cleaves at the same site, shows a pull-down menu of the peptidases.

    More flexible display of EST data

    Greater use of CGI techniques in MEROPS has made it possible for the human and mouse EST data table for each peptidase to be sorted by the table headings according to the user's preference, for example by disease state.

    Richer markup of BLAST results

    The markup of the results of BLAST searches against the MEROPS data has now been further enhanced to include the potential and known glycosylation sites and disulfide bonds that are shown in UniProt.

    Release 7.30 22-Dec-2005

    Batchwise scanning of MEROPS data

    We now provide a facility to BLAST a set of up to 5000 amino acid sequences against the MEROPS sequence collection, to obtain a report that shows the peptidase family and conservation of active site residues for each of the hits. (Find full details under "Batch BLAST" in the About pages.)

    Protein domain images

    The summary page for each peptidase and inhibitor now includes a linear diagram of the whole protein in which the locations of the peptidase unit and other recognised domains are shown. The diagrams include markers for the active site residues, disulfide bonds and other features. These diagrams have been made possible by the availability of data from Pfam, our sister database at the Wellcome Trust Sanger Institute. (Find full details under "Domain images" in the About pages.)

    Clan summaries begin to appear

    Clans form the top level of the hierarchical classification of peptidases and their inhibitors that MEROPS has pioneered. Having completed the summary pages for the families, we have now started work on those for the clans, and the first of these are included in the present release.

    Literature pages: new format, new data

    We have changed the format of the references to a three-line one that we believe is clearer. We have also added new links that use the Digital Object Identifier (DOI) system. The DOI system is designed to provide stable identifiers for journal articles and much else besides. By use of DOIs we make links to the resources on the respective publishers' web sites. For journals that provide free content, or to which the user of MEROPS has a subscription, this link is likely to lead directly to the full text of the article.

    New format for 'About' pages

    The 'About' pages that try to answer a variety of questions about MEROPS, and are our equivalent of FAQ, have been re-organised with their own menu. We trust that users will find the information more accessible.

    Release 7.20 14-Oct-2005

    Completion of summaries for inhibitor families

    Each of the 56 families of proteins that inhibit peptidases now has a text summary in a format similar to that we used for the peptidase families.

    Links from BLAST results

    The markup of results on the BLAST page now includes a direct link to each matching peptidase or inhibitor and its family.

    Second clan of N-terminal nucleophile hydrolases recognised

    DmpA peptidase (S58.001) is a self-processing, serine-type, N-terminal nucleophile (Ntn) hydrolase, but its structure, in the DOM-fold, shows that it is not homologous to the other Ntn hydrolases, which are in clan PB of MEROPS. Accordingly, family S58 has been placed in a new clan, SQ.

    Family U61 moves to S66, in clan SS

    It has been shown that LD-carboxypeptidase formerly of unknown catalytic type in family U61 is in fact a serine peptidase with a distinctive protein fold. Full details can be found under family S66.

    Links to the HUGO Gene Nomenclature Committee database

    On the peptidase and inhibitor summary pages, each human gene symbol is now linked to the HUGO Gene Nomenclature Committee database, which will provide additional information about it.

    Links to IPfam for peptidase-inhibitor interactions

    Links to the IPfam protein interactions database are made when a structure is available for a peptidase-inhibitor complex.

    Release 7.10 22-Jul-2005

    New peptidases and inhibitors

    Additional MEROPS identifiers have been assigned for all of the known human and mouse peptidase inhibitors. The list of identifiers that appear for the first time in the present release of MEROPS can be found here.

    Release 7.00 4-Apr-2005

    Markup of BLAST results

    The results of BLAST searches are now highlighted to show the presence or absence of catalytic residues. This gives an indication of whether a novel sequence is that of an active peptidase or a non-peptidase homologue.

    Lists of peptidases and inhibitors at higher taxonomic levels

    Having consulted an Organism page to see the list of peptidases known from (for example) Plasmodium falciparum, a user may be interested to see the set for all Plasmodium species, or even all Protozoa. That is now possible: just click the higher level at the top of the Organism page. Please be patient if the pages with these larger sets of peptidases take a little time to appear.

    Yet more family summaries

    The series of expanded Summary pages for peptidase families is complete in Release 7.00. Work on fuller summaries for the inhibitor families is now in progress, and already new data are visible for many of them.

    Alignments and trees for individual peptidases and inhibitors

    With the new Alignment button at the top of each peptidase page MEROPS now provides an alignment of sequences (either full-length or peptidase units only) for each individual peptidase and inhibitor. The Tree button leads to a Neighbor-Joining tree derived from the alignment of peptidase units.

    More informative sequence displays

    We have added more markup to the displays of individual "MER" sequences that are reached from the sequence pages. When a peptidase unit is interrupted by unrelated sequence, that is shown. An example is a plant peptidase in family A1 that contains an inserted saposin-like sequence. Non-catalytic residues that replace catalytic ones are now marked in black (view).

    FAQ become About pages

    In the process of revising the FAQ pages we have re-arranged much of the information they contained, and also have changed their name to "About".

    Release 6.90 16-Dec-2004

    Display of active site residues in family summaries

    Our work on the expansion of the data on the peptidase family summary pages continues, and about three-quarters of them have been re-written during 2004. As an adjunct to this, we have added a display of the active site residues in the family (numbered as in the type peptidase). Following our usual convention, the catalytic residues are shown on a red background, and the metal ligands on blue.

    Alignments of sequences for individual peptidases and inhibitors

    We now include alignments of the full-length amino acid sequences of individual peptidases and inhibitors, reached by the pale blue button. Being full-length alignments, these show the peptidase or inhibitor units in context. Colour-coding marks the extent of the peptidase and inhibitor units: in the sequence of the holotype, the unit is coloured green, and the remainder of the sequence brown. There is highlighting of the catalytic residues (red background) and the metal ligands (blue background). The link to the MEROPS reference sequence shows the species of origin when the cursor is held over it. The alignments are generated dynamically on demand, so please allow a second or two for them to appear.

    Activity status of human and mouse peptidases

    MEROPS Release 6.9 lists just under 700 peptidases and homologues encoded in the human genome, and slightly more from the mouse genome. But in each species only a minority of these potential peptidases are yet known to be catalytically active peptidases on the basis of direct experimental evidence. Others are presumed to be active because of their structural similarity to active peptidases from other species such as rat. Or they can be described as 'putative' because, although there is no closely relevant experimental evidence, they do contain all the residues that are known to be of functional importance in the family. Other homologues are probably not active peptidases, either because the genes are pseudogenes or the expressed proteins lack residues that are believed to be essential to peptidase activity in the family. A new feature of MEROPS introduced in Release 6.9 is the field 'Activity Status' on the PepCard for each peptidase homologue from the human or mouse genomes. This shows whether we currently regard this as an 'active', 'putative' or 'inactive' peptidase homologue. If we are aware that a peptidase has been shown experimentally to be active, we try to give a reference, and if on the other hand we believe it to be inactive because one or more expected active site residues are replaced, we show that too. A few examples:

    A01.007 Renin Human: active (Suzuki et al., 2004)
    Mouse: active (Hansen et al., 2004)
    S09.018 Dipeptidyl-peptidase 8 Human: active (Chen et al., 2004)
    Mouse: active (by similarity to human)
    S09.973 Dipeptidylpeptidase homologue DPP6 Human: inactive; S D H 
    Mouse: inactive; S D H 

    Flagging of topics in Literature pages

    The literature on peptidases is large, and the Literature pages in MEROPS contain well over 20,000 references. So that it may be easier to spot a paper on a particular topic in a Literature page, we have added "flags" for six important topics. Thus E indicates that the paper contains information on the recombinant Expression of a peptidase, I shows that we found the article to be relevant to the design of Inhibitors for the enzyme, K means that the paper deals with a gene Knockout or other artificial genetic manipulation, M shows that the paper deals with a natural Mutation, allelic variant or polymorphism, R indicates that the article includes information about an RNA splicing variant, S means that the article deals with three-dimensional Structure, and V shows that the article is a Review.

    No doubt some articles deserving of flags do not yet have them, and we are working to make the assignments more complete.

    Release 6.80 27-Aug-2004

    Progress with expanded family summaries

    We have been busy producing more of the expanded summaries of peptidase families that were introduced in Release 6.7, and 90 families (just over half of the total peptidase families) are now included.

    Clan cards for all clans

    In the past, MEROPS has not provided summary cards for clans that are divided into subclans; instead there was a separate summary for each subclan. This has now been changed so that we treat clans and subclans very much as we have done families and subfamilies for some time. That is to say, there is a summary page for every clan, and the cards for the clans that have subclans contain the data for each subclans in a subsection. We feel that this is logical and hope that it will make the clan-level data more accessible.

    Release 6.70 30-Jun-2004

    A start on expanded family summaries

    A new format has been adopted for the family summaries that allows the inclusion of much new information. The additional information has been added to about one quarter of the summaries in the present release, and we plan to complete the work during the coming year.

    Better access to information on peptidases and inhibitors by Organism

    Three improvements have been made.

    1. The indexes of Organisms now include English common names as well as the scientific binomial names of the organisms.
    2. The individual Organism pages that list all peptidases or inhibitors known from a given species have been made interactive in regard to sort order. The default order in the lists is by family, but re-sorting by clan, peptidase or inhibitor, or gene name can be achieved simply by clicking the top of the appropriate column.
    3. A link at the top of each Organism card gives access to a full list of the sequences of peptidase or inhibitor units known from the species.

    Marking of "holotypes"

    Many users of MEROPS will be aware that each family or subfamily is built around a "type peptidase" to which all other members of the family must be shown to be related. Similarly, we nominate a type form for each peptidase and inhibitor, and by analogy with the taxonomy of organisms, this is called the "holotype" (formerly "type example"). The identity of the holotype for each peptidase and inhibitor is shown on the relevant Summary page, and as of Release 6.7, the names of the holotypes are also highlighted in the label file below each multiple sequence alignment and tree.

    New book relevant to MEROPS

    The long-awaited new edition of the Handbook of Proteolytic Enzymes edited by Alan J. Barrett, Neil D. Rawlings and J. Fred Woessner is now available. Please see http://books.elsevier.com/proteo. The CD-ROM that accompanies the two-volume book is closely linked to MEROPS.

    Release 6.60 29-Mar-2004

    A new catalytic type of peptidases

    As a result of the exciting paper of Fujinaga, Cherney, Oyama, Oda & James (2004) The molecular structure and catalytic mechanism of a novel carboxyl peptidase from Scytalidium lignicolum. PubMed, we now recognise a sixth catalytic type of peptidases: the glutamic peptidases. The known glutamic peptidases are all contained in the the family that was formerly A4, and now becomes G1.

    New clans

    In the light of new crystal structures, three new clans have been established: clan MO (containing family M23), clan MP (containing family M67) and clan SJ (containing families S16 and S50).

    New peptidases, inhibitors and families

    New families in this release are G1 (renamed from A4, see above) and M73 (containing only camelysin from Bacillus species). There is a full list of new identifiers here.

    Enhanced Download page

    If you are interested in working with MEROPS data on your own local system, please click Downloads on the menu at left, and see what we are now offering.

    New papers about MEROPS

    And much else besides...

    As usual, MEROPS has undergone a comprehensive update for the new release. The total of sequences listed is now 19,689 (up by 820) for 2,243 peptidases (up by 63) from 2,056 species of organism (up by 49). The alignments of human and mouse ESTs have been completely regenerated, and now contain over 166,000 individual ESTs. There are 2,267 literature pages, containing 20,174 references.
    The amino acid sequences from our own collection (the "MER-sequences") are now returned by use of a CGI script with enhancements such as residue numbering, range of peptidase unit and numbers of catalytic residues. We hope you will find all this useful.

    Release 6.50 22-Dec-2003

    An additional series of identifiers for clans of inhibitors

    We have named the clans of peptidase inhibitors with identifiers from the series IA to IZ, but this has not proved sufficient for the very numerous clans of inhibitors. We have therefore moved on to using the additional series JA - JZ, of which only JA has so far been assigned (for the family of the thrombin inhibitor, triabin).

    Withdrawal of a few families of inhibitors

    The status of all the families of peptidases and inhibitors is kept constantly under review. We have recently taken the view that the inhibitors in three of the families that we have previously recognised do not meet our criteria for retention in MEROPS. Two of the families affected are I22 (BsuPI protease inhibitor) and I23 (BbrPI protease inhibitor), about which we feel too little is yet known, although we shall be watching for new developments that will justify their re-instatement. Family I30 containing the cathepsin B propeptide and its homologues has also been withdrawn, on the grounds that we do not know of any member of the family that is expressed other than as part of a cysteine peptidase. Again, the situation will remain under review.

    New peptidases, inhibitors and families

    The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release. Amongst the new families are C69 (Lactobacillus dipeptidase A), S62 (PA endopeptidase of influenza A virus), I57 (staphostatin B), I58 (staphostatin A) and I59 (triabin).

    Release 6.40 16-Sep-2003

    Still more peptidases, inhibitors and families

    The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release.

    Release 6.30 16-Jun-2003

    A facility for better links to MEROPS

    The MEROPS Web site makes extensive use of frames, and this can cause problems for a user wanting to save a link to a specific page from their browser or their own Web site. We have now added a CGI program to the MEROPS Web site to help with this. Adding a page to your Favorites (Microsoft Internet Explorer) or Bookmarks (Netscape) is easy: each page in the Database contains a JavaScript function do this. Just click the link "Add this page to your Favorites" at the bottom of the page, and follow the prompt. If you want to construct links to MEROPS from your own Web site, please read how to do it here.

    Clearer statistics

    The counts of known peptidases and inhibitors that are shown on each Organism card are now broken down to show active or putative peptidases separately from their catalytically inactive homologues, and active inhibitors separately from homologues without known inhibitory activity. For example, for Homo sapiens we show "Count of known and putative peptidases: 464, inactive homologues: 88" and "Count of inhibitors: 98, homologues: 156".

    Links to Ensembl

    The Summaries for human and mouse peptidases and inhibitors now contain links to gene reports in the Ensembl database. Ensembl is a joint project between EMBL - EBI and the Sanger Institute to develop a software system which produces and maintains automatic annotation on human, mouse and other eukaryotic genomes.

    Still more peptidases, inhibitors and families

    The process of collecting data for MEROPS continues and a number of peptidases, inhibitors and families are appearing for the first time in this Release.

    Release 6.20 24-Mar-2003

    BLAST server for MEROPS

    A user of MEROPS may have a protein or nucleic acid sequence that is possibly that of a peptidase. It is useful to be able to search the MEROPS data with such a sequence to find homologues and see how they are classified in MEROPS. With the help of the Institute's Web team we have now implemented a server to BLAST the MEROPS sequence data in this way. A library has been compiled containing amino acid sequences of peptidase units from our entire collection for peptidases and peptidase homologues. The library also contains inhibitor units from our collection of protein inhibitor sequences. The searches available are BLASTP (protein sequence query against a protein sequence database) and TBLASTX (nucleic acid sequence query against a protein sequence database). Please look for the "BLAST MEROPS" item on the sidebar menu and give it a try.

    Details of distributions of families

    Recently a colleague was reviewing one of the families of peptidases, and asked us for all we knew about its distribution throughout various kinds of organisms. We had a wealth of data, but found that we were not presenting it in a readily-accessible form in MEROPS. Now you will find that the Distribution table at the foot of each family summary card contains "details" links, and these will open up new windows containing the names of all the species from which the family has been reported in the kingdom of organisms. For example:

    Distribution among Kingdoms of Organisms
    Bacteria Archaea Archezoa Protozoa Fungi Plants Animals Viruses
    - - - details details details details -

    Substrate cards

    Many of the Pepcards now show a new "Substrates" button that will reveal a page of specificity data mainly for the action of the peptidase on other proteins. Of course the three Searches that provide access to specificity data are still there too, but the new buttons look and work like this:

    Substrates

    New and merged families

    The MEROPS classification of peptidases is constantly evolving. Major changes that have occurred since the last release of MEROPS include the merging of family M46 (pappalysin) into M43 (cytophagalysin). New families recognised in the present release include C61 (small protease of Sulfolobus solfataricus), C62 (gill-associated nidovirus 3C-like proteinase) and C63 (African swine fever virus processing peptidase).

    The Peptidase List

    There are a number of databases that provide various kinds of information about enzymes in general. We encourage them to give their own distinctive treatments to as many peptidases as possible, and to this end we are providing a list of the well-characterised peptidases that they might wish to include. We call this the Peptidase List (or PepList for short), and it can be reached from the PepList item on the sidebar menu. If the curator of any other database would like to have the List in any other format we shall try to help.

    Molecular images of inhibitors

    Twenty five molecular images of inhibitors have been added to MEROPS in the present release - please see the Images index on the Inhibiors side of MEROPS for the details.

    Release 6.10 10-Jan-2003

    New Location for MEROPS

    The MEROPS team moved to the Wellcome Trust Sanger Institute on the Genome Campus at Hinxton near Cambridge on October 1, 2002. There we are working alongside the Pfam Protein Families database, and are in an ideal environment for database work. Access to the database is now unrestricted under the terms of a GNU library licence. We are grateful for continued financial support from the Medical Research Council and the warm welcome that we have received from Alex Bateman and his Pfam team.

    Addition of inhibitors to MEROPS

    The proteins that inhibit peptidases are arguably as important as the peptidases themselves to any understanding of the balances of proteolysis in any biological system. The compilers of the Database have a long-standing interest in these proteins, and with Release 6.1 have made a start on including them in the database. It is a major challenge to provide proper coverage of this new aspect, and we do not claim to have completed it in one release, but we feel that we have made a useful start. As always, we shall welcome suggestions for further improvements.

    New families

    New families of peptidases appearing in this release of MEROPS are C60 (type example: sortase A) and M67 (type example: Poh1 peptidase).

    Merging of families

    MEROPS recognises transitive relationships when forming families. This means that when a new sequence appears that shows significant relationships to proteins in two existing families the two families are merged. Since Release 6.0 we have merged family C29 into C16, families M25 and M40 into M20, and M37 into M23. More details can be found here.

    More informative 'MER' sequences

    Colleagues familiar with the database will know that each Sequences card contains (in the left-hand 'MERNUM' column) links to our copies of the full-length, 'MER' sequences in FASTA format. In the present release we have added additional information to each header line, and have highlighted the part of the sequence that is the peptidase or inhibitor unit in red for greater clarity.

    Release 6.0 30-Aug-2002

    Clan-level sequence alignments

    In the absence of three-dimensional molecular structures, the evidence that supports the assignment of a peptidase family to a clan can come from the order of catalytic residues in the polypeptide chain and the similarities in amino acid sequences around them. We now show this kind of evidence in MEROPS in the form of a limited sequence alignment for any clan that contains multiple families, e.g. for the metzincins, clan MA(M). (The numbering of residues is according to that of the type example of the clan.)

    Extended family trees

    In the past, MEROPS has contained alignments and trees only for the subfamilies in those families that are divided into subfamilies, not for the complete families. We feel that this has been a deficiency, because we nowhere showed the deep divergences that demand the separation of the subfamilies. With the present release we have started to put this right, by providing trees for the complete families also.

    New families and subfamilies

    Family A22 of presenilin has been divided into two families, following the discovery of many more putative peptidase homologues. Two new families of bacterial peptidases have been created: C58 and M64.

    Type examples for peptidases

    The concept of a "type example" for a taxon is well recognised - this is the individual example to which all other members of the group can be shown to be related, and the definition of type examples enhances the stability of a taxonomic system. Type examples for clans and families have been shown in MEROPS previously, but we have now added the type examples for individual peptidases as well.

    Sequence libraries for human and mouse peptidase units added to the FamCards

    There are now buttons "H-seq" and "M-seq" at the top the card for each family of peptidases that is known from mammals. These link to FASTA libraries of the sequences of the peptidase units for the family, which can be copied, and are then useful for making one's own alignments and hidden Markov models. The header line for each sequence now shows the residue numbers of the range of amino acids included in the peptidase unit, even when the peptidase unit is one from which an intervening sequence has been removed.

    Release 5.9 21-Jun-2002

    Learning more about peptidases from the complete genomes

    It is fairly obvious that once the sequencing of the genome of an organism is truly complete the data can show which families of proteins are absent from the genome as well as which are present. There are data for about 75 completed genomes in the present release of MEROPS, and we are now making fuller use of these to obtain information about the evolution of peptidase families. The set of buttons linking to information on each family now include a "Genomes" button (except for the families that are confined to viruses). The distribution data take the form of a tree in which each twig is an organism with a sequenced genome, and the colours are blue if the family is present or black if it is not. In an example we can see that family A1 containing pepsin and its homologues is present in all the eukaryotes, but not in the archaea or bacteria. Conversely, subfamily M24A is present in all the genomes available to date. The genome tree for family M22 is the only other one that is entirely blue at this stage. Many of the other families have produced trees that are much more complex, and the interpretation of these will be food for thought for our users as well as ourselves at MEROPS.

    Please note that the blue (and capitalized) organism names have links to their organism cards, so that the user can click on any one to look up exactly which members of the family occur there. We have added similar links to the Distribution trees for individual peptidases, too.

    New plug-in to view molecules

    Since the early days MEROPS has provided for the use of the excellent RASMOL viewer, but we now also provide links on the Structure pages that make use of CHIME. The user has first to install CHIME from here.

    Release 5.8 19-Mar-2002

    Removal of intervening sequences that interrupt peptidase units

    Most users of MEROPS are probably familiar with the concept of a peptidase unit, and they may have noticed that in some peptidases the peptidase unit is interrupted by the insertion of an unrelated domain. A very clear example of this is gelatinase A (M10.003) in which three copies of a fibronectin-like sequence are found immediately N-terminal to the HEXXH zinc-binding consensus sequence. Most other members of the family (e.g. matrilysin, M10.008) show no such insertion. We think it best to remove these intervening domains from peptidase units for display in MEROPS, because this improves the alignment of the sequences of the peptidase units proper, and makes it easier to compare one peptidase unit with another throughout the family. Removal of such intervening sequences has been done where necessary in subfamilies M10A, M50B and S8A. The sequence segments to be removed were recognisable by the lack of any matching segment in a pairwise alignment with the type example of the family. The residue numbers of the two remaining segments forming the strict peptidase unit are indicated in the label file for the subfamily alignment and tree. In the alignment itself, the point at which the intervening sequence has been removed has been indicated by use of colouring as shown here (for human gelatinase A):

    ...DDELWTCPDQGY...

    This would indicate that an intervening segment has been removed between the residues T and C. The details of this are now included in the key to the sequence alignment, e.g. "(residues: 108-214, 390-461)" for this particular sequence.

    A start on larger alignments and trees

    Problems of computation and display have until now prevented us from including alignments and trees containing more than 200 sequences in MEROPS, but with the help of our friends at Pfam we have now been able to handle 740 sequences in subfamily S1A. We find this enormous tree helpful in checking the coding assignments in the subfamily, and anticipate that our users will find it useful too. We plan to add more large alignments in the next release.

    New genomes, and another human chromosome

    MEROPS now contains analyses of the peptidase content of the complete genomes of Brucella melitensis, Nostoc sp. PCC 7120, Clostridium perfringens and Pyrobaculum aerophilum. In addition, the completed human chromosome 20 has been found to contain 9 peptidases and homologues amongst 727 total predicted protein-encoding genes. Incidentally, about 30 new images of the three-dimensional molecular structures of peptidases have been added to MEROPS since the last release.

    Changing families in MEROPS

    Family M51 of intramembrane metallopeptidases has been merged with family M50 following the appearance of linking sequences related to members of both families. We note that family M50 is an exceptionally interesting one, represented in almost all of the genomes so far completely sequenced. Families U61 and U62 are new to this release of MEROPS.

    MEROPS names revisited

    As we noted in Release 5.5, it seems obvious that any one peptidase is almost certain to occur in more than one organism. This means that any species-specific name is not going to work for long, because we shall not know what to call the same enzyme when it turns up in a different organism. Although this seems obvious, many peptidases are still named in just this way, most notably by use of gene symbols as names. A peptidase that is the product of the xyz gene in Escherichia coli might well be called "XYZ protease". But gene names are species-specific, and the gene that encodes this peptidase in another organism will almost inevitably have a different name. So "XYZ protease" will make little sense outside the original species. We know of many peptidases that have no satisfactory name for reasons such as this, and we have found it necessary to devise a system of temporary names for use until the scientific community, or perhaps the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology, produces a satisfactory one. Such a temporary name originally took the form "MEROPS-AA001 peptidase", but we have now replaced these by "Mername-AA001" because they make their meaning clearer. As before, we simply increment the letters and numbers to generate new names. We shall look forward to retiring the Mernames as soon as an acceptable new names appear.

    Data for EST libraries

    The alignments in MEROPS now contain 82,000 ESTs for peptidases and their homologues (for the three species, human, mouse and rat), and we believe that useful information can be obtained by considering the EST libraries in which they were found. So there is a new item in the sidebar menu "EST cell lines". This provides access to a table for each of the 2557 EST libraries (identifed by EMBL Library number) that shows what peptidases were detected in the library. Four separate indexes to the libraries in each species are sorted alphabetically by (1) Library number, (2) Tissue, (3) Developmental stage and (4) Disease. So it is easy, for example, to identify the three libraries from mouse adipose tissue, and to see the lists of peptidase ESTs that they contained by clicking the links to the library numbers on the left. This can be seen here.

    New MEROPS publications

    In recent months, two new publications have described aspects of MEROPS. A free download of the PDF file for the Nucleic Acids Research paper is available at the PubMed link.

    Barrett, A.J., Rawlings, N.D. & O'Brien, E. A. (2001) The MEROPS database as a protease information system. J. Structural Biol. 134, 95-102.  PubMed

    Rawlings, N.D., O'Brien, E.A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346.  PubMed

    Release 5.7 17-Dec-2001

    New families in MEROPS

    The families M60 of enhancin, S46 of dipeptidyl-peptidase 7 from Porphyromonas gingivalis and S54 for the Rhomboid protein have been added to the database. S18 (for omptin) has been moved to A26 (see below).

    First structures for three families

    It is always a landmark when the first three-dimensional structure is published from a family of peptidases. One reason is that it commonly allows the assignment of the family to a clan, or places the existing assignment on a firmer footing. In this release we are happy to be able to show the first structures of peptidases from three families: omptin (A26, clan AF), mitochondrial processing peptidase (M16, clan ME) and anthrax lethal factor (M34, clan MA(E)). The structure of omptin was particularly influential, since it indicated that this is a family of aspartic peptidases, not serine peptidases as had previously been thought, and founded a new clan.

    A "Community" page

    MEROPS is accessed from nearly 10,000 computers every month, so it has the potential to act as a medium for the exchange of the kinds of information that can help to bring the community of scientists who are interested in proteolytic enzymes closer together. We are happy for it to do this, and the new Community page lists some of the societies and conferences that we think our users may like to be aware of.

    Release 5.61 29-Oct-2001

    Statistics of peptidases in completed genomes

    The Statistics page now starts with a table that indicates roughly how much of their coding potential different organisms use to encode proteolytic enzymes. What is shown is the total number of members of peptidase families that we have found in each of the complete genomes we have analysed, as a percentage of the reported total number of genes. The numbers range from 0.68 - 3.57, but many are close to 1.8%. We see no striking correlation with type of organism, e.g. archaeon, bacterium or eukaryote.

    Distribution diagrams

    We have added to MEROPS a feature previously seen only in MEROPS-PRO: a "Distribution" diagram that shows the organisms from which each peptidase is known in comparison to the distribution of the whole family. The diagram is reached by use of a new button on the PepCard.

    More and better EST analyses

    We have added results from analysis of the rat EST collection alongside those for human and mouse ESTs in PepCards. The format is the same: alignments and data tables. The "EST" columns in the Peptidase Identifier index now show "0" when a search was made and found no hits. Also, we have added Comments on many of the EST alignments; these can be found at the top of EST data cards.

    Peptidase gene knockout data

    Because of the great importance of gene knockouts in the understanding of biological functions of peptidases, an additional "Knockout" section has been added under "ACTIVITY" in the PepCards where we have such information. Incidentally, the links for the references cited are to the appropriate year in the Literature file.

    Explanations of assignment of families to clans

    The relationships between the families that are grouped in a single clan are in the "twilight zone" of sequence similarity, and the evidence of their relationship depends upon a variety of less rigorous criteria. We have therefore added a line to the FamCard to explain the reasons for the assignment of the family to the clan.

    Trimming of trees for large subfamilies

    Subfamilies C1A and S1A contain more peptidases than we can reasonably display in the sequence alignments and trees. The method of selection of peptidases for the S1A tree is again as was described for the last Release, but as an experiment the C1A alignment and tree contain one sequence from each code, human or mammalian if possible, plus all of the peptidases in the subfamily that are not yet assigned to codes.

    Families coming and going

    A new family M61 has been established; the type example is the glycyl aminopeptidase of Sphingomonas capsulata.

    Previous releases of MEROPS have contained a family A13 in which we placed only the retrotransposon peptidase of Drosophila buzzatii. Homologues of the retrotransposon have now appeared that provide statistically significant links to members of family A2, and family A13 has therefore been closed. The divergent member that was A13.001 is now A02.054.

    Release 5.5 15-Jun-2001

    Introduction of "MEROPS-" names

    It seems obvious that any one peptidase is almost certain to occur in more than one organism. This means that a species-specific name like "zebra fish protease A" is not going to work for long, because we shall not know what to call the same enzyme when it turns up in a different fish, or wherever. Although this seems obvious, many peptidases are still named in just this way. As a result, we often find ourselves in the ludicrous position of listing things like "zebra fish protease A from tuna" in MEROPS. At last we decided that this could not go on, so we have started a system of temporary names to use until the scientific community, or perhaps the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology, produces more satisfactory names. Such a temporary name takes the form "MEROPS-AA001 peptidase" in which we simply increment the letters and numbers. We shall look forward to retiring each MEROPS name as soon as an acceptable new name appears (and then we shall not re-use it for anything else). You may notice such names in MEROPS.

    Introduction of subclans

    All the peptidases that are grouped together in a clan are believed to be derived from a single evolutionary ancestor. Nevertheless, some clans contain distinct groups that are so divergent that there is a clear need to recognize them, and we have therefore introduced the concept of "subclan". The identifier for a subclan is formed by adding a letter in parenthesis to the clan identifier. One clan split in this way is MA, representing the gluzincin and metzincin groups of HEXXH-containing metallopeptidase, MA(E) and MA(M), respectively. A second example is clan PA, in which the families of serine peptidases are placed in PA(S) and the cysteine peptidases in PA(C).

    Selection of sequences for alignments and trees

    A few families and even subfamilies are now so large that it is not practicable to include all the sequences we have in the alignment and tree. In the present release we have had to cut down the number of sequences in subfamily S1A, and what we have done is to use only the human sequences. This will probably be attractive to those who care only about the human peptidases anyway, but will be frustrating to anyone wanting to identify a protein from another species as an orthologue of a human peptidase. Perhaps we will use another set next time.

    A new search of the human and mouse ESTs

    The quantity of expressed sequence tags in the databases is increasing at an impressive rate, and we have made another set of searches of the human and mouse ESTs. We searched a total of about five million ESTs to retrieve about 60,000 that match peptidases, and these are presented in our new set of alignments. In a change from Release 5.4, the existence of EST alignments is now indicated by the number of ESTs found in a column in the "Peptidase by Identifier" index table.

    More informative EST data cards

    If you find the EST alignments useful, don't miss the Data cards (linked from the top of each alignment). Amongst the new things in this release are the live links to Unigene clusters. You will find that we do not always agree with the Unigene assignments, but the links will make it easy to compare.

    Clearer presentation of "unassigned" peptidases and homologues

    We know many peptidases only by their deduced amino acid sequences. The sequence allows us to assign a putative peptidase to a family and often a subfamily, but unless it is closely similar to that of a peptidase that has been characterised biochemically, we commonly cannot assign the peptidase to a specific MEROPS identifier. This means that we are left with numbers of unassigned peptidases, as well as unassigned non-peptidase homologues, in many families. We have now introduced a new style of data card better suited to presenting the information about unassigned peptidases. Suggestions for further improvements in these or any other aspect of MEROPS will always be most welcome.

    Inclusion of LocusLinks

    We have started to include links to the valuable LocusLinks resource at NCBI. These appear in the Human Genetics section of the peptidase Summary cards, and in the Species card for Homo sapiens.

    Better Downloads table

    The Downloads table that you can use to fetch peptidase sequences for your own work now contains comments (in GCG format) giving the family and organism (NCBI Taxonomy code) for each accession number. This makes it easy to filter the lines (perhaps using a few lines of Perl) so that you could import only the sequences for family S1, say, or only human sequences. The Taxonomy identifier can be found at the top of the card for each organism, e.g. "9606" for Homo sapiens.

    Release 5.4 23-Mar-2001

    New format for Sequence pages

    We have reorganized the sequence pages to make them smaller, faster to load and easier to use. The links are no longer divided between protein sequence and nucleic acid sequence tables, but combined in one table. This has allowed us to associate each TrEMBL database entry with its corresponding nucleic acid sequence database entry. The re-organisation has yet to be implemented with the PIR database entries, so the PIR database links are temporarily suspended from this release of MEROPS, but will return soon.

    Inclusion of mouse EST alignments

    In Release 5.3 we introduced alignments of human ESTs for peptidases, and we now have pleasure in adding alignments of mouse ESTs in the same format. We anticipate that users will find these helpful in the identification of novel homologues of known peptidases, and in recognising polymorphisms and splice variants.  We are working hard to keep the alignments up to date as new ESTs are added to the databases at a huge rate, so please take a look at the alignments for your favourite peptidases in case important new information has appeared since the last Release.

    Again, many new peptidases are recognised, and some additional families and clans

    New families: C55 (clan CE), type example YopJ protease (Yersinia pseudotuberculosis), C56 (clan CJ) moved from U46, type example PfpI endopeptidase (Pyrococcus furiosus), C57 (clan CE), type example I7 processing peptidase (Vaccinia virus) and M55 type example D-aminopeptidase DppA (Bacillus subtilis).

    New clans: CJ containing family C56, type example PfpI endopeptidase (Pyrococcus furiosus); CK containing family C26, type example gamma-glutamyl hydrolase (Rattus norvegicus), and SN containing family S51, type example dipeptidase E (Escherichia coli).

    New format for Literature files

    We hope that you will find the Literature files easier to scan, with the inclusion of Year headings and titles in blue.

    Addition of two dimensional structures

    We feel that the essence of a molecular structure can sometimes be captured most easily in a two-dimensional representation.  In particular, two-dimensional representations reflect in a very simple way the similarities in protein fold that are so valuable in detecting the distant relationships at the clan level.  The order of catalytic residues can also be shown well in this format.  So we now provide a depiction of the two-dimensional structure with each three-dimensional structure image, and have assembled new pages of two-dimensional structures at the Family and Clan levels.

    Release 5.3 4-Dec-2000

    Inclusion of a "specificity" search

    We have collected a good deal of data for the specificity of peptidases, and made it available to several kinds of search functions. The searches are now on the "Searches" menu. Of course, one could never have enough specificity data, and if anyone would like to contribute some of their own published data to be included, we should be happy to hear from them.

    Addition of alignments of human ESTs

    We developed a system for screening the human EST collection, initially in order to find novel homologues to follow up in our wet lab. As a result, we now have more novel peptidases to clone and sequence than we can handle, so rather than just sitting on the data we decided to share it. The existence of an alignment of human ESTs for a particular peptidase in MEROPS is indicated by a red "EST" tag in the Identifier index. Then, access from the PepCard is via the red button. Each alignment offers a link to the table of EST data, which includes the Unigene cluster assignments of the ESTs. Coming soon: the same for mouse ESTs!

    Release 5.2 31-Aug-2000

    Identifiers assigned to "multipeptidases"

    Our classification is essentially one of "peptidase units", and when a peptidase molecule contains several different kinds of peptidase units it makes problems for us: no one location in the classification is right for the whole of such a multipeptidase molecule. The proteasome is an obvious example, but there are half a dozen others. We now use a MEROPS identifier starting in "X" for each of the multipeptidases, and reserve the standard codes for the individual peptidase units. For example, the somatic form of peptidyl-dipeptidase A (angiotensin-converting enzyme) is X06.001, and its two peptidase units are M02.001 and M02.004. There is a Pepcard for X06.001 including a Literature button, in addition to the standard Pepcards for M02.001 and M02.004. If you care to know which other multipeptidases we recognize, look under "X" in the alphabetical index of MEROPS identifiers.

    Pepcards provided for unsequenced peptidases

    Until a peptidase is sequenced, we cannot classify it as we would wish, but nevertheless, several of the peptidases that are yet to be sequenced are of real interest, and as a way to display more information about them, we now include Pepcards for them.

    Release 5.1 15-Jun-2000

    Access arrangements

    Anyone reading this has obviously obtained access to the database, but if you know of anyone who is not able to get access as they wish following the recent changes, please encourage them to contact merops@sanger.ac.uk. We shall do what we can to help.

    New alignments for families of peptidase units

    We now have a new system for generating the sequence alignments and evolutionary trees. We trust that you will find the larger number of these pages, and their new style, helpful.

    Release 5.0 3-Apr-2000

    MEROPS receives approval of ISI!

    We were happy to receive the message:

    "You are publishing important, high-quality material on the Web. For this reason, ISI has selected your site for inclusion in Current Web Contents, a new section of Current Contents ConnectTM (CC Connect TM ). ISI editors -- following carefully structured evaluation criteria -- have visited your site, reviewed it, developed a standardized descriptive record, written an abstract and created a link from CC Connect to your site."

    A start on search facilities

    We have wanted to add search facilities to MEROPS for some time, and with Release 5 we have made a start on this. Let us know what other searches you think we might usefully provide. (Please do not ask for a specificity search, though! This is not feasible at the present time, and we can only refer you to the search function on the CD-ROM of the Handbook of Proteolytic Enzymes).

    Lots of new data!

    Amongst the new data are those for the 450 or so peptidases in the genome of Drosophila melanogaster just completed. There are now over 4000 names in the index of protease names.

    Literature files

    With Release 5 we have made a start on including concise reference lists for all the proteases, families and clans. Many of the references are linked to Medline. This is a large job, but we aim to finish it soon.

    Release 4.0 27-Jan-2000

    Peptidase/Protease - what's in a name?

    "Protease" and "peptidase" are synonymous terms applying to all enzymes that hydrolyze peptide bonds, i.e. proteolytic enzymes. In previous releases of the MEROPS database we have used the term "peptidase" rather than "protease" to describe what it is about. This was because "peptidase" is the term recommended by IUBMB and is familiar as the basis of all the names of subgroups of these enzymes: endopeptidase, exopeptidase, aminopeptidase, etc. However, it has become clear that the majority of scientists working on these enzymes most naturally think of them as proteases, so we have decided to use that more familiar term here also.

    The new URL

    Things are changing at MEROPS! It would be unrealistic for us to expect the public funding of the database that we have enjoyed for several years to continue indefinitely, so we are asking the many commercial users of the database to pay a modest license fee. This will allow it to remain free for academic users, and will enable us to expand our resources somewhat to do justice to the exciting genomic data on proteases that are going to be with us very soon. At this stage, we would only ask you to note the new URL: "merops.co.uk/merops/merops.htm". The old URL will continue to work for a time, but it is due to be phased out in the not too distant future, so it would be smart to change the bookmark in your browser now. Stay tuned for further developments.

    Disappearance of clan MB

    During 1999, clan MB disappeared from the MEROPS classification with no explanation being provided in the database. This confused a number of people, and we are sorry about that. The explanation is that we decided that we should merge clan MB into clan MA. According to our definition of a clan, all families that we suspect of having a common evolutionary origin should be contained within a single clan. We do not feel that the presence of the "HEXXH" sequence or even the more extended motif in which it normally occurs is in itself strong enough evidence of common ancestry to justify putting all of the HEXXH families into a single clan. But as more three-dimensional structures became available, we saw that there is a characteristic arrangement of beta-strands located N-terminally to the zinc-binding site in both "gluzincin" and "metzincin" families. This can be seen in many images provided in the database, and examples would be pseudolysin, astacin and snapalysin. The existence of this common structural motif left us in no significant doubt that these families did indeed have a common origin, despite other differences including the third zinc ligands. Accordingly, the families of clan MB were moved into clan MA, and clan MB disappeared, never to be seen again.

    Species cards

    We have improved the display of peptidases grouped by source organism. There is now a document that we term a "SpecCard" for each of the over 1000 organisms from which a peptidase has been sequenced. Each SpecCard shows an abbreviated taxonomy for the organism and a list of its known peptidases. The taxonomy is derived from the Taxonomy database at NIH, but with some minor modifications of our own, particularly for the higher taxa in the bacteria and viruses. The table of peptidases gives the clan, family, MEROPS identifier, recommended name and gene symbol for each peptidase. Click on a MEROPS identifier to find the PepCard listing sequences and more.