What is ArrayExpress?

This course has now been archived, please see the updated course ArrayExpress in BioStudies: Quick tour.

ArrayExpress is one of the major public repositories for functional genomics datasets. Most of the data is genome-wide gene expression data, measured on microarray or next-generation sequencing (NGS) platforms. A range of DNA assays are also hosted by ArrayExpress, such as ChIP-seq or genotyping.

The main object in ArrayExpress is the experiment. An experiment usually groups several assays belonging to one study or publication. Each experiment contains metadata describing the biological specimen and experimental procedures, as well as resulting data files (Figure 1). The definition of an assay depends on the experiment type. For microarray experiments an assay represents one hybridisation (of biological sample material to an array). For NGS experiments an assay is the read-out (sequencing) of one library.

Figure 1 Overview of data in ArrayExpress.

All data and files in ArrayExpress are provided by the user, either submitted directly or imported from other databases, such as Gene Expression Omnibus (GEO) at NCBI. Directly submitted datasets are manually curated to promote compliance with the MIAME and MINSEQE guidelines. These minimal information standards support the sharing and reuse of scientific data.