What are genome wide association studies (GWAS)?

Genome wide association studies (GWAS) are hypothesis-free methods for identifying associations between genetic regions (loci) and traits (including diseases).

It has long been known that genetic variation between individuals can cause differences in phenotypes. These causal variants, and those which are tightly linked to their region of the chromosome, are therefore present at higher frequency in cases (individuals with the trait) than in controls (individuals without the trait) (Figure 2).

Figure 2 Diagram to show typical allele distribution which GWAS seek to identify.
Figure 2 Diagram to show typical allele distribution which GWAS seek to identify. 

A typical GWAS study collects data to find out the common variants in a number of individuals, both with and without a common trait (e.g. a disease), across the genome, using genome wide SNP arrays. Variants associated with the disease, or within the same haplotype as a variant associated with a disease, will be found at a higher frequency in cases than in controls. Statistical analysis is carried out to indicate how likely a variant is to be associated with a trait.

As GWAS analyse common variants, usually typed on commercial SNP arrays (Figure 3), they do not generally identify causal variants. GWAS identify common variants which tag a region of linkage disequilibrium (LD) containing causal variant(s). Additional or follow-on studies are usually required to narrow the region of association and identify the causal variant. Find out more about the theory and background of genetic variation here.

Figure 3 Diagram to show the identification of alternative variants in cases and controls using an array-based typing method. Results are subject to statistical analyses to assign a p-value to each variant.
Figure 3 Diagram to show the identification of alternative variants in cases and controls using an array-based typing method. Results are subject to statistical analyses to assign a p-value to each variant. 

A p-value indicates the significance of the difference in frequency of the allele tested between cases and controls i.e. the probability that the allele is likely to be associated with the trait. GWAS results are often displayed in a Manhattan plot (Figure 3) with -log10 (p-value) plotted against the position in the genome.

The GWAS Catalog is a structured repository which provides summary data from all published human GWAS studies, in a consistent, searchable format.

You can learn about the theory behind GWAS in more detail in a resource by Gill McVean, and in a book chapter written by Bush and Moore.