The G2P VEP plugin identifies likely disease causing genes based on the knowledge encoded in the G2P database and runs as part of the Variant Effect Predictor (VEP).
Ensembl VEP predicts the molecular consequence of a variant and reports further optional annotation.
If the input file contains variant data for a set of individuals the VEP generates one line of output for each pair of variant allele and overlapping transcript per individual.The G2P VEP plugin adds further annotation to the line of output based on the individual's genotypes and the knowledge contained in the G2P database. The G2P VEP plugin uses a set of filters for identifying potentially causal variants. If the plugin counts a sufficient number of causal variants (variant hits) for a G2P gene it will report the gene as likely disease causing and all variants that passed the filters. The number of sufficient causal variants is derived from the allelic requirement of the gene which is stored in the G2P database.
By default the plugin adds certain information to the VEP output individual information, gene symbol or HGNC id, global allele frequency data from 1000 Genomes Phase 3 data for any colocated variant, SIFT predictions, Polyphen-2 predictions.
The plugin by default also checks for existing variants that are colocated with the given variants and will exclude those flagged as failed by Ensembl QC checks.
Consider the variant as potentially causal if the variant passes all filtering steps.
The sufficient number of variant hits is determined by the gene's allelic requirement.
G2P supports biallelic_autosomal, monoallelic_autosomal, mitochondrial, monoallelic_Y_hem, monoallelic_X_hem, monoallelic_X_het, monoallelic_PAR, biallelic_PAR as an allelic requirement. To ensure compatibility with our old terminologies, we still support the allelic requirements, monoallelic, biallelic, hemizygous, x-linked dominant, x-linked dominance.
Gene classification | G2P allelic requirement | Filtering rules |
---|---|---|
biallelic | A count of at least 2 heterozygous variants or 1 homozygous variants which passes all other filtering rules
af => 0.005, rules => {HET => 2, HOM => 1} |
|
monoallelic | A count of 1 heterozygous variants or 1 homozygous variants which passes all other filtering rules
af => 0.0001, rules => {HET => 1, HOM => 1} |
For installation and running the VEP script please refer to the VEP GitHub repository and VEP documentation pages. Plugins are installed and configured during the VEP installation. The G2P VEP plugin is located in the VEP plugins repository.
The file to be used for running G2P plugin is the panel file from G2P or PanelApp. The plugin can not be run without the file.
Options are passed to the plugin as key=value pairs
Key | Description | Input or Default value | Output |
---|---|---|---|
file | Path to G2P data file. The file needs to be uncompressed. - Download from http://www.ebi.ac.uk/gene2phenotype/downloads - Download from PanelApp |
The plugin can not run without this data file. | Data from this file is used in the filtering process. The text output and html output are also annotated with data from this file |
af_monoallelic | maximum allele frequency for inclusion for monoallelic genes | 0.0001 | A different value can be used by ./vep -i input.vcf --plugin G2P,file='DDG2P.csv',af_monoallelic=0.00001 |
af_biallelic | maximum allele frequency for inclusion for biallelic genes | 0.005 | A different value can be used by ./vep -i input.vcf --plugin G2P,file='DDG2P.csv',af_biallelic=0.05 |
confidence_levels | We still support confidence levels of our old terminology. Confidence levels to include: definitive, strong, limited, moderate, confirmed, probable, possible, both RD and IF. Separate multiple values with '&'. https://www.ebi.ac.uk/gene2phenotype/terminology |
Supported values: definitive, strong, moderate, confirmed, probable, limited. By default the plugin reports: definitive, strong, moderate, confirmed, probable | Confidence levels are used to determine which genes are used in the filtering process. The G2P confidence levels is reported in the HTML and text output. Some G2P entries have the flag "Requires clinical review", this is reported in the HTML and text output to show careful consideration of the results are required |
all_confidence_levels | Set value to 1 to include all confidence levels: definitive, strong, limited, moderate, confirmed, probable and possible | 0 | |
af_from_vcf | set value to 1 to include allele frequencies from VCF files. The location of the VCF file is configured in ensembl-variation/modules/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json or ensembl-vep/Bio/EnsEMBL/Variation/DBSQL/vcf_config.json depending on how the ensembl-variation API was installed | 0 | This option can be used to filter against population frequency sets (UK10K and TOPMed) which are not in the Ensembl VEP reference data cache but for which VCF files are available. Filtering using additional VCF files takes more time than using the VEP cache only. |
af_from_vcf_keys | Select VCF collections. Separate multiple values with '&'. Should be only be used if option af_from_vcf is used. | VCF collections presently supported are - uk10k (assembly GRCh37 and GRCh38), topmed (assembly GRCh37 and GRCh38). | The VCF collection specified are used in the filtering process, to determine maximum allele frequency. For example, if the variants in gnomADg_v3.1.2 has an allele frequency higher than the frequency specified for the G2P gene, it is excluded. |
variant_include_list | A list of variants to include even if variants do not pass allele frequency filtering. The include list needs to be a sorted, bgzipped and tabixed VCF file. | ||
types | SO consequence types to include. Separate multiple values with '&'. | splice_donor_variant, splice_acceptor_variant, stop_gained, frameshift_variant, stop_lost, initiator_codon_variant, inframe_insertion, inframe_deletion, missense_variant, coding_sequence_variant, start_lost, transcript_ablation, transcript_amplification, protein_altering_variant | |
log_dir | The log_dir is required to store log_files which are used for writting intermediate results. The log_dir should be empty. The log_files can be consulted for any frequency filtering decisions. | current_working_dir/g2p_log_dir_[year]_[mon]_[mday]_[hour]_[min]_[sec] | log_dir contains information of gene and variants that did not pass all the filtering rules. |
txt_report | Write all G2P complete genes and attributes to txt file | current_working_dir/txt_report_[year]_[mon]_[mday]_[hour]_[min]_[sec].txt | The G2P plugin output that contains a summary report of genes passing VEP-G2P filtering |
html_report | Write all G2P complete genes and attributes to html file | current_working_dir/html_report_[year]_[mon]_[mday]_[hour]_[min]_[sec].html | The G2P plugin output that contains a summary report of genes passing VEP-G2P filtering for visualization in a web browser. |
filter_by_gene_symbol | The plugin by default filters by HGNC ID using G2P panel files. Set this option to 1 to filter by gene symbol | 0 | This is the default option using PanelApp files. |
only_mane | The plugin by default filters every transcript. This option is set to 1 to ensure filtering of only MANE transcripts | 0 | Information may be lost using this option. |
The G2P plugin filters input variants on allele frequencies. The allele frequencies are retrieved from major genotyping projects like the 1000 Genomes project and gnomAD. The VEP provides a cache which contains allele frequencies in order to speed up the variant annotation.
To use the VCF file for filtering, the G2P plugin option af_from_vcf needs to be set to 1.
./vep -i input.vcf --plugin G2P,file='DDG2P.csv,af_from_vcf=1'
reference population short name | description | source |
---|---|---|
minor_allele_freq | global allele frequency (AF) from 1000 Genomes Phase 3 data | VEP cache |
AA | Exome Sequencing Project 6500:African_American | VEP cache |
AFR | 1000GENOMES:phase_3:AFR | VEP cache |
AMR | 1000GENOMES:phase_3:AMR | VEP cache |
EA | Exome Sequencing Project 6500:European_American | VEP cache |
EAS | 1000GENOMES:phase_3:EAS | VEP cache |
EUR | 1000GENOMES:phase_3:EUR | VEP cache |
SAS | 1000GENOMES:phase_3:SAS | VEP cache |
gnomADe | Genome Aggregation Database:Total | VEP cache and VCF file. |
gnomADe:afr | Genome Aggregation Database exomes r2.1:African/African American | VEP cache and VCF file |
gnomADe:amr | Genome Aggregation Database exomes r2.1:Latino | VEP cache and VCF file |
gnomADe:asj | Genome Aggregation Database exomes r2.1:Ashkenazi Jewish | VEP cache and VCF file |
gnomADe:eas | Genome Aggregation Database exomes r2.1:East Asian | VEP cache and VCF file |
gnomADe:fin | Genome Aggregation Database exomes r2.1:Finnish | VEP cache and VCF file |
gnomADe:NFE | Genome Aggregation Database exomes r2.1:Non-Finnish European | VEP cache and VCF file |
gnomADe:oth | Genome Aggregation Database exomes r2.1:Other (population not assigned) | VEP cache and VCF file |
gnomADe:SAS | Genome Aggregation Database exomes r2.1:South Asian | VEP cache and VCF file |
gnomADg:ALL | Genome Aggregation Database genomes v3:All gnomAD genomes individuals | VEP Cache and VCF file |
gnomADg:afr | Genome Aggregation Database genomes v3:African/African American | VEP Cache and VCF file |
gnomADg:ami | Genome Aggregation Database genomes v3:Amish | VEP Cache and VCF file |
gnomADg:amr | Genome Aggregation Database genomes v3:Latino/Admixed American | VEP Cache and VCF file |
gnomADg:asj | Genome Aggregation Database genomes v3:Ashkenazi Jewish | VEP Cache and VCF file |
gnomADg:eas | Genome Aggregation Database genomes v3:East Asian | VEP Cache and VCF file |
gnomADg:fin | Genome Aggregation Database genomes v3:Finnish | VEP Cache and VCF file |
gnomADg:nfe | Genome Aggregation Database genomes v3:Non-Finnish European | VEP Cache and VCF file |
gnomADg:eas | Genome Aggregation Database genomes v3:South Asian | VEP Cache and VCF file |
gnomADg:oth | Genome Aggregation Database genomes v3:Other (population not assigned) | VEP Cache and VCF file |
TOPMed | Trans-Omics for Precision Medicine (TOPMed) Program | VCF file |
ALSPAC | UK10K:ALSPAC cohort | VCF file |
TWINSUK | UK10K:TWINSUK cohort | VCF file |
The G2P VEP plugin accepts PanelApp data files as input. We use the following mappings to translate between the terminologies used by G2P and PanelApp.
G2P | PanelApp |
---|---|
G2P confidence | Gene Ratings |
Definitive | Green |
Strong | Amber |
Moderate | Amber |
Limited | Red |
Allelic requirement | Model of inheritance from PanelApp |
MONOALLELIC, autosomal or pseudoautosomal, not imprinted | |
MONOALLELIC, autosomal or pseudoautosomal, maternally imprinted (paternal allele expressed) | |
MONOALLELIC, autosomal or pseudoautosomal, paternally imprinted (maternal allele expressed) | |
MONOALLELIC, autosomal or pseudoautosomal, imprinted status unknown | |
BOTH monoallelic and biallelic, autosomal or pseudoautosomal | |
BOTH monoallelic and biallelic (but BIALLELIC mutations cause a more SEVERE disease form), autosomal or pseudoautosomal | |
BIALLELIC, autosomal or pseudoautosomal | |
BOTH monoallelic and biallelic, autosomal or pseudoautosomal | |
BOTH monoallelic and biallelic (but BIALLELIC mutations cause a more SEVERE disease form), autosomal or pseudoautosomal | |
X-LINKED: hemizygous mutation in males, biallelic mutations in females | |
|
X-LINKED: hemizygous mutation in males, monoallelic mutations in females may cause disease (may be less severe, later onset than males) |