Germline Variant Filtration
Implementation of the variant_filtration
step
This step takes annotated variants as the input from variant_annotation
and performs various
filtration and postprocessing operations:
- filter to high-confidence variants
apply quality filter sets
filter for consistency between different callers
filter to compatible mode of inheritance
filter by population/cohort frequency, remove polymorphisms
filter by region
filter by scores (e.g., conservation)
filter for het. comp. inheritance or keep all
#
# 1
# stringent
# loose
# 2 # $qual.denovo # $qual.dom # $qual.rec_hom
# 3 # $qual.denovo.denov_freq # $qual.dom.dom_freq # $qual.dom.rec_freq # $qual.rec_hom.rec_freq
# 4 # $qual.denovo.denov_freq.$region # $qual.dom.dom_freq.$region # $qual.dom.rec_freq.$region # $qual.rec_hom.rec_freq.$region
# 5 # $qual.denovo.denov_freq.$region.$scores # $qual.dom.dom_freq.$region.$scores # $qual.dom.rec_freq.$region.$scores # $qual.rec_hom.rec_freq.$region.$scores
# 6 # $qual.denovo.denov_freq.$region.keep_all # $qual.dom.dom_freq.$region.keep_all # $qual.dom.rec_freq.$region.$scores.same_gene # $qual.dom.rec_freq.$region.$scores.same_tad # $qual.dom.rec_freq.$region.$scores.itv_500bp # $qual.rec_hom.rec_freq.$region.keep_all
Filtration Steps
The combinations of the filters is given in the configuration setting filter_combinations
as dot-separated values, e.g., AA.BB.CC
.
Step Input
TODO
Step Output
TODO
Global Configuration
TODO
Default Configuration
The default configuration is as follows.
step_config:
variant_filtration:
path_variant_annotation: ../variant_annotation
tools_ngs_mapping: null # defaults to ngs_mapping tool
tools_variant_calling: null # defaults to variant_annotation tool
thresholds: # quality filter sets, "keep_all" implicitely defined
conservative:
min_gq: 40
min_dp_het: 10
min_dp_hom: 5
include_expressions:
- 'MEDGEN_COHORT_INCONSISTENT_AC=0'
relaxed:
min_gq: 20
min_dp_het: 6
min_dp_hom: 3
include_expressions:
- 'MEDGEN_COHORT_INCONSISTENT_AC=0'
frequencies: # values to use for frequency filtration
af_dominant: 0.001 # AF (allele frequency) values
af_recessive: 0.01
ac_dominant: 3 # AC (allele count in gnomAD) values
region_beds: # regions to filter to, "whole_genome" implicitely defined
all_tads: /fast/projects/medgen_genomes/static_data/GRCh37/hESC_hg19_allTads.bed
all_genes: /fast/projects/medgen_genomes/static_data/GRCh37/gene_bed/ENSEMBL_v75.bed.gz
limb_tads: /fast/projects/medgen_genomes/static_data/GRCh37/newlimb_tads.bed
lifted_enhancers: /fast/projects/medgen_genomes/static_data/GRCh37/all_but_onlyMB.bed
vista_enhancers: /fast/projects/medgen_genomes/static_data/GRCh37/vista_limb_enhancers.bed
score_thresholds: # thresholds on scores to filter to, "all_scores" implictely defined
coding:
require_coding: true
require_gerpp_gt2: false
min_cadd: null
conservative: # unused; TODO: rename?
require_coding: false
require_gerpp_gt2: false
min_cadd: 0
conserved: # TODO: rename?
require_coding: false
require_gerpp_gt2: true
min_cadd: null
filter_combinations: # dot-separated {thresholds}.{inherit}.{freq}.{region}.{score}.{het_comp}
- conservative.de_novo.dominant_freq.lifted_enhancers.all_scores.passthrough
- conservative.de_novo.dominant_freq.lifted_enhancers.conserved.passthrough
- conservative.de_novo.dominant_freq.limb_tads.all_scores.passthrough
- conservative.de_novo.dominant_freq.limb_tads.coding.passthrough
- conservative.de_novo.dominant_freq.limb_tads.conserved.passthrough
- conservative.de_novo.dominant_freq.vista_enhancers.all_scores.passthrough
- conservative.de_novo.dominant_freq.vista_enhancers.conserved.passthrough
- conservative.de_novo.dominant_freq.whole_genome.all_scores.passthrough
- conservative.de_novo.dominant_freq.whole_genome.coding.passthrough
- conservative.de_novo.dominant_freq.whole_genome.conserved.passthrough
- conservative.dominant.dominant_freq.lifted_enhancers.all_scores.passthrough
- conservative.dominant.dominant_freq.lifted_enhancers.conserved.passthrough
- conservative.dominant.dominant_freq.limb_tads.all_scores.passthrough
- conservative.dominant.dominant_freq.limb_tads.coding.passthrough
- conservative.dominant.dominant_freq.limb_tads.conserved.passthrough
- conservative.dominant.dominant_freq.vista_enhancers.all_scores.passthrough
- conservative.dominant.dominant_freq.vista_enhancers.conserved.passthrough
- conservative.dominant.dominant_freq.whole_genome.all_scores.passthrough
- conservative.dominant.dominant_freq.whole_genome.coding.passthrough
- conservative.dominant.dominant_freq.whole_genome.conserved.passthrough
- conservative.dominant.recessive_freq.lifted_enhancers.all_scores.intervals500
- conservative.dominant.recessive_freq.lifted_enhancers.conserved.intervals500
- conservative.dominant.recessive_freq.lifted_enhancers.conserved.tads
- conservative.dominant.recessive_freq.limb_tads.all_scores.intervals500
- conservative.dominant.recessive_freq.limb_tads.coding.gene
- conservative.dominant.recessive_freq.limb_tads.conserved.intervals500
- conservative.dominant.recessive_freq.limb_tads.conserved.tads
- conservative.dominant.recessive_freq.vista_enhancers.all_scores.intervals500
- conservative.dominant.recessive_freq.vista_enhancers.conserved.intervals500
- conservative.dominant.recessive_freq.vista_enhancers.conserved.tads
- conservative.dominant.recessive_freq.whole_genome.all_scores.intervals500
- conservative.dominant.recessive_freq.whole_genome.coding.gene
- conservative.dominant.recessive_freq.whole_genome.conserved.intervals500
- conservative.dominant.recessive_freq.whole_genome.conserved.tads
- conservative.recessive_hom.recessive_freq.lifted_enhancers.all_scores.passthrough
- conservative.recessive_hom.recessive_freq.lifted_enhancers.conserved.passthrough
- conservative.recessive_hom.recessive_freq.limb_tads.all_scores.passthrough
- conservative.recessive_hom.recessive_freq.limb_tads.coding.passthrough
- conservative.recessive_hom.recessive_freq.limb_tads.conserved.passthrough
- conservative.recessive_hom.recessive_freq.vista_enhancers.all_scores.passthrough
- conservative.recessive_hom.recessive_freq.vista_enhancers.conserved.passthrough
- conservative.recessive_hom.recessive_freq.whole_genome.all_scores.passthrough
- conservative.recessive_hom.recessive_freq.whole_genome.coding.passthrough
- conservative.recessive_hom.recessive_freq.whole_genome.conserved.passthrough
# The following are for input to variant_combination.
- conservative.dominant.recessive_freq.whole_genome.coding.passthrough
- conservative.dominant.recessive_freq.whole_genome.conserved.passthrough
Reports
Currently, no reports are generated.