Germline Variant Filtration

Implementation of the variant_filtration step

This step takes annotated variants as the input from variant_annotation and performs various filtration and postprocessing operations:

  1. filter to high-confidence variants
    1. apply quality filter sets

    2. filter for consistency between different callers

  2. filter to compatible mode of inheritance

  3. filter by population/cohort frequency, remove polymorphisms

  4. filter by region

  5. filter by scores (e.g., conservation)

  6. filter for het. comp. inheritance or keep all

#

#     1
#     stringent
#     loose

# 2 # $qual.denovo # $qual.dom # $qual.rec_hom

# 3 # $qual.denovo.denov_freq # $qual.dom.dom_freq # $qual.dom.rec_freq # $qual.rec_hom.rec_freq

# 4 # $qual.denovo.denov_freq.$region # $qual.dom.dom_freq.$region # $qual.dom.rec_freq.$region # $qual.rec_hom.rec_freq.$region

# 5 # $qual.denovo.denov_freq.$region.$scores # $qual.dom.dom_freq.$region.$scores # $qual.dom.rec_freq.$region.$scores # $qual.rec_hom.rec_freq.$region.$scores

# 6 # $qual.denovo.denov_freq.$region.keep_all # $qual.dom.dom_freq.$region.keep_all # $qual.dom.rec_freq.$region.$scores.same_gene # $qual.dom.rec_freq.$region.$scores.same_tad # $qual.dom.rec_freq.$region.$scores.itv_500bp # $qual.rec_hom.rec_freq.$region.keep_all

Filtration Steps

The combinations of the filters is given in the configuration setting filter_combinations as dot-separated values, e.g., AA.BB.CC.

Step Input

TODO

Step Output

TODO

Global Configuration

TODO

Default Configuration

The default configuration is as follows.

step_config:
  variant_filtration:
    #path_variant_annotation: ../variant_annotation
    #
    # defaults to ngs_mapping tool
    #tools_ngs_mapping: []
    #
    # defaults to variant_annotation tool
    #tools_variant_calling: []
    #
    # quality filter sets, "keep_all" implicitly defined
    #thresholds:
    #  conservative:
    #    min_gq: 40
    #    min_dp_het: 10
    #    min_dp_hom: 5
    #    include_expressions:
    #      - "'MEDGEN_COHORT_INCONSISTENT_AC=0'"
    #  relaxed:
    #    min_gq: 20
    #    min_dp_het: 6
    #    min_dp_hom: 3
    #    include_expressions:
    #      - "'MEDGEN_COHORT_INCONSISTENT_AC=0'"
    #frequencies:
    #
    #  # AF (allele frequency) values
    #  af_dominant: 0.001
    #
    #  # AF (allele frequency) values
    #  af_recessive: 0.01
    #
    #  # AC (allele count in gnomAD) values
    #  ac_dominant: 3
    #
    # regions to filter to, "whole_genome" implicitly defined
    #region_beds: {}                                   # Examples: {'all_tads': '/fast/projects/medgen_genomes/static_data/GRCh37/hESC_hg19_allTads.bed', 'all_genes': '/fast/projects/medgen_genomes/static_data/GRCh37/gene_bed/ENSEMBL_v75.bed.gz', 'limb_tads': '/fast/projects/medgen_genomes/static_data/GRCh37/newlimb_tads.bed', 'lifted_enhancers': '/fast/projects/medgen_genomes/static_data/GRCh37/all_but_onlyMB.bed', 'vista_enhancers': '/fast/projects/medgen_genomes/static_data/GRCh37/vista_limb_enhancers.bed'}
    #score_thresholds:
    #  coding:
    #    require_coding: true
    #    require_gerpp_gt2: false
    #    min_cadd:
    #  conservative:
    #    require_coding: false
    #    require_gerpp_gt2: false
    #    min_cadd: 0
    #  conserved:
    #    require_coding: false
    #    require_gerpp_gt2: true
    #    min_cadd:
    #
    # dot-separated {thresholds}.{inherit}.{freq}.{region}.{score}.{het_comp}
    #filter_combinations: []                           # Examples: conservative.de_novo.dominant_freq.lifted_enhancers.all_scores.passthrough, conservative.de_novo.dominant_freq.lifted_enhancers.conserved.passthrough, conservative.de_novo.dominant_freq.limb_tads.all_scores.passthrough, conservative.de_novo.dominant_freq.limb_tads.coding.passthrough, conservative.de_novo.dominant_freq.limb_tads.conserved.passthrough, conservative.de_novo.dominant_freq.vista_enhancers.all_scores.passthrough, conservative.de_novo.dominant_freq.vista_enhancers.conserved.passthrough, conservative.de_novo.dominant_freq.whole_genome.all_scores.passthrough, conservative.de_novo.dominant_freq.whole_genome.coding.passthrough, conservative.de_novo.dominant_freq.whole_genome.conserved.passthrough, conservative.dominant.dominant_freq.lifted_enhancers.all_scores.passthrough, conservative.dominant.dominant_freq.lifted_enhancers.conserved.passthrough, conservative.dominant.dominant_freq.limb_tads.all_scores.passthrough, conservative.dominant.dominant_freq.limb_tads.coding.passthrough, conservative.dominant.dominant_freq.limb_tads.conserved.passthrough, conservative.dominant.dominant_freq.vista_enhancers.all_scores.passthrough, conservative.dominant.dominant_freq.vista_enhancers.conserved.passthrough, conservative.dominant.dominant_freq.whole_genome.all_scores.passthrough, conservative.dominant.dominant_freq.whole_genome.coding.passthrough, conservative.dominant.dominant_freq.whole_genome.conserved.passthrough, conservative.dominant.recessive_freq.lifted_enhancers.all_scores.intervals500, conservative.dominant.recessive_freq.lifted_enhancers.conserved.intervals500, conservative.dominant.recessive_freq.lifted_enhancers.conserved.tads, conservative.dominant.recessive_freq.limb_tads.all_scores.intervals500, conservative.dominant.recessive_freq.limb_tads.coding.gene, conservative.dominant.recessive_freq.limb_tads.conserved.intervals500, conservative.dominant.recessive_freq.limb_tads.conserved.tads, conservative.dominant.recessive_freq.vista_enhancers.all_scores.intervals500, conservative.dominant.recessive_freq.vista_enhancers.conserved.intervals500, conservative.dominant.recessive_freq.vista_enhancers.conserved.tads, conservative.dominant.recessive_freq.whole_genome.all_scores.intervals500, conservative.dominant.recessive_freq.whole_genome.coding.gene, conservative.dominant.recessive_freq.whole_genome.conserved.intervals500, conservative.dominant.recessive_freq.whole_genome.conserved.tads, conservative.recessive_hom.recessive_freq.lifted_enhancers.all_scores.passthrough, conservative.recessive_hom.recessive_freq.lifted_enhancers.conserved.passthrough, conservative.recessive_hom.recessive_freq.limb_tads.all_scores.passthrough, conservative.recessive_hom.recessive_freq.limb_tads.coding.passthrough, conservative.recessive_hom.recessive_freq.limb_tads.conserved.passthrough, conservative.recessive_hom.recessive_freq.vista_enhancers.all_scores.passthrough, conservative.recessive_hom.recessive_freq.vista_enhancers.conserved.passthrough, conservative.recessive_hom.recessive_freq.whole_genome.all_scores.passthrough, conservative.recessive_hom.recessive_freq.whole_genome.coding.passthrough, conservative.recessive_hom.recessive_freq.whole_genome.conserved.passthrough, conservative.dominant.recessive_freq.whole_genome.coding.passthrough, conservative.dominant.recessive_freq.whole_genome.conserved.passthrough

Reports

Currently, no reports are generated.