Germline Variant Filtration

Implementation of the variant_filtration step

This step takes annotated variants as the input from variant_annotation and performs various filtration and postprocessing operations:

  1. filter to high-confidence variants
    1. apply quality filter sets

    2. filter for consistency between different callers

  2. filter to compatible mode of inheritance

  3. filter by population/cohort frequency, remove polymorphisms

  4. filter by region

  5. filter by scores (e.g., conservation)

  6. filter for het. comp. inheritance or keep all

#

#     1
#     stringent
#     loose

# 2 # $qual.denovo # $qual.dom # $qual.rec_hom

# 3 # $qual.denovo.denov_freq # $qual.dom.dom_freq # $qual.dom.rec_freq # $qual.rec_hom.rec_freq

# 4 # $qual.denovo.denov_freq.$region # $qual.dom.dom_freq.$region # $qual.dom.rec_freq.$region # $qual.rec_hom.rec_freq.$region

# 5 # $qual.denovo.denov_freq.$region.$scores # $qual.dom.dom_freq.$region.$scores # $qual.dom.rec_freq.$region.$scores # $qual.rec_hom.rec_freq.$region.$scores

# 6 # $qual.denovo.denov_freq.$region.keep_all # $qual.dom.dom_freq.$region.keep_all # $qual.dom.rec_freq.$region.$scores.same_gene # $qual.dom.rec_freq.$region.$scores.same_tad # $qual.dom.rec_freq.$region.$scores.itv_500bp # $qual.rec_hom.rec_freq.$region.keep_all

Filtration Steps

The combinations of the filters is given in the configuration setting filter_combinations as dot-separated values, e.g., AA.BB.CC.

Step Input

TODO

Step Output

TODO

Global Configuration

TODO

Default Configuration

The default configuration is as follows.

step_config:
  variant_filtration:
    path_variant_annotation: ../variant_annotation
    tools_ngs_mapping: null      # defaults to ngs_mapping tool
    tools_variant_calling: null  # defaults to variant_annotation tool
    thresholds:                  # quality filter sets, "keep_all" implicitely defined
      conservative:
        min_gq: 40
        min_dp_het: 10
        min_dp_hom: 5
        include_expressions:
        - 'MEDGEN_COHORT_INCONSISTENT_AC=0'
      relaxed:
        min_gq: 20
        min_dp_het: 6
        min_dp_hom: 3
        include_expressions:
        - 'MEDGEN_COHORT_INCONSISTENT_AC=0'
    frequencies:             # values to use for frequency filtration
      af_dominant: 0.001     # AF (allele frequency) values
      af_recessive: 0.01
      ac_dominant: 3         # AC (allele count in gnomAD) values
    region_beds:             # regions to filter to, "whole_genome" implicitely defined
      all_tads: /fast/projects/medgen_genomes/static_data/GRCh37/hESC_hg19_allTads.bed
      all_genes: /fast/projects/medgen_genomes/static_data/GRCh37/gene_bed/ENSEMBL_v75.bed.gz
      limb_tads: /fast/projects/medgen_genomes/static_data/GRCh37/newlimb_tads.bed
      lifted_enhancers: /fast/projects/medgen_genomes/static_data/GRCh37/all_but_onlyMB.bed
      vista_enhancers: /fast/projects/medgen_genomes/static_data/GRCh37/vista_limb_enhancers.bed
    score_thresholds:        # thresholds on scores to filter to, "all_scores" implictely defined
      coding:
        require_coding: true
        require_gerpp_gt2: false
        min_cadd: null
      conservative:  # unused; TODO: rename?
        require_coding: false
        require_gerpp_gt2: false
        min_cadd: 0
      conserved:  # TODO: rename?
        require_coding: false
        require_gerpp_gt2: true
        min_cadd: null
    filter_combinations: # dot-separated {thresholds}.{inherit}.{freq}.{region}.{score}.{het_comp}
    - conservative.de_novo.dominant_freq.lifted_enhancers.all_scores.passthrough
    - conservative.de_novo.dominant_freq.lifted_enhancers.conserved.passthrough
    - conservative.de_novo.dominant_freq.limb_tads.all_scores.passthrough
    - conservative.de_novo.dominant_freq.limb_tads.coding.passthrough
    - conservative.de_novo.dominant_freq.limb_tads.conserved.passthrough
    - conservative.de_novo.dominant_freq.vista_enhancers.all_scores.passthrough
    - conservative.de_novo.dominant_freq.vista_enhancers.conserved.passthrough
    - conservative.de_novo.dominant_freq.whole_genome.all_scores.passthrough
    - conservative.de_novo.dominant_freq.whole_genome.coding.passthrough
    - conservative.de_novo.dominant_freq.whole_genome.conserved.passthrough
    - conservative.dominant.dominant_freq.lifted_enhancers.all_scores.passthrough
    - conservative.dominant.dominant_freq.lifted_enhancers.conserved.passthrough
    - conservative.dominant.dominant_freq.limb_tads.all_scores.passthrough
    - conservative.dominant.dominant_freq.limb_tads.coding.passthrough
    - conservative.dominant.dominant_freq.limb_tads.conserved.passthrough
    - conservative.dominant.dominant_freq.vista_enhancers.all_scores.passthrough
    - conservative.dominant.dominant_freq.vista_enhancers.conserved.passthrough
    - conservative.dominant.dominant_freq.whole_genome.all_scores.passthrough
    - conservative.dominant.dominant_freq.whole_genome.coding.passthrough
    - conservative.dominant.dominant_freq.whole_genome.conserved.passthrough
    - conservative.dominant.recessive_freq.lifted_enhancers.all_scores.intervals500
    - conservative.dominant.recessive_freq.lifted_enhancers.conserved.intervals500
    - conservative.dominant.recessive_freq.lifted_enhancers.conserved.tads
    - conservative.dominant.recessive_freq.limb_tads.all_scores.intervals500
    - conservative.dominant.recessive_freq.limb_tads.coding.gene
    - conservative.dominant.recessive_freq.limb_tads.conserved.intervals500
    - conservative.dominant.recessive_freq.limb_tads.conserved.tads
    - conservative.dominant.recessive_freq.vista_enhancers.all_scores.intervals500
    - conservative.dominant.recessive_freq.vista_enhancers.conserved.intervals500
    - conservative.dominant.recessive_freq.vista_enhancers.conserved.tads
    - conservative.dominant.recessive_freq.whole_genome.all_scores.intervals500
    - conservative.dominant.recessive_freq.whole_genome.coding.gene
    - conservative.dominant.recessive_freq.whole_genome.conserved.intervals500
    - conservative.dominant.recessive_freq.whole_genome.conserved.tads
    - conservative.recessive_hom.recessive_freq.lifted_enhancers.all_scores.passthrough
    - conservative.recessive_hom.recessive_freq.lifted_enhancers.conserved.passthrough
    - conservative.recessive_hom.recessive_freq.limb_tads.all_scores.passthrough
    - conservative.recessive_hom.recessive_freq.limb_tads.coding.passthrough
    - conservative.recessive_hom.recessive_freq.limb_tads.conserved.passthrough
    - conservative.recessive_hom.recessive_freq.vista_enhancers.all_scores.passthrough
    - conservative.recessive_hom.recessive_freq.vista_enhancers.conserved.passthrough
    - conservative.recessive_hom.recessive_freq.whole_genome.all_scores.passthrough
    - conservative.recessive_hom.recessive_freq.whole_genome.coding.passthrough
    - conservative.recessive_hom.recessive_freq.whole_genome.conserved.passthrough
    # The following are for input to variant_combination.
    - conservative.dominant.recessive_freq.whole_genome.coding.passthrough
    - conservative.dominant.recessive_freq.whole_genome.conserved.passthrough

Reports

Currently, no reports are generated.