Germline Variant De Novo Filtration

Implementation of the variant_denovo_filtration step.

This step implements filtration of variants to de novo variants. This step was introduced for the “Ionizing Radiation” study in ca. 2016 and the aim here is to get a set of high-confidence de novo sequence variants (both SNVs and indels, although the latter turned out to be less reliable). Further, if the variants are phased, assigning to paternal or maternal allele can be attempted. This allows to study paternal age effects.

Note that in contrast to variant_calling and variant_annotation but in consistency with variant_phasing, the central individual here are children and not the index of pedigrees.

Step Input

The step reads in the variant call files from one of the following steps:

  • variant_calling

  • variant_annotation

  • variant_phasing

Of course, assignment to parental allele can only be performed on phased variants. Further, only filtering annotated variants is really useful as one wants to excludes variants in problematic genomic regions.

Step Output

For all children with both parents present, variant de novo annotation will be attempted on the primary DNA NGS library of that child. The name of this library will be used as the identification token in the output file and file name. For each read mapper, variant caller, and pedigree, the following files will be generated:

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos.{lib_name}.vcf.gz.tbi

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos.{lib_name}.vcf.gz

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos.{lib_name}.vcf.gz.md5

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos.{lib_name}.vcf.gz.tbi.md5

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.vcf.gz

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.vcf.gz.tbi

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.vcf.gz.md5

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.vcf.gz.tbi.md5

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.summary.txt

  • {mapper}.{var_caller}.{annotation}.{phasing}.de_novos_hard.{lib_name}.summary.txt.md5

The the annotation and phasing will only be persent when the input is read from the variant_annotation or variant_phasing steps, respectively.

For example, it might look as follows for the example from above:

output/
+-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1
|   `-- out
|       |-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1.vcf.gz
|       |-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1.vcf.gz.md5
|       |-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1.vcf.gz.tbi
|       |-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1.vcf.gz.tbi.md5
|       |-- bwa.gatk3_hc.de_novos.P001-N1-DNA1-WES1.vcf.gz
|       |-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.vcf.gz.md5
|       |-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.vcf.gz.tbi
|       |-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.vcf.gz.tbi.md5
|       |-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.vcf.gz
|       |-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.summary.txt
|       `-- bwa.gatk3_hc.de_novos_hard.P001-N1-DNA1-WES1.summary.txt.md5
[...]

Global Configuration

No global configuration is in use.

Default Configuration

The default configuration is as follows.

step_config:
  variant_denovo_filtration:
    # One of the following must be given!
    path_variant_phasing: ''
    path_variant_annotation: ''
    path_variant_calling: ''
    path_ngs_mapping: ../ngs_mapping
    tools_ngs_mapping: null          # defaults to ngs_mapping tool
    tools_variant_calling: null      # defaults to variant_annotation tool
    info_key_reliable_regions: []    # optional INFO keys with reliable regions
    info_key_unreliable_regions: []  # optional INFO keys with unreliable regions
    params_besenbacher:              # parameters for Besenbacher quality filter
      min_gq: 50
      min_dp: 10
      max_dp: 120
      min_ab: 0.20
      max_ab: 0.9
      max_ad2: 1
    bad_region_expressions: []
    # e.g.,
    # - 'UCSC_CRG_MAPABILITY36 == 1'
    # - 'UCSC_SIMPLE_REPEAT == 1'
    collect_msdn: True               # whether or not to collect MSDN (requires GATK HC+UG)

Reports

Currently, no reports are generated.