Germline Variant Phasing

Implementation of the germline variant_phasing step

This step takes the result of the variant_annotation step and performs phasing of the variants using the GATK tools. Note that there are some issues with the GATK tools implementing this step:

  • The result of the PhaseByTransmission tool changes the genotype of some variants which is problematic when trying to phase de novo variants.

  • The read backed phasing is also not 100% reliable at the moment.

Thus, the functionality of the tools is made available by this pipeline step but it is not as fully integrated as it could because it is unclear how useful this is for clinical studies. Also, so far only the GATK variant caller results can be phased.

Also note that this step generates one output file for each child in a pedigree where both parents have been sequenced.

Step Input

The variant annotation step uses the output of the following CUBI pipeline steps:

  • ngs_mapping

  • variant_annotation

Step Output

For each input VCF file (i.e., for each mapper and pedigree), a directory output/{mapper}.{caller}.{phaser}.{index_ngs_library}/out will be created with the following output files.

The {phaser} placeholder can take the values gatk_phase_by_transmission, gatk_read_backed_phasing, and gatk_phased_both (for the latter, first phasing by transmission and then read backed phasing is performed).

Global Configuration

  • static_data_config/reference/path must be set appropriately

Default Configuration

The default configuration is as follows.

step_config:
  variant_phasing:
    #path_ngs_mapping: ../ngs_mapping
    #path_variant_annotation: ../variant_annotation    # Examples: ../variant_annotation
    #
    # expected tools for ngs mapping
    #tools_ngs_mapping: []
    #
    # expected tools for variant calling
    #tools_variant_calling: []
    #phasings:
    #  - gatk_phasing_both
    #
    # patterns of chromosome names to ignore
    #ignore_chroms:
    #  - NC_007605
    #  - hs37d5
    #  - chrEBV
    #  - '*_decoy'
    #  - HLA-*
    #gatk_read_backed_phasing:
    #
    #  # quality threshold for phasing
    #  phase_quality_threshold: 20.0
    #
    #  # split input into windows of this size, each triggers a job
    #  window_length: 5000000
    #
    #  # number of windows to process in parallel
    #  num_jobs: 1000
    #
    #  # use Snakemake profile for parallel processing
    #  use_profile: true
    #
    #  # number of times to re-launch jobs in case of failure
    #  restart_times: 0
    #
    #  # throttling of job creation
    #  max_jobs_per_second: 10
    #
    #  # throttling of status checks
    #  max_status_checks_per_second: 10
    #
    #  # truncation to first N tokens (0 for none)
    #  debug_trunc_tokens: 0
    #
    #  # keep temporary directory, {always, never, onerror}
    #  keep_tmpdir: never                              # Options: 'always', 'never', 'onerror'
    #
    #  # memory multiplier
    #  job_mult_memory: 1.0
    #
    #  # running time multiplier
    #  job_mult_time: 1.0
    #
    #  # memory multiplier for merging
    #  merge_mult_memory: 1.0
    #
    #  # running time multiplier for merging
    #  merge_mult_time: 1.0
    #gatk_phase_by_transmission:
    #
    #  # use 1e-6 when interested in phasing de novos
    #  de_novo_prior: 1e-08

Reports

Currently, no reports are generated.