Germline Variant Phasing
Implementation of the germline variant_phasing
step
This step takes the result of the variant_annotation
step and performs phasing of the
variants using the GATK tools. Note that there are some issues with the GATK tools implementing
this step:
The result of the PhaseByTransmission tool changes the genotype of some variants which is problematic when trying to phase de novo variants.
The read backed phasing is also not 100% reliable at the moment.
Thus, the functionality of the tools is made available by this pipeline step but it is not as fully integrated as it could because it is unclear how useful this is for clinical studies. Also, so far only the GATK variant caller results can be phased.
Also note that this step generates one output file for each child in a pedigree where both parents have been sequenced.
Step Input
The variant annotation step uses the output of the following CUBI pipeline steps:
ngs_mapping
variant_annotation
Step Output
For each input VCF file (i.e., for each mapper and pedigree), a directory
output/{mapper}.{caller}.{phaser}.{index_ngs_library}/out
will be created with the following
output files.
The {phaser}
placeholder can take the values gatk_phase_by_transmission,
gatk_read_backed_phasing, and gatk_phased_both (for the latter, first phasing by transmission
and then read backed phasing is performed).
Global Configuration
static_data_config/reference/path
must be set appropriately
Default Configuration
The default configuration is as follows.
# Default configuration wgs_sv_filtration
step_config:
variant_phasing:
path_ngs_mapping: ../ngs_mapping
path_variant_annotation: ../variant_annotation
tools_ngs_mapping: [] # expected tools for ngs mapping
tools_variant_calling: [] # expected tools for variant calling
phasings:
- gatk_phasing_both
ignore_chroms: # patterns of chromosome names to ignore
- NC_007605 # herpes virus
- hs37d5 # GRCh37 decoy
- chrEBV # Eppstein-Barr Virus
- '*_decoy' # decoy contig
- 'HLA-*' # HLA genes
gatk_read_backed_phasing:
phase_quality_threshold: 20.0 # quality threshold for phasing
window_length: 5000000 # split input into windows of this size, each triggers a job
num_jobs: 1000 # number of windows to process in parallel
use_profil: true # use Snakemake profile for parallel processing
restart_times: 0 # number of times to re-launch jobs in case of failure
max_jobs_per_second: 10 # throttling of job creation
max_status_checks_per_second: 10 # throttling of status checks
debug_trunc_tokens: 0 # truncation to first N tokens (0 for none)
keep_tmpdir: never # keep temporary directory, {always, never, onerror}
job_mult_memory: 1 # memory multiplier
job_mult_time: 1 # running time multiplier
merge_mult_memory: 1 # memory multiplier for merging
merge_mult_time: 1 # running time multiplier for merging
gatk_phase_by_transmission:
de_novo_prior: 1e-8 # default, use 1e-6 when interested in phasing de novos
Reports
Currently, no reports are generated.