Germline Build Target Sequence gCNV Model

Implementation of the helper_gcnv_model_targeted step

The helper_gcnv_model_targeted step takes as the input the results of the ngs_mapping step (aligned germline reads) and builds a model that can be used by GATK4 gCNV for a particular library kit.

Step Input

The step uses Snakemake sub workflows for the result of the ngs_mapping (aligned reads BAM files).

Step Output

All donors will be used to generate the two parts of the required gCNV model, specifically: ploidy-model and cnv_calls-model. Both are required to execute gCNV in CASE mode.

For example, the relevant directories might look as follows:

work/
+-- bwa.gcnv_contig_ploidy.<library_kit_name>
    `-- out
        `-- bwa.gcnv_contig_ploidy.<library_kit_name>
            |-- SAMPLE_0
            |   |-- contig_ploidy.tsv
            |   |-- global_read_depth.tsv
            |   |-- mu_psi_s_log__.tsv
            |   |-- sample_name.txt
            |   `-- std_psi_s_log__.tsv
            |-- [...]
            `-- bwa.gcnv_contig_ploidy.<library_kit_name>
                `-- ploidy-model
                    |-- contig_ploidy_prior.tsv
                    |-- gcnvkernel_version.json
                    |-- interval_list.tsv
                    |-- mu_mean_bias_j_lowerbound__.tsv
                    |-- mu_psi_j_log__.tsv
                    |-- ploidy_config.json
                    |-- std_mean_bias_j_lowerbound__.tsv
                    `-- std_psi_j_log__.tsv
+-- bwa.gcnv_call_cnvs.<library_kit_name>.***_of_***
    `-- out
        `-- bwa.gcnv_call_cnvs.<library_kit_name>.***_of_***
            |-- cnv_calls-calls
            |   |-- SAMPLE_0
            |       `-- [...]
            |   |-- [...]
            |-- cnv_calls-model
            |  |-- denoising_config.json
            |  |-- gcnvkernel_version.json
            |  |-- interval_list.tsv
            |  |-- log_q_tau_tk.tsv
            |  |-- mu_W_tu.tsv
            |  |-- mu_ard_u_log__.tsv
            |  |-- mu_log_mean_bias_t.tsv
            |  |-- mu_psi_t_log__.tsv
            |  |-- std_W_tu.tsv
            |  |-- std_ard_u_log__.tsv
            |  |-- std_log_mean_bias_t.tsv
            |  `-- std_psi_t_log__.tsv
            `-- cnv_calls-tracking
                `-- [...]

Global Configuration

  • At the moment, no global configuration is used.

Default Configuration

The default configuration is as follows.

# Default configuration helper_gcnv_model_targeted
step_config:

  helper_gcnv_model_targeted:
    path_ngs_mapping: ../ngs_mapping  # REQUIRED

    gcnv:
      path_uniquely_mapable_bed: null  # REQUIRED - path to BED file with uniquely mappable regions.
      path_target_interval_list_mapping: []  # REQUIRED - define one or more set of target intervals.
      # The following will match both the stock IDT library kit and the ones
      # with spike-ins seen from Yale genomics.  The path above would be
      # mapped to the name "default".
      # - name: IDT_xGen_V1_0
      #   pattern: "xGen Exome Research Panel V1\\.0*"
      #   path: "path/to/targets.bed"