Somatic Targeted Seq. CNV Calling

Implementation of the somatic_target_seq_cnv_calling step

This step allows for the detection of CNV events for cancer samples from targeted sequenced (e.g., exomes or large panels). The wrapped tools start from the aligned reads (thus off ngs_mapping) and generate CNV calls for somatic variants.

The wrapped tools implement different strategies. Some work “reference free” and just use the somatic BAM files for their input, some work in “matched cancer normal mode” and need the cancer and normal BAM files, others again need both normal and cancer BAM files, and additionally a set of non-cancer BAM files for their background.

Step Input

Gene somatic CNV calling for targeted sequencing starts off the aligned reads, i.e., ngs_mapping.

Step Output

Generally, the following links are generated to output/.

Note

Tool-Specific Output

As the only integrated tool is cnvkit at the moment, the output is very tailored to the result of this tool. In the future, this section will contain “common” output and tool-specific output sub sections.

  • {mapper}.cnvkit.{lib_name}-{lib_pk}/out/
    • {mapper}.cnvkit.{lib_name}-{lib_pk}.bed

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.seg

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.vcf.gz

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.vcf.gz.tbi

  • {mapper}.cnvkit.{lib_name}-{lib_pk}/report
    • {mapper}.cnvkit.{lib_name}-{lib_pk}.diagram.pdf

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.scatter.pdf

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.heatmap.pdf

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.heatmap.chr1.pdf

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.scatter.chrX.pdf

  • {mapper}.cnvkit.{lib_name}-{lib_pk}/report
    • {mapper}.cnvkit.{lib_name}-{lib_pk}.breaks.txt

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.genemetrics.txt

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.gender.txt

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.metrics.txt

    • {mapper}.cnvkit.{lib_name}-{lib_pk}.segmetrics.txt

For example:

output/
|-- bwa.cnvkit.P001-T1-DNA1-WES1-000007
|   `-- out
|       |-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.bed
|       |-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.seg
|       `-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.vcf
|-- bwa.cnvkit.P002-T1-DNA1-WES1-000016
|   `-- report
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.diagram.pdf
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.heatmap.pdf
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.scatter.pdf
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.heatmap.chr1.pdf
|       |-- ...
|       `-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.scatter.chrX.pdf
|-- bwa.cnvkit.P002-T1-DNA1-WES1-000016
|   `-- report
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.breaks.txt
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.genemetrics.txt
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.gender.txt
|       |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.metrics.txt
|       `-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.segmetrics.txt
[...]

Note that tool cnvetti doesn’t follow the snappy convention above: the tool name is followed by an underscore & the action, where the action is one of coverage, segment and postprocess. For example, the output directory would contain a directory named bwa.cnvetti_coverage.P002-T1-DNA1-WES1-000016.

Default Configuration

The default configuration is as follows.

# Default configuration somatic_targeted_seq_cnv_calling
step_config:
  somatic_targeted_seq_cnv_calling:
    tools: ['cnvkit']  # REQUIRED - available: 'cnvkit', 'copywriter', 'cnvetti_on_target' and 'cnvetti_off_target'
    path_ngs_mapping: ../ngs_mapping  # REQUIRED
    cnvkit:
      path_target: REQUIRED             # Usually ../panel_of_normals/output/cnvkit.target/out/cnvkit.target.bed
      path_antitarget: REQUIRED         # Usually ../panel_of_normals/output/cnvkit.antitarget/out/cnvkit.antitarget.bed
      path_panel_of_normals: REQUIRED   # Usually ../panel_of_normals/output/{mapper}.cnvkit.create_panel/out/{mapper}.cnvkit.panel_of_normals.cnn
      plot: True                        # Generate plots (very slow)
      min_mapq: 0                       # [coverage] Mininum mapping quality score to count a read for coverage depth
      count: False                      # [coverage] Alternative couting algorithm
      gc_correction: True               # [fix] Use GC correction
      edge_correction: True             # [fix] Use edge correction
      rmask_correction: True            # [fix] Use rmask correction
      # BCBIO uses
      # seg_method: haar
      # seg_threshold: 0.0001
      # -- OR
      # seg_method: cbs
      # seg_threshold: 0.000001
      segmentation_method: cbs          # [segment] One of cbs, flasso, haar, hmm, hmm-tumor, hmm-germline, none
      segmentation_threshold: 0.000001  # [segment] Significance threshold (hmm methods: smoothing window size)
      drop_low_coverage: False          # [segment, call, genemetrics] Drop very low coverage bins
      drop_outliers: 10                 # [segment] Drop outlier bins (0 for no outlier filtering)
      smooth_cbs: True                  # [segment] Additional smoothing of CBS segmentation (WARNING- not the default value)
      center: ""                        # [call] Either one of mean, median, mode, biweight, or a constant log2 ratio value.
      filter: ampdel                    # [call] One of ampdel, cn, ci, sem (merging segments flagged with the specified filter), "" for no filtering
      calling_method: threshold         # [call] One of threshold, clonal, none
      call_thresholds: "-1.1,-0.25,0.2,0.7" # [call] Thresholds for calling integer copy number
      ploidy: 2                         # [call] Ploidy of sample cells
      purity: 0                         # [call] Estimated tumor cell fraction (0 for discarding tumor cell purity)
      gender: ""                        # [call, diagram] Specify the chromosomal sex of all given samples as male or female. Guess when missing
      male_reference: False             # [call, diagram] Create male reference
      diagram_threshold: 0.5            # [diagram] Copy number change threshold to label genes
      diagram_min_probes: 3             # [diagram] Min number of covered probes to label genes
      shift_xy: True                    # [diagram] Shift X & Y chromosomes according to sample sex
      breaks_min_probes: 1              # [breaks] Min number of covered probes for a break inside the gene
      genemetrics_min_probes: 3         # [genemetrics] Min number of covered probes to consider a gene
      genemetrics_threshold: 0.2        # [genemetrics] Min abs log2 change to consider a gene
      genemetrics_alpha: 0.05           # [genemetrics] Significance cutoff
      genemetrics_bootstrap: 100        # [genemetrics] Number of bootstraps
      segmetrics_alpha: 0.05            # [segmetrics] Significance cutoff
      segmetrics_bootstrap: 100         # [segmetrics] Number of bootstraps
      smooth_bootstrap: False           # [segmetrics] Smooth bootstrap results
    copywriter:
      path_target_regions: REQUIRED # REQUIRED
      bin_size: 20000 # TODO: make actually configurable
      plot_genes: REQUIRED  # Path to civic annotation
      genome: hg19          # Could be hg38 (consider setting prefix to 'chr' when using GRCh38.v1)
      features: EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75
      prefix: ''
      nThread: 8
    cnvetti_on_target:
      path_target_regions: REQUIRED # REQUIRED
    cnvetti_off_target:
      path_target_regions: REQUIRED # REQUIRED
      window_length: 20000

Available Somatic Targeted CNV Caller

  • cnvkit