Somatic Targeted Seq. CNV Calling
Implementation of the somatic_target_seq_cnv_calling
step
This step allows for the detection of CNV events for cancer samples from targeted sequenced (e.g.,
exomes or large panels). The wrapped tools start from the aligned reads (thus off ngs_mapping
)
and generate CNV calls for somatic variants.
The wrapped tools implement different strategies. Some work “reference free” and just use the somatic BAM files for their input, some work in “matched cancer normal mode” and need the cancer and normal BAM files, others again need both normal and cancer BAM files, and additionally a set of non-cancer BAM files for their background.
Step Input
Gene somatic CNV calling for targeted sequencing starts off the aligned reads, i.e.,
ngs_mapping
.
Step Output
Generally, the following links are generated to output/
.
Note
Tool-Specific Output
As the only integrated tool is cnvkit at the moment, the output is very tailored to the result of this tool. In the future, this section will contain “common” output and tool-specific output sub sections.
{mapper}.cnvkit.{lib_name}-{lib_pk}/out/
{mapper}.cnvkit.{lib_name}-{lib_pk}.bed
{mapper}.cnvkit.{lib_name}-{lib_pk}.seg
{mapper}.cnvkit.{lib_name}-{lib_pk}.vcf.gz
{mapper}.cnvkit.{lib_name}-{lib_pk}.vcf.gz.tbi
{mapper}.cnvkit.{lib_name}-{lib_pk}/report
{mapper}.cnvkit.{lib_name}-{lib_pk}.diagram.pdf
{mapper}.cnvkit.{lib_name}-{lib_pk}.scatter.pdf
{mapper}.cnvkit.{lib_name}-{lib_pk}.heatmap.pdf
{mapper}.cnvkit.{lib_name}-{lib_pk}.heatmap.chr1.pdf
…
{mapper}.cnvkit.{lib_name}-{lib_pk}.scatter.chrX.pdf
{mapper}.cnvkit.{lib_name}-{lib_pk}/report
{mapper}.cnvkit.{lib_name}-{lib_pk}.breaks.txt
{mapper}.cnvkit.{lib_name}-{lib_pk}.genemetrics.txt
{mapper}.cnvkit.{lib_name}-{lib_pk}.gender.txt
{mapper}.cnvkit.{lib_name}-{lib_pk}.metrics.txt
{mapper}.cnvkit.{lib_name}-{lib_pk}.segmetrics.txt
For example:
output/
|-- bwa.cnvkit.P001-T1-DNA1-WES1-000007
| `-- out
| |-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.bed
| |-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.seg
| `-- bwa.cnvkit.P001-T1-DNA1-WES1-000007.vcf
|-- bwa.cnvkit.P002-T1-DNA1-WES1-000016
| `-- report
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.diagram.pdf
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.heatmap.pdf
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.scatter.pdf
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.heatmap.chr1.pdf
| |-- ...
| `-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.scatter.chrX.pdf
|-- bwa.cnvkit.P002-T1-DNA1-WES1-000016
| `-- report
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.breaks.txt
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.genemetrics.txt
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.gender.txt
| |-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.metrics.txt
| `-- bwa.cnvkit.P002-T1-DNA1-WES1-000016.segmetrics.txt
[...]
Note that tool cnvetti
doesn’t follow the snappy convention above:
the tool name is followed by an underscore & the action, where the action is one of coverage
, segment
and postprocess
.
For example, the output directory would contain a directory named bwa.cnvetti_coverage.P002-T1-DNA1-WES1-000016
.
Default Configuration
The default configuration is as follows.
# Default configuration somatic_targeted_seq_cnv_calling
step_config:
somatic_targeted_seq_cnv_calling:
tools: ['cnvkit'] # REQUIRED - available: 'cnvkit', 'copywriter', 'cnvetti_on_target' and 'cnvetti_off_target'
path_ngs_mapping: ../ngs_mapping # REQUIRED
cnvkit:
path_target: REQUIRED # Usually ../panel_of_normals/output/cnvkit.target/out/cnvkit.target.bed
path_antitarget: REQUIRED # Usually ../panel_of_normals/output/cnvkit.antitarget/out/cnvkit.antitarget.bed
path_panel_of_normals: REQUIRED # Usually ../panel_of_normals/output/{mapper}.cnvkit.create_panel/out/{mapper}.cnvkit.panel_of_normals.cnn
plot: True # Generate plots (very slow)
min_mapq: 0 # [coverage] Mininum mapping quality score to count a read for coverage depth
count: False # [coverage] Alternative couting algorithm
gc_correction: True # [fix] Use GC correction
edge_correction: True # [fix] Use edge correction
rmask_correction: True # [fix] Use rmask correction
# BCBIO uses
# seg_method: haar
# seg_threshold: 0.0001
# -- OR
# seg_method: cbs
# seg_threshold: 0.000001
segmentation_method: cbs # [segment] One of cbs, flasso, haar, hmm, hmm-tumor, hmm-germline, none
segmentation_threshold: 0.000001 # [segment] Significance threshold (hmm methods: smoothing window size)
drop_low_coverage: False # [segment, call, genemetrics] Drop very low coverage bins
drop_outliers: 10 # [segment] Drop outlier bins (0 for no outlier filtering)
smooth_cbs: True # [segment] Additional smoothing of CBS segmentation (WARNING- not the default value)
center: "" # [call] Either one of mean, median, mode, biweight, or a constant log2 ratio value.
filter: ampdel # [call] One of ampdel, cn, ci, sem (merging segments flagged with the specified filter), "" for no filtering
calling_method: threshold # [call] One of threshold, clonal, none
call_thresholds: "-1.1,-0.25,0.2,0.7" # [call] Thresholds for calling integer copy number
ploidy: 2 # [call] Ploidy of sample cells
purity: 0 # [call] Estimated tumor cell fraction (0 for discarding tumor cell purity)
gender: "" # [call, diagram] Specify the chromosomal sex of all given samples as male or female. Guess when missing
male_reference: False # [call, diagram] Create male reference
diagram_threshold: 0.5 # [diagram] Copy number change threshold to label genes
diagram_min_probes: 3 # [diagram] Min number of covered probes to label genes
shift_xy: True # [diagram] Shift X & Y chromosomes according to sample sex
breaks_min_probes: 1 # [breaks] Min number of covered probes for a break inside the gene
genemetrics_min_probes: 3 # [genemetrics] Min number of covered probes to consider a gene
genemetrics_threshold: 0.2 # [genemetrics] Min abs log2 change to consider a gene
genemetrics_alpha: 0.05 # [genemetrics] Significance cutoff
genemetrics_bootstrap: 100 # [genemetrics] Number of bootstraps
segmetrics_alpha: 0.05 # [segmetrics] Significance cutoff
segmetrics_bootstrap: 100 # [segmetrics] Number of bootstraps
smooth_bootstrap: False # [segmetrics] Smooth bootstrap results
copywriter:
path_target_regions: REQUIRED # REQUIRED
bin_size: 20000 # TODO: make actually configurable
plot_genes: REQUIRED # Path to civic annotation
genome: hg19 # Could be hg38 (consider setting prefix to 'chr' when using GRCh38.v1)
features: EnsDb.Hsapiens.v75::EnsDb.Hsapiens.v75
prefix: ''
nThread: 8
cnvetti_on_target:
path_target_regions: REQUIRED # REQUIRED
cnvetti_off_target:
path_target_regions: REQUIRED # REQUIRED
window_length: 20000
Available Somatic Targeted CNV Caller
cnvkit