Somatic Gene Fusion Calling
Implementation of the somatic_gene_fusion_calling
step
The somatic_gene_fusion calling step allows for the detection of gene fusions from RNA-seq data in cancer. The wrapped tools start at the raw RNA-seq reads and generate filtered lists of predicted gene fusions.
Step Input
Gene fusion calling starts at the raw RNA-seq reads. Thus, the input is very similar to one of ngs_mapping step.
See Step Input for more information.
Note
The step requires a cancer_matched
configuration & samplesheet files.
This is an unnecessary requirement, which might be dropped in the future.
Step Output
There is no standard for reporting gene fusions, and therefore the output is different for all implemented tools.
arriba
returns two tab-separated files: arriba.<library name>.fusions.tsv
& arriba.<library name>.discarded_fusions.tsv.gz
.
Both files list the affected genes, reads supporting the fusion & a confidence level.
Obviously, the discarded fusion file contains all hints of fusion that have been discarded because of insufficient evidence.
Default Configuration
The default configuration is as follows.
step_config:
somatic_gene_fusion_calling:
# Override data set configuration search paths for FASTQ files
#path_link_in: ''
#tools: # Options: 'fusioncatcher', 'jaffa', 'arriba', 'defuse', 'hera', 'pizzly', 'star_fusion'
# - fusioncatcher
# - jaffa
# - arriba
# - defuse
# - hera
# - pizzly
# - star_fusion
#fusioncatcher:
# data_dir: # REQUIRED
# configuration: ''
# num_threads: 16
#jaffa: {}
#arriba:
#
# # STAR path index (preferably 2.7.10 or later)
# path_index: # REQUIRED
#
# # provided in the arriba distribution, see /fast/work/groups/cubi/projects/biotools/static_data/app_support/arriba/v2.3.0
# blacklist: ''
# known_fusions: ''
#
# # can be set to the same path as known_fusions
# tags: ''
# structural_variants: ''
# protein_domains: ''
# num_threads: 8
# trim_adapters: false
# num_threads_trimming: 2
# star_parameters:
# - ' --outFilterMultimapNmax 50'
# - ' --peOverlapNbasesMin 10'
# - ' --alignSplicedMateMapLminOverLmate 0.5'
# - ' --alignSJstitchMismatchNmax 5 -1 5 5'
# - ' --chimSegmentMin 10'
# - ' --chimOutType WithinBAM HardClip'
# - ' --chimJunctionOverhangMin 10'
# - ' --chimScoreDropMax 30'
# - ' --chimScoreJunctionNonGTAG 0'
# - ' --chimScoreSeparation 1'
# - ' --chimSegmentReadGapMax 3'
# - ' --chimMultimapNmax 50'
#defuse:
# path_dataset_directory: # REQUIRED
#hera:
# path_index: # REQUIRED
# path_genome: # REQUIRED
#pizzly:
# kallisto_index: # REQUIRED
# transcripts_fasta: # REQUIRED
# annotations_gtf: # REQUIRED
# kmer_size: 31
#star_fusion:
# path_ctat_resource_lib: # REQUIRED
Available Gene Fusion Callers
arriba
fusioncatcher
implementation is broken. The tool’s computational resources requirements are so enormous that it might not be advisable to try re-enable it.the status of
defuse
,hera
,jaffa
&pizzly
is unknown, they are probably currently broken or not implemented.the status of
star_fusion
is also unknown, but it apparently returns results fairly similar toarriba
, but not quite as accurate.arriba
should be preferred.