Somatic Gene Fusion Calling

Implementation of the somatic_gene_fusion_calling step

The somatic_gene_fusion calling step allows for the detection of gene fusions from RNA-seq data in cancer. The wrapped tools start at the raw RNA-seq reads and generate filtered lists of predicted gene fusions.

Step Input

Gene fusion calling starts at the raw RNA-seq reads. Thus, the input is very similar to one of ngs_mapping step.

See Step Input for more information.

Note

The step requires a cancer_matched configuration & samplesheet files. This is an unnecessary requirement, which might be dropped in the future.

Step Output

There is no standard for reporting gene fusions, and therefore the output is different for all implemented tools.

arriba returns two tab-separated files: arriba.<library name>.fusions.tsv & arriba.<library name>.discarded_fusions.tsv.gz. Both files list the affected genes, reads supporting the fusion & a confidence level. Obviously, the discarded fusion file contains all hints of fusion that have been discarded because of insufficient evidence.

Default Configuration

The default configuration is as follows.

step_config:
  somatic_gene_fusion_calling:

    # Override data set configuration search paths for FASTQ files
    #path_link_in: ''
    #tools:                                            # Options: 'fusioncatcher', 'jaffa', 'arriba', 'defuse', 'hera', 'pizzly', 'star_fusion'
    #  - fusioncatcher
    #  - jaffa
    #  - arriba
    #  - defuse
    #  - hera
    #  - pizzly
    #  - star_fusion
    #fusioncatcher:
    #  data_dir:                                       # REQUIRED
    #  configuration: ''
    #  num_threads: 16
    #jaffa: {}
    #arriba:
    #
    #  # STAR path index (preferably 2.7.10 or later)
    #  path_index:                                     # REQUIRED
    #
    #  # provided in the arriba distribution, see /fast/work/groups/cubi/projects/biotools/static_data/app_support/arriba/v2.3.0
    #  blacklist: ''
    #  known_fusions: ''
    #
    #  # can be set to the same path as known_fusions
    #  tags: ''
    #  structural_variants: ''
    #  protein_domains: ''
    #  num_threads: 8
    #  trim_adapters: false
    #  num_threads_trimming: 2
    #  star_parameters:
    #    - ' --outFilterMultimapNmax 50'
    #    - ' --peOverlapNbasesMin 10'
    #    - ' --alignSplicedMateMapLminOverLmate 0.5'
    #    - ' --alignSJstitchMismatchNmax 5 -1 5 5'
    #    - ' --chimSegmentMin 10'
    #    - ' --chimOutType WithinBAM HardClip'
    #    - ' --chimJunctionOverhangMin 10'
    #    - ' --chimScoreDropMax 30'
    #    - ' --chimScoreJunctionNonGTAG 0'
    #    - ' --chimScoreSeparation 1'
    #    - ' --chimSegmentReadGapMax 3'
    #    - ' --chimMultimapNmax 50'
    #defuse:
    #  path_dataset_directory:                         # REQUIRED
    #hera:
    #  path_index:                                     # REQUIRED
    #  path_genome:                                    # REQUIRED
    #pizzly:
    #  kallisto_index:                                 # REQUIRED
    #  transcripts_fasta:                              # REQUIRED
    #  annotations_gtf:                                # REQUIRED
    #  kmer_size: 31
    #star_fusion:
    #  path_ctat_resource_lib:                         # REQUIRED

Available Gene Fusion Callers

  • arriba

  • fusioncatcher implementation is broken. The tool’s computational resources requirements are so enormous that it might not be advisable to try re-enable it.

  • the status of defuse, hera, jaffa & pizzly is unknown, they are probably currently broken or not implemented.

  • the status of star_fusion is also unknown, but it apparently returns results fairly similar to arriba, but not quite as accurate. arriba should be preferred.