Germline Targeted Seq. MEI Calling

Implementation of the targeted_seq_mei_calling step

The targeted_seq_mei_calling step takes as the input the results of the ngs_mapping step (aligned reads in BAM format) and performs germline mobile element insertion (MEI) identification. The result are VCF files with mobile insertions.

Stability

This step is considered experimental, use it at your own discretion.

Step Input

MEI identification step uses Snakemake sub workflows for using the result of the ngs_mapping step.

Step Output

For all samples, MEI identification will be performed on the primary DNA NGS libraries separately for each configured read mapper and mobile element identification tool. The name of the primary DNA NGS library will be used as an identification token in the output file.

For each read mapper, MEI tool, and sample the following files will be generated:

  • {mapper}.{mei_tool}.{lib_name}.vcf.gz

  • {mapper}.{mei_tool}.{lib_name}.vcf.gz.md5

For example, it might look as follows for the example from above:

output/
+-- bwa.scramble.P001-N1-DNA1-WES1
|   `-- out
|       |-- bwa.scramble.P001-N1-DNA1-WES1.vcf.gz
|       |-- bwa.scramble.P001-N1-DNA1-WES1.vcf.gz.md5
[...]

Global Configuration

Not applicable.

Default Configuration

The default configuration is as follows.

# Default configuration
step_config:
  targeted_seq_mei_calling:
    # Path to the ngs_mapping step
    path_ngs_mapping: ../ngs_mapping

    tools: [scramble]  # REQUIRED - available: 'scramble'

    scramble:
      blast_ref: null  # REQUIRED: path to FASTA reference with BLAST DB (`makeblastdb`)
      mei_refs: null  # OPTIONAL: MEI reference file (FASTA), if none provided will use default.
      n_cluster: 5  # OPTIONAL: minimum cluster size, depth of soft-clipped reads.
      mei_score: 50  # OPTIONAL: minimum MEI alignment score.
      indel_score: 80  # OPTIONAL: minimum INDEL alignment score.
      mei_polya_frac: 0.75  # OPTIONAL: minimum fraction of clipped length for calling polyA tail.

Available MEI Identification Tools

The following germline MEI identification tool is currently available:

  • "Scramble"

Reports

Not applicable.

Parallel Execution

Not available.