Fusions / Structural Variants¶
vflank fusion builds the chimeric junction sequence for a fusion so a ddPCR
probe can span it.
Input¶
A tab-separated breakpoint table, columns matched by name (any order, extra columns ignored):
| Column | Meaning |
|---|---|
chr1, pos1, str1 |
breakpoint 1: chromosome, 1-based position, strand |
chr2, pos2, str2 |
breakpoint 2 |
name, sample |
optional labels |
str follows the iCallSV convention: 0 = plus/reference, 1 = minus/complement.
This is the format produced by iCallSV /
iAnnotateSV. Rename columns with
--chr1-col, --pos1-col, … if needed.
Run¶
vflank fusion run breakpoints.tsv \
--ref-genome GRCh37.fasta \
--genome-build hg19 \
--flank 200 \
--pop-source api \ # optional: mask common SNPs in the junction flanks
--output fusion_junctions.fasta
Add --ref-source api to fetch the reference from the UCSC API instead of a
local FASTA (then --ref-genome is optional) — see
Small Variants for the trade-offs.
How the junction is built¶
The fused product reads 5'→3' as partner1 + partner2 (no separator). With
L = --flank bases per side:
| partner 1 (ends at junction) | partner 2 (starts at junction) | |
|---|---|---|
| str = 0 | ref[pos−L+1 .. pos] (+) |
revcomp(ref[pos−L+1 .. pos]) |
| str = 1 | revcomp(ref[pos .. pos+L−1]) |
ref[pos .. pos+L−1] (+) |
The probe is designed to span junction_index (the FASTA header records it as
__j{index}). Junctions that run off a contig end are emitted but flagged.
Output¶
One raw + one Masked__ record per fusion (the {SAMPLE}__ prefix appears only
when a per-sample BAM consensus is used, mirroring vflank small):
Masking¶
The same masking applies to the junction flanks (a real upgrade over per-fusion scripts that ignore polymorphisms). Masking is computed in genomic space before any reverse-complement — see SNP Masking.