Recommended Usage¶

vflank exposes many flags, but a real run comes down to a few decisions. Good defaults cover the rest (--genome-build hg19, --flank 200, --pop-data genome, --af-threshold 0.001). This page gives the recommended command for each situation; reach for the CLI reference only when you need an override.

Which mode should I use?¶

flowchart TD
    START["Small variants (MAF)
or fusions (TSV)?"] --> Q1{Reference + gnomAD
already on disk?}
    Q1 -->|"No / one-off / hosted"| API["APIs
--ref-source api --pop-source api"]
    Q1 -->|"Yes / bulk / must be reproducible"| LOCAL["Local files
-r ref.fa -d gnomad/"]
    API --> Q2{Patient BAM
available?}
    LOCAL --> Q2
    Q2 -->|No| RUN["Run → raw + Masked FASTA"]
    Q2 -->|"Yes (best fidelity)"| BAM["add --bam-map samples.tsv"]
    BAM --> RUN
    RUN --> Q3{Feed a designer?}
    Q3 -->|Primer3| EMIT["add --emit-primer3 primers.txt"]
    Q3 -->|No| DONE["Done"]
    EMIT --> DONE

Use mouse to pan and zoom

The first split is the one that matters most — where the data comes from — and there is no single right answer; it depends on your situation.

The two recommended setups¶

Quick / hosted / small cohort Reproducible / bulk / offline

No local data, small inputs, or a hosted/one-off check. Fetch the reference and gnomAD over the network — nothing to download:

vflank small run variants.maf -g hg19 \
    --ref-source api --pop-source api

Use when: a handful of variants, an ad-hoc check, a CI/demo, or a hosted service with no reference on disk.

Zero setup; both builds supported.
Rate-limited (~1 req/s reference, ~10/min gnomAD) → small inputs only. Needs network, and the served data can change over time (not reproducible) — record the run for provenance.

Real cohort work, an HPC node, or anything that must be auditable and repeatable. Use a local indexed FASTA + local gnomAD VCFs:

vflank small run variants.maf \
    --ref-genome GRCh37.fasta \
    --pop-vcf-dir gnomad_v2.1.1/ \
    --genome-build hg19

Use when: more than a few dozen variants, a pipeline, or results you must reproduce later.

Offline, unlimited scale, reproducible (pin the data version); the build is sanity-checked against the FASTA's chr1 length.
Requires the FASTA (.fai alongside) and per-chromosome gnomAD VCFs on disk — see Installation and SNP Masking for download commands.

Mix and match

The two backends are independent: --ref-source and --pop-source can each be file or api. A common middle ground is a local FASTA with the gnomAD API for masking (-r ref.fa --pop-source api), or vice versa.

Add patient consensus (best assay fidelity)¶

When you have the patient's reads, build the flank from them so primers match the real template (catches private/rare variants gnomAD never saw). Add one flag to either setup above:

vflank small run variants.maf -r GRCh37.fasta -d gnomad_v2.1.1/ -g hg19 \
    --bam-map samples.tsv          # sample<TAB>bam_path

Output becomes one record per (variant, sample). See BAM Consensus.

Fusions¶

Identical decisions, different input (a breakpoint TSV):

vflank fusion run breakpoints.tsv -r GRCh37.fasta -g hg19 \
    --pop-source api               # optional: mask the junction flanks

Hand off to a designer¶

Add --emit-primer3 primers.txt to either command to also write a Primer3 Boulder-IO file alongside the FASTA — ready for probe/primer design. See Small Variants → Emit for Primer3.

The flags worth knowing¶

Everything else is an override with a sensible default. These are the ones you might actually change:

Flag	Default	Change it when
`--genome-build` / `-g`	`hg19`	your data is GRCh38 → `hg38`
`--flank` / `-f`	`200`	you need longer/shorter design windows
`--pop-data`	`genome`	masking near exons → `both` (genome ∪ exome)
`--af-threshold`	`0.001`	you want stricter/looser SNP masking
`--report`	off	you want a per-variant TSV + skip breakdown

The remaining flags (MAF column remapping, BAM-consensus tuning, sample filtering) are grouped into panels in vflank small run --help — you rarely need them.