Skip to content

Recommended Usage

vflank exposes many flags, but a real run comes down to a few decisions. Good defaults cover the rest (--genome-build hg19, --flank 200, --pop-data genome, --af-threshold 0.001). This page gives the recommended command for each situation; reach for the CLI reference only when you need an override.

Which mode should I use?

flowchart TD
    START["Small variants (MAF)
or fusions (TSV)?"] --> Q1{Reference + gnomAD
already on disk?} Q1 -->|"No / one-off / hosted"| API["APIs
--ref-source api --pop-source api"] Q1 -->|"Yes / bulk / must be reproducible"| LOCAL["Local files
-r ref.fa -d gnomad/"] API --> Q2{Patient BAM
available?} LOCAL --> Q2 Q2 -->|No| RUN["Run → raw + Masked FASTA"] Q2 -->|"Yes (best fidelity)"| BAM["add --bam-map samples.tsv"] BAM --> RUN RUN --> Q3{Feed a designer?} Q3 -->|Primer3| EMIT["add --emit-primer3 primers.txt"] Q3 -->|No| DONE["Done"] EMIT --> DONE
Use mouse to pan and zoom

The first split is the one that matters most — where the data comes from — and there is no single right answer; it depends on your situation.

No local data, small inputs, or a hosted/one-off check. Fetch the reference and gnomAD over the network — nothing to download:

vflank small run variants.maf -g hg19 \
    --ref-source api --pop-source api

Use when: a handful of variants, an ad-hoc check, a CI/demo, or a hosted service with no reference on disk.

  • Zero setup; both builds supported.
  • Rate-limited (~1 req/s reference, ~10/min gnomAD) → small inputs only. Needs network, and the served data can change over time (not reproducible) — record the run for provenance.

Real cohort work, an HPC node, or anything that must be auditable and repeatable. Use a local indexed FASTA + local gnomAD VCFs:

vflank small run variants.maf \
    --ref-genome GRCh37.fasta \
    --pop-vcf-dir gnomad_v2.1.1/ \
    --genome-build hg19

Use when: more than a few dozen variants, a pipeline, or results you must reproduce later.

  • Offline, unlimited scale, reproducible (pin the data version); the build is sanity-checked against the FASTA's chr1 length.
  • Requires the FASTA (.fai alongside) and per-chromosome gnomAD VCFs on disk — see Installation and SNP Masking for download commands.

Mix and match

The two backends are independent: --ref-source and --pop-source can each be file or api. A common middle ground is a local FASTA with the gnomAD API for masking (-r ref.fa --pop-source api), or vice versa.

Add patient consensus (best assay fidelity)

When you have the patient's reads, build the flank from them so primers match the real template (catches private/rare variants gnomAD never saw). Add one flag to either setup above:

vflank small run variants.maf -r GRCh37.fasta -d gnomad_v2.1.1/ -g hg19 \
    --bam-map samples.tsv          # sample<TAB>bam_path

Output becomes one record per (variant, sample). See BAM Consensus.

Fusions

Identical decisions, different input (a breakpoint TSV):

vflank fusion run breakpoints.tsv -r GRCh37.fasta -g hg19 \
    --pop-source api               # optional: mask the junction flanks

Hand off to a designer

Add --emit-primer3 primers.txt to either command to also write a Primer3 Boulder-IO file alongside the FASTA — ready for probe/primer design. See Small Variants → Emit for Primer3.

The flags worth knowing

Everything else is an override with a sensible default. These are the ones you might actually change:

Flag Default Change it when
--genome-build / -g hg19 your data is GRCh38hg38
--flank / -f 200 you need longer/shorter design windows
--pop-data genome masking near exons → both (genome ∪ exome)
--af-threshold 0.001 you want stricter/looser SNP masking
--report off you want a per-variant TSV + skip breakdown

The remaining flags (MAF column remapping, BAM-consensus tuning, sample filtering) are grouped into panels in vflank small run --help — you rarely need them.