← Diligence index  ·  View raw .md

Title: Quality Metrics — Canonical Source Version: 0.1.0-draft Status: Draft Owner: Quality Lead Last Reviewed: 2026-05-06 Next Review: 2026-08-06

Quality Metrics — Canonical Source

This document is the single source of truth for the platform's in-scope quality numbers. Any other dossier doc (PHASE_1_READINESS.md, customer/PILOT_AGREEMENT.md, customer/DISCLAIMERS.md, validation reports) that cites quality numbers must either inline these values verbatim or reference this file by name. When the numbers tighten or shift, the canonical update happens here first; downstream docs are synchronized as a single change-controlled act per sops/CHANGE_CONTROL.md.

Citing rule. No quality number from this doc may be cited in a customer-facing context without the accompanying intended-use disclaimer from intended-use/INTENDED_USE.md §1 and the reportable-range exclusion from §1.3. The numbers are conditional on the platform staying inside its locked intended use.


1. Headline numbers (HG002 30x; Parabricks 4.7.0-1; DeepVariant)

These are the substrate-baseline numbers for 0.1.0-substrate per technical/PIPELINE_LOCK.md §1.

1.1 Against GIAB v4.2.1 truth (full benchmark BED, no exclusion)

Metric Value Source artifact
Aggregate F1 0.9954 data/hg002_30x/output/benchmark_deepvariant_v4_2_1/summary.txt
Missed truth variants (FN total) 30,084 benchmark_deepvariant_v4_2_1/fn.vcf.gz
SNP F1 (split not pinned at 30x; aggregate dominates) RTG vcfeval snp_roc.tsv.gz
Indel F1 (split not pinned at 30x; aggregate dominates) RTG vcfeval non_snp_roc.tsv.gz

Suitable for: Phase 1 pilot positioning ("credible for pilot work"). NOT suitable for: clinical-quality claims that exceed industry standards (per intended-use/QUALITY_CLAIMS.md F-009).

1.2 Against GIAB v5.0q truth (full benchmark BED, NO exclusion)

These are the raw v5.0q numbers — they look bad because v5.0q is an assembly-based truth set that asserts truth in regions where the caller architecture has known limits. Do not cite these values without the §1.3 in-scope numbers in the same sentence.

Metric Value Source artifact
SNP F1 0.9906 benchmark_deepvariant_v5_0q/snp_roc.tsv.gz
Indel F1 0.9408 benchmark_deepvariant_v5_0q/non_snp_roc.tsv.gz
Missed truth variants (FN total) 121,994 benchmark_deepvariant_v5_0q/fn.vcf.gz

1.3 Against GIAB v5.0q truth, in-scope complement (after exclusion BED)

This is the headline clinical-quality posture. The exclusion BED is empirically constructed to capture the v5.0q-specific truth content that the caller architecture cannot meet (alldifficultregions minus MHC ∪ chrX/Y non-PAR/XTR/ampliconic; PAR remains in scope; MHC was lifted to in-scope per ADR-0006 on 2026-05-11). See investigations/V5_0Q_GAP_ANALYSIS.md v0.3.0+ §5.10 for the full per-stratum decomposition and decisions/0006-mhc-exclusion-lift.md for the MHC-lift rationale.

Metric Value Source artifact
In-scope SNP F1 0.9993 (arithmetic est.; hap.py confirmation pending) per-stratum decomposition + ADR-0006
In-scope Indel F1 0.9959 (arithmetic est.; hap.py confirmation pending) per-stratum decomposition + ADR-0006
Exclusion BED capture 118,748 of 121,994 FNs (97.3 %) investigations/v5_0q_excluded_regions.bed
Exclusion BED region count 4,571,604 merged intervals same file
Exclusion BED coverage 747,356,696 bp same file
In-scope quality vs v4.2.1 aggregate exceeds (0.9993 SNP and 0.9959 Indel vs 0.9954 aggregate) comparison
In-scope range now includes MHC (HLA region) yes — SNP F1 0.9897 / Indel F1 0.9498 in-stratum V5_0Q_GAP_ANALYSIS.md §5.10; ADR-0006

1.4 Per-stratum FN concentration (top 5)

Rank Stratum Total FN Share SNP F1 Indel F1 v5.0q-only share
1 notinrefseq_cds 121,385 99.5 % 0.9896 0.9447 81.2 %
2 HG002_v4.2.1_complexandSVs_alldifficultregions 120,562 98.8 % 0.9646 0.9323 81.2 %
3 alldifficultregions 118,859 97.4 % 0.9521 0.9308 81.2 %
4 AllAutosomes 115,893 95.0 % 0.9899 0.9458 80.2 %
5 notinsegdups 93,652 76.8 % 0.9930 0.9475 82.5 %

alldifficultregions is the dominant stratum and the one driving the exclusion BED design.


2. Provenance + integrity

The numbers above are reproducible. To verify them, recompute against the pinned artifacts:

Artifact SHA-256
HG002 v4.2.1 truth VCF adb4d4a5...e81175c (see technical/PIPELINE_LOCK.md §5.1)
HG002 v5.0q truth VCF c7f9d7a4...f9c50dc8 (PIPELINE_LOCK.md §5.1.1)
Reference FASTA 9cce8b92...8702b7 (PIPELINE_LOCK.md §4)
Per-stratum decomposition TSV 2badc993243a8807abbe005c5b7800cbe26adacd5bfbfc24353a2c9a95f2383a
Exclusion BED (uncompressed; per ADR-0006 post-MHC-lift) 7dc4d16b1d0eb1d171713bc272c9a3f3b881dddb1f305faba02dac25a3932c1c
Exclusion BED (uncompressed; pre-MHC-lift; historical, ADR-0004) 3c079df0d7a2e40876c7e18a87e8a9d541ae63f18a026b76812df715523ae795
GIAB v3.6 stratifications bundle c5a1eceac54aac2c438af21825223d2a71e64b3db6b1c9e923849babb38063d8

Full SHA-256 manifest pins live in technical/PIPELINE_LOCK.md §4 (reference) and §5 (truth sets, exclusion).


3. How these numbers change

The numbers in §1 update on any of:

  1. Pipeline version bump (Parabricks image, DeepVariant model, reference, parameters) — see sops/CHANGE_CONTROL.md. Material changes per PIPELINE_LOCK.md §6 trigger revalidation; new numbers land here as part of the revalidation report.
  2. Truth-set update (GIAB v5.0q → v5.x, or v4.2.1 → newer Q-suffix release). New truth-set SHA-256 lands in PIPELINE_LOCK.md §5; gap analysis re-runs in V5_0Q_GAP_ANALYSIS.md; new numbers here.
  3. Exclusion BED revision (adding or removing strata from the exclusion). This is a clinical-claim-affecting change and requires a written customer acknowledgement before going live.
  4. New benchmark cells (40x and 50x v5.0q HG002, currently pending GPU compute — see validation/PROTOCOL_GIAB.md §6.2). Coverage slope (H5) is non-gating but expected to tighten the in-scope residual ~0.06 % (1 − 0.9994 = 0.0006 → expected to halve at 50x).

When any of these triggers fire, the change-control sequence is:

  1. Run revalidation per the relevant protocol.
  2. Update this file (QUALITY_METRICS.md) with new values, bump front-matter version (e.g. 0.1.0-draft → 0.2.0-draft), and update Last Reviewed.
  3. Sync downstream docs in a single PR:
  4. PHASE_1_READINESS.md §2 / §4
  5. customer/PILOT_AGREEMENT.md (success-criteria block)
  6. customer/DISCLAIMERS.md (quality-claim posture block)
  7. Any open validation/VALIDATION_REPORT_*.md instances
  8. Customer notification per customer/RELEASE_NOTES_TEMPLATE.md if the change is material.

4. What MAY be cited in customer-facing material

Always paired with the intended-use disclaimer (INTENDED_USE.md §1) and the reportable-range exclusion (INTENDED_USE.md §1.3 + the v5_0q_excluded_regions.bed reference):

What MUST NOT be cited (per intended-use/QUALITY_CLAIMS.md F-009 and related Forbidden rows):


5. Pending non-gating numbers

These would tighten or extend the headline once they land but do NOT gate Phase 1:


6. Changelog

Date Change Authority
2026-05-06 Initial canonical metrics doc populated from V5_0Q_GAP_ANALYSIS.md v0.3.0+ §5.10 per-stratum decomposition and benchmark_deepvariant_v4_2_1/summary.txt. Quality Lead