DVH plan quality dosimetry IMRT QA

Reading DVH Curves: What Plan Quality Metrics Actually Tell You

Yuto Kimura Founder & CEO, Airato December 17, 2025 10 min read

The dose-volume histogram is the most compressed representation of a treatment plan that contains clinically actionable information. A single DVH curve collapses the three-dimensional dose distribution for a structure into a two-dimensional curve — volume fraction on the y-axis, dose on the x-axis — and from that curve, a physicist or oncologist can read the metrics that determine whether a plan meets protocol objectives. Understanding what those metrics actually mean, and what their deviations signal, is foundational to productive plan evaluation.

Cumulative vs. Differential DVH: Reading the Right Curve

There are two DVH representations in common use: cumulative and differential. Cumulative DVH (the standard in treatment planning evaluation) shows, for each dose level D, what fraction of the structure's volume receives at least that dose. Differential DVH shows the volume fraction receiving dose within a small band around D — it resembles a probability distribution and is less commonly used in clinical plan review, though it can reveal dose hotspots in OARs more visually clearly than the cumulative form.

When a protocol specifies "V20 < 30%" for lung, it is specifying a point on the cumulative DVH: the volume fraction receiving at least 20 Gy should be less than 30% of total lung volume. When a protocol specifies "D95 > 95% of prescription dose", it is specifying that the dose to the 95th percentile of PTV volume — the dose at which 5% of the PTV receives less — should meet prescription. These notations (DV and VD) define the axes of the cumulative DVH and are the standard clinical shorthand.

The notation convention matters because it is asymmetric: DV is a dose value at a given volume percentile; VD is a volume percentage at a given dose value. Confusing the two in plan review is a source of clinical errors that is more common than it should be, especially in departments where physicists and oncologists review plans together without a shared shorthand training.

PTV Coverage Metrics: D95, D98, D2

PTV coverage evaluation typically centers on three metrics. D95 — the minimum dose received by 95% of the PTV — is the primary coverage metric in most IMRT protocols. A plan where D95 equals the prescription dose means that 95% of the planning target volume receives at least the prescribed dose; the remaining 5% may be under-covered, which is clinically acceptable only if the under-covered region is geographically located at the PTV margin rather than at the geometric center of the target.

D98 tightens this: if 2% of the PTV is under-covered, the physicist needs to examine where that 2% is. A 2% cold spot in the lateral margin of a prostate PTV, adjacent to a bowel loop that is constraining dose from one direction, may be clinically acceptable. The same 2% cold spot at the seminal vesicle tip in a patient with T3 disease is a different clinical situation entirely.

D2 (the dose to the highest 2% of PTV volume) is the hotspot metric. ICRU report guidance specifies that hotspots should generally not exceed 107% of prescription dose within the PTV; significant violations — D2 above 110% — can indicate beam delivery problems, optimizer artifacts, or anatomy-driven dose accumulation that warrants clinical review. In practice, head-and-neck IMRT plans for targets adjacent to bone frequently show D2 values at 105–108% due to electron density effects at tissue interfaces; understanding when this is physically expected versus algorithmically problematic requires dosimetric context that pure metric review cannot provide.

OAR Mean Dose vs. Volume-Based Constraints

For OAR evaluation, the choice between mean dose and volume-based constraints (V20, V40, V60, etc.) reflects different dose-response assumptions. Mean dose is most appropriate when the tissue's response is approximately proportional to the average dose across the organ — a valid approximation for salivary glands, where xerostomia severity correlates with mean parotid dose, and for the heart, where dose-volume correlations for cardiac toxicity are being characterized in several ongoing datasets.

Volume-based constraints are more appropriate for serial organs — those where function is preserved as long as any sufficient volume of the organ is undamaged. The spinal cord is the canonical serial OAR: dose to any point above the tolerance threshold risks myelopathy regardless of what the rest of the cord receives. For the spinal cord, the relevant metric is Dmax (or D0.1cc to avoid voxel-size sensitivity) — not mean dose, which is clinically misleading for serial structures.

At a community oncology center in the Kyushu region, a physicist evaluating a head-and-neck plan noted that the protocol template specified both spinal cord Dmax <45 Gy and spinal cord mean dose as a secondary metric. The mean dose for the cord was 12 Gy — comfortably within range — but Dmax was 44.2 Gy, just under the hard limit. Focusing on mean dose for the cord without checking Dmax would have missed the near-tolerance situation that required a beam weight adjustment. The lesson: apply the clinically appropriate metric type for the organ's biological response mechanism, and treat mean dose metrics for serial organs with skepticism.

Conformity Index and Gradient Index

Beyond DVH curves, plan quality metrics for target coverage include the conformity index (CI) — the ratio of the prescription-dose isodose volume to the PTV volume. A CI of 1.0 indicates perfect conformality; values above 1.0 indicate that more volume than the PTV is receiving prescription dose (geographic miss in the opposite direction of what the metric name suggests); values below 1.0 indicate under-coverage. RTOG and Paddick conformity indices are the most common formulations, with the Paddick CI also accounting for target coverage (the intersection of the prescription isodose volume and the PTV) rather than just volume matching.

The gradient index (GI) — the ratio of the half-prescription-dose isodose volume to the prescription-dose isodose volume — characterizes how sharply dose falls off beyond the prescription isodose. A lower GI indicates sharper falloff, which reduces dose to tissues immediately surrounding the target. In SBRT planning, where steep dose gradients are a primary dosimetric goal, GI is a critical metric that conformity index alone does not capture.

We are not suggesting that any single metric provides a complete picture of plan quality — the DVH is a projection of the three-dimensional dose distribution, and no set of one-dimensional metrics fully characterizes a 3D plan. The practice of reviewing a set of key metrics (D95, D2, key OAR values, CI, GI for SBRT) as a structured checklist is necessary but not sufficient; it identifies plans that fail protocol criteria but does not substitute for visual dose inspection, which can reveal spatially localized dose concerns that aggregate metrics obscure.

How AI Planning Systems Surface DVH Information

One underappreciated aspect of AI-assisted planning is the change in how DVH information is presented during plan evaluation. In conventional planning, the physicist evaluates DVH metrics after the optimizer has converged — retrospectively checking whether the plan meets criteria. In AI-assisted planning workflows that provide predicted DVH envelopes alongside the planned curves, the physicist can see simultaneously what the plan achieves and what was predicted as achievable for this anatomy.

This comparison changes the diagnostic question from "does this plan meet protocol?" to "does this plan meet protocol, and is it achieving close to the geometrically achievable optimum for this patient?" A plan that meets all protocol constraints but is 8 Gy above the predicted achievable mean dose for the parotid is technically compliant but clinically suboptimal — the physics team could do better, and the AI prediction makes that visible.

The practical implication is that AI-predicted DVH envelopes function as a quality floor, not just a planning target. Departments using predicted DVH benchmarks as part of their plan review workflow develop a more granular understanding of plan quality variation across patients and planners — which, over time, drives upward convergence on dosimetric outcomes that protocol compliance metrics alone cannot capture.