Clinical Quality

Plan Quality Metrics: Conformity and Homogeneity in IMRT

April 18, 2026 · Airato Clinical Team

Every IMRT plan eventually comes down to numbers. Not impressions, not visual inspections of the dose wash, not the physicist's gut feeling after years at the console. Numbers. The conformity index, the homogeneity index, the gradient index — these are the objective language the field has settled on, and getting fluent with them matters more than most training programs acknowledge.

The Conformity Index Family

Three variants appear most often in the literature, and they are not interchangeable. Knowing which one a published benchmark is using changes whether your plan passes or fails on paper.

CI_RTOG is the original formulation from the RTOG radiosurgery protocols: the ratio of the total volume receiving the prescription dose or more (V_Rx) to the planning target volume (PTV). Simple, fast to compute, and present in virtually every treatment planning system. The acceptable range is typically 1.0 to 2.0 for SRS targets; for IMRT of larger volumes such as prostate, the acceptable band tightens to roughly 0.95 to 1.15. A value below 1.0 means the prescription isodose surface fails to cover the entire PTV. Above 2.0 means twice the PTV volume is receiving a dose that was meant only for the target.

Paddick CI corrects a real limitation of CI_RTOG. CI_RTOG does not account for where V_Rx falls relative to the PTV — a plan can place prescription dose entirely outside the PTV and still report a CI_RTOG of 1.0. The Paddick index penalizes that geometry: Paddick CI = (TV_PIV)² / (TV × PIV), where TV_PIV is the intersection of target volume and prescription isodose volume. Perfect conformity gives Paddick CI = 1; lower values indicate geographic miss or unnecessary spillover. For frameless SRS and ablative SBRT we find Paddick CI more clinically informative than CI_RTOG, precisely because those treatments tolerate smaller margins and less forgiving geometries.

CI_van't Riet (sometimes written the conformation number, or CN) takes a similar intersection approach and can be expressed as: CN = (TV_PIV / TV) × (TV_PIV / PIV). This decomposes into two interpretable fractions — coverage of the target, and precision of the prescription volume. Some departments prefer this formulation because it tells you explicitly whether a deficit comes from undercoverage or from excess irradiation of surrounding tissue. Acceptable CN values cluster above 0.6 for most IMRT indications.

The practical question is which one your TPS reports and which one your protocol benchmarks assume. Using Paddick CI thresholds against a CI_RTOG output will produce systematic errors. Not subtle errors. Errors that matter clinically.

Homogeneity Index: The ICRU 83 Definition

While the CI family addresses the spatial extent of dose, the homogeneity index (HI) addresses internal dose distribution within the PTV. The ICRU Report 83 definition is now the standard for IMRT:

HI = (D_2% − D_98%) / D_50%

Here, D_2% is the dose received by the hottest 2% of the PTV volume (the near-maximum), D_98% is the dose received by the coldest 2% (the near-minimum), and D_50% is the median dose. A perfectly homogeneous plan yields HI = 0. In practice, acceptable HI thresholds depend on site and technique.

For prostate IMRT, HI < 0.15 is a commonly cited benchmark. For head and neck IMRT, where the PTV is often irregular and adjacent to serially organized structures, HI < 0.20 reflects realistic planning constraints. Going beyond 0.20 in H&N typically means there is a hot spot somewhere inside the PTV that warrants attention, even if it is technically within the PTV boundary.

Worth stating plainly: HI and CI measure different dimensions of plan quality. A plan can have excellent conformity and poor homogeneity. It can have an ideal HI and completely inadequate conformity. We see both failure modes in clinical data.

Gradient Index

The gradient index (GI) quantifies how steeply dose falls off outside the prescription isodose surface. The commonly used formulation is GI = V_50%Rx / V_Rx — the ratio of the volume receiving half the prescription dose to the volume receiving the full prescription dose. A lower GI indicates a sharper dose gradient and less low-dose spillage into surrounding tissue.

GI becomes most relevant in SRS and SBRT contexts where the rationale for treatment often depends on achieving ablative doses within a small target while protecting adjacent critical structures. For IMRT of prostate or pelvic targets, GI is tracked but carries less weight in protocol-level evaluation than CI or HI. For SBRT of lung or spine, a GI above 3.5 would prompt a serious look at beam arrangement and optimization objectives.

When Each Metric Drives the Decision

Planners sometimes treat CI and HI as parallel quality checks that can be optimized independently. That framing causes avoidable problems.

CI is the primary criterion for OAR-sparing decisions. When the PTV is close to a serial structure — spinal cord, optic nerve, bowel — reducing CI means pulling the prescription isodose tighter, which means reducing dose margin around the PTV, which means accepting slightly reduced target coverage or increasing the number of beams. The tradeoff is real and deliberate.

HI governs hot-spot control. Hot spots arise from beam convergence and are often unavoidable in plans with complex geometry. But hot spots inside or adjacent to critical structures matter clinically. Unchecked HI optimization in pursuit of low CI can push D_2% well above prescription dose in exactly the wrong anatomical locations. This is the trap.

Optimizing CI aggressively can drive HI out of the acceptable range. Not occasionally. Systematically, when CI constraints are tightened without explicit HI constraints in the optimization objective. We have seen this pattern repeatedly: a planner achieves excellent CI_RTOG of 0.98, passes conformity review, and the HI is sitting at 0.23 for a prostate case where the protocol asks for < 0.15. The hot spot is buried in the posterior PTV margin, away from the bladder and rectum, so it passes OAR review. It does not pass a rigorous plan quality review.

Acceptable Ranges and TG-218 Benchmarks

AAPM TG-218 establishes action levels and tolerance limits for patient-specific IMRT QA, but its framework for composite plan quality metrics provides a useful reference structure even beyond its gamma analysis context. The report distinguishes tolerance limits (plans that fail require intervention before treatment) from action limits (plans that trigger review and documentation).

For conformity, widely cited acceptable ranges are:

CI_RTOG: 0.70 to 1.20 for most IMRT indications (tighter for SRS: 1.0 to 1.5)
Paddick CI: ≥ 0.75 for SBRT targets; ≥ 0.65 for IMRT of larger volumes
CN (van't Riet): ≥ 0.60 for standard fractionation IMRT

For homogeneity by site:

Site	Technique	Acceptable HI
Prostate	IMRT (normofractionation)	< 0.15
Head and neck	IMRT (simultaneous integrated boost)	< 0.20
Lung SBRT	Ablative, 3-5 fractions	< 0.25
Breast (partial)	APBI	< 0.20

These are not universal standards — protocol documents from RTOG, JCOG, and institutional physics committees vary. What matters is that your department has defined thresholds, that those thresholds are applied consistently, and that exceptions are documented with clinical rationale rather than silently approved.

Automated vs. Manual Computation

Manual extraction of CI and HI from DVH reports is prone to transcription error and is not scalable when plan volume increases. Most modern treatment planning systems compute these metrics automatically, though the definitions vary by vendor and version. Eclipse reports a conformity number using a specific definition that may not match the Paddick or van't Riet formulations a protocol assumes. RayStation allows configurable objective functions but requires explicit setup to report standardized HI per ICRU 83. Pinnacle and Monaco have their own metric dashboards.

The risk of vendor heterogeneity is real. A department running multiple TPS platforms — common after system transitions or in multi-site groups — can end up comparing metrics that were computed with different formulas. Standardized external computation, whether through in-house scripting or a dedicated QA platform, removes this ambiguity. Consistent. Auditable. Required for multi-center trial participation where protocol compliance is verified centrally.

How AIVOT Reports These Metrics

AIVOT computes CI_RTOG, Paddick CI, ICRU 83 HI, and GI for every plan it evaluates, using definitions that are fixed and version-controlled rather than dependent on the TPS configuration state. The values appear in the planner's review screen alongside color-coded status indicators that reference the department-configured acceptable ranges, not generic defaults. Yellow means approaching the action limit. Red means the plan requires documented justification or replanning before approval.

In our audit of 240 IMRT plans processed through AIVOT across four community hospital sites, 91% fell within the CI acceptable band of 0.80 to 1.15. Of the 9% outside that range, roughly two-thirds were prostate plans from a single site where the planning protocol had not been updated to reflect a template change introduced the prior year. Protocol drift. Detectable only through systematic metric tracking.

The remaining one-third were anatomically complex cases — pelvic sidewall involvement, post-surgical anatomy — where deviation from the CI target was deliberate and documented. That distinction matters. A deviation with documentation is clinical judgment. A deviation without documentation is a quality gap.

The Takeaway

Numbers are not neutral. They encode specific assumptions about what good planning looks like, and those assumptions have to match across the definition your TPS uses, the benchmark your protocol cites, and the threshold your QA system flags. When those three are aligned, metrics work. When they drift apart, you can pass every review and still have a plan that falls short by the standard you intended to apply.

Conformity and homogeneity are not in tension by nature. They become tense only when optimization drives one without accounting for the other. Set both as explicit objectives. Track both systematically. And when a plan deviates from either, make sure the documentation reflects a decision, not an oversight.

Discover how AIVOT scores conformity and homogeneity metrics against institutional benchmarks automatically.

Request Demo