We ran the numbers last autumn. Twelve months of prostate plans, 87 cases, one institution. The mean rectal V70 had drifted 4.2 percentage points above the written protocol limit — not in any single plan dramatically enough to trigger peer review, but steadily, across the whole cohort. Nobody set out to loosen the constraint. It happened through accumulated small decisions: a planner copying a prior plan as a starting point, a physician approving a case that was close enough, a vendor upgrade that changed the optimizer's default weighting. The protocol document had not changed once.
That is protocol drift. And it is far more common than most departments realize.
Defining the Problem
Protocol drift is the divergence between what an institution's written planning protocol specifies and what is actually being delivered to patients over time. It is not individual outliers — those are handled through incident learning. Drift is a population-level shift. The institutional distribution of key DVH metrics moves away from the protocol target, gradually, without any single case crossing a hard constraint.
Several mechanisms drive it. Dosimetrist turnover is the most common. When a senior planner leaves, the informal knowledge of why certain constraint values were chosen leaves with them. New planners inherit starting-point plans rather than first principles. Each generation of copy-from-previous carries the accumulated preferences of whoever last touched the case.
Vendor upgrades are the second major driver. Optimizer behavior changes across software versions. A constraint that reliably achieved mean parotid <26 Gy on version N may trend toward 28 Gy on version N+1 because the internal penalty weighting shifted. If nobody runs a before-and-after cohort comparison, the change is invisible until the next audit — if there is one.
The third mechanism is physician preference layering. Attending physicians develop working relationships with specific dosimetrists. They approve plans that reflect their individual priorities. Over time, these preferences accumulate in the case archive without being formally incorporated into the protocol. The protocol says one thing; the living practice says another.
What an Annual Retrospective Audit Looks Like
The audit methodology is not complicated. Pull 12 months of completed plans by disease site. Start with the four highest-volume sites: prostate, head and neck, breast, lung. For each site, extract the key OAR metrics specified in the written protocol — not the constraints you think planners use, the ones actually in the document.
Compute the institutional distribution for each metric. Mean, standard deviation, 10th and 90th percentile. Then compare those distributions to the protocol target values. Flag any metric where the institutional mean sits more than one standard deviation from the protocol target. Flag any case where a specific metric falls more than two standard deviations from the institutional mean — those are cases that warrant individual clinical review.
The >2 SD flag is not a compliance violation by definition. It is a question: was this a documented clinical override, or an undocumented planning decision? That distinction matters. Valid physician overrides with chart documentation are not drift. They are intentional, case-specific departures. The audit is designed to separate those from cases where nobody chose anything — the plan just landed where it landed because of accumulated institutional habit.
Specific Examples Worth Naming
Parotid mean dose creep is one of the most reliably documented patterns. A protocol specifying mean <26 Gy bilateral frequently shows a cohort mean of 27–29 Gy within 18 months of a planner transition. The constraint is not being violated in any individual plan — the optimizer respects hard limits. But the soft target is being systematically deprioritized as planners optimize toward PTV coverage first and accept whatever OAR sparing results.
Bladder V65 creep in prostate plans follows the same pattern. The written protocol specifies V65 <17%. The institutional cohort mean drifts to 19–22% over two years. Each plan was reviewed. None were rejected. The drift is invisible plan-by-plan and visible only in aggregate.
Lung mean dose is a third example. QUANTEC guidance is specific. But if the planning system's beam angle optimization defaults changed, or a new CT scanner altered Hounsfield unit calibration, the mean dose distribution shifts without any planner making a deliberate choice.
Tools and Data Sources
The audit requires three data sources working together. First, the plan archive — every approved plan from the audit period, with DVH data intact. Mosaiq and Aria both support scripted DVH extraction; the Aria scripting API will pull structured data for batch cohort analysis without manual export. Second, the written protocol document, converted into a structured constraint table that the analysis can query against. Third, a record of documented overrides — chart notes, physics consult documentation, anything that formally records a physician's decision to deviate.
Airato's plan-library analytics module handles the cohort extraction and distribution computation. You define the protocol table, the tool pulls the relevant DVH points, and the output is a per-metric distribution plot with protocol reference lines and individual case flags. For departments without that tooling, the same analysis can be done with Aria scripts and a spreadsheet — it takes longer and the visualization is less clean, but the methodology is identical.
The critical input is the constraint table. I have seen audits fail because the analyst used constraint values from memory rather than the current protocol document. Protocol documents themselves drift; the written version and the version in people's heads diverge. The audit should start with a fresh read of the actual document.
What the Tohoku Audit Found
In a retrospective audit we conducted across a 14-month plan archive at a Tohoku-region institution, 23% of cases sat outside the intended protocol window for at least one key metric without any documented clinical rationale. That is roughly one in four patients receiving a plan that diverged from institutional intent — not catastrophically, not in a way that would have been caught by standard peer review, but measurably.
The distribution was not uniform across sites. Head and neck showed the highest drift rate: 31% of cases had at least one parotid or spinal cord metric outside the protocol target band. Prostate was second at 24%. Breast and lung were lower, 15% and 18% respectively, possibly because those sites had more recently undergone protocol revision.
One finding surprised the clinical team: the cases most likely to sit outside protocol were not new planners' cases. They were cases planned by the most experienced dosimetrists — the ones whose informal knowledge had accumulated over years and whose intuitions had diverged furthest from the written document. That is worth noting. Drift is not a training problem. It is a documentation and feedback problem.
Policy Response: What Actually Works
The Tohoku institution implemented three changes after the audit. First, a quarterly peer review cohort pull — not individual plan review, but a statistical summary of the prior quarter's cases against protocol targets, presented at the physics QA meeting. Second, a formal protocol renewal cycle: every disease-site protocol is reviewed and re-ratified annually, with the renewal date documented and the responsible physician named. Third, an onboarding checklist for new dosimetrists that requires reading the current protocol document and completing a protocol-constraint quiz before planning independently.
The quarterly pull is the most operationally important of the three. It creates a regular feedback loop between what is being planned and what the protocol specifies. Drift that would have taken two years to accumulate visibly is now caught within a quarter.
Protocol renewal matters because it forces the question: do we still believe this constraint is right? Sometimes the answer is no. The correct response to finding that the institutional mean has drifted to mean parotid 28 Gy is not always to tighten back to 26 Gy. Sometimes the clinical team reviews the evidence, consults updated literature, and concludes that 28 Gy is appropriate given their patient population. That is a different outcome from drift. That is protocol evolution. The audit distinguishes one from the other.
What Drift Is Not
Worth being explicit. A physician reviewing a case and deciding that the protocol constraint should not govern this patient — that is not drift. That is clinical judgment. The audit is not designed to penalize it. What matters is that the decision is documented: why this case, what the tradeoff was, what the physician chose.
An institution where 8% of cases have documented overrides and 2% have undocumented deviations is in a different position than one where 23% of cases sit outside protocol with no documentation. The first is a functioning clinical governance system. The second is drift.
Annual retrospective audits are the mechanism for knowing which one you are.