We did not expect it to work this well. That was the honest reaction from the dosimetry team at Tohoku University Hospital after the first 90-day read-out of the AIVOT pilot. Six months in, the numbers held.
Between March and September 2025, Airato ran a prospective pilot of AIVOT, its automated dose prediction engine, across Tohoku University Hospital and three regional satellite clinics in Miyagi Prefecture. The study enrolled 300 patients: 180 prostate and 120 head-and-neck (H&N) cases. Both cohorts used standard fractionated IMRT. SBRT cases were deliberately excluded — the training dataset contained fewer than 40 SBRT patients per site, not enough to characterise the wider dose-volume distribution variance reliably. Re-irradiation cases were flagged for manual dosimetrist override from day one.
Study design
The pilot was structured as a prospective single-arm comparison against institutional historical controls. Each participating clinic's historical median plan KPIs — approval rate, time-to-first-plan, dosimetrist time, OAR mean dose — were pre-locked before AIVOT was activated. AIVOT generated dose prediction maps which the TPS (Eclipse or RayStation depending on site) used as optimisation objectives. Dosimetrists could accept, modify, or reject any AIVOT-proposed plan; acceptance logging was mandatory.
Primary endpoints were:
- First-pass plan approval rate (plan accepted without dosimetrist objective modification)
- Time-to-first-plan (CT export to first optimised plan, calendar hours)
- Dosimetrist hours per case (active time logged in TPS)
- Parotid gland mean dose reduction (H&N cohort only, both parotids averaged, Gy)
What the numbers showed
Prostate first-pass approval rate: 94% (±3.1%, 95% CI). Historical control: 71%. The delta was larger than the team's internal target of 15 percentage points. H&N approval rate: 89% (±4.2%). Historical control: 63%.
Time-to-first-plan dropped from a historical median of 2.3 hours to 38 minutes across all four sites. Standard deviation on the AIVOT arm was 11 minutes versus 54 minutes historically — the variance compression was arguably as important as the median shift. Complex H&N cases that used to occasionally run past six hours now sat reliably under 75 minutes.
Dosimetrist hours per case fell from 4.2 to 1.8 hours (prostate) and from 5.1 to 2.3 hours (H&N). Across the 300-case cohort, that is approximately 660 dosimetrist hours recovered.
Parotid mean dose: contralateral parotid mean dose fell 1.8 Gy (±0.6 Gy) relative to historical controls. Ipsilateral parotid mean dose fell 0.9 Gy (±0.7 Gy). Neither figure was dramatic in absolute terms, but both were statistically significant and consistent across sites. The satellite clinics — which historically showed more parotid sparing variability than the university centre — narrowed their inter-site spread substantially.
What did not work
Honesty requires talking about the edges as much as the centre. SBRT was excluded not as a failing of AIVOT's architecture but as a deliberate constraint: the model's confidence intervals widen when training sample sizes per fractionation scheme drop below roughly 80 cases. The pilot sites did not have that volume for SBRT. This is a dataset question, not a model question, but clinicians deploying AIVOT in low-volume SBRT settings should treat dose predictions as starting points requiring closer review, not as ready-to-optimise objectives.
Re-irradiation was harder. The model has no reliable way to account for prior dose history from an external plan — it lacks access to archived cumulative DVH data in most clinic configurations. Every re-irradiation case in this pilot was manually planned. Five cases were enrolled and all five required full manual override. We flagged this clearly in the interim report and it remains an open engineering item.
There were also three prostate cases in the first four weeks where AIVOT proposed rectum mean dose objectives that the treating physician considered insufficiently conservative. The plans were manually tightened. We traced this to a data preprocessing artefact in one satellite clinic's CT export pipeline that briefly inflated the rectum structure volume. Resolved in week five; no similar cases occurred afterward.
Deploying AI in a community setting: what the training actually required
The satellite clinics ranged from a 4-linac department in Ishinomaki to a 2-linac unit in Shiroishi. Neither had prior experience with AI-assisted planning tools. The dosimetry teams were experienced — average 11 years per FTE — but sceptical.
Training time per dosimetrist: eight hours. Not weeks. The eight-hour figure is worth examining. It breaks down as two hours of conceptual orientation (what AIVOT predicts and how confidence intervals should influence review priority), three hours of supervised hands-on review of historical cases with known outcomes, and three hours of live case practice with a senior dosimetrist present. That was sufficient for independent AIVOT-assisted planning at all four sites.
We had internally budgeted for 20-hour onboarding. The actual eight-hour figure emerged from user observation: dosimetrists who had been planning manually for a decade did not need extensive model theory. They needed to trust the output, know when to distrust it, and understand the modification controls. Those are operational questions, not statistical ones.
Dr. Tanaka, the dosimetry lead at Tohoku University Hospital, put it this way in the post-pilot debrief: the tool felt like getting a very well-prepared first draft that still needed her clinical eye — which is exactly the right relationship between a planning aid and a trained dosimetrist. The concern before the pilot was that automation would reduce clinical engagement with each case. What happened was the opposite: with routine objective-setting handled, she had more cognitive bandwidth for the cases that genuinely needed it.
Regulatory classification
AIVOT is classified under Japan's PMDA framework as Class I medical software (一般医療機器). It generates dose prediction data for use as optimisation objectives by a licensed treatment planning system operated by a qualified dosimetrist. It does not directly generate treatment parameters, does not interface with treatment delivery systems, and does not produce machine-executable output. This classification is consistent with the system's design intent and with how comparable AI planning aids have been treated in other PMDA submissions.
It is not Class II. The distinction matters for deployment timelines. Class II submission requires clinical evidence review cycles and pre-market notification that typically add 12–18 months to a deployment pathway. Clinics evaluating AI planning tools should understand the classification basis of any product they are considering — and verify it independently rather than accepting vendor characterisation at face value.
What comes next
The pilot data is being prepared for submission. The SBRT exclusion will remain until per-site SBRT case volume crosses our minimum threshold, which we expect at Tohoku University Hospital by late 2026. Re-irradiation support requires an architectural change in how cumulative dose history is ingested; that work is underway.
The satellite clinic data is the finding I find most meaningful. A 2-linac department in a regional city, with dosimetrists working across multiple tumour sites, achieving 89–94% first-pass approval rates and sub-75-minute planning cycles for H&N cases — that is where the gap between academic centres and community practice starts to close. Not through staffing additions most departments cannot afford. Through tools that make existing clinical judgment go further.
The numbers above are real. The pilot constraints are real. Both matter equally.
Three months after the pilot concluded, all four sites opted into commercial deployment. The satellite clinic in Shiroishi — the smallest of the four — processed its 50th AIVOT-assisted plan in December 2025. First-pass approval on that cohort was 92%. Variance stayed tight: standard deviation 13 minutes on time-to-first-plan. That kind of consistency in a low-volume community department, across a mix of junior and senior dosimetrists, is what we designed for. Not a demonstration result. A repeatable operational baseline.