Pilots often succeed because a small team âholds it togetherâ with informal knowledge, heroic workarounds, and high-touch oversight. Scaling fails when that informal layer is removed and the model is copied without the operating conditions that made it work. This article sits within Scaling What Works and connects to commissioning structures in Integrated Funding Pilots, focusing on the operational redesign required to make a model reliable across multiple sites, workforces, and partner systems.
Why scaling fails even when the pilot âworkedâ
Most scaling failures are operational, not clinical. A pilot can compensate for unclear roles, missing data, and inconsistent escalation because leaders are close to the work. Once scaled, small inconsistencies become system defects: missed follow-ups, uneven eligibility decisions, and different interpretations of âwhat good looks like.â
Scaling is the process of converting a successful practice into an operating model: defined workflows, training and supervision routines, governance, and evidence artifacts that hold up in audits and contract management.
System expectations leaders must meet
Expectation 1: A defined, auditable model of care (not just an intervention)
Funders and commissioners increasingly expect services to describe the model as an end-to-end pathway: how clients enter, how risk is assessed, what touchpoints are required, what escalation thresholds exist, and what documentation proves delivery. Scaling is evaluated on whether the pathway is repeatable and auditable, not just whether outcomes look good in one site.
Expectation 2: Governance that controls drift, quality risk, and equity performance
When models scale, they drift. Oversight bodies expect leaders to show how they control drift through fidelity monitoring, supervision structures, incident review, and subgroup performance monitoring (e.g., whether access and outcomes differ by language, disability, rurality, or housing instability). âWe will monitor qualityâ is not enough; the governance mechanism must be explicit.
What a scalable operating model contains
A scalable model has three layers. First is the core workflow: the minimum steps that must happen every time for safety and intended outcomes. Second is the enabling system: staffing roles, training, tools, and partner interfaces that make the workflow possible. Third is assurance: the evidence trail, audits, and review routines that detect failures early and trigger corrective action.
Scaling decisions should also separate âcoreâ from âadaptableâ elements. The core should be protected through fidelity controls; adaptations should be permitted but documented, so variation is intentional rather than accidental.
Operational example 1: Converting a pilot into a standard workflow with time standards
What happens in day-to-day delivery: Leaders translate pilot practice into a step-by-step workflow: eligibility criteria, intake script, risk stratification tool, required touchpoints, and escalation rules. Each step has a time standard (e.g., intake completed within 48 hours; follow-up within 24 hours after a missed contact in a high-risk tier). Supervisors run daily exception reports (missed time standards, incomplete assessments) and hold brief huddles to assign corrective actions. Documentation templates mirror the workflow so staff can evidence completion without rewriting notes.
Why the practice exists (failure mode it addresses): In pilots, reliability is often achieved through informal memory and leadership proximity. Scaling exposes variability: different sites interpret steps differently, and time-critical actions slip.
What goes wrong if it is absent: The model becomes âstyle-based.â Some teams do all steps, others skip or delay them, and outcomes become inconsistent. Commissioners see uneven performance, and safety incidents rise because escalation triggers are applied late or not at all.
What observable outcome it produces: The program can report workflow completion rates, timeliness compliance, and reductions in missed-contact escalations. Audit samples show consistent delivery across sites, not just strong outcomes in one location.
Operational example 2: Building a replication package (playbook) that survives staff turnover
What happens in day-to-day delivery: The scaling team creates a replication package: role descriptions, training modules, shadowing plans, supervision checklists, and a âfirst 30 daysâ onboarding pathway for new staff. A designated implementation lead runs weekly case-based coaching sessions using real service data (e.g., missed touchpoints, high escalation rates, documentation defects). Site leads receive a launch checklist and must demonstrate readiness (staff trained, tools configured, partner contacts established) before going live.
Why the practice exists (failure mode it addresses): Staff turnover and rapid hiring are the norm in community services. Without a replication package, the model degrades as experienced staff leave and tacit knowledge disappears.
What goes wrong if it is absent: Training becomes inconsistent, new staff rely on informal advice, and critical steps are missed. Quality becomes person-dependent, and scaling amplifies that fragility across sites.
What observable outcome it produces: Leaders can evidence training completion, supervision frequency, and competency checks. Variation in key metrics narrows across teams over time, indicating the model is being delivered consistently rather than dependent on âstarâ staff.
Operational example 3: Establishing an assurance loop that detects drift and fixes root causes
What happens in day-to-day delivery: The program sets a monthly assurance cycle: (1) a dashboard review of core process measures (touchpoint completion, timeliness, escalation adherence), (2) a sample-based audit of records to validate documentation quality, and (3) a structured incident/near-miss review that maps failures to workflow steps. Corrective actions are tracked in an improvement log with named owners and deadlines. If drift is detected at a site, the response is standardized: targeted coaching, additional supervision, and a short re-audit window to confirm improvement.
Why the practice exists (failure mode it addresses): Drift is predictable when services scale. Without an assurance loop, leaders discover problems only after outcomes worsen, complaints rise, or contract performance is challenged.
What goes wrong if it is absent: Problems persist long enough to become systemicâe.g., escalations routinely delayed, assessments inconsistently completed, or partner interfaces failing. The program then requires a disruptive âreset,â damaging trust and continuity.
What observable outcome it produces: Earlier detection of defects, fewer repeated incidents, improved documentation completeness, and demonstrable corrective action closure. Commissioners gain confidence because the program can show not only performance but also how it manages risk.
Making scaling contractable
Scaling becomes contractable when the operating model is written into measurable requirements: workflow steps, time standards, evidence artifacts, audit rights, and governance cadence. This shifts scaling from a promise (âwe will expandâ) to a managed delivery method (âthis is how we will deliver, prove, and improveâ).
What âscaledâ really means
A model is scaled when it can perform reliably without heroic effort: new sites can launch using the replication package, staff can deliver using the workflow and tools, and leaders can evidence quality and outcomes through routine assurance. That is what turns âwhat workedâ into âwhat works here, every day.â