A pilot that runs on a frozen snapshot of clean data, with no auth, no rate limits, and a forgiving audience, will almost always succeed. That is what makes it dangerous. It proves the easy half of the problem and quietly defers the hard half.
The sandbox lets you skip the real work
Production is not a bigger sandbox. It is a different system. The data is live, partial, and contradictory. The same question arrives a thousand times an hour from people who did not read the demo script. Permissions matter. Latency matters. Being wrong in front of a customer matters.
- The demo used a curated set of inputs. Production sends you the inputs nobody anticipated.
- The demo had no notion of who is asking. Production has to honor what each person is allowed to see.
- The demo measured whether it looked impressive. Production measures whether it can be trusted twice in a row.
Build the boring half first
The teams whose pilots ship are the ones who treat the unglamorous parts — access control, error handling, the audit trail, the path for when the model is unsure — as the actual project, not as cleanup. The model was never the risk. The system around it was.
A pilot proves the idea is possible. It tells you almost nothing about whether it will survive contact with a Tuesday.
So we scope pilots to answer the question that actually decides the project: not can this work once, but will this hold under real load, real data, and real consequences. Everything else is theater.