When the AI Worked Fine. We Were the Problem.

There is a version of the AI-in-banking story that gets told at conferences. The version with the elegant architecture diagram, the impressive accuracy metrics, the CTO on stage explaining how they transformed their compliance operations in eighteen months. Applause. Networking drinks. Everyone flies home feeling slightly inadequate.

Then there is the version I lived.

In 2022, I was part of a team running AI pilots across GRC functions inside a large bank. Six pilots. Reasonable budget. Good vendor relationships. Genuine executive appetite — the rare kind where people actually showed up to the steering committee rather than sending a delegate to take notes. By every measure that matters before you start, we had the conditions for success.

Every single pilot stalled at the same wall.

The Wall Nobody Puts in the Business Case

The wall was not the technology. I want to be clear about that, because the easy narrative — the one that protects everyone’s prior decisions — is to blame the vendor, blame the model, blame AI for not being ready. That narrative is comfortable and wrong.

The wall was our data. Specifically, the state of it. Which is to say: the state of twenty years of accumulated decisions, mergers, system migrations, and the quiet institutional habit of solving urgent problems without fixing underlying ones.

Here is the moment I remember most clearly. We had a transaction monitoring model that was genuinely impressive in the sandbox. Clean inputs, clean outputs, the kind of performance that makes a proof-of-concept presentation feel like a TED talk. We moved it toward live deployment and ran the first serious data profiling exercise against our actual production environment.

Our data taxonomy contained seventeen different definitions of “beneficial owner” across legacy systems.

Seventeen.

The model was not confused. The model was doing exactly what models do — processing the inputs it received according to the logic it had learned. We were confused. We had built seventeen slightly different answers to the same regulatory question, across different systems, over different years, and we had simply never needed to look at them all in the same room before. AI did not create that problem. It just turned on the lights.

I spent approximately forty-eight hours after that discovery in a state that I can only describe as professionally humbling. I had been in regulated financial services long enough to know that data quality was important. I had said the words “data governance” in enough board papers to have strong opinions about font size. And yet here I was, learning that my organisation did not have a consistent answer to a question that regulators had been asking for years. The AI pilot did not fail. It diagnosed.

What the POC Was Actually Testing

The proof-of-concept works because you give it clean, curated data. That is not a criticism of how pilots are designed — it is simply what they are. You select a representative sample, you prepare it, you load it, and you measure performance against a problem you have defined carefully. The POC answers the question: can this technology do what we hope, given the right conditions?

That is a useful question. It is not the question that matters.

The question that matters is: what are the actual conditions inside this organisation? And the answer to that question, in most large banks I have worked with or alongside, is some variation of: complicated, undocumented, and older than the team currently responsible for it. The gap between sandbox and production is not a technical gap. It is a data archaeology gap.

This is the insight that took me too long to reach, and I say that as someone who had read enough about data quality to have formed opinions about it. Knowing that data quality matters is not the same as knowing what it looks like when your specific data estate is the problem. Those are different kinds of knowing, and only one of them is available before you start the work.

Why Transformation Announcements Age Badly

There is a related pattern I have observed across organisations — and I wrote about something adjacent to this when reflecting on regulatory conversations in Toronto around crypto adoption, where the same dynamic appears in a different form: the gap between announcing transformation and executing it tends to open exactly at the point where unglamorous remediation work needs to happen and nobody wants to own it.

AI deployment in regulated banking is not primarily a technology programme. It is a data remediation programme with a model at the end of it. The model is, in some ways, the reward for doing the hard work — not the hard work itself.

The six months of data taxonomy reconciliation, legacy system mapping, beneficial ownership standardisation, and governance framework alignment that has to precede meaningful AI deployment: that work does not make it into the press release. It does not generate a conference talk. It does not produce a metric that looks good in a board update. It produces the conditions under which the actual work becomes possible.

Senior leaders who understand this — and I have met perhaps a handful who genuinely do — structure their AI programmes accordingly. They do not budget for a pilot and a deployment. They budget for a data programme, a remediation phase, a governance layer, and then, eventually, a model. The timeline feels slower. It also actually works.

The connection to cyber risk is worth naming too: as I explored in a previous piece on the human side of cyber risk, the most dangerous vulnerabilities in complex organisations are rarely the ones the technology missed. They are the ones the organisation had quietly decided not to look at. Data debt in AI deployment has the same character.

What This Means If You Are Planning the Next Pilot

If you are a risk, compliance, or technology leader in a regulated institution and you are looking at your AI roadmap for the next eighteen months, here is the most useful reframe I can offer: before you approve the next proof-of-concept, run a data readiness assessment against the domain you are targeting. Not a tick-box exercise — an honest one, with the people who actually maintain the systems in the room. You will learn more in two weeks of that work than in six months of a pilot that is quietly testing curated data rather than your real estate.

The findings will be uncomfortable. In my experience, they are always uncomfortable. That discomfort is the value. It tells you what the transformation programme actually is, rather than what the vendor deck implied it would be.

Fix that first. Then talk about transformation.

For more on what AI deployment actually looks like inside regulated industries — including the parts that do not make the conference agenda — follow Laksh Vaswani on LinkedIn.