Why Your Modern Data Platform Is Not Enough
A conversation with Mihir Shah
The premise sounds provocative, but the argument is precise: the modern data stack, as most firms have built it, was never designed to run a business. It was designed to analyse one. That distinction, and what to do about it, was the thread running through this conversation between Gus Sekhon of FINBOURNE Technology and Mihir Shah, who spent three decades as CTO and CIO at one of the world’s largest investment organisations, overseeing not one but two major re-architecture programmes.
The blind spots in the modern data stack
Mihir opened by tracing the phrase “modern data platform” back to its origins, a well-circulated paper from a16z that mapped out a reference architecture for data teams. The paper was influential, but it had two significant gaps. First, it addressed only the analytics side of the house: warehouses, lakes, pipelines. It said little about the operational systems that actually run a business day to day. Second, and more fundamentally, it did not mention data models. It catalogued technologies while leaving out the most important ingredient.
Those two omissions, operational data and the data model, are precisely what Mihir identifies as the foundations of a genuinely modern data architecture. And both have become more consequential, not less, in the age of AI.
Two transformations, two different problems
Over his tenure, Mihir led two separate re-architecture programmes, each driven by a distinct problem.
The first came when he became CTO of the asset management division. The business had grown through rapid organic expansion into nine separate business units, each with its own IT stack. The result was hundreds of applications sitting on top of equal number of databases. This was problematic for many functions, for example, Risk and exposure could not be aggregated at firm level without overnight batch jobs. The firm asked if we could grow the AUM fourfold on the existing platform and the answer was No, we cannot scale the current platform. A third constraint was capacity: covering more names, sectors and geographies to deploy larger pools of capital without a proportional increase in research headcount.
The second transformation, more recently, was a firm-wide initiative. The firm had six or seven business units spanning brokerage, pensions, asset management and more, sitting on approximately 180 separate data warehouses. Corporate functions like finance, risk and audit could not aggregate data across units to do their jobs. The strategic motivation, identified around 2018, was that AI was coming, that every firm would have access to the same models and algorithms, and that the real differentiator would be 50 years of proprietary data, provided it was organised and accessible.
Two transformations, two different motivations. One to run the business better. One to enable AI.
Why you start with the data model
The default way applications get built is function-first approach: define the business process, build the app, add the database last. The result is data fragmentation, with each application owning its own slice of data and integration happening as an afterthought, usually in a downstream warehouse.
The alternative is to start with the data model. As Mihir pointed out, the conceptual data model for an asset manager has perhaps ten core entities: portfolios, positions, trades, securities. Those entities and their relationships have not changed in 50 years. Building a shared operational data store using stable data model and building applications on top of a common data platform means integration is built in from the start, not retrofitted. You get a firm-wide view of your business while you are running it, not only when you query a warehouse the following morning.
The warehouse still has a very significant role, specifically for deep history, behavioural analysis, and machine learning workloads. But the design principles are fundamentally different.
Operational systems are designed around known queries and known transaction patterns, so you can fine-tune for speed and reliability. Warehouses are designed for unknown, ad hoc queries and workloads. Trying to run your operations off a data warehouse is architecturally wrong.
Why this matters now: AI changes the stakes
Large language models operate on language and context. For unstructured data, documents provide that context. For structured data, the context comes from table and column names, from a glossary of business terms, from an ontology that says what a position is, what a portfolio is, how they relate. Without that semantic layer, an LLM operating on a fragmented set of databases with inconsistent naming conventions has very little to work with.
This is why, as Mihir observed, data modelling, which had become something of a dying art, has suddenly become critical again. The firms that invested in getting their data models right are now in a significantly stronger position than those that did not.
For FINBOURNE, this validation is tangible. The argument for starting with the data model, maintaining a correct ontology, and building a shared operational store is one the company has been making since its founding. It is increasingly the argument the market is making back.
The headless asset management platform
One of the more forward-looking ideas to emerge from the conversation was the concept of FINBOURNE as a headless asset management platform. The principle is the same: decouple the data layer from the presentation and workflow layer. The data model, the APIs, the connections to market data are stable and long-lived. The UI and the workflows on top of them change constantly, through reorganisations, changing preferences, and new tooling.
If the platform exposes clean APIs and a well-defined semantic model, organisations can build their own interfaces, their own workflows, and increasingly their own AI-driven query layers on top. Natural language querying of operational data is already happening with some FINBOURNE clients. The barrier between technical and operational users starts to fall away.
This also sharpens the challenge for incumbents. Tightly coupled front-to-back systems, where UI, middle tier and data are all tangled together, cannot easily be modernised. Even with investment and intent, changing one layer means touching the others. With a large installed base, that creates a near-paralysis. Mihir was direct about this: incumbents are stuck, and it will be slow going regardless of how much resource they apply.
On implementation
For organisations that cannot do a big-bang replacement (which is most of them), Mihir’s recommended path is pragmatic: start by building an operational data store, stream data from existing source systems into it, and use that as the first instantiation of a unified enterprise data model. Onboard functions that most need a firm-wide view first, typically corporate risk, compliance and finance, then migrate others progressively over a period of years. Snowflake or Databricks as a downstream warehouse is entirely compatible with this approach; the key is that the semantic model defined in the operational layer carries through.
About Mihir Shah
Mihir served as CTO of Fidelity Asset Management, where he led the transformation of the entire investment platform, and as CIO responsible for Data, where he shaped the Data and Analytics strategy across one of the world’s largest investment organisations. Since retirement, he has advised leading asset managers and advisory firms on data strategy through his association with EY.