Last week the Bank of England (BoE) and Financial Conduct Authority (FCA) released the results of their Machine learning in UK financial services joint survey. The paper confirmed what most of us already know, which is that machine learning (ML) is being used increasingly in the investment management industry. But it also confirmed a lesser talked about fact – legacy systems and data limitations are major barriers to implementing an effective machine learning strategy. In this blog post, we explore how the right technology can help investment managers overcome these barriers.
The #1 barrier to ML deployment
Many investment management firms suffer from legacy system issues. These often occur after merger activity when new systems are ‘stitched’ together, or after expansion into new asset classes when expertise and systems become ‘silos’. The end-result is a highly inefficient, gargantuan IT system with endless workarounds, manual processes and production issues. Legacy systems are the main barrier to ML deployment, and this is reflected in the BoE and FCA report.
Legacy systems restrict your access to data
Effective machine learning models need to be fed volumes of clean structured data. For investment managers this means transactions, strategies, positions, market and economic data (to list only a few sources). However, Chief Data Officers of firms with legacy systems face many issues maintaining this pipeline of data:
- “How can our stakeholders access data?”- The legacy system isn’t usually one enterprise system but rather a ‘hodgepodge’ of smaller systems. This means critical data could be stored everywhere and anywhere, locked into siloed databases or even (god forbid!) an Excel file on an individual user’s local hard drive. This creates access issues. You could have dozens of data stores each with their own access protocols and procedures.
- “Does our data have meaning?” – This is fundamentally a question of data ontology. The data within legacy systems may not have formal definitions of attributes and properties, or any standard description of the relations between data sets in the same environment. Undefined data sets can lead to incorrect results in downstream processes. For example, an analytics engine might be calling {bond price} from two separate source each with a different definition of price.
- ”How can we integrate new sources of data?” – Legacy systems often have rigid data schemas so building adaptors for new sources of data can require significant time and resources. Also, you may be duplicating your efforts if two separate systems require the same data (example: a separate UK and US Equity IBOR needing the same set of new ESG data).
- “Who can see and use our data?” – Regulations and expensive licensing impose strict rules about who can access what data. Many legacy systems cannot accommodate the detailed entitlements model required to provide users the access they need without exposing restricted or sensitive data.
Make the most of your resources and people
We believe that investment managers should focus on what you do best – managing investments and delivering risk-adjusted returns to your clients. You shouldn’t spend undue energy and resources maintaining legacy IT systems which are inadequate for the current data landscape…
That’s why we’ve been busy building a modern investment data platform from the ground-up around several core principles future-proofing your data for whatever machine learning you want to throw at it. Here’s what you need from your technology stack to ensure you can implement an effective machine learning strategy:
- A platform that can consume new sources of data with ease. You want a platform with a flexible data model to accommodate the loading of new data sources, removing the need for lengthy software deployments or painful mapping exercises.
- A framework to build a full ontology of your data. You want your data structured around clear properties and attributes. Fields should have distinct data types (e.g. “text” versus “numeric”) and the platform should allow users to define relations between data sets. You want to understand how this data is used so you can answer all the main data ontology questions – What data do I have? What does it mean? Where did the data come from and who loaded it? Who can access it? How can it be used?
- A fully open-API platform. This means that all your data is available to you via standardised APIs. SDKs for all major programming languages will facilitate easy integrations of your data into your own suite of applications. You also want a granular entitlements model so platform administrators can have precise control over all read and write access.
- A platform born in the cloud – this means you can login and create an account within minutes, use it incrementally and don’t need to commit to lengthy integrations. This also means that your environment scales quickly and benefits from all the latest cloud security.
If a platform like this sounds like something you’d like to try, speak to us today to learn all about how LUSID works and the value it can deliver to your firm.