Machine Learning (ML) dominates the narrative in financial services, just as in practically every other industry. In this piece we examine its current use cases and why the take up in Investment Management hasn’t been more widespread. We’ll then look at practical and achievable steps to harness relevant aspects of the innovation.
Current Use Cases for ML
- Recommender systems – Recommender systems are trained to understand preferences, previous decisions, and trade characteristics. Recommender systems are commonly used in e-commerce for bundling purchases, and have also been transitioned into finance for trade processing/fails spotting.
- New data sets – ML accelerates and enhances integration of diverse data sets in investment decision-making, reducing reliance on expensive and inconsistent human effort. ML can quickly identify patterns, trends and correlations for deeper data-driven insights. Better insights are required to support Research, Portfolio Construction, and Portfolio Management, providing a comprehensive ‘Whole Portfolio View’ with enriched context for alternative assets and private equity investments. This includes managing existing data sets like trades & positions, as well as alternative data sets such as weather forecasts for hurricane frequency. The latter is particularly important in an ESG context as regulators formalise climate stress testing. Investment teams can use this information to predict asset value fluctuations and make informed decisions.
Why hasn’t the take up in Investment Management been more widespread?
Despite the potential benefits, there are reasons why the adoption hasn’t been more widespread:
- Integration of new data sets. Incorporating new data sets into investment processes can be challenging. For example, operational data has not typically been available for ML models. While data lakes aim to address this issue, there are flaws such as the inability to understand financial information such as building positions and holdings from underlying trades. Lack of synchronization, and data being received after it has been cleaned, often resulting in missing key characteristics. The fallacy of the “golden copy” can also remove important facets of the data before learning can take place. Moreover, technical deficiencies, legitimate caution around licensing issues and quality of the answers have caused concern.
- Portfolio managers are slow to establish trust in new data sets: Promising datasets should be thoroughly tested as part of the investment model. An extensible data model which can connect to both internal operational and external alternative data is essential for risk identification. The ability to scale infrastructure and reproduce historical data is also crucial for alpha generation and once the risks are understood, decision models can be adjusted to incorporate new data inputs.
Continuous monitoring of investment decisions with a robust feedback loop will help track the effectiveness of changes.
Introduction to MLOps and Luminesce
MLOps, a shortened form of “Machine Learning Operations”, represents a comprehensive set of practices and tools specifically designed to facilitate the seamless integration, deployment, and management of machine learning models within production environments. By merging principles from both DevOps and data engineering, MLOps streamlines the entire machine learning lifecycle, promoting effective collaboration among data scientists, engineers, and operations teams. The ultimate goal of MLOps is to enhance model reliability, scalability, and monitoring, empowering organizations to deploy and maintain machine learning systems more efficiently.
Luminesce’s Role in MLOps
Luminesce (FINBOURNE’s data virtualization platform), in the context of MLOps, serves as a valuable platform that addresses the most challenging aspects of the model development cycle: data preparation and model deployment. When preparing data, the process entails exploration and understanding of sources or producers of data and then gathering, constructing and cleaning the dataset. Luminesce offers an array of functions and tools, such as lumipy syntax and stats functions, to aid this preparation process. Additionally, it packages up the workflow, enabling its repeatability through views.
Deploying a model after its training phase can pose significant difficulties; Luminesce addresses this with its native python providers, simplifying the deployment process. Users can simply write the same python code they used to test the model and then initiate the deployment process. For users who prefer visual tools or SQL, Luminesce allows seamless integration with visualization tools like Tableau or Power BI, making these models available to and usable by a broader set of stakeholders.
Large Language Models (LLMs) and AI Assistant
We are actively working on a proof-of-concept assistant for FINBOURNE’s services and the LLM engine that will support it. This upcoming capability is expected to simplify the usage of FINBOURNE’s systems while maintaining a high level of security. Notably, users can rest assured that their data will remain within our ecosystem and will not be shared with any third-party vendors like OpenAI.
Continuous Improvements
Luminesce is continuously evolving to cater to the diverse needs of users and further enhance the machine learning experience. One of the ongoing developments is to elevate tensors as a first-class data type within Luminesce. This enhancement will unlock additional machine learning use cases and make it even easier to execute trained models.
What can we change?
Luminesce combined with LUSID, FINBOURNE’s core data consolidation platform, plays a pivotal role in enabling effective MLOps by simplifying data preparation and model deployment. LUSID offers a complete and robust solution for managing your operational data. Through its bitemporality your data can always be reproduced from any historical point and easily corrected when needed. LUSID’s extensible data model further enhances this flexibility and power, enabling you to confidently represent and validate your data at any stage of its development. Luminesce seamlessly connects to LUSID and various other data sets, empowering you to explore new combinations of data. Through Lumipy, our high-level Python library, these combinations always yield results in the popular and widely-supported pandas DataFrames format, which is highly advantageous for machine learning and data science applications.
Our fusion of accessibility, flexibility, and robustness, along with the incorporation of bitemporality, greatly accelerates the MLOps development cycle while simultaneously enhancing safety and reliability.
To learn more about how our products enable effective MLOps, get in touch here.