Consolidated Tape: Breaking bad data

Consolidated Tape Provider (CTP)

Consolidated Tape: Breaking bad data

Finbourne Logo

Neil Ryan


In our recent views shared on Consolidated Tape (‘CT’), the issue of data quality was highlighted as central to the CT debate. Some may argue it is even hampering the development of a Consolidated Tape Provider (CTP) in the UK and the EU.  

This concern also features in ESMA’s MiFID II/MiFIR Review Report of September 2020, where, in relation to changes to the regime, the sell side ‘group’ noted that “priority should be given to working on accessibility, readability and quality of market data”. Similarly, a section of the larger buy-side ‘group’ also highlighted the need to improve the standardisation, the accessibility, and the quality of MiFIR market data”.

It is clear from this, that the problems relating to data quality impacts all participants across capital markets, and must be addressed as part of the process to create a CT, in order to bring about greater transparency, and visibility of market liquidity.

Data quality – what are the issues?

As a data-driven technology provider, we decided to do what we know best and investigate the data, to pinpoint the specific issues with the transactions being reported. Over a six month period, FINBOURNE has broken down data from across equities, fixed income and derivatives transactions. That’s over 42 million transactions – from various public sources – collected and analysed since March 2021.

As we continue this work through our Design Council, our aim is to recognise, manage and problem-solve for the underlying barriers, to ensure that the transaction data that will feed into a CT, is fit for purpose.

Using publicly available, post-trade transaction data from a number of the largest APAs and Trading Venues (the ‘transaction data providers’) from early 2021, we’ve broken down the issues relating to transaction data, into the three component parts:  

Consolidation and aggregation

Consolidation and aggregation challenges create barriers to market participants unless they have technical SMEs to translate a myriad of codes, formats and conventions. Crucially this is NOT a one off exercise and requires significant monitoring and maintenance, given the entities publishing the post-trade data can change their structure or delivery mechanisms at any given time.


The consistency of data is another area of concern where we see the same fields for transactions completed in different ways across the market, leading to inconsistent treatment that requires remediation. Interestingly, the levels of completeness from the transaction data providers also vary, with no obvious patterns to the missing values.


The coherence of data is perhaps the area of most concern, where we see incorrect data reported in a manner that either distorts any aggregates or averages of data, or leads to incorrect/incoherent output.

Then there is the issue of waivers or deferrals, which, we regard, as a ‘policy’ issue. The data may be available, but the nature and timing of the reporting of transactions is determined by those policy frameworks.

Why today’s data can’t support a CT?

We know data is central to the CT, but in our analysis we found several reasons why in its current form, it could not support a CT for market use:


1. Coverage of trade fields

While some previous market reports have indicated that fields seemed to be missing from the ‘feeds’ they receive from various sources, FINBOURNE found that:

– We could access all the required data fields from the transaction data providers.

– In most cases, the fields were completed although there were significant issues with the quality of the data provided.

– While there were some records that did not contain values, there are no obvious patterns to the missing values, and it is, thus, unlikely for there to exist simple solutions to remedy the problems.

2. Publication method

FINBOURNE found that:

– One of the APAs published the information in JSON, while the others published in CSV format, which requires additional effort from market participants.

– We also observed, during the time that we tracked the reported data that the formats from some providers CSV files changed requiring technical adjustment to earlier records, to ensure that they could be maintained consistently.

– During the tracking, we noted that at least one APA changed the API without notification or explanation, which led to ‘dropping’ of data and required participants to re-connect to the API.

– Finally, in terms of capturing the data, some APAs required constant monitoring to record the data (i.e. data ‘grabs’ every five minutes), however, these ’grabs duplicated data, which then resulted in significant ‘de-duplication’ effort.

3. Formatting

FINBOURNE found that:

– The formatting of the trade fields themselves differed across the transaction data providers, making it difficult for participants to aggregate data easily.

– Trade flags were represented differently in every APA – in some cases, the fields were contained in one data field while in other cases, there was an individual field for each flag.

– Some feeds were over FIX while others were not, calling for a different format and way of connecting.

– One APA had some date configurations that are inconsistent with the ISO standard.

– While another APA has trade fields populated in a manner that causes difficulty with aggregation e.g. a PRICE field with 20 decimal points: “99.29596743210102030405”.

4. Parsing

FINBOURNE found that:

– Several numeric fields contained non-numeric data e.g. ‘N/A’ which has the effect of slowing run-time.


5. Self-aggregation

For any market participants that have considered the opportunity to maintain a CT internally, practical considerations include:

– Handling the CSV/JSON Issue

– Normalising the formatting points

– Correcting and eliminating the parsing issues

– Given the industry’s love of spreadsheets, it’s worth noting that, Excel cannot handle more than 1 million rows and its ability to filter data at that level of volume is limited.

6. Using the data

While the RTS 2 data provide 24 fields of detail:

– It does not include relevant basis data such as the issuer name. To make more sense of the transaction data, we connected with the FIRDS database to ensure a basic level of utility – although there were blanks from that source as well.

– There are limits as to what the data shows in terms of granularity or detail e.g. a ‘SINT’ designation does not identify which of the 216 Systematic Internalisers is actually the ‘Venue of Execution’.

The FINBOURNE way forward 

We agree with industry bodies, regulators and market participants that the issues of data quality need to be solved.  However, we don’t believe this analysis needs to wait until a decision has been made on the form of a CTP.  If anything, starting this process will improve the quality and efficiency of the CTP, while it is being developed and contribute to the resiliency of the end product.  

The time for engagement is now and we’re inviting market participants to join our Design Council to take the first steps to solve these issues and build a better CTP.

Consolidated Tape - FINBOURNE Technology Design Council

For details on how to join the Design Council or to speak to us about CT, click here.

Finbourne Logo

Neil Ryan



Related articles

As Technology Evolves, Asset Managers Adapt and Innovate

Finbourne LogoFinbourne14/09/23

Is artificial intelligence facing a diversity crisis? 

Finbourne LogoFlora Stirling05/09/23

Unlocking Competitive Edge: The Untapped Potential of Machine Learning in Financial Services 

Finbourne LogoFinbourne24/08/23

FINBOURNE appoints former BNP Paribas Securities Services head to lead Product Marketing and Solution Positioning

Finbourne LogoFinbourne11/07/23