What is bitemporal data anyway?
Bitemporal data is a fundamental principle in our LUSID platform, yet the term is not very common amongst technologists nor business users. It can be a complex concept to describe, and I’ve been thinking of a simple example to help illustrate it…
As it happens, I’ve taken to going running with some of the team at lunchtime. I’ve never really enjoyed running, so it’s good to have people to go with, because frankly I wouldn’t go if I were left to my own devices! I always opt for the short route, and leave the ‘enthusiasts’ to enjoy the full experience of going all the way around Victoria Park and back. I use the solitary jog back to the office as a useful opportunity to think about interesting things.
Introducing the FINBOURNE Joggometer!
Imagine we created an app to track how many kilometres each of us racks up on our lunchtime trips with the FINBOURNE Harriers. Every time I return from a run, I tell the app how far I went:
Now, over time, I start to build up a history of all my efforts:
Nothing very sophisticated so far – this is just a simple time-series of my run distances. Nevertheless, we can still ask some interesting questions from our data:
· How far did I run in total in February?
· On which day of the week do I run the most distance?
· Compare to my teammates…
Now I’m hooked!
Turns out that the gamification of exercise is a powerful thing! I look through my running data, and realise my paltry totals are dwarfed by those of my fellow runners. I need to boost my totals!
I then remembered that I had actually put some hard yards in but forgot to enter them. Thankfully, that is very easy with the app, because each run is already stored alongside the date:
In fact, having back-populated my runs for January, I now realise that I have elevated myself from the bottom of the January Joggometer leader board!
The thing is, the leader board results were last published at the end of January, when the data still showed me firmly at the bottom of the list. This is a shame as obviously, I would like people to see the leader board using the latest data available, so all my hard-earned kilometres are included! However, the poor person I just relegated to last place likes the table as it was published at the end of January…
Can we have both?
Introducing the ‘2nd time’ in bitemporal data
Stepping back a moment, what is the difference between the ‘official’ January leader board, and my preferred one, where I’m no longer in last place? The difference is all those extra runs I added retrospectively. If I ignore all those retrospective additions, the leader board results are the same as they were before. If I include them, we get a new leader board which reflects my improved standing!
If we capture BOTH the time that the run happened AND the time I entered the record into the app, we have all the information we need to reproduce both the old and new versions of the leader board results:
|Run Date||Distance||Date Entered|
|13 Jan||11.5km||24 Feb|
|24 Jan||5.1km||24 Feb|
|1 Feb||4.8km||1 Feb|
|7 Feb||7.1km||>7 Feb|
|18 Feb||5.0km||18 Feb|
|21 Feb||7.6km||21 Feb|
(the items in a red background are the retrospective additions)
This is the basis of bitemporal data – we store two dates against every piece of data:
· The Effective Date – when the ‘thing’ actually happened
· The As-At Date – the date the ‘thing’ was recorded in the system
By using both dates together we can answer even more sophisticated questions:
· How far did I run in January, using all the latest information?
· How far did I run in January, as it looked when I originally evaluated the results on 1 Feb?
Bitemporal Data is Everywhere
It turns out, you can model many business processes bitemporally…
An example from our LUSID platform is how to record the contents of an investment portfolio. You can add trades to a portfolio in a very similar manner to how we added runs to our fictitious Joggometer app. We can enter back-dated trades, or delete and re-book incorrect trades. By storing the data bitemporally, we have perfect clarity on what we owned, effective at any given point in history, as-at any point in history. Questions such as these are now simple to answer:
Q: How many units of that security were in my portfolio at month end, as-at month end? No problem: just filter out all those records which were entered into the system after our as-at cut date.
Q: How many units did I hold at month end, but this time including all the corrections I made yesterday? Again, easy – we can ask for all the data: as-at latest.
We apply these principles to all the data we hold in our system, which gives us an unparalleled audit capability.
What’s the catch?
The difficulty with bitemporal models is that you are forced to aggregate every single record for every different query. Making this process consistently fast is a non-trivial engineering challenge. Thankfully, we enjoy non-trivial engineering challenges! Watch this space to find out how we are tackling this head on…
Subscribe to our newsletter
Get stories like this in your inboxSign up
The Mysterious Hanging Client & TCP Keep Alives
You should work from home unless it is impossible for you to do so
Plotting the future of hedge fund technology
What a Naval Historian Can Teach Us About Data