Engineering Blog

What is bitemporal data anyway?

By Chris on February 28, 2018

Bitemporal data is a fundamental principle in our LUSID platform, yet the term is not very common amongst technologists nor business users. It can be a complex concept to describe, and I've been thinking of a simple example to help illustrate it…

As it happens, I've taken to going running with some of the team at lunchtime. I've never really enjoyed running, so it's good to have people to go with, because frankly I wouldn't go if I were left to my own devices! I always opt for the short route, and leave the 'enthusiasts' to enjoy the full experience of going all the way around Victoria Park and back. I use the solitary jog back to the office as a useful opportunity to think about interesting things.

Introducing the FINBOURNE Joggometer!

Imagine we created an app to track how many kilometres each of us racks up on our lunchtime trips with the FINBOURNE Harriers. Every time I return from a run, I tell the app how far I went:

Run Date Distance
1 Feb 4.8km

Now, over time, I start to build up a history of all my efforts:

Run Date Distance
1 Feb 4.8km
7 Feb 7.1km
18 Feb 5.0km
21 Feb 7.6km

Nothing very sophisticated so far - this is just a simple time-series of my run distances. Nevertheless, we can still ask some interesting questions from our data:

· How far did I run in total in February?

· On which day of the week do I run the most distance?

· Compare to my teammates…

Now I'm hooked!

Turns out that the gamification of exercise is a powerful thing! I look through my running data, and realise my paltry totals are dwarfed by those of my fellow runners. I need to boost my totals!

I then remembered that I had actually put some hard yards in but forgot to enter them. Thankfully, that is very easy with the app, because each run is already stored alongside the date:

Run Date Distance
1 Feb 4.8km
7 Feb 7.1km
18 Feb 5.0km
21 Feb 7.6km
13 Jan 11.5km
24 Jan 5.1km

In fact, having back-populated my runs for January, I now realise that I have elevated myself from the bottom of the January Joggometer leader board!

The thing is, the leader board results were last published at the end of January, when the data still showed me firmly at the bottom of the list. This is a shame as obviously, I would like people to see the leader board using the latest data available, so all my hard-earned kilometres are included! However, the poor person I just relegated to last place likes the table as it was published at the end of January…

Can we have both?

Introducing the '2nd time' in bitemporal data

Stepping back a moment, what is the difference between the 'official' January leader board, and my preferred one, where I'm no longer in last place? The difference is all those extra runs I added retrospectively. If I ignore all those retrospective additions, the leader board results are the same as they were before. If I include them, we get a new leader board which reflects my improved standing!

If we capture BOTH the time that the run happened AND the time I entered the record into the app, we have all the information we need to reproduce both the old and new versions of the leader board results:

Run Date Distance Date Entered
13 Jan 11.5km 24 Feb
24 Jan 5.1km 24 Feb
1 Feb 4.8km 1 Feb
7 Feb 7.1km >7 Feb
18 Feb 5.0km 18 Feb
21 Feb 7.6km 21 Feb
(the items in a red background are the retrospective additions)

This is the basis of bitemporal data - we store two dates against every piece of data:

· The Effective Date - when the 'thing' actually happened

· The As-At Date - the date the 'thing' was recorded in the system

By using both dates together we can answer even more sophisticated questions:

· How far did I run in January, using all the latest information?

· How far did I run in January, as it looked when I originally evaluated the results on 1 Feb?

Bitemporal Data is Everywhere

It turns out, you can model many business processes bitemporally...

An example from our LUSID platform is how to record the contents of an investment portfolio. You can add trades to a portfolio in a very similar manner to how we added runs to our fictitious Joggometer app. We can enter back-dated trades, or delete and re-book incorrect trades. By storing the data bitemporally, we have perfect clarity on what we owned, effective at any given point in history, as-at any point in history. Questions such as these are now simple to answer:

Q: How many units of that security were in my portfolio at month end, as-at month end? No problem: just filter out all those records which were entered into the system after our as-at cut date.

Q: How many units did I hold at month end, but this time including all the corrections I made yesterday? Again, easy - we can ask for all the data: as-at latest.

We apply these principles to all the data we hold in our system, which gives us an unparalleled audit capability.

What's the catch?

The difficulty with bitemporal models is that you are forced to aggregate every single record for every different query. Making this process consistently fast is a non-trivial engineering challenge. Thankfully, we enjoy non-trivial engineering challenges! Watch this space to find out how we are tackling this head on...