The Riddle of the Lancaster House: a hands-on exercise with Mercury-Reels
Mercury-reels, developed at BBVA AI Factory, is a component of our Python library written in C++ code, which allows us to analyze event sequences. To work with events (understood as any action occurring in a given moment), we must identify them with a specific code and a timestamp. Events triggered by the same user or client, for example, in the BBVA mobile app, form an individual sequence of events.
In other words, we can understand a sequence of events as a video clip, where the events are stored in each frame. The video clips are stored in a reel where clips of other users have also been recorded.
We can generate an analytical model to predict future events and the time they will occur using these clips or sequences of events. We call these events “target events”. We apply a simple algorithm to predict them: aligning each sequence or clip in reverse order. Starting from the last event, we recreate the complete tree: the decision path that led a user to the target event. We also store the frequencies and elapsed times until this event occurs.
Then, through an efficient search within the tree, the algorithm identifies all partial or complete matches between the different sequences and consolidates its results using predefined aggregation options. This way, we can predict which previous steps (events) must occur for the target event.
Mercury-Reels offers other advantages, including its high efficiency, interpretability, and ease of testing the available options and adjusting the hyperparameters to each use case. Today, we will try its functionalities through a fictionalized story.
The enigma of Cedric Lancaster’s mansion
Although it looks like any other mansion, Cedric Lancaster’s old house holds a mystery. Over the last few years, several murders have occurred with the same modus operandi: the victim and the executioner alone in the billiard room, a gun as a weapon, and a single shot.
We know who the perpetrators of the latest murders were. Now, we have to find out the route the culprits took before the crime so that we can prevent it on future occasions. Our target event, the event we will predict, is the shooting in the billiard room when two people are in it.
From this target event, we will reconstruct the previous sequence of events. We analyze the movements through the rooms of the mansion of some innocent guests, such as Princess Anastasia Romanov, the owner of the house Cedric Lancaster, and the Marquis Lorenzo di Sant-Angelo, as well as the movements of the culprits, among them, Countess Adelaide Beaumont.
To record the movements, we have created a dataset that collects the actions of each guest (individual events). An event comprises a specific time, a person who enters a room, and the number of people in it.
Events alone do not tell you much, but it is possible to identify patterns when analyzing the activity history of guests as a sequence of individual events. The challenge lies in efficiently analyzing the millions of event sequences stored to detect patterns of culprit behavior.
Hands-on!: solving the mystery with mercury-reels
Reels is a framework for which there are two tutorials: a general one and one on event optimization; both can be run in Google Colab without first installing anything. This hands-on exercise offers a basic example demonstrating how all the components work together.
We need two datasets we have to carry out our exercise. First, a dataset that simulates the movement of the guests through the rooms of the mansion, alone or in groups, each with an identifier and a time stamp. The other dataset contemplates the guests who actually fired: those who reached the target event.
Our mission is to predict when another potential killer will likely shoot. In the shooting dataset, that happens precisely one second after the culprit has a gun and is in the billiard room with another guest.
Before proceeding, open a console window. In our example, we use a Linux system. The exact syntax may vary depending on the platform. Ensure you install `Python 3.8` or higher and `pip`. Update `pip` to the latest version and install `mercury-reels.` You may skip these steps to find out the conclusions of the case below.
pip install --upgrade pip
pip install mercury-reels
If you have decided to proceed, now make sure you have your preferred Python interpreter, either Jupyter, an IDE, or the Python interpreter directly, open and ready to use for executing the code samples. We now download the two datasets directly into pandas datasets.
import reels
import pandas as pd
room_access = pd.read_csv('https://raw.githubusercontent.com/BBVA/mercury-reels/master/notebooks/data/reels_mansion.csv', sep = '\t')
target_data = pd.read_csv('https://raw.githubusercontent.com/BBVA/mercury-reels/master/notebooks/data/reels_mansion_targets.csv', sep = '\t')
Now, we need to define the events. To do so, we create an `Intake` object, a tool to iterate over the dataset, and use it to populate an `Events` object. An event can be viewed as an edge in a graph; therefore, the object expects an origin node ID, a destination ID, and a weight. In this case, we do not have two nodes; we simply give the same ID as source and destination.
events = reels.Events()
access_in = reels.Intake(room_access)
access_in.insert_rows(events, columns = ['room', 'room', 'num_occ'])
#To examine the object's content, we could print `list(events.describe_events())`.
Once the events are defined, we need to compute the clips – what event has happened to each one and at what time – for all the guests. We populate the `Clips` object with the same `Intake,` passing the column names accordingly and an empty `Clients` object to select all the participants.
clips = reels.Clips(reels.Clients(), events)
access_in.scan_events(clips, columns = ['room', 'room', 'num_occ', 'guest', 'time'])
#If we wanted to examine a clip, we could just `clips.describe_clip('Sir Cedric Lancaster')`.
This `Clips` object and the target times are all we need to fit the model. We do so by using a new `Intake` for the targets.
targets = reels.Targets(clips)
targets_in = reels.Intake(target_data)
targets_in.insert_targets(targets, columns = ['guest', 'time'])
targets.fit(x_form = 'linear', agg = 'longest', p = 0, depth = 100)
Conclusions of the case: the sequence of events from the source
Calculating the events, clips, and targets will show us the sequence of events the killer followed: the culprit needed to be alone in the sitting room to acquire a key to a gun cabinet in the study room. Once they had the key, they waited until they were alone in the study room to open the cabinet and take the gun. The killer then went to the billiard room, hiding the weapon until another guest entered. That’s when the gun was fired. Thus, we know the pattern of critical events that lead to the crime.
Since the algorithm inversely orders the events in tree form from the target event, we can finally know which paths the culprit, in this case Adelaide Beaumont, took in order to fire.
In this simple case, we illustrated mercury-reels’ functionalities with complete sequences identifying each person. However, in a real case with millions of sequences, we will work with partial matches.
To know the times in which the culprits act, we can do it in the following way:
list(targets.predict_clients(['Countess Adelaide Beaumont']))
We confirm that it is very close to one second, which is the expected outcome, so we can say that Adelaide Beaumont follows the same sequential and temporal pattern as the other killers.
On the other hand, we can also predict innocent times using other names within the dataset, and we will see that these are usually predicted as in a survival analysis. Since no target is observed, the division by zero is done using a constant (100 years in seconds).
list(targets.predict_clients(['Princess Anastasia Romanov','Sir Cedric Lancaster']))
If you have come this far, you may want to try it with your data. Before you do, we wholeheartedly recommend that you complete the tutorials first. To see everything reels can do, check out our GitHub repository.
Mercury-reels at BBVA: use case
This story is another way to use mercury-reels, but it becomes even more valuable when extrapolated to real use cases. In fact, mercury-reels came about to understand better how BBVA customers navigate the bank’s mobile app.
Faced with this need, we launched a data exploration initiative in the form of Project X, which allowed us to investigate alternatives to graph embeddings and time series analysis. In this way, we obtained actionable information about the browsing behavior of our customers and users.
Notes
- We are only storing 50 sequences long enough to be unique. Ergo, we are overfitting. This, of course, is not the intended way to use `reels`. Reels provides all the tools to control the length of sequences and search for meaningful patterns, as explained in the tutorials.
- The sequences for the assassins intentionally end after they shoot. It is important to remember that when there is a target, everything after it in the clip is not used. The clip is aligned with the last event before the target.
- In this case, we chose neither to do an aggregation nor a transformation and simply matched the longest fit. See the documentation about the `fit()` arguments.