What we saw (and what we showed) at KDD 2019
One of our first and most successful applications of machine learning to a retail financial tool included in the BBVA app is that which allows customers to know a forecast of recurring expenses and incomes for next month. Knowing what day you will receive the car insurance charge or a recurring transfer -and its amount- is key to manage your accounts and monitor your financial health.
A team from BBVA AI Factory, of which BBVA Data & Analytics is part, is now investigating how to enrich this forecast by providing more information to customers, including the forecast of unusual movements. Deep Neural Networks are a great approach to forecasting problems, but the mainstream application of this technology is focused on producing point estimates, which implies a clear limitation in scenarios where being aware of the uncertainty in prediction is crucial. In the short article presented in the Workshop on “Anomaly Detection in Finance”, within the KDD 2019 conference held last August in Anchorage (USA), our colleagues José Rodríguez and Axel Brando, in collaboration with the University of Barcelona, propose the use of Deep Neural Networks that yield distributions rather than point estimates, allowing us to detect unusual transactions more efficiently.
This work is still in exploratory phase and is based on a previous model for Uncertainty Estimates in Deep Regression Networks (published in ECML/PKDD 2018). A line of work that is continuing with a very recently accepted paper at NeurIPS.
Beyond the presentation of this short article, this year’s “Knowledge Discovery and Data Mining” conference (KDD 2019) was packed with new data science based applications. Here is a highlight of what we saw, of course with a bias on applications to real-world systems and products, and on applications in the banking and financial industries.
A conference on… Data Science
Many data scientists feel that most of the top ML/AI conferences rated A* do not talk in the “Data Science” language. KDD is an exception: it describes itself explicitly as targeting the “Data Science” community. The differentiators are: an applied data science track, a new track on invited Data Science speakers, and hands-on tutorials. The bar is still very high technically, but the problems to solve come from the real world. All articles of the conference are available here. Some highlights of the main conference were:
- The work “150 successful Machine Learning models: 6 lessons learned at Booking.com”, a lesson on applying machine learning in a very focused way to solve real business challenges and make big impact on users.
- This year, there were lots of articles on graphs, especially Knowledge Graphs and Graph Convolutional Networks. As examples of companies seem to be adopting Graphs, Alibaba presented their AliGraph platform; in the financial domain, we saw this other example from Capital One. Some interesting articles were:
- “Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts”: proposing graphs to jointly model concepts and instances in a knowledge base.
- “OAG: Toward Linking Large-scale Heterogeneous Entity Graphs”, a deep learning algorithm for record linkage in large entity graphs, with GitHub repository.
- We are happy to see applications in banking, finance and transactional data in major conferences; articles in this category were
- “E.T.-RNN: Applying Deep Learning to Credit Loan Applications”, from Sberbank.
- “AlphaStock: A Buying-Winners-and-Selling-Losers Investment Strategy using Interpretable Deep Reinforcement Attention NetworksQuantitative trading with Deep Reinforcement Learning.”
- “Anomaly Detection for an e-commerce pricing system“, by Walmart Labs.
- Smart replies and text generation in products: the Gmail team presented the engine behind Google Smart Compose, Uber implemented Smart Replies.
- Methods to alleviate the “cold-start”: A problem often faced with in-house text datasets, and sometimes overlooked in academia. The paper “How to Invest my Time: Lessons from Human-in-the-Loop Entity Extraction” studies how to label entities in a new corpus, discover new entities, and learn incrementally. An empirical study of lessons learned.
- Rich Caruana’s talk: “Friends don’t let friends deploy black-box models”, which is becoming a classic: a tutorial on Interpretability going in technical depth into the “shades of gray” of interpretability through real examples.
Workshop on Anomaly Detection in Finance
In addition to the main conference, the workshop on Anomaly Detection in Finance had representation from Data Science teams in banks and other financial institutions. As an example, Rabobank presented a visualization prototype to help analysts spot fraudulent transactions (using SHAP feature importances and MDS projections), in collaboration with University of Eindhoven, while Deutsche Bundesbank, PwC and Univ. St. Gallen proposed a method to detect fraud in corporate operations by detecting anomalies in ERP transactions (using adversarial autoencoders).
On the other hand, Capital One co-organized the workshop and presented a method to detect changepoints in time series, and the MIT-IBM Research Centre presented an algorithm to detect money laundering in financial graphs (related article). They have also released a public graph dataset of Bitcoin transactions, the Elliptic Dataset, but mentioned that were interested in piloting this algorithm with banks. Uber were also present, with articles on fake ridership detection and fake account detection. After all, Uber processes financial transactions.