A Research team on Spotify develops a new ML model for decision-making using counterfactuals. Why that’s exciting!

Self-driving cars, medical diagnosis, investment plans, and game theory are all applications of decision-making AI. Decision theory is the study of algorithms for correct decisions, but it comes with a few obstacles. Making the right decision is quite difficult under all the uncertainties that might occur. Not always do we know all the potential outcomes nor all the probabilities for the events we regard and discuss.

Judea Pearl is a computer scientist and philosopher who has made significant contributions to the study of causality and counterfactuals. His work is focused on developing a mathematical framework for reasoning about cause-and-effect relationships.

Pearl’s approach to causality involves the use of counterfactuals, which are statements about what would have happened under different conditions. For example, we might ask what would have happened if we had administered a particular treatment to a patient who did not receive it. By considering these counterfactual scenarios, Pearl argues we can reason about causality and better understand the effects of different interventions.

Counterfactuals are probably familiar to most of us. What would have happened, if, under certain conditions, certain parameters would have changed? It’s like in every movie with a time machine. You go back in time, change a few details, and Viola! The outcome can be totally different. The interesting part here is, that with the right parameters and setups, we can categorically distinguish between causality and correlation. What are the parameters that actually make a change in the outcome of things? What sounds simple is actually a pretty complicated mathematical model.

A Research Team on the streaming platform Spotify now developed an ML model for counterfactuals to optimize user-specific recommendations. The model is based on so-called twin networks.

Twin networks were invented by Judea Pearl and are a type of neural network architecture that involves training two separate networks with the same architecture and weights but with different input data. One network is trained on a source task, while the other is trained on a target task.

The idea behind twin networks is to transfer the knowledge learned by the source network to the target network, which can help improve the performance of the target network, particularly in cases where there is limited labeled data for the target task. One network represents the real word and the other the fictional world and they are coupled with each other so that they stay the same except for the parameters that we want to change.

The Researchers on Spotify used twin networks as a model for a neural network and trained this network to make predictions about the fictional world they created. A news program now lets us answer counterfactual statements for a specific scenario.

Decision theory always faced a lot of problems and paradoxes. One example is the prisoner’s dilemma:

Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime or to cooperate with the other by remaining silent. The potential outcomes are:

If A and B each betray the other, each of them serves two years in prison

If A betrays B but B remains silent, A will be set free and B will serve three years in prison

If A remains silent but B betrays A, A will serve three years in prison and B will be set free

If A and B both remain silent, both of them will serve only one year in prison (on a lesser charge). (Wikipedia)

It seems intuitive that the best choice for both prisoners should be to cooperate since they will only have to spend 1 year in prison each. However, the Nash equilibrium tells us that is the better choice in this scenario to betray the gang member. Let’s say you are gang member A and B betrays you, then you are better off. If you betray him too since you will spend less time in prison. But if B cooperated and remains silent, you are still better off. If you betray him, they will set you free.

This dilemma can be played as a game of two competing with each other in game theory.

Player B

Cooperate © Defect (D)

Player A Cooperate © R, R S, T

Defect (D) T, S P, P

This is a matrix to show the different outcomes. In this matrix, R represents the reward for mutual cooperation, S represents the sucker’s payoff for cooperating when the other player defects, T represents the temptation payoff for defecting when the other player cooperates, and P represents the punishment payoff for mutual defection.

The values of the payoffs are typically chosen such that R > T > P > S, which means that mutual cooperation is the best outcome for both players, but there is a temptation for each player to defect to gain a higher payoff at the expense of the other player.

The dilemma arises because the best individual strategy for each player (defecting) leads to a suboptimal outcome for both players when both players choose it. This is known as the Nash equilibrium of the game, where neither player can improve their payoff by unilaterally changing their strategy.

And this makes decision-making so hard. The uncertainty about probability plays into that as well and can be represented by the so-called Simpson’s paradox.

Simpson’s paradox is a statistical phenomenon in which a trend or association that appears in different groups of data disappears or even reverses when the groups are combined or aggregated. For example, imagine you and your friend are having a race. You run faster than your friend in the first and second laps of the race, but your friend runs faster than you in the third lap. Even though you were faster in two out of the three laps, your friend might still win the race overall because they were faster in the last lap.

Simpson’s Paradox happens when you add up all the results of the race and it seems like you are faster overall, but when you look at each lap individually, your friend was faster in more laps than you.

With AI tools like ChatGPT, we also need to stress that GPT is not actually understanding language or context but it uses statistical models to “guess” the next word of a phrase based on the input data it received and the probabilities it learned from that.

So, have you ever been in a situation where you asked yourself: what would have happened if I did this certain thing in this certain situation differently? Wanna tell us what it was in the comments, perhaps? In that sense, have a good day and be good!

References:

https://plato.stanford.edu/entries/prisoner-dilemma/

Pearl, J. (2009). Causal inference in statistics: An overview. This paper provides an overview of Pearl’s approach to causal inference and discusses its applications in statistics.

Decision Theory and the Power of Counterfactuals

Yildiz Culcu


Hi, I'm Yildiz Culcu, a student of Computer Science and Philosophy based in Germany. My mission is to help people discover the joy of learning about science and explore new ideas. As a 2x Top Writer on Medium and an active voice on LinkedIn, and this blog, I love sharing insights and sparking curiosity. I'm an emerging Decision science researcher associated with the Max Planck Institute for Cognitive and Brain Sciences and the University of Kiel. I am also a Mentor, and a Public Speaker available for booking. Let's connect and inspire one another to be our best!


Post navigation