Infinite Games: Comparing the Combination of The Value of Defecting + The Value of Los

Infinite Games: Comparing The Combination Of The Value Of Defecting + The Value Of Los

By Kyla Scanlon of Scanlon on Stocks

Sunday, May 5, 2019 2:00 AM EDT

Consider the infinitely repeated version of the symmetric two-player stage game in the following figure. The first number in a cell is player 1’s single- period payoff. The second number in a cell is player 2’s single period payoff.

Source: Games, Strategies, and Decision Making

Assume that past actions of each player are common knowledge. Each player’s payoff is the present value of the stream of single-period payoffs, where the discount factor is S for player i. This game has only one Nash Equilibrium (c, x) or (4, 3).

However, each player could be better off if they played at the combination of (a, z) which is a payoff (5, 6). So the punishment in this game will be the Nash Equilibrium, which is equal to (4, 3) and the reward will be the higher payoff, which is equal to (5, 6).

So each player’s strategy profile would look like this, in order to achieve to operate at the point of higher payoff (a, z):

Player 1: In period 1, play a. In period 2, play a if (a, z) was the outcome in period 1, otherwise, play c.
Player 2: In period 1, play z. In period 2, play z if (a, z) was the outcome in period 1, otherwise play x.

The Value of Defecting: Player 1

What is Player 1 doesn’t want to settle for a payoff of 5 from playing (a, z)? What if they want to defect to a higher value, and basically stick the middle finger in Player 2’s face?

To do that, Player 1’s highest possible defection from this game would be to play c and force Player 2 to play z, which would create a payoff of 8 for Player 1. But that would be cheating (defecting), and Player 2 would NOT want to cooperate with Player 1 anymore. They would revert to the second part of their strategy profile, playing x, so Player 1 would be stuck with a lower payoff of 4, forever after.

So today, Player 1 can gain 8. But they will be losing out on the value of the reward tomorrow if they didn’t cheat, which would be (a, z) or 5. Thus, the advantage of cheating today would be 8 – 5, or 3.

But what if they decide to play nice, and what would it take in order for them to be incentivized to do so? They settle for a lower payoff forever, but they avoid punishment from Player 2. The advantage of cooperating would be the value of the reward forever minus the value of the punishment forever.

Here, Player 1 can get five forever, or they can be punished with four forever if they choose to get 8 today. So let’s set the advantage of cheating as an inequality against the advantage of cooperating to calculate the following:

Player 1 would have to have a 75% chance that tomorrow is happening in order to prevent them from defecting today. This makes sense, because they can gain 3 points from defecting, and only lose 1 point forever from doing so. They have to basically KNOW that tomorrow is happening in order to prevent them from cheating. No one wants to stick the middle finger in someone’s face if there is a 75% chance that you will see them tomorrow.

So let’s compare that to Player 2.

The Value of Defecting: Player 2

For Player 2, the highest possible value that they can get through cheating is playing w, which gives them a value of 7. Player 1 would be forced to play a, and the total payoff would be (2, 7). This 7 is one point higher than the 6 that Player 2 would get if they stuck to the cooperating strategy. Thus, the advantage of cheating for Player 2 would be 1.

If Player 2 chose to cooperate instead of cheat, they would gain 6 forever as a reward. But if they chose to deviate, then they would be punished with a payoff of 3 forever after, based on the second part of Player 1’s strategy profile.

So setting the advantage of cheating as an inequality against the advantage of cooperating, we get a discount factor of 25%. There only has to be a 25% chance of tomorrow happening in order for Player 2 to play the cooperating strategy.

Conclusion: Player 2 NEEDS Player 1

What does this mean, when comparing Player 1’s 75% versus Player 2’s 25%?

Player 1 needs to be incentivized much more to cooperate. There has to be a 75% chance of tomorrow happening for them to play the correct strategy. This makes sense. Player 1 can gain 3 points by deviating, whereas Player 2 can only gain 1. Player 1 also only loses 1 point forever after they deviate, whereas Player 2 loses 3 forever.

Those losses compound over time. Player 2 is exposed to much larger losses, both in terms of what they gain, and what they give up in order to cheat. Thus, they only need a 25% chance of tomorrow happening in order to stay the course. They really want to hang onto this working relationship with Player 1.

A combination of a lower value from cheating in the first place + a higher loss from doing so indefinitely, makes Player 2 pretty vulnerable to the decisions that Player 1 makes.

TL;DR: Always be Player 1. High value from defecting in the first place and minimal losses over time from doing so.

Disclaimer: These views are not investment advice, and should not be interpreted as such. These views are my own, and do not represent my employer. Trading has risk. Big risk. Make sure that you can ...

How did you like this article? Let us know so we can better customize your reading experience.

Infinite Games: Comparing The Combination Of The Value Of Defecting + The Value Of Los

Comments