Game Theory - freeCodeCamp.org

Introduction to Evolutionary Game Theory

freeCodeCamp — Tue, 13 Apr 2021 17:12:33 +0000

By Peter Gleeson

For the longest time, the evolution of cooperative social behaviour has fascinated evolutionary biologists.

The mathematical field of game theory helps shed light on how it emerges. Game theory is “the study of mathematical models of strategic interaction among rational decision-makers” (according to Wikipedia).

Game theory applies to “games” as varied as economics, politics, chess and tic-tac-toe. In each case, there are some rules, some “players” or “agents”, and a set of strategies available to them.

Each player has a concept of “utility” – a “currency” they seek to individually maximise through the strategies they play.

The currency of evolution is the concept of fitness.

That is, the chance of being represented in the next generation. Genes and traits which increase the odds of survival to reproductive age are more likely to be passed on to future generations. Therefore, they confer a greater fitness to the individual which "hosts” them.

Evolutionary game theory takes the concepts from game theory and applies them in an evolutionary context.

For a given model, it lets you ask questions about which strategy prevails, and whether certain strategies can coexist. And if so – at what frequencies?

Replicator dynamics

Evolutionary game theory plays “games” over many generations.

Each game will alter the utility (that is, fitness) of the players. The next generation is produced, with players being represented proportionally to their overall fitness.

This set up is called “replicator dynamics”. It is easy to simulate and explore different models of evolutionary games.

The classic model of evolutionary game theory is the “Hawk-Dove” game, popularised by John Maynard Smith in the 1970s.

In this game, there exists a population of animals which compete for a finite resource (for example, food). The more resources an individual wins, the greater its fitness.

Each animal can play one of two strategies:

Hawks are aggressive, and will fight for a resource at all costs.
Doves are passive, and will share instead of fight for a resource.

The animals are all the same kind – “hawk” and “dove” refer to their behaviour.

There are three pairwise competitions that can exist:

Hawk vs Hawk

If two hawks compete, they will engage in a 50:50 battle to win the resource. This is a winner-takes-all scenario – the winner obtains the full value of the resource. The injured loser pays a price, and loses a certain amount of fitness.

Hawk vs Dove

If a hawk meets a dove, the dove will back down immediately. The hawk wins the full value of the resource, and the dove walks (or, rather flies) away with nothing. But they do not pay any cost.

Dove vs Dove

When two doves meet, they agree to share the resource evenly. No one gets hurt.

This can be modelled mathematically. Doing so allows us to understand whether these strategies can coexist (or if one of them prevails).

The math of replicator dynamics

Let V be the value of winning a contest, and C be the cost of injury in a contest.

Represent the frequency of hawks in the population as p, and the frequency of doves as 1-p.

Now, define two functions F(H) and F(D) which define the expected fitness of playing the hawk and dove strategies, respectively.

Playing as a hawk will mean engaging in a hawk-vs-hawk contest with frequency p. The expected utility of doing so is understood as the average outcome. Half the time the hawk wins V, half the time it loses C.

The rest of a hawk’s contests will be against doves. This guarantees an easy win V.

Playing as a dove will win nothing against hawks. But a dove will encounter another dove with frequency 1-p. Under this scenario, the expected utility is the shared resource, worth V/2.

Now, set F(H) equal to F(D) and solve for p.

This reveals the frequency p at which the hawk strategy confers no more or less fitness than the dove strategy.

At this frequency, there is no advantage to either strategy, so this is the equilibrium at which both strategies may coexist.

Some algebraic rearrangement gives:

Which provides the ratio of hawks-to-doves that exists at equilibrium.

Just a little more rearranging gives the equilibrium in terms of p:

Considering the properties of this expression reveals two things:

Whenever the cost C of losing a contest is less than or equal to the value V of winning, the hawk strategy will dominate. Neither strategy can coexist.
If the cost C is greater than the value V, the strategies will coexist in equilibrium.

Plugging in the values V=4 and C=6 shows the equilibrium occurs when 2/3 of the population play the hawk strategy.

You can test this by simulating the model in Python.

The code

In the file called bird.py:

import random

class Bird:
    def __init__(self, strategy):
        """
        Each bird has a strategy type (hawk or dove)
        And a small starting fitness
        """
        self.strategy = strategy
        self.fitness = 10

    def contest(self, opponent, v, c):
       """
       Simulate the outcomes depending on the strategies
       """

        # both hawks --> 50:50 battle

        if self.strategy == opponent.strategy == "hawk":
            if random.randint(0, 1) == 1:
                self.fitness = self.fitness + v
                opponent.fitness = opponent.fitness - c
            else:
                self.fitness = self.fitness - c
                opponent.fitness = opponent.fitness + v

        # hawk meets dove

        elif self.strategy == "hawk" != opponent.strategy:
            self.fitness = self.fitness + v
            opponent.fitness = opponent.fitness
        elif self.strategy == "dove" != opponent.strategy:
            self.fitness = self.fitness
            opponent.fitness = opponent.fitness + v

        # both doves --> share the resource

        else:
            self.fitness = self.fitness + v/2
            opponent.fitness = opponent.fitness + v/2

    def spawn(self):
        """
        Allow a small chance of mutation to flip strategy
        Otherwise, return offspring of the same type
        """

        mutation = random.randint(0, 1000) > 999
        if mutation:
            if self.strategy == "dove":
                return Bird("hawk")
            else:
                return Bird("dove")
        else:
            return Bird(self.strategy)

The next file is called simulation.py.

Initialise a population of all doves.
Define a timestep function to simulate randomised contests.
Draw the next generation weighted by their relative fitnesses.
Rinse and repeat a thousand times over, then save the output as a graph.

from bird import Bird
import random
import numpy as np
import pandas as pd
import matplotlib


def initialise():
    """
    Create a population of birds - all dove to begin
    """

    birds = []

    for _ in range(1000):
        birds.append(Bird("dove"))

    return (birds)


def timestep(birds, value, cost):
    """
    Pair up the birds, make them compete
    Then produce next generation, weighted by fitness
    """

    next_generation = []

    random.shuffle(birds)

    for _ in range(1000):

        # pair up random birds to contest
        a, b = random.sample(birds, 2)
        a.contest(b, value, cost)

    # generate next generation
    fitnesses = [bird.fitness for bird in birds]

    draw = random.choices(birds, k=1000, weights=fitnesses)
    next_generation = [bird.spawn() for bird in draw]

    return next_generation


def main():

    birds = initialise()

    rows = []

    V = 4 ; C = 6

    for _ in range(1000):

        # add the counts to a new row
        strategy = [bird.strategy for bird in birds]
        n_hawks = strategy.count("hawk")
        n_doves =  strategy.count("dove")
        row = {'n_hawks': n_hawks, 'n_doves': n_doves}
        rows.append(row)

        # run the timestep function
        birds = timestep(birds, V, C)


    # create dataframe and save output

    df = pd.DataFrame(rows)
    df.to_csv('simulation.csv')
    fig = df.plot(y=["n_hawks", "n_doves"]).get_figure()
    fig.savefig('simulation.pdf')

if __name__ == "__main__":
    main()

And voilà - here’s an example of the output for V=4 and C=6:

Exactly as the theory predicts.

Outro

The evolution of complex systems is a fascinating field of study. Understanding how natural forces and competitive pressures can shape individual-level traits to give rise to complex social behaviours has been one of the major areas of research in biological sciences in the last few decades.

The ability of relatively simple mathematical models to accurately predict outcomes of dynamic systems is also a key point to take away.

In this case, it is the existence of a feedback loop that results in the two strategies reaching an equilibrium. The advantage conferred by either strategy varies depending on how many others in the population are playing that strategy.

In other words, when more individuals play "dove", there is an advantage to playing "hawk". However, as more individuals play "hawk", the expected value of playing "dove" increases.

Finally, the availability of programming and software tools makes it possible to test theoretical predictions through simulation.

If you found this article interesting, you may also find How to Model an Epidemic With R worth checking out, too.

You can follow more of my writing at gleeson.substack.com

In need of evolution: game theory and AI

freeCodeCamp — Sat, 12 May 2018 21:07:00 +0000

By Elena Nisioti

Artificial Intelligence (AI) is full of questions that cannot be answered and answers that cannot be assigned to the correct questions. In the past, it paid for its persistence to wrong practices with periods of stagnation, known as AI winters. The calendar of AI, however, has just reached spring, and the applications are flourishing.

Yet, there is a branch of AI that has long been neglected. The talk is about reinforcement learning, that has recently exhibited impressive results on games like AlphaGo and Atari. But let’s be honest: these were not reinforcement learning wins. What got deeper in these cases was the deep neural networks, and not our understanding of reinforcement learning, which maintains the depth it achieved decades ago.

Even worse is the case of reinforcement learning when applied to real life problems. If training a robot to balance on a rope sounds hard, try training a team of robots to win a football game, or a team of drones to monitor a moving target.

Before we lose the branch, or even worse the tree, we must sharpen our understanding of these applications. Game theory is the most common approach to studying teams of players that share a common goal. It can lend us tools to guide learning algorithms in these settings.

But let’s see why the common approach is not a common sense approach.

To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact. — Charles Darwin

First, let’s dirty our hands with some terminology and basics of these areas.

Game theory

Some useful terms

Game: like games in popular understanding, it can be any setting where players take actions and its outcome will depend on them.
Player: a strategic decision-maker within a game.
Strategy: a complete plan of actions a player will take, given the set of circumstances that might arise within the game.
Payoff: the gain a player receives from arriving at a particular outcome of a game.
Equilibrium: the point in a game where both players have made their decisions and an outcome is reached.
Nash equilibrium: an equilibrium in which no player can gain by changing their own strategy if the strategies of the other players remain unchanged.
Dominant strategy: occurs when one strategy is better than another strategy for one player, no matter how that player’s opponents may play.

Prisoner’s dilemma

This is probably the most famous game in the literature. The figure below presents its payoff matrix. Now, a payoff matrix is worth a thousand words. It is sufficient, to an experienced eye, to provide all the information necessary to describe a game. But let’s be a bit less laconic.

Prisoner’s dilemma payoff matrix

The police arrest two criminals, criminal A and criminal B. Although quite notorious, the criminals cannot be imprisoned for the crime under investigation due to lack of evidence. But they can be held for lesser charges.

The length of their imprisonment will depend on what they will say in the interrogation room, which gives rise to the game. Each criminal (player) is given the chance to either stay silent or snitch on the other criminal (player). The payoff matrix depicts how many years each player will be imprisoned depending on the outcome. For example, if player A stays silent and player B snitches on them, player A will serve 3 years (-3) and player B will serve none (0).

If you reviews the payoff matrix carefully, you will find out that the logical action of a player is to betray the other person or, in game-theoretic terms, betraying is the dominant strategy. This will lead to the Nash equilibrium of the game, where each player has a payoff of -2.

Does something feel odd? Yes, or at least it should. If both players somehow agreed to remain silent they would both get a higher reward of -1. Prisoner’s dilemma is an example of a game where rationality leads to a worse result than cooperation would.

Some historical remarks

Game theory originated in economics, but is today an interdisciplinary area of study. Its father, John von Neumann (you will notice that Johns have serious career prospects in this area), was the first to give a strict formulation to the common notion of a game. He restricted his studies to games of two players, as they were easier to analyze.

He then co-authored a book with Oskar Morgenstern, which laid the foundations for expected utility theory and shaped the course of game theory. Around that time, John Nash introduced the concept of Nash equilibria, which helps describe the outcome of a game.

Reinforcement learning

It did not take long to realize how vast the applications of game theory can be. From games to biology, philosophy and, wait for it, artificial intelligence. Game theory is nowadays closely related to settings where multiple players learn through reinforcement, an area called multi-agent reinforcement learning. Examples of applications in this case are teams of robots, where each player has to learn how to behave in favor of its team.

Some useful terms

Agent: equivalent to a player.
Reward: equivalent to a payoff.
State: all the information necessary to describe the situation an agent is in.
Action: equivalent of a move in a game.
Policy: similar to a strategy, it defines the action an agent will make when in particular states
Environment: everything the agent interacts with during learning.

Applications

Imagine the following scenario: a team of drones is unleashed into a forest in order to predict and locate fires early enough for the firefighters to respond. The drones are autonomous and must explore the forest, learn which conditions are likely to cause fire, and cooperate with each other, so that they cover wide areas of the forest using little battery and communication.

This application belongs to the area of environmental monitoring, where AI can lend its predictive skills to human intervention. In a technological world that is becoming increasingly complex and a physical world under threat, we can paraphrase Kipling’s quote to “Man could not be everywhere, and therefore he made drones.”

Decentralized architectures are another interesting application field. Technologies like the Internet of Things and Blockchain create immense networks. Information and processing is distributed in different physical entities, a trait that has been acknowledged to offer privacy, efficiency and democratization.

Regardless of whether you want to use sensors to minimize energy consumption in the households of a country, or replace the banking system, decentralized is the new sexy.

Making these networks smart, however, is challenging, as most of the AI algorithms we are proud of are data- and computation-hungry. Reinforcement learning algorithms can be employed for efficient data processing and rendering the network adaptive to changes in its environment. In this case, it is interesting, and to the benefit of overall efficiency, to study how the individual algorithms will cooperate.

Deep or collective learning? AI research has based its harvest on increasingly deeper networks, but it could be that the answers to challenging problems come from collective knowledge, not deep-rooted individuals. Did we miss the forest?

Not just a game

Translating AI problems to simple games like the prisoner’s dilemma is tempting. This is a usual practice when testing new techniques, as it offers a computationally cheap and intuitive testbed. Nevertheless, it is important not to ignore the effect that the practical characteristics of the problem, such as noise, delays, and finite memory, have on the algorithm.

Perhaps the most misleading assumption in AI research is that of representing interaction with iterated static games. For example, an algorithm can apply the prisoner’s dilemma game every time it wants to make a decision, a formulation that assumes that the agent has not learned, or changed, along the way. But what about the effect learning will have on the behavior of the agent? Won’t interaction with others affect its strategy?

Research in this area has focused on evolution of cooperation and Robert Axelrod has studied optimal strategies that arise in the iterated version of prisoner’s dilemma. The tournaments that Axelrod organized revealed that strategies that adapt with time and interaction, even as simple as Tit-for-Tat may sound, are very effective.The AI community has recently investigated learning under the sequential prisoner’s dilemma, but research in this area is still in a premature state.

What differentiates multi-agent from single-agent learning is the increased complexity. Training one deep neural network is already enough of a pain, while adding new networks, as parts of the agents, makes the problem exponentially harder.

One less obvious, but more important concern, is the lack of theoretical properties for this kind of problem. Single-agent reinforcement learning is a well-understood area, as Richard Bellman and Christopher Watkins have offered the algorithms and proofs necessary to learn. In the multi-agent case, however, the proofs lose their validity.

Just to illustrate some of the mind-puzzling difficulties that arise: an agent executes a learning algorithm to learn how to react optimally to its environment. In our case, the environment includes the other agents, which also execute the learning algorithm. Thus, the algorithm has to consider the effect of its action before it acts.

The early concerns

The concerns start where game theory started: in economics. Let’s begin with some assumptions made when studying a system under classical game theory.

Rationality: generally in game theory, and in order to derive Nash equilibria, perfect rationality is assumed. This roughly means that agents always act for their own sake.

Complete information: each agent knows everything about the game, including the rules, what the other players know, and what their strategies are.

Common knowledge: there is common knowledge of a fact p in a group of agents when: all the agents know p, they all know that all agents know p, they all know that they all know that all agents know p, and so on ad infinitum. There are interesting puzzles, like the blue-eyed islanders, that describe the effect common knowledge has on a problem.

In 1986 Kenn Arrow expressed his reservations towards classical game theory.

In this paper, I want to disentangle some of the senses in which the hypothesis of rationality is used in economic theory. In particular, I want to stress that rationality is not a property of the individual alone, although it is usually presented that way. Rather, it gathers not only its force but also its very meaning from the social context in which it is embedded. It is most plausible under very ideal conditions. When these conditions cease to hold, the rationality assumptions become strained and possibly even self-contradictory.

If you find that Arrow is a bit harsh with classical game theory, how rational would you say your last purchases have been? Or, how much consciousness and effort did you put into your meal today?

But Arrow is not so much worried about the assumption of rationality. He is worried about the implications of it. For an agent to be rational, you need to provide them with all the information necessary to make their decisions. This calls for omniscient players, which is bad in two ways: first, it creates impractical requirements for information storing and processing of players. Second, game theory is no longer a game theory, as you can replace all players by a central ruler (and where is the fun in that?).

The value of information in this view is another point of interest. We have already discussed that possessing all the information is infeasible. But what about assuming players with limited knowledge? Would that help?

You may ask anyone involved in this area, but it suffices to say that optimization under uncertainty is tough. Yes, there still are the good-old Nash equilibria. The problem is that they are infinite. Game theory does not provide you with arguments to evaluate them. So, even if you reach one, you shouldn't make it such a big deal.

Reinforcement learning concerns

By this point you should suspect that AI applications are much more complicated than the examples classical game theory concerns itself with. Just to mention a few obstacles on the path of applying the Nash equilibrium approach in a robotic application: imagine being the captain of a team of robots playing football in RoboCup. How fast, strong, and intelligent are your players and your opponents? What strategies does the opponent team use? How should you reward your players? Is a goal the only reason for congratulating, or will applauding a good pass also improve the team’s behavior? Clearly, just being familiar with the rules of football will not win you the game.

If game theory has been raising debates for decades, if it has been founded on unrealistic assumptions and, for realistic tasks, if it offers complicated and little-understood solutions, why are we still going for it? Well, plainly enough, it’s the only thing we’ve got when it comes to group reasoning. If we actually understood how groups interact and cooperate to achieve their goals, psychology and politics would be much clearer.

Researchers in the area of multi-agent reinforcement learning either completely emit a discussion on the theoretical properties of their algorithms (and nevertheless often exhibit good results) or traditionally study the existence of Nash equilibria. The latter approach seems, to the eyes of a young researcher in the field, like a struggle to prove, under severe, unrealistic assumptions, the theoretical existence of solutions that — being infinite and of questionable value — will never be leveraged in practice.

Evolutionary game theory

The inception of evolutionary game theory is not recent, yet its far-reaching applications in the area of AI took long to be acknowledged. Originating in biology, it was introduced in 1973, by John M. Smith and George R. Price, as an alternative to classical game theory. The alterations are so profound that we can talk about a whole new approach.

The subject of reasoning is no longer the player itself, but the population of players. Thus, probabilistic strategies are defined as the percentage of players that make a choice, not the probability of one player choosing an action as in classical game theory. This removes the necessity for rational, omniscient agents, as strategies evolve as patterns of behavior. The evolution process resembles Darwinian theory. Players reproduce following the principles of survival of the fittest and random mutations, and can be elegantly described by a set of differential equations, termed the replicator dynamics.

We can see the three important parts of this system in the illustration below. A population represents the team of agents, and is characterized by a mixture of strategies. The game rules determine the payoffs of the population, which can also be seen as the fitness values of an evolutionary algorithm. Finally, the replicator rules describe how the population will evolve based on the fitness values and the mathematical properties of the evolution process.

_Image credit: By HowieKor [CC BY-SA 3.0 ([https://creativecommons.org/licenses/by-sa/3.0](https://creativecommons.org/licenses/by-sa/3.0" rel="noopener" target="blank" title="))], from Wikimedia Commons

The notion and pursuit of Nash equilibria is replaced by evolutionary stable strategies. A strategy can bear this characterization if it is immune to an invasion by a population of agents that follow another strategy, provided that the invading population is small. Thus, the behavior of the team can be studied under the well-understood area of stability of dynamical systems, such as Lyapunov stability.

The attainment of equilibrium requires a disequilibrium process. What does rational behavior mean in the presence of disequilibrium? Do individuals speculate on the equilibrating process? If they do, can the disequilibrium be regarded as, in some sense, a higher-order equilibrium process?

In the above passage, Arrow seems to be struggling to pinpoint the dynamic properties of a game. Could evolutionary game theory be an answer to his questions?

Quite recently, famous reinforcement learning algorithms, such as Q-learning, were studied under this new approach and significant conclusions were drawn. How this new tool is used ultimately depends on the application.

We can follow the forward approach, to derive the dynamic model of a learning algorithm. Or the inverse, where we start from some desired dynamic properties and engineer a learning algorithm that exhibits them.

We can use the replicator dynamics descriptively, to visualize convergence. Or prescriptively, to tune the algorithm in order to converge to optimal solutions. The latter can immensely reduce the complexity entailed in training deep networks for tough tasks that we face today, by removing the need for blind tuning.

Conclusion

It’s not hard to trace when and why the paths of game theory and AI became convoluted. What’s harder, however, is to overlook the restrictions AI, and in particular multi-agent reinforcement learning, has to face when following classical game theoretic approaches.

Evolutionary game theory sounds promising, offering both theoretical tools and practical advantages, but we won’t really know until we try it. In this case, evolution will not arise naturally, but out of a conscious struggle of the research community for improvement. But isn’t that the essence of evolution?

It takes some effort to deviate from where inertia is pushing you, but reinforcement learning, despite general successes in AI, is in serious need of a lift.