![]() Guidelines for Submissions APA NEWSLETTERS Navigation
|
APA NewslettersSpring 2001
|
| Action
of B \ Action of A |
Player A cooperates | Player A defects |
| Player B cooperates | R=3 (reward for mutual cooperation) | S=0 (sucker's payoff) |
| Player B defects | T=5 (temptation to defect) | P=1 (punishment for mutual defection) |
Table 1: Iterated Prisoner's Dilemma (IPD) scoring for player A.
The two participants have two options: to cooperate or to defect. The payoff to player A is shown. Obviously, he gains more by defection not only if player two cooperates (5 instead of 3 points), but also if player two defects (1 instead of 0 points). Since this is also valid for player two, both end up defecting and score instead 3 points each if they had cooperated. Hence the dilemma.
Using evolutionary dynamics software implementation, the IPD can be simulated as an iterated game, where the players (virtual or real players) learn from the results of previous iterations and dynamically select the appropriate strategy among various possible strategies. Strategies can be applied in an evolutionary process, where the fittest strategy survives.
3. The Emergence of Cooperation
Axelrod, the most famous researcher in the area of Evolution of Cooperation [1, page 110] says: "The advice takes the form of four simple suggestions for how to do well in a durable Iterated Prisoner's Dilemma:
1. Don't be envious,
2. Don't be the first to defect,
3. Reciprocate both cooperation and defection,
4. Don't be too clever."
When Axelrod writes that, he states that a strategy has to be understandable, hence simple, mainly for its opponents to understand that all it wants is to establish a cooperation period. He thinks that if the strategy clearly announces how it will act, then the other player will be able to cooperate more quickly.
Axelrod and Hamilton [4] used a computer tournament to detect strategies that would favor cooperation among individuals engaged in the IPD. In a first round, 14 more or less sophisticated strategies and one totally random strategy competed against each other for the highest average scores in an IPD of 200 moves. Unexpectedly, a very simple strategy did outstandingly well: cooperate on the first move and then copy your opponent's last move for all subsequent moves.
This strategy was called 'Tit for tat' (TFT) and became the basis of an ever growing number of successful strategies. In a similar competition with 62 contestants, TFT won again. It has three characteristics that account for its impressive performance: it is nice (cooperates on the first move), retaliatory (punishes defection in the prior move with defection) and forgiving (immediate return to cooperation after one C of the adversary).
In order to study the behavior of strategies, two types of computation can be made: The first one is a simple round-robin tournament, in which each strategy meets all other strategies. Its final score is then the sum of all scores done in each confrontation. At the end, the strategy's strength measurement is given by its range in the tournament. This is the way the tit-for-tat strategy has been isolated by Axelrod. The second one is an evolutionary approach, where at the beginning the same number of agents hold each one of the strategies. A round- robin tournament is made and then the population of inefficient strategies is decreased whereas efficient strategies obtain new elements. The simulation is repeated until the population has been stabilized, i.e. the population does not change anymore. It has been found that "nasty" strategies (those who take the initiative of the first defection) are less effective than "kind" ones!
Delahaye and P. Mathieu [10-14] did some experiments with the evolutionary approach computation method.
Twelve strategies have been tested:
| 1. Cooperate |
always cooperates |
|
2. Defect |
always
defects |
|
3. Random |
cooperates with a probability of 0.5 |
|
4. Tit-for-tat |
cooperates on the first move and then plays what its opponent played on the previous move |
|
5. Spite |
cooperates until the opponent defects, then defects all the time |
|
6. Per-kind |
plays periodically [cooperate, cooperate, defect] |
|
7. Per-nasty |
plays periodically [defect, defect, cooperate] |
| 8. Soft-majo | plays the opponent's most used move and cooperates in case of equality (first move considered as equality) |
| 9. Mistrust | has
the same behavior as tit-for-tat but defects on the first move
|
| 10. Prober | begins by playing [cooperate, defect, defect], then if the opponent cooperates on the second and the third move |
| 11. Gradual | continues
to defect, else plays tit-for-tat. This strategy acts as tit-for-tat,
except of the following: after the first defection of the other
player, defects one time and cooperates two times; After the
second defection of the opponent, defect two times and cooperate
two times,... |
| 12. Pavlov | cooperates on the first move and then cooperates only if the two players made the same move, this strategy was studied in [27]. |
Each game (experiment) consisted of 1000 moves, with the classical payoff shown in table 1. The results were as follows: (the full results are in [11]).
|
Strategy
|
Final
Score
|
|
Gradual
|
33416
|
|
Tit-for-tat
|
31411
|
|
Soft
majo
|
31210
|
|
Spite
|
30013
|
|
Prober
|
29177
|
|
Pavlov
|
28910
|
|
Mistrust
|
25921
|
|
Cooperate
|
25484
|
|
Per
kind
|
24796
|
|
Defect
|
24363
|
|
Per
nasty
|
23835
|
|
Random
|
22965
|
The results of these experiments show that strategies with an inclination to cooperate gain more than the “nasty” ones! Gradual, Tit-for-Tat, and Soft_majo (the leading three strategies) all have a strong tendency to cooperate, while Defect and Per_nasty are defect-oriented (and Random is simply a dumb).
Several other experiments have been made with virtual systems in order to show the emergence of cooperation among software agents. The conclusions derived from such results are far-reaching, and are beyond the scope and the goal of this paper. I’ll just mention Daniel Dennett [15], who explains that Darwin’s idea is considered dangerous because it challenges the belief that there is something special about life, and in particular about human life, consciousness, emotions, etc. Instead, it shows how these essences are the result of billions and billions of applications of a simple, mindless, mechanistic process.
4. Simulating Ethics
A major question I’d like to raise is whether norms of ethics can emerge in a virtual system, and how it may affect our Philosophy.
Let’s examine a simple game called the Ultimatum game. A cake is divided into ten pieces. Player A can demand either five or nine pieces, player B can either accept or reject the proposal. We will analyse this game within the context of evolutionary game theory.
What are the strategies applicable for this game? Each player can be either A or B. While being A there are only two options: Demand5 and Demand9. Being B enables four options: Accept all, Reject all, Accept if 9 is demanded but reject if 5 is demanded, and Accept if 5 is demanded but reject if 9 is demanded. In total, we have 2X4 strategies for a certain player, where each strategy tells her how to behave while being A or B.
We can summarize the strategies in the following table:
|
IF
BEING A
|
IF
BEING B
|
|
|
S1
|
Demand
9
|
Accept
All
|
|
S2
|
Demand
9
|
Reject
All
|
|
S3
|
Demand
9
|
Accept
5, Reject 9
|
|
S4
|
Demand
9
|
Accept
9, Reject 5
|
|
S5
|
Demand 5 |
Accept
All
|
|
S6
|
Demand
5
|
Reject
All
|
|
S7
|
Demand
5
|
Accept
5, Reject 9
|
|
S8
|
Demand
5
|
Accept
9, Reject 5
|
This game has been simulated by Brian Skyrms [32], using “genetic algorithms”.In this method, replication is governed by success, judged by some standards relevant for the problem. Once in a while, the code of the application is cut into two pieces, and the pieces are swapped between other programs, creating new programs. Most of these programs will be useless, but some will be successful. The task for the programs is pre-defined, so there is a clear criteria whether the program is successful or not. This “crossover” takes place again and again, and few of the generated programs are approaching closer and closer towards the target. This method has been brought into the computer sciences by John Holland [18] and expanded by John Koza [19].
How can genetic algorithms be applied to the Ultimatum game? We can view strategies as strings. Thus, the strategy If A: demand 9; If B: accept 5, reject has the following sub-strategies: If A: demand 9 and if B: accept 5, reject 9. In addition, the second sub-strategy can be split into: If B and 5 is required, accept it and If B and 9 is required, reject it. The idea is to cut the strategies into the smallest pieces and recombine them arbitrarily, and to create new strategies as many as possible. After a cycle of recombination the strategies are played against each other and only the fittest will survive.
Skyrms performed several experiments with this game. In each experiment he initiated groups of agents with different strategies, let them play against each other, and operated the recombination again and again. In many of the experiments he observed a persistence of the strategy S7. This strategy (known later as Fairman) is the most fair strategy among the eight!
In most cases Fairman was not the winning strategy. S1 and S4 did better when starting with a population with equal proportions of the strategies. But it did not go extinct, in opposition to other strategies, S2 for example. In certain initial conditions, the range of population holding Fairman has been expanded. One experiment started with 30% of the population using Fairman, with the remaining strategies having equal proportions of the rest of the population. Fairman gained 64% of the population, while S1 and S4 did not survive. In another experiment, the initial proportions of the population for S1-S8 strategies where <.32,.02,.10,.02,.10,.02,.40,.02> , respectively. The results state show 56.5% of Fairman and 43.5% of S5.
The above does not pretend to be a full evolutionary explanation of the fairness effect. Fairness is not a rational strategy, in the sense that if confronted with an unfair offer, it chooses the lower payoff. However the above shows that such a strategy can persist in evolution.
5. Utilitarianism: are Kant and Mill really different?
John Stuart Mill argues that moral theories are divided between two distinct approaches: the intuitive and inductive schools. Although both schools agree that there is a single and highest normative principle, they disagree about whether we have knowledge of that principle intuitively (without appeal to experience), or inductively (though experience and observation). Kant represents the best of the intuitive school, and Mill himself defends the inductive school. Mill criticizes Kant’s categorical imperative noting that it is essentially the same as utilitarianism since it involves calculating the good or bad consequences of an action to determine the morality of that action.
The simulations I described, as well as many others, show that the categorical imperative is doing pretty well as a strategy. The cooperative strategies actually follow the rule: “Act only so that if others act likewise fitness is maximized.” Strategies that violate this imperative are driven to extinction. However I am not arguing that Kant is indeed a utilitarian, as Mill claims. In my opinion, Kant and Mill are not really different. We could see how moral ethics emerges from a system ruled by pure utilitarian relationships, in an evolutionary process. The intuitive and the inductive attitudes are the two faces of the same essence. They describe the same phenomena in two different languages, in two different levels, complementing each other.
Human morality has emerged from the complex system of life and society. Complexity theory is trying to understand complex systems “bottom-up”, from the base elements and their relations up to the general phenomena, in oppose to the traditional “top-down” analysis, which is looking at the logic and mechanism that controls the behaviour of the system. Complexity brings a new kind of understanding. We can simulate a system as a whole, but we can not predict the state of each one of its elements at a given time. However we can see how the general properties of the system have emerged. We can see how morality is an evolutionary consequence of utilitarianism, hence they are the two sides of the same coin.
6. The Potential of Simulating Complex Systems
Agent-based modelling of complex systems is not limited to game theory. Various simulations usually contain a physical landscape and a definition of environmental conditions. The agents are grouped into social-related groups and are assigned basic rules of social behaviour. The agents then are associated with a range of alternative states and then subjected to competitive fitness tests.
Such ‘artificial societies’ will hopefully open new windows into the kind of variables that are influencing the evolution of human behaviour today in the contemporary world. Models are being developed today in order to explain the evolution of cultures, the creation of sociological structures and the behaviour of economic systems.
The new discipline of complexity brings new insights into our philosophical discussions and problems. Today we are using neural networks in order to simulate brain behaviour. We cannot define the rules that govern the brain, but we can try to simulate it based on the elementary relations between neurons. The same can be applied for ethics, as well as for other philosophical issues.
Notes
1.
R. Axelrod. The Evolution of Cooperation. Basic Books, New York,
1984.
2. R. Axelrod. The evolution of strategies in the iterated prisoner’s
dilemma. In L. Davis, editor, Genetic Algorithms and the Simulated
Annealing, chapter 3, pages 32—41. Pitman, London, 1987.
3. R. Axelrod and D. Dion. The further evolution of cooperation.
Science, 242:1385—1390, 1988.
4. R. Axelrod and W. D. Hamilton. The evolution of cooperation.
Science, 211:1390—1396, 1981.
5. S. Bankes. Exploring the foundations of artificial societies.
In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th
International Workshop on the Synthesis and Simulation of Living
Systems, volume 4, pages 337—342. MIT Press, 1994.
6. J. Batali and P. Kitcher. Evolutionary dynamics of altruistic
behavior in optional and compulsory versions of the iterated prisoner’s
dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life,
Proc. 4th International Workshop on the Synthesis and Simulation
of Living Systems, volume 4, pages 344-348. MIT Press, 1994.
7. J. Bendor. In good times and bad: Reciprocity in an uncertain
world. American J. of Political Science, 31:531—558, 1987.
8. Robert Boyd and Jeffrey P. Lorberbaum. No pure strategy is evolutionarily
stable in the repeated prisoner’s dilemma game. Nature, 327:58—59,
1987.
9. E. Ann Stanley D. Ashlock, M. D. Smucker and L. Tesfatsion. Preferential
partner selection in an evolutionnary study of prisoner’s dilemma.
Economics R. No 35, Submitted for publication, 1994.
10. J.P. Delahaye. L’altruisme r’ecompens’e ? Pour La Science, 181:150—156,
1992.
11. J.P. Delahaye and P. Mathieu. Exp’eriences sur le dilemme it’er’e
des prisonniers. Rapport de Recherche 233, LIFL Lille CNRS (URA
369), 1992.
12. J.P. Delahaye and P. Mathieu. L’altruisme perfectionn’e. Pour
La Science, 187:102—107, 1993.
13. J.P. Delahaye and P. Mathieu. L’altruisme perfectionn’e. Rapport
de Recherche 249, LIFL Lille CNRS (URA 369), 1993.
14. J.P. Delahaye and P. Mathieu. Complex strategies in the iterated
prisoner’s dilemma. In A. Albert, editor, Chaos and Society, Amsterdam,
1995. IOS Press.
15. D. Dennett. Darwin’s Dangerous Idea: Evolution and the Meaning
of Life. Simon and Schuster, NY, 1995.
16. M. R. Frean. The prisoner’s dilemma without synchrony. Proc.
Royal Society London, 257(B):75—79, 1994.
17. H. C. J. Godfray. The evolution of forgiveness. Nature, 355:206—207,
1992.
18. J. Holland. Adaptations in Neural and Artificial Systems. Ann
Harbor: University of Michigan Press, 1975.
19. J. Koza. Genetic Programming: On the Programming of Computers
by Natural Selection. Cambridge, Mass.: MIT Press, 1992.
20. D. Ashlock M., D. Smucker, E. Ann Stanley. Analyzing social
network structures in the iterated prisoner’s dilemma with choice
and refusal. RR. CS- TR-94-1259, University of Wisconsin-Madison,
Department of Computer-Sciences, 1994.
21. C. Martino. Emergent nastiness in iterated prisoner’s dilemma
games. 2.725: Design and Automation, 1995.
22. R. M. May. More evolution of cooperation. Nature, 327:15-17,
1987.
23. P. Molander. The optimal level of generosity in a selfish, uncertain
environment. J. of Conflict Resolution, 29(4):611—618, 1985.
24. M. Nowak. Stochastic strategies in the prisoner’s dilemma. Theoretical
Population Biology, 38:93-112, 1990.
25. M. Nowak and K. Sigmund. The evolution of stochastic strategies
n the prisoner’s dilemma. Acta ApplicandaeMathematicae, 20:247—265,
1990.
26. M. Nowak and K. Sigmund. Tit for tat in heterogeneous populations.
Nature, 355:250—253, 1992.
27. M. Nowak and K. Sigmund. A strategy of win-stay, lose-shift
that outperforms tit-for-tat in the prisoner’s dilemma game. Nature,
364:56—58, 1993.
28. M. Oliphant. Evolving cooperation in the non-iterated prisoner’s
dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life,
Proc. 4th International Workshop on the Synthesis and Simulation
of Living Systems, volume 4, pages 350—352. MIT Press, 1994.
29. R. Pool. Putting game theory to the test. Science, 267:1591—1593,
1995.
30. W. Poundstone. Prisoner’s Dilemma : John von Neumann, Game Theory,
and the Puzzle of the Bomb. Number 0-19-286162-X. Oxford University
Press, Oxford, 1993.
31. Xin Yao and P. J. Darwen. An experimental study of n-person
iterated prisoner’s dilemma game. Informatica, 18:435—450, 1994.
132. B. Skyrms. The Evolution of the Social Contract. Cambridge
University Press, 1996.
Copyright 2000, The American Philosophical
Association.
Last revised:
August 28, 2001