[ Return to APA Home Page ]

Guidelines for Submissions

APA NEWSLETTERS
    American Indians
        Viola F. Cordova &
        Anne Waters, Co-Editors
    Black Experience
        Jesse Taylor, Editor
   
Philosophy and Computers
        Jon Dorbolo, Editor
    Feminism and Philosophy
        Joan Callahan, Editor
    Hispanic/Latino Issues in
    Philosophy
        Eduardo Mendieta, Editor
    Philosophy and Law
        Richard Nunan, Editor
    Philosophy and Lesbian,
    Gay, Bisexual and
    Transgender Issues
        Timothy Murphy, Editor
    Philosophy and Medicine
        Rosamond Rhodes, Editor
    Teaching Philosophy
        Tziporah Kasachkoff &
        Eugene Kelly, Co-Editors

Navigation
   
Newsletters Index (00:2)
    apaOnline Home Page

 

APA Newsletters

Spring 2001
Volume 00, Number 2


Newsletter on Philosophy and Computers

Computer Ethics

Previous Article | Index | Next Article



Simulating Philosophy

Shai Ophir Lev-Hasharon
ophir@starhome.com

1. Cooperation and Game Theory 

   Ever since Darwin the evolutionary success of a seemingly obvious contradiction to his statement has raised the interests of naturalists all over the world: some cooperating animals are clearly mutualistic or even altruistic. Before the 1960s only a few scientists attempted to understand the evolutionary processes underlying cooperation, since group selection seemed to explain cooperative societies. Yet research in later years could not support a pervasive group-benefit view of selection; how then can cooperative genotypes spread in an environment of selfish genes? Currently, the evolution of cooperation can be divided into several general categories: e.g. (1) by-product mutualism where cooperation is an incidental outcome from genuinely selfish behaviour, (2) kin-selected altruism with its climax in the social insects and (3) reciprocal altruism among unrelated individuals. The Prisoner's Dilemma is used as the standard metaphor to conceptualize the conflict between mutual support and selfish exploitation among interacting non-relatives in biological communities.

   The Prisoner's Dilemma game is issued from the Game Theory of John von Neumann (see [30]) and Oskar Morgenstern and was introduced by RAND game theorists Merrill Flood and Melvin Dresher in 1952. The idea was to introduce some irrationality in Game Theory, which is used as a way of modelling interactions between individuals.

   The game got its name from the following hypothetical situation: imagine two criminals arrested under the suspicion of having committed a crime together. However, the police do not have sufficient proof in order to have them convicted. The two prisoners are isolated from each other, and the police visit each of them and offer a deal: the one who offers evidence against the other one will be freed. If none of them accepts the offer, they are in fact cooperating against the police, and both of them will get only a small punishment because of lack of proof. They both gain. However, if one of them betrays the other one, by confessing to the police, the defector will gain more, since he is freed; the one who remained silent, on the other hand, will receive the full punishment, since he did not help the police, and there is sufficient proof. If both betray, both will be punished, but less severely than if they had refused to talk. The dilemma resides in the fact that each prisoner has a choice between only two options, but cannot make a good decision without knowing what the other one will do.

   This game has been the found to be a very good way of studying cooperation and evolution of cooperation and thus a sort of theory of cooperation based upon reciprocity is found in a wide literature, such as in [1, 3, 4]. The experimental studies of the game and its strategies need a lot of time computation and thus with the progress of computers, computer-scientists and mathematicians have studied it as they have been able to use specific methods, like genetic algorithms, see [9, 2, 5, 8, 19, 20, 21, 26, 27, 28, 31].

   As cooperation is a topic of continuing interest for the social, zoological and biological sciences, a lot of works in those different fields have been made on the Prisoner's Dilemma: [6, 7, 16, 17, 22, 23, 25, 29, 24]. Although all people who have studied the game come from different research fields, it could be said that they are all working on the same topic, which belong to the Artificial Life field, the bottom-up study of Life.

2. The Iterated Prisoner's Dilemma (IPD)

   Let two artificial agents have the choice between cooperation and defection. They play one against the other, in a synchronous manner, so that they do not know what the other will play. They get a score according to the situation of the move:

  • They both cooperate and then both get the cooperation reward, let evaluate it to R points; 
  • They both defect and then get both the selfish punishment, let evaluate it to P points; 
  • One chooses to defect while the other chooses to cooperate, then the one who has defected gets the selfish temptation salary, let it be T points, and the one who has cooperated gets the sucker score, let it be S points.

   To have a dilemma, temptation must be better than cooperation, which must be better than punishment, which must be better than to be the sucker. This can be formalized as:

           T > R > P > S

   Since this one-shot version of the Prisoner's Dilemma is not very interesting (the most rational choice is to defect), the game is iterated, and the final score being the sum of all the moves scored. Each player does not know how many moves there will be, thus each agent's strategy can be studied, to look, for instance, how each player tries to put cooperation in the game.

   To avoid the one-shot Prisoner's Dilemma solution to influence strategies, by giving too much importance to temptation regarding cooperation, it is useful to add the following restriction :

          2R > T + S

   With this restriction, strategies have no advantage in alternatively cooperate and defect. The classical payoff matrix used is shown in table 1.

Action of B \
Action of A
Player A cooperates Player A defects
Player B cooperates R=3 (reward for mutual cooperation) S=0 (sucker's payoff)
Player B defects  T=5 (temptation to defect) P=1 (punishment for mutual defection)

   Table 1: Iterated Prisoner's Dilemma (IPD) scoring for player A.

   The two participants have two options: to cooperate or to defect. The payoff to player A is shown. Obviously, he gains more by defection not only if player two cooperates (5 instead of 3 points), but also if player two defects (1 instead of 0 points). Since this is also valid for player two, both end up defecting and score instead 3 points each if they had cooperated. Hence the dilemma.

   Using evolutionary dynamics software implementation, the IPD can be simulated as an iterated game, where the players (virtual or real players) learn from the results of previous iterations and dynamically select the appropriate strategy among various possible strategies. Strategies can be applied in an evolutionary process, where the fittest strategy survives.

3. The Emergence of Cooperation

   Axelrod, the most famous researcher in the area of Evolution of Cooperation [1, page 110] says: "The advice takes the form of four simple suggestions for how to do well in a durable Iterated Prisoner's Dilemma:

1. Don't be envious,
2. Don't be the first to defect,
3. Reciprocate both cooperation and defection,
4. Don't be too clever."

   When Axelrod writes that, he states that a strategy has to be understandable, hence simple, mainly for its opponents to understand that all it wants is to establish a cooperation period. He thinks that if the strategy clearly announces how it will act, then the other player will be able to cooperate more quickly.

   Axelrod and Hamilton [4] used a computer tournament to detect strategies that would favor cooperation among individuals engaged in the IPD. In a first round, 14 more or less sophisticated strategies and one totally random strategy competed against each other for the highest average scores in an IPD of 200 moves. Unexpectedly, a very simple strategy did outstandingly well: cooperate on the first move and then copy your opponent's last move for all subsequent moves.

   This strategy was called 'Tit for tat' (TFT) and became the basis of an ever growing number of successful strategies. In a similar competition with 62 contestants, TFT won again. It has three characteristics that account for its impressive performance: it is nice (cooperates on the first move), retaliatory (punishes defection in the prior move with defection) and forgiving (immediate return to cooperation after one C of the adversary).

   In order to study the behavior of strategies, two types of computation can be made: The first one is a simple round-robin tournament, in which each strategy meets all other strategies. Its final score is then the sum of all scores done in each confrontation. At the end, the strategy's strength measurement is given by its range in the tournament. This is the way the tit-for-tat strategy has been isolated by Axelrod. The second one is an evolutionary approach, where at the beginning the same number of agents hold each one of the strategies. A round- robin tournament is made and then the population of inefficient strategies is decreased whereas efficient strategies obtain new elements. The simulation is repeated until the population has been stabilized, i.e. the population does not change anymore. It has been found that "nasty" strategies (those who take the initiative of the first defection) are less effective than "kind" ones!

   Delahaye and P. Mathieu [10-14] did some experiments with the evolutionary approach computation method.

   Twelve strategies have been tested:

1. Cooperate

always cooperates

2. Defect

always defects

3. Random

cooperates with a probability of 0.5

4. Tit-for-tat

cooperates on the first move and then plays what its opponent played on the previous move

5. Spite

cooperates until the opponent defects, then defects all the time

6. Per-kind

plays periodically [cooperate, cooperate, defect]

7. Per-nasty

plays periodically [defect, defect, cooperate]

8. Soft-majo plays the opponent's most used move and cooperates in case of equality (first move considered as equality)
9. Mistrust has the same behavior as tit-for-tat but defects on the first move
10. Prober begins by playing [cooperate, defect, defect], then if the opponent cooperates on the second and the third move
11. Gradual continues to defect, else plays tit-for-tat. This strategy acts as tit-for-tat, except of the following: after the first defection of the other player, defects one time and cooperates two times; After the second defection of the opponent, defect two times and cooperate two times,...
12. Pavlov cooperates on the first move and then cooperates only if the two players made the same move, this strategy was studied in [27].

Each game (experiment) consisted of 1000 moves, with the classical payoff shown in table 1. The results were as follows: (the full results are in [11]).

Strategy
Final Score
Gradual
33416
Tit-for-tat
31411
Soft majo
31210
Spite
30013
Prober
29177
Pavlov
28910
Mistrust
25921
Cooperate
25484
Per kind
24796
Defect
24363
Per nasty
23835
Random
22965

   The results of these experiments show that strategies with an inclination to cooperate gain more than the “nasty” ones! Gradual, Tit-for-Tat, and Soft_majo (the leading three strategies) all have a strong tendency to cooperate, while Defect and Per_nasty are defect-oriented (and Random is simply a dumb).

   Several other experiments have been made with virtual systems in order to show the emergence of cooperation among software agents. The conclusions derived from such results are far-reaching, and are beyond the scope and the goal of this paper. I’ll just mention Daniel Dennett [15], who explains that Darwin’s idea is considered dangerous because it challenges the belief that there is something special about life, and in particular about human life, consciousness, emotions, etc. Instead, it shows how these essences are the result of billions and billions of applications of a simple, mindless, mechanistic process.

4. Simulating Ethics

   A major question I’d like to raise is whether norms of ethics can emerge in a virtual system, and how it may affect our Philosophy.

   Let’s examine a simple game called the Ultimatum game. A cake is divided into ten pieces. Player A can demand either five or nine pieces, player B can either accept or reject the proposal. We will analyse this game within the context of evolutionary game theory.

   What are the strategies applicable for this game? Each player can be either A or B. While being A there are only two options: Demand5 and Demand9. Being B enables four options: Accept all, Reject all, Accept if 9 is demanded but reject if 5 is demanded, and Accept if 5 is demanded but reject if 9 is demanded. In total, we have 2X4 strategies for a certain player, where each strategy tells her how to behave while being A or B.

   We can summarize the strategies in the following table:

 
IF BEING A
IF BEING B
S1
Demand 9
Accept All
S2
Demand 9
Reject All
S3
Demand 9
Accept 5, Reject 9
S4
Demand 9
Accept 9, Reject 5
S5

Demand 5

Accept All
S6
Demand 5
Reject All
S7
Demand 5
Accept 5, Reject 9
S8
Demand 5
Accept 9, Reject 5

   This game has been simulated by Brian Skyrms [32], using “genetic algorithms”.In this method, replication is governed by success, judged by some standards relevant for the problem. Once in a while, the code of the application is cut into two pieces, and the pieces are swapped between other programs, creating new programs. Most of these programs will be useless, but some will be successful. The task for the programs is pre-defined, so there is a clear criteria whether the program is successful or not. This “crossover” takes place again and again, and few of the generated programs are approaching closer and closer towards the target. This method has been brought into the computer sciences by John Holland [18] and expanded by John Koza [19].

   How can genetic algorithms be applied to the Ultimatum game? We can view strategies as strings. Thus, the strategy If A: demand 9; If B: accept 5, reject has the following sub-strategies: If A: demand 9 and if B: accept 5, reject 9. In addition, the second sub-strategy can be split into: If B and 5 is required, accept it and If B and 9 is required, reject it. The idea is to cut the strategies into the smallest pieces and recombine them arbitrarily, and to create new strategies as many as possible. After a cycle of recombination the strategies are played against each other and only the fittest will survive.

   Skyrms performed several experiments with this game. In each experiment he initiated groups of agents with different strategies, let them play against each other, and operated the recombination again and again. In many of the experiments he observed a persistence of the strategy S7. This strategy (known later as Fairman) is the most fair strategy among the eight!

   In most cases Fairman was not the winning strategy. S1 and S4 did better when starting with a population with equal proportions of the strategies. But it did not go extinct, in opposition to other strategies, S2 for example. In certain initial conditions, the range of population holding Fairman has been expanded. One experiment started with 30% of the population using Fairman, with the remaining strategies having equal proportions of the rest of the population. Fairman gained 64% of the population, while S1 and S4 did not survive. In another experiment, the initial proportions of the population for S1-S8 strategies where <.32,.02,.10,.02,.10,.02,.40,.02> , respectively. The results state show 56.5% of Fairman and 43.5% of S5.

   The above does not pretend to be a full evolutionary explanation of the fairness effect. Fairness is not a rational strategy, in the sense that if confronted with an unfair offer, it chooses the lower payoff. However the above shows that such a strategy can persist in evolution.

5. Utilitarianism: are Kant and Mill really different?

   John Stuart Mill argues that moral theories are divided between two distinct approaches: the intuitive and inductive schools. Although both schools agree that there is a single and highest normative principle, they disagree about whether we have knowledge of that principle intuitively (without appeal to experience), or inductively (though experience and observation). Kant represents the best of the intuitive school, and Mill himself defends the inductive school. Mill criticizes Kant’s categorical imperative noting that it is essentially the same as utilitarianism since it involves calculating the good or bad consequences of an action to determine the morality of that action.

   The simulations I described, as well as many others, show that the categorical imperative is doing pretty well as a strategy. The cooperative strategies actually follow the rule: “Act only so that if others act likewise fitness is maximized.” Strategies that violate this imperative are driven to extinction. However I am not arguing that Kant is indeed a utilitarian, as Mill claims. In my opinion, Kant and Mill are not really different. We could see how moral ethics emerges from a system ruled by pure utilitarian relationships, in an evolutionary process. The intuitive and the inductive attitudes are the two faces of the same essence. They describe the same phenomena in two different languages, in two different levels, complementing each other.

   Human morality has emerged from the complex system of life and society. Complexity theory is trying to understand complex systems “bottom-up”, from the base elements and their relations up to the general phenomena, in oppose to the traditional “top-down” analysis, which is looking at the logic and mechanism that controls the behaviour of the system. Complexity brings a new kind of understanding. We can simulate a system as a whole, but we can not predict the state of each one of its elements at a given time. However we can see how the general properties of the system have emerged. We can see how morality is an evolutionary consequence of utilitarianism, hence they are the two sides of the same coin.

6. The Potential of Simulating Complex Systems

   Agent-based modelling of complex systems is not limited to game theory. Various simulations usually contain a physical landscape and a definition of environmental conditions. The agents are grouped into social-related groups and are assigned basic rules of social behaviour. The agents then are associated with a range of alternative states and then subjected to competitive fitness tests.

   Such ‘artificial societies’ will hopefully open new windows into the kind of variables that are influencing the evolution of human behaviour today in the contemporary world. Models are being developed today in order to explain the evolution of cultures, the creation of sociological structures and the behaviour of economic systems.

   The new discipline of complexity brings new insights into our philosophical discussions and problems. Today we are using neural networks in order to simulate brain behaviour. We cannot define the rules that govern the brain, but we can try to simulate it based on the elementary relations between neurons. The same can be applied for ethics, as well as for other philosophical issues.

Notes

1. R. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.
2. R. Axelrod. The evolution of strategies in the iterated prisoner’s dilemma. In L. Davis, editor, Genetic Algorithms and the Simulated Annealing, chapter 3, pages 32—41. Pitman, London, 1987.
3. R. Axelrod and D. Dion. The further evolution of cooperation. Science, 242:1385—1390, 1988.
4. R. Axelrod and W. D. Hamilton. The evolution of cooperation. Science, 211:1390—1396, 1981.
5. S. Bankes. Exploring the foundations of artificial societies. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 337—342. MIT Press, 1994.
6. J. Batali and P. Kitcher. Evolutionary dynamics of altruistic behavior in optional and compulsory versions of the iterated prisoner’s dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 344-348. MIT Press, 1994.
7. J. Bendor. In good times and bad: Reciprocity in an uncertain world. American J. of Political Science, 31:531—558, 1987.
8. Robert Boyd and Jeffrey P. Lorberbaum. No pure strategy is evolutionarily stable in the repeated prisoner’s dilemma game. Nature, 327:58—59, 1987.
9. E. Ann Stanley D. Ashlock, M. D. Smucker and L. Tesfatsion. Preferential partner selection in an evolutionnary study of prisoner’s dilemma. Economics R. No 35, Submitted for publication, 1994.
10. J.P. Delahaye. L’altruisme r’ecompens’e ? Pour La Science, 181:150—156, 1992.
11. J.P. Delahaye and P. Mathieu. Exp’eriences sur le dilemme it’er’e des prisonniers. Rapport de Recherche 233, LIFL Lille CNRS (URA 369), 1992.
12. J.P. Delahaye and P. Mathieu. L’altruisme perfectionn’e. Pour La Science, 187:102—107, 1993.
13. J.P. Delahaye and P. Mathieu. L’altruisme perfectionn’e. Rapport de Recherche 249, LIFL Lille CNRS (URA 369), 1993.
14. J.P. Delahaye and P. Mathieu. Complex strategies in the iterated prisoner’s dilemma. In A. Albert, editor, Chaos and Society, Amsterdam, 1995. IOS Press.
15. D. Dennett. Darwin’s Dangerous Idea: Evolution and the Meaning of Life. Simon and Schuster, NY, 1995.
16. M. R. Frean. The prisoner’s dilemma without synchrony. Proc. Royal Society London, 257(B):75—79, 1994.
17. H. C. J. Godfray. The evolution of forgiveness. Nature, 355:206—207, 1992.
18. J. Holland. Adaptations in Neural and Artificial Systems. Ann Harbor: University of Michigan Press, 1975.
19. J. Koza. Genetic Programming: On the Programming of Computers by Natural Selection. Cambridge, Mass.: MIT Press, 1992.
20. D. Ashlock M., D. Smucker, E. Ann Stanley. Analyzing social network structures in the iterated prisoner’s dilemma with choice and refusal. RR. CS- TR-94-1259, University of Wisconsin-Madison, Department of Computer-Sciences, 1994.
21. C. Martino. Emergent nastiness in iterated prisoner’s dilemma games. 2.725: Design and Automation, 1995.
22. R. M. May. More evolution of cooperation. Nature, 327:15-17, 1987.
23. P. Molander. The optimal level of generosity in a selfish, uncertain environment. J. of Conflict Resolution, 29(4):611—618, 1985.
24. M. Nowak. Stochastic strategies in the prisoner’s dilemma. Theoretical Population Biology, 38:93-112, 1990.
25. M. Nowak and K. Sigmund. The evolution of stochastic strategies n the prisoner’s dilemma. Acta ApplicandaeMathematicae, 20:247—265, 1990.
26. M. Nowak and K. Sigmund. Tit for tat in heterogeneous populations. Nature, 355:250—253, 1992.
27. M. Nowak and K. Sigmund. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner’s dilemma game. Nature, 364:56—58, 1993.
28. M. Oliphant. Evolving cooperation in the non-iterated prisoner’s dilemma. In R. A. Brooks and P. Maes, editors, Artificial Life, Proc. 4th International Workshop on the Synthesis and Simulation of Living Systems, volume 4, pages 350—352. MIT Press, 1994.
29. R. Pool. Putting game theory to the test. Science, 267:1591—1593, 1995.
30. W. Poundstone. Prisoner’s Dilemma : John von Neumann, Game Theory, and the Puzzle of the Bomb. Number 0-19-286162-X. Oxford University Press, Oxford, 1993.
31. Xin Yao and P. J. Darwen. An experimental study of n-person iterated prisoner’s dilemma game. Informatica, 18:435—450, 1994. 132. B. Skyrms. The Evolution of the Social Contract. Cambridge University Press, 1996.


Previous Article | Index | Next Article


Copyright 2000, The American Philosophical Association.
Last revised: August 28, 2001