Prisoner's Dilemma & Tit for Tat Winning Strategy
Will the two prisoners cooperate to minimize
total loss of liberty, or will one of them, trusting the other to
cooperate, betray him so as to go free?
In game theory, the prisoner's dilemma (sometimes abbreviated PD) is a type of non-zero-sum
game in which two players may each "cooperate" with or "defect" (i.e.,
betray) the other player. In this game, as in all game theory, the only
concern of each individual player ("prisoner") is maximizing his/her
own payoff, without any concern for the other player's payoff. The
unique equilibrium for this game is a Pareto-suboptimal solution—that is, rational choice leads the two players to both play defect even though each player's individual reward would be greater if they both played cooperate.
Tit for tat is a highly effective strategy in game theory for the iterated prisoner's dilemma. Based on the English saying meaning "equivalent retaliation" ("tit for tat"), an agent
using this strategy will initially cooperate, then respond in kind to
an opponent's previous action. If the opponent previously was
cooperative, the agent is cooperative. If not, the agent is not. This
is similar to reciprocal altruism in biology.
In the classic form of this game, cooperating is strictly dominated by defecting, so that the only possible equilibrium
for the game is for all players to defect. In simpler terms, no matter
what the other player does, one player will always gain a greater
payoff by playing defect. Since in any situation playing defect is more beneficial than cooperating, all rational players will play defect, all things being equal.
In the iterated prisoner's dilemma the game is played repeatedly.
Thus each player has an opportunity to "punish" the other player for
previous non-cooperative play. Cooperation may then arise as an
equilibrium outcome. The incentive to defect is overcome by the threat
of punishment, leading to the possibility of a cooperative outcome. So
if the game is infinitely repeated, cooperation may be a subgame perfect Nash equilibrium although both players defecting always remains an equilibrium and there are many other equilibrium outcomes.
The classical prisoner's dilemma
The Prisoner's Dilemma was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence payoffs and gave it the "Prisoner's Dilemma" name (Poundstone, 1992).
The classical prisoner's dilemma (PD) is as follows:
- Two suspects, A and B, are arrested by the police. The police have
insufficient evidence for a conviction, and, having separated both
prisoners, visit each of them to offer the same deal: if one testifies
for the prosecution against the other and the other remains silent, the
betrayer goes free and the silent accomplice receives the full 10-year
sentence. If both remain silent, both prisoners are sentenced to only
six months in jail for a minor charge. If each betrays the other, each
receives a five-year sentence. Each prisoner must make the choice of
whether to betray the other or to remain silent. However, neither
prisoner knows for sure what choice the other prisoner will make. So
this dilemma poses the question: How should the prisoners act?
The dilemma can be summarized thus:
|
Prisoner B Stays Silent |
Prisoner B Betrays |
| Prisoner A Stays Silent |
Each serves six months |
Prisoner A serves ten years
Prisoner B goes free |
| Prisoner A Betrays |
Prisoner A goes free
Prisoner B serves ten years |
Each serves five years |
The dilemma arises when one assumes that both prisoners only care
about minimizing their own jail terms. Each prisoner has two and only
two options: either to cooperate with his accomplice and stay quiet, or
to defect from their implied pact and betray his accomplice in return
for a lighter sentence. The outcome of each choice depends on the
choice of the accomplice, but each prisoner must choose without knowing
what his accomplice has chosen.
In deciding what to do in strategic situations, it is normally
important to predict what others will do. This is not the case here. If
you knew the other prisoner would stay silent, your best move is to
betray as you then walk free instead of receiving the minor sentence.
If you knew the other prisoner would betray, your best move is still to
betray, as you receive a lesser sentence than by silence. Betraying is
a dominant strategy.
The other prisoner reasons similarly, and therefore also chooses to
betray. By both defecting they get a lower payoff than they would get
by staying silent. Rational self-interested play results in each
prisoner being worse off than if they had stayed silent. In more
technical language, this demonstrates very elegantly that in a non-zero sum game a Nash Equilibrium need not be a Pareto optimum.
Note that the paradox of the situation lies in that the prisoners
are not defecting in hope that the other will not. Even when they both
know the other to be rational and selfish, they will both play defect.
Defect is what they will play no matter what, even though they
know fully well that the other player is playing defect as well and
that they will both be better off with a different result.
The "Stay Silent" and "Betray" strategies are also known as "don't confess" and "confess", or the more standard "cooperate" and "defect."
One experiment based on the simple dilemma found that approximately 40% of participants cooperated (i.e., stayed silent).[1]
Generalized form
We can expose the skeleton of the game by stripping it of the prisoner framing device. The generalized form of the game has been used frequently in experimental economics. The following rules give a typical realization of the game.
There are two players and a banker. Each player holds a set of two
cards: one printed with the word "Cooperate", the other printed with
"Defect" (the standard terminology for the game). Each player puts one
card face-down in front of the banker. By laying them face down, the
possibility of a player knowing the other player's selection in advance
is eliminated (although revealing one's move does not affect the
dominance analysis[2]). At the end of the turn, the banker turns over both cards and gives out the payments accordingly.
If player 1 (red) defects and player 2 (blue) cooperates, player 1
gets the Temptation to Defect payoff of 5 points while player 2
receives the Sucker's payoff of 0 points. If both cooperate they get
the Reward for Mutual Cooperation payoff of 3 points each, while if
they both defect they get the Punishment for Mutual Defection payoff of
1 point. The checker board payoff matrix showing the payoffs is given below.
Canonical PD payoff matrix
|
Cooperate |
Defect |
| Cooperate |
3, 3 |
0, 5 |
| Defect |
5, 0 |
1, 1 |
In "win-lose" terminology the table looks like this:
|
Cooperate |
Defect |
| Cooperate |
win-win |
lose much-win much |
| Defect |
win much-lose much |
lose-lose |
These point assignments are given arbitrarily for illustration. It is possible to generalize them. Let T stand for Temptation to defect, R for Reward for mutual cooperation, P for Punishment for mutual defection and S for Sucker's payoff. The following inequalities must hold:
T > R > P > S
In addition to the above condition, if the game is repeatedly played by two players, the following condition should be added.[3]
2 R > T + S
If that condition does not hold, then full cooperation is not necessarily Pareto optimal, as the players are collectively better off by having each player alternate between cooperate and defect.
These rules were established by cognitive scientist Douglas Hofstadter and form the formal canonical description of a typical game of Prisoner's Dilemma.
The iterated prisoner's dilemma
If two players play Prisoner's Dilemma more than once in succession
(that is, having memory of at least one previous game), it is called
iterated Prisoner's Dilemma. Amongst results shown by Nobel Prize
winner Robert Aumann
in his 1959 paper, rational players repeatedly interacting for
indefinitely long games can sustain the cooperative outcome. Popular
interest in the iterated prisoners dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation
(1984). In this he reports on a tournament he organized in which
participants have to choose their mutual strategy again and again, and
have memory of their previous encounters. Axelrod invited academic
colleagues all over the world to devise computer strategies to compete
in an IPD tournament.
The programs that were entered varied widely in algorithmic complexity;
initial hostility; capacity for forgiveness; and so forth.
Axelrod discovered that when these encounters were repeated over a
long period of time with many players, each with different strategies,
greedy strategies tended to do very poorly in the long run while more altruistic
strategies did better, as judged purely by self-interest. He used this
to show a possible mechanism for the evolution of altruistic behaviour
from mechanisms that are initially purely selfish, by natural selection.
The best deterministic strategy was found to be "Tit for Tat," which Anatol Rapoport
developed and entered into the tournament. It was the simplest of any
program entered, containing only four lines of BASIC, and won the
contest. The strategy is simply to cooperate on the first iteration of
the game; after that, the player does what his opponent did on the
previous move. Depending on the situation, a slightly better strategy
can be "Tit for Tat with forgiveness." When the opponent defects, on
the next move, the player sometimes cooperates anyway, with a small
probability (around 1%-5%). This allows for occasional recovery from
getting trapped in a cycle of defections. The exact probability depends
on the line-up of opponents.
By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.
- Nice
- The most important condition is that the strategy must be "nice",
that is, it will not defect before its opponent does. Almost all of the
top-scoring strategies were nice; therefore a purely selfish strategy
will not "cheat" on its opponent, for purely utilitarian reasons first.
- Retaliating
- However, Axelrod contended, the successful strategy must not be a
blind optimist. It must sometimes retaliate. An example of a
non-retaliating strategy is Always Cooperate. This is a very bad
choice, as "nasty" strategies will ruthlessly exploit such softies.
- Forgiving
- Another quality of successful strategies is that they must be
forgiving. Though they will retaliate, they will once again fall back
to cooperating if the opponent does not continue to play defects. This
stops long runs of revenge and counter-revenge, maximizing points.
- Non-envious
- The last quality is being non-envious, that is not striving to
score more than the opponent (impossible for a ‘nice’ strategy, i.e., a
'nice' strategy can never score more than the opponent).
Therefore, Axelrod reached the Utopian-sounding
conclusion that selfish individuals for their own selfish good will
tend to be nice and forgiving and non-envious. One of the most
important conclusions of Axelrod's study of IPDs is that Nice guys can
finish first.
The optimal (points-maximizing) strategy for the one-time PD game is
simply defection; as explained above, this is true whatever the
composition of opponents may be. However, in the iterated-PD game the
optimal strategy depends upon the strategies of likely opponents, and
how they will react to defections and cooperations. For example,
consider a population where everyone defects every time, except for a
single individual following the Tit-for-Tat strategy. That individual
is at a slight disadvantage because of the loss on the first turn. In
such a population, the optimal strategy for that individual is to
defect every time. In a population with a certain percentage of
always-defectors and the rest being Tit-for-Tat players, the optimal
strategy for an individual depends on the percentage, and on the length
of the game.
Deriving the optimal strategy is generally done in two ways:
- Bayesian Nash Equilibrium:
If the statistical distribution of opposing strategies can be
determined (e.g. 50% tit-for-tat, 50% always cooperate) an optimal
counter-strategy can be derived analytically.[4]
- Monte Carlo simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a genetic algorithm
for finding an optimal strategy). The mix of algorithms in the final
population generally depends on the mix in the initial population. The
introduction of mutation (random variation during reproduction) lessens
the dependency on the initial population; empirical experiments with
such systems tend to produce Tit-for-Tat players (see for instance
Chess 1988), but there is no analytic proof that this will always occur.
Although Tit-for-Tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings [1]
and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers,
Perukrishnen Vytelingum) introduced a new strategy at the
20th-anniversary Iterated Prisoner's Dilemma competition, which proved
to be more successful than Tit-for-Tat. This strategy relied on
cooperation between programs to achieve the highest number of points
for a single program. The University submitted 60 programs to the
competition, which were designed to recognize each other through a
series of five to ten moves at the start. Once this recognition was
made, one program would always cooperate and the other would always
defect, assuring the maximum number of points for the defector. If the
program realized that it was playing a non-Southampton player, it would
continuously defect in an attempt to minimize the score of the
competing program. As a result,[5] this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
This strategy takes advantage of the fact that multiple entries were
allowed in this particular competition, and that the performance of a
team was measured by that of the highest-scoring player (meaning that
the use of self-sacrificing players was a form of minmaxing).
In a competition where one has control of only a single player,
Tit-for-Tat is certainly a better strategy. Because of this new rule,
this competition also has little theoretical significance when
analysing single agent strategies as compared to Axelrod's seminal
tournament. However, it provided the framework for analysing how to
achieve cooperative strategies in multi-agent frameworks, especially in
the presence of noise. In fact, long before this new-rules tournament
was played, Richard Dawkins in his book The Selfish Gene
pointed out the possibility of such strategies winning if multiple
entries were allowed, but remarked that most probably Axelrod would not
have allowed them if they had been submitted. It also relies on
circumventing rules about the prisoner's dilemma in that there is no
communication allowed between the two players. When the Southampton
programs engage in an opening "ten move dance" to recognize one
another, this only reinforces just how valuable communication can be in
shifting the balance of the game.
If an iterated PD is going to be iterated exactly N times, for some
known constant N, then it is always optimal to defect in all rounds.
The only possible Nash equilibrium
is to always defect. The proof goes like this: one might as well defect
on the last turn, since the opponent will not have a chance to punish
the player. Therefore, both will defect on the last turn. Thus, the
player might as well defect on the second-to-last turn, since the
opponent will defect on the last no matter what is done, and so on. For
cooperation to emerge the total number of rounds must be random, or at
least unknown to the players. However, even in this case always defect
is no longer a strictly dominant strategy, only a Nash equilibrium.
Another odd case is "play forever" prisoner's dilemma. The game is
repeated infinitely many times, and the player's score is the average
(suitably computed).
The prisoner's dilemma game is fundamental to certain theories of
human cooperation and trust. On the assumption that the PD can model
transactions between two people requiring trust, cooperative behaviour
in populations may be modelled by a multi-player, iterated, version of
the game. It has, consequently, fascinated many scholars over the
years. In 1975, Grofman and Pool estimated the count of scholarly
articles devoted to it at over 2,000. The iterated prisoner's dilemma
has also been referred to as the "Peace-War game".[6]
Learning psychology and game theory
Where game players can learn to estimate the likelihood of other
players defecting, their own behaviour is influenced by their
experience of the others' behaviour. Simple statistics show that
inexperienced players are more likely to have had, overall, atypically
good or bad interactions with other players. If they act on the basis
of these experiences (by defecting or cooperating more than they would
otherwise) they are likely to suffer in future transactions. As more
experience is accrued a truer impression of the likelihood of defection
is gained and game playing becomes more successful. The early
transactions experienced by immature players are likely to have a
greater effect on their future playing than would such transactions
affect mature players. This principle goes part way towards explaining
why the formative experiences of young people are so influential and
why, for example, those who are particularly vulnerable to bullying
sometimes become bullies themselves.
The likelihood of defection in a population may be reduced by the experience of cooperation in earlier games allowing trust to build up.[7]
Hence self-sacrificing behaviour may, in some instances, strengthen the
moral fibre of a group. If the group is small the positive behaviour is
more likely to feed back in a mutually affirming way, encouraging
individuals within that group to continue to cooperate. This is allied
to the twin dilemma of encouraging those people whom one would aid to
indulge in behaviour that might put them at risk. Such processes are
major concerns within the study of reciprocal altruism, group selection, kin selection and moral philosophy.
Rationality and super-rationality
One resolution of the dilemma proposed by Douglas Hofstadter in his Metamagical Themas is to reject the definition of "rational" that led to the "rational" decision to defect. In this view, truly rational (or "superrational")
players take into account that the other person is (presumably)
superrational, like them, and thus they behave identically, and thus
they cooperate. This analysis of the one-shot game is in complete
contradiction to classical game theory, but according to this view
follows naturally from the symmetry between the two players:
- an optimal strategy must be the same for both players (unlike the terms of the classical prisoner's game)
- the result must lie on the diagonal of the payoff matrix
- maximize return from solutions on the diagonal
- cooperate
Morality
While it is sometimes thought that morality must involve the constraint of self-interest, David Gauthier
famously argues that co-operating in the prisoners dilemma on moral
principles is consistent with self-interest and the axioms of game
theory. It is most prudent to give up straightforward maximizing and
instead adopt a disposition of constrained maximization, according to
which one resolves to cooperate with all similarly disposed persons and
defect on the rest. In other words, moral constraints are justified
because they make us all better off, in terms of our preferences
(whatever they may be). This form of contractarianism
claims that good moral thinking is just an elevated and subtly
strategic version of plain old means-end reasoning. Those that defect
can be predicted because people are not completely opaque.
Douglas Hofstadter expresses a strong personal belief that the mathematical symmetry is reinforced by a moral symmetry, along the lines of the Kantian categorical imperative:
defecting in the hope that the other player cooperates is morally
indefensible. If players treat each other as they would treat
themselves, then off-diagonal results cannot occur.
Real-life examples
These particular examples, involving prisoners and bag switching and
so forth, may seem contrived, but there are in fact many examples in
human interaction as well as interactions in nature that have the same
payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics and sociology, as well as to the biological sciences such as ethology and evolutionary biology.
Many natural processes have been abstracted into models in which living
beings are engaged in endless games of Prisoner's Dilemma (PD). This
wide applicability of the PD gives the game its substantial importance.
In political science, for instance, the PD scenario is often used to illustrate the problem of two states engaged in an arms race. Both will reason that they have two options, either to increase military expenditure
or to make an agreement to reduce weapons. Neither state can be certain
that the other one will keep to such an agreement; therefore, they both
incline towards military expansion. The paradox is that both states are acting rationally, but producing an apparently irrational result. This could be considered a corollary to deterrence theory.
In sociology or criminology,
the PD may be applied to an actual dilemma facing two inmates. The game
theorist Marek Kaminski, a former political prisoner, analysed the
factors contributing to payoffs in the game set up by a prosecutor for
arrested defendants (cf. References).
He concluded that while the PD is the ideal game of a prosecutor,
numerous factors may strongly affect the payoffs and potentially change
the properties of the game.
In program management and technology development, the PD applies to
the relationship between the customer and the developer. Capt Dan Ward,
an officer in the US Air Force, examined The Program Manager's Dilemma in an article published in Defense AT&L, a defense technology journal.[8]
Another example concerns a well-known concept in cycling races, for instance in the Tour de France. Consider two cyclists halfway in a race, with the peloton (larger group) at great distance behind them. The two cyclists often work together (mutual cooperation)
by sharing the tough load of the front position, where there is no
shelter from the wind. If neither of the cyclists makes an effort to
stay ahead, the peloton will soon catch up (mutual defection). An often-seen scenario is one cyclist doing the hard work alone (cooperating), keeping the two ahead of the peloton. In the end, this will likely lead to a victory for the second cyclist (defecting) who has an easy ride in the first cyclist's slipstream.
Also in athletics, there is a widespread practice in high school
wrestling where the participants intentionally lose unnaturally large
amounts of weight so as to compete against lighter opponents. In doing
so, the participants are clearly not at their top level of physical and
athletic fitness and yet often end up competing against the same
opponents anyway, who have also followed this practice (mutual defection). The result is a reduction in the level of competition. Yet if a participant maintains their natural weight (cooperating), they will most likely compete against a stronger opponent who has lost considerable weight.
Advertising is sometimes cited as a real life example of the prisoner’s dilemma. When cigarette
advertising was legal in the United States, competing cigarette
manufacturers had to decide how much money to spend on advertising. The
effectiveness of Firm A’s advertising was partially determined by the
advertising conducted by Firm B. Likewise, the profit derived from
advertising for Firm B is affected by the advertising conducted by Firm
A. If both Firm A and Firm B chose to advertise during a given period
the advertising cancels out, receipts remain constant, and expenses
increase due to the cost of advertising. Both firms would benefit from
a reduction in advertising. However, should Firm B choose not to
advertise, Firm A could benefit greatly by advertising. Nevertheless,
the optimal amount of advertising by one firm depends on how much
advertising the other undertakes. As the best strategy is dependent on
what the other firm chooses there is no dominant strategy and this is
not a prisoner's dilemma but rather is an example of a stag hunt.
The outcome is similar, though, in that both firms would be better off
were they to advertise less than in the equilibrium. Sometimes
cooperative behaviors do emerge in business situations. For instance,
cigarette manufacturers endorsed the creation of laws banning cigarette
advertising, understanding that this would reduce costs and increase
profits across the industry.[7] This analysis is likely to be pertinent in many other business situations involving advertising.
Large software projects under the GPL (such as Linux) can force cooperation in an otherwise standard PD situation. Given a piece of Free Software,
you can study the (modifiable) source code and make improvements. Then
you can keep secret the improved version, i.e. keep the modified source
code to yourself and distribute it in an unmodifiable binary form
(defect). Alternatively, you could share the improved version in a
modifiable source code form (cooperate). If everyone defects, then many
are probably making exactly the same improvements. For any software
that is under the GPL, it is illegal to distribute only the
unmodifiable form, including any changes made, thus forcing
cooperation. Hence, rival parties can all work on it and know that none
will defect, and all share in the improvements made by the others.
Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons
may be viewed as an example of a multi-player generalization of the PD:
Each villager makes a choice for personal gain or restraint. The
collective reward for unanimous (or even frequent) defection is very
low payoffs (representing the destruction of the "commons"). Such
multi-player PDs are not formal as they can always be decomposed into a
set of classical two-player games. The commons are not always
exploited: William Poundstone,
in a book about the Prisoner's Dilemma (see References below),
describes a situation in New Zealand where newspaper boxes are left
unlocked. It is possible for someone to take a paper without paying (defecting) but very few do, perhaps feeling that if they do not pay then nor will others, destroying the system. (Because there is no mechanism for personal choice to influence others' decisions this widespread line of reasoning is called "magical thinking".)[9] Newspapers are less risky to distribute under the honour system than other consumables because taking more than one offers very little extra benefit. Another real-life example is gridlock.
The theoretical conclusion of PD is one reason why, in many countries, plea bargaining
is forbidden. Often, precisely the PD scenario applies: it is in the
interest of both suspects to confess and testify against the other
prisoner/suspect, even if each is innocent of the alleged crime.
Arguably, the worst case is when only one party is guilty — here, the
innocent one is unlikely to confess, while the guilty one is likely to
confess and testify against the innocent.
Related games
Closed-bag exchange
Hofstadter[10]
once suggested that people often find problems such as the PD problem
easier to understand when it is illustrated in the form of a simple
game, or trade-off. One of several examples he used was "closed bag
exchange":
- Two people meet and exchange closed bags, with the understanding
that one of them contains money, and the other contains a purchase.
Either player can choose to honour the deal by putting into his bag
what he agreed, or he can defect by handing over an empty bag.
In this game, defection is always the best course, implying that
rational agents will never play, and that "closed bag exchange" will be
a missing market due to adverse selection.
However, in this case both players cooperating and both players
defecting actually give the same result, so chances of mutual
cooperation, even in repeated games, are few.
Friend or Foe?
Friend or Foe? is a game show that aired from 2002 to 2005 on the Game Show Network in the United States.
It is an example of the prisoner's dilemma game tested by real people,
but in an artificial setting. On the game show, three pairs of people
compete. As each pair is eliminated, they play a game of Prisoner's
Dilemma to determine how their winnings are split. If they both
cooperate (Friend), they share the winnings 50-50. If one cooperates
and the other defects (Foe), the defector gets all the winnings and the
cooperator gets nothing. If both defect, both leave with nothing.
Notice that the payoff matrix is slightly different from the standard
one given above, as the payouts for the "both defect" and the
"cooperate while the opponent defects" cases are identical. This makes
the "both defect" case a weak equilibrium, compared with being a strict
equilibrium in the standard prisoner's dilemma. If you know your
opponent is going to vote Foe, then your choice does not affect your
winnings. In a certain sense, Friend or Foe has a payoff model between "Prisoner's Dilemma" and "Chicken".
The payoff matrix is
|
Cooperate |
Defect |
| Cooperate |
1, 1 |
0, 2 |
| Defect |
2, 0 |
0, 0 |
This payoff matrix was later used on the British television programmes Shafted and Golden Balls.
See also
Notes
- ^ Tversky, Amos (2004). Preference, Belief, and Similarity: Selected Writings. MIT Press. ISBN 026270093X.
- ^ A simple "tell"
that partially or wholly reveals one player's choice — such as the Red
player playing their Cooperate card face-up — does not change the fact
that Defect is the dominant strategy. When one is considering the game
itself, communication has no effect whatsoever. However, when the game
is being played in real life considerations outside of the game itself
may cause communication to matter. It is a point of utmost importance
to the full implications of the dilemma that when we do not need to
take into account external considerations, single-instance Prisoner's
Dilemma is not affected in any way by communications.
Even in single-instance Prisoner's Dilemma, meaningful prior
communication about issues external to the game could alter the play
environment, by raising the possibility of enforceable side contracts
or credible threats. For example, if the Red player plays their
Cooperate card face-up and simultaneously reveals a binding commitment
to blow the jail up if and only if Blue Defects (with additional payoff
-11,-10),
then Blue's Cooperation becomes dominant. As a result, players are
screened from each other and prevented from communicating outside of
the game.
- ^ Dawkins, Richard (1989). The Selfish Gene. Oxford University Press. ISBN 0-19-286092-5. Page: 204 of Paperback edition
- ^ For example see the 2003 study “Bayesian Nash equilibrium; a statistical test of the hypothesis” for discussion of the concept and whether it can apply in real economic or strategic situations (from Tel Aviv University).
- ^ The 2004 Prisoner's Dilemma Tournament Results show University of Southampton's
strategies in the first three places, despite having fewer wins and
many more losses than the GRIM strategy. (Note that in a PD tournament,
the aim of the game is not to “win” matches - that can easily be
achieved by frequent defection). It should also be pointed out that
even without implicit collusion between software strategies
(exploited by the Southampton team) tit-for-tat is not always the
absolute winner of any given tournament; it would be more precise to
say that its long run results over a series of tournaments outperform
its rivals. (In any one event a given strategy can be slightly better
adjusted to the competition than tit-for-tat, but tit-for-tat is more
robust). The same applies for the tit-for-tat-with-forgiveness variant,
and other optimal strategies: on any given day they might not 'win'
against a specific mix of counter-strategies.
An alternative way of putting it is using the Darinian ESS
simulation. In such a simulation Tit-for-Tat will almost always come to
dominate, though nasty strategies will drift in and out of the
population because a Tit-for-Tat population is penetratable by
non-retaliating nice strategies which in turn are easy prey for the
nasty strategies. Richard Dawkins showed that here no static mix of
strategies form a stable equilibrium and the system will always
oscillate between bounds.
- ^ Shy, O., 1996, Industrial Organization: Theory and Applications, Cambridge, Mass.: The MIT Press.
- ^ a b This argument for the development of cooperation through trust is given in The Wisdom of Crowds , where it is argued that long-distance capitalism was able to form around a nucleus of Quakers,
who always dealt honourably with their business partners. (Rather than
defecting and reneging on promises – a phenomenon that had discouraged
earlier long-term unenforceable overseas contracts). It is argued that
dealings with reliable merchants allowed the meme
for cooperation to spread to other traders, who spread it further until
a high degree of cooperation became a profitable strategy in general commerce
- ^ Ward, D. (2004) The Program Manager's Dilemma The Program Manager's Dilemma (Defense AT&L, Defense Acquisition University Press).
- ^ As well as being an explanation for the lack of petty-theft, magical thinking has been used to explain such things as voluntary voting (where a non-voter is considered a free rider). Potentially, it might be used to explain Wikipedia
contributions: Text may be added under the assumption that if
contributions are not made, then similar people will also fail to
contribute (i.e. arguing from effect to cause). Alternatively, the
explanation could depend on expected future actions (and not require a
magical connection). Modelling future interactions requires the
addition of the temporal dimension, as given in the Iterated prisoner’s dilemma section.
- ^ Hofstadter, Douglas R. (1985). Metamagical Themas: questing for the essence of mind and pattern. Bantam Dell Pub Group. ISBN 0-465-04566-9. - see Ch.29 The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation.
References
- Robert Aumann,
“Acceptable points in general cooperative n-person games”, in R. D.
Luce and A. W. Tucker (eds.), Contributions to the Theory 23 of Games
IV, Annals of Mathematics Study 40, 287–324, Princeton University
Press, Princeton NJ.
- Axelrod, R. (1984). The Evolution of Cooperation. ISBN 0-465-02121-2
- Kenneth Binmore, Fun and Games.
- David M. Chess (1988). Simulating the evolution of behavior: the
iterated prisoners' dilemma problem. Complex Systems, 2:663–670.
- Dresher, M. (1961). The Mathematics of Games of Strategy: Theory and Applications Prentice-Hall, Englewood Cliffs, NJ.
- Flood, M.M. (1952). Some experimental games. Research memorandum RM-789. RAND Corporation, Santa Monica, CA.
- Kaminski, Marek M. (2004) Games Prisoners Play Princeton University Press. ISBN 0-691-11721-7 http://webfiles.uci.edu/mkaminsk/www/book.html
- Poundstone, W. (1992) Prisoner's Dilemma Doubleday, NY NY.
- Greif, A. (2006). Institutions and the Path to the Modern Economy: Lessons from Medieval Trade. Cambridge University Press, Cambridge, UK.
- Rapoport, Anatol and Albert M. Chammah (1965). Prisoner's Dilemma. University of Michigan Press.
- Le, S. & Boyd, R. (In press). Evolutionary dynamics of the continuous iterated Prisoner's Dilemma, Journal of Theoretical Biology Full text
- A. Rogers, R. K. Dash, S. D. Ramchurn, P. Vytelingum and N. R.
Jennings (2007) “Coordinating team players within a noisy iterated
Prisoner’s Dilemma tournament” Theoretical Computer Science 377 (1-3)
243-259. [2]
Further reading
- Plous, S. (1993). Prisoner's Dilemma or Perceptual Dilemma? Journal of Peace Research, Vol. 30, No. 2, 163-179.
External links
Tit for Tat Winning Strategy
Tit for tat is a highly effective strategy in game theory for the iterated prisoner's dilemma. It was first introduced by Anatol Rapoport in Robert Axelrod's two tournaments, held around 1980. Based on the English saying meaning "equivalent retaliation" ("tit for tat"), an agent
using this strategy will initially cooperate, then respond in kind to
an opponent's previous action. If the opponent previously was
cooperative, the agent is cooperative. If not, the agent is not. This
is similar to reciprocal altruism in biology.
Overview
This strategy is dependent on four conditions that has allowed it to
become the most prevalent strategy for the prisoner's dilemma:
- Unless provoked, the agent will always cooperate
- If provoked, the agent will retaliate
- The agent is quick to forgive
- The agent must have a good chance of competing against the opponent more than once.
In the last condition, the definition of "good chance" depends on the payoff matrix
of the prisoner's dilemma. The important thing is that the competition
continues long enough for repeated punishment and forgiveness to
generate a long-term payoff higher than the possible loss from
cooperating initially.
A fifth condition applies to make the competition meaningful: if an
agent knows that the next play will be the last, it should naturally
defect for a higher score. Similarly if it knows that the next two
plays will be the last, it should defect twice, and so on. Therefore
the number of competitions must not be known in advance to the agents.
Against a variety of alternative strategies, tit for tat was the
most effective, winning in several annual automated tournaments against
(generally far more complex) strategies created by teams of computer
scientists, economists, and psychologists. Game theorists informally
believed the strategy to be optimal (although no proof was presented).
It is important to know that tit for tat still is the most effective
strategy if the average performance of each competing team is compared.
The team which recently won over a pure tit for tat team only
outperformed it with some of their algorithms because they submitted
multiple algorithms which would recognize each other and assume a
master and slave relationship (one algorithm would "sacrifice" itself
and obtain a very poor result in order for the other algorithm to be
able to outperform Tit for Tat on an individual basis, but not as a
pair or group). Still, this "group" victory illustrates an important
limitation of the Prisoner's Dilemma in representing social reality,
namely, that it does not include any natural equivalent for friendship
or alliances. The advantage of "tit for tat" thus pertains only to a Hobbesian world of rational solutions, not to a world in which humans are inherently social.
Example of play
|
Cooperate |
Defect |
| Cooperate |
3, 3 |
0, 5 |
| Defect |
5, 0 |
1, 1 |
| Prisoner's dilemma example |
Assume there are 4 agents: 2 are Tit for Tat players ("variables")
and 2 are "Defectors", simply trying to maximize their own winnings by
always giving evidence against the other. Assume that each player faces
the other 3 in a match lasting 6 games. If one player gives evidence
against a player who does not, the former gains 5 points and the latter
nets 0. If both refrain from giving evidence, both gain 3 points. If
both give evidence against each other, both gain 1 point.
When a variable faces off against a defector, the former refrains
from giving evidence in the first game while the defector does the
opposite, gaining the control 5 points. In the remaining 5 games, both
players give evidence against each other, netting 1 point each game.
The final score is: Defector - 10 | Variable - 5.
When the variables face off against each other, each refrains from
giving evidence in all 6 games. 6 * 3 = 18 points, the final score
being Variable(1) - 18 | Variable(2) - 18.
When the defectors face off, each gives evidence against the other
in all 6 games. 6 * 1 = 6 points, the final score being Defector(1) - 6
| Defector(2) - 6.
The final score for each variable is 5 (game against defector(1)) +
5 (game against defector(2)) + 18 (game against variable) = 28 points.
The final score for each defector is 10 (against variable(1)) + 10
(against variable(2)) + 6 (against defector) = 26 points.
Despite the fact that the variables never won a match and the
defectors never lost a match, the variables still came out ahead,
because the final score is not determined by the winner of matches, but
the scorer of points. Simply put, the variables gained more points
tying with each other than they lost to the defectors. The more
variables that there are in the game, the more advantage it is to be a
variable.
(This example was taken from Piers Anthony's novel, Golem in the Gears.)
Implications
The success of the strategy, which is largely cooperative, took many
by surprise. In successive competitions various teams produced complex
strategies which attempted to "cheat" in a variety of cunning ways, but
Tit for Tat eventually prevailed in every competition.
Some theorists believe this result may give insight into how groups
of animals (and particularly human societies) have come to live in
largely (or entirely) cooperative societies, rather than the
individualistic "red in tooth and claw" way that might be expected from
individual engaged in a Hobbesian state of nature. This, and particularly its application to human society and politics, is the subject of Robert Axelrod's book The Evolution of Cooperation.
Problems
While it has been empirically shown (by Axelrod) that the strategy
is optimal in some cases, two agents playing tit for tat remain
vulnerable. A one-time, single-bit error in either player's
interpretation of events can lead to an unending "death spiral". In
this symmetric situation, each side perceives itself as preferring to
cooperate, if only the other side would. But each is forced by the
strategy into repeatedly punishing an opponent who continues to attack
despite being punished in every game cycle. Both sides come to think of
themselves as innocent and acting in self-defense, and their opponent
as either evil or too stupid to learn to cooperate.
This situation frequently arises in real world conflicts, ranging
from schoolboy fights to civil and regional wars. Tit for two tats
could be used to avoid this problem.
"Tit for Tat with forgiveness" is sometimes superior. When the
opponent defects, on the next move, the player sometimes cooperates
anyway, with a small probability (around 1%-5%). This allows for
occasional recovery from getting trapped in a cycle of defections. The
exact probability depends on the line-up of opponents.
Tit for two tats
Tit for Two Tats is similar to Tit for Tat in that it is
nice, retaliating, forgiving and non-envious, the only difference
between the two being how nice the strategy is.
In a tit for tat strategy once an opponent defects, the tit for tat
player immediately responds by defecting on the next move. This has the
unfortunate consequence of causing two retaliatory strategies to
continuously defect against one another resulting in a poor outcome for
both players. A tit for two tats player will let the first defection go
unchallenged as a means to avoid the "death spiral" of the previous
example. If the opponent defects twice in a row, the tit for two tats
player will respond by defecting.
This strategy was put forward by Robert Axelrod during his second round of computer simulations at RAND.
After analyzing the results of the first experiment he determined that
had a participant entered the tit for two tats strategy it would have
emerged with a higher cumulative score than any other program. As a
result he himself entered it with high expectations in the second
tournament. Unfortunately due to the more aggressive nature of the
programs entered in the second round, tit for two tats did
significantly worse than tit for tat due to aggressive strategies being
able to take advantage of its highly forgiving nature.
Popular culture
The tit for tat strategy was employed in an episode of Numb3rs, where FBI
agents were interrogating and attempting to obtain information from an
inmate on death row. The strategy was working, but the FBI would not
implement a "tit for two tats".
Live and Let Live
The tit for tat strategy has been detected by analysts in the spontaneous non-violent behaviour, called "live and let live" that arose during the First World War. The Christmas truce of 1914 appears to be an example.
See also
External links
References
- The Evolution of Cooperation, Robert Axelrod, Basic Books, ISBN 0-465-02121-2
- The Selfish Gene, Richard Dawkins (1990), second edition -- includes two chapters about the evolution of cooperation, ISBN 0-19-286092-5
- The Origins of Virtue, Matt Ridley, Penguin Books Ltd, ISBN 0-14-024404-2
- How Are We to Live?, Peter Singer, Prometheus Books, ISBN 0-87975-966-6
This article is licensed under the GNU Free Documentation License. It uses material from Wikipedia Encyclopedia article "Prisoner's Dilemma"
|