BedRock

May 18, 202111 min read

Scientific Exploration of Strategy Selection - Reflections on Investment Thinking

Updated: Apr 8

Mathematical Considerations on Social Cooperation

I had the opportunity to read a rather obscure yet groundbreaking book published in 1984 called "The Evolution of Cooperation." It explores many controversial topics, such as the inherent nature of humanity—is it inherently good or evil? In human society and international politics, what strategies should be adopted to navigate our interactions with others? Should we follow parental advice, national guidance, or religious doctrines to determine how we should engage with the world? These questions are incredibly complex, profound, and seemingly impossible to answer definitively. However, this book manages to address them in a more scientific, engineering-oriented, and logical manner, providing profound insights.

One of the key questions tackled in the book is how to foster cooperative relationships in an environment without a central governing authority. Typically, helping others involves some level of sacrifice, and in the absence of a central authority, there is no guarantee that others will reciprocate the favor.

Betrayal is often chosen in the Prisoner's Dilemma.

In a standard Prisoner's Dilemma, the best outcome is clearly mutual cooperation. However, due to the inability to guarantee the other party's choice, both parties end up making the individually best choice from a perspective of maximizing self-interest, which surprisingly leads to the worst overall outcome for both. In fact, as long as the game is played for a finite number of rounds, both parties will consistently choose betrayal.

Why does cooperation arise then?

When the conditions slightly change, it is also a very common situation where the participants' interaction may involve an unknown number of rounds or uncertainties about the future. In other words, if the decisions made today by the decision-makers not only affect the immediate outcomes but also influence the decisions to be made in the future, the consideration of future decisions can in turn affect today's choices.

However, the importance of the future is relatively weaker compared to the present. On the one hand, people's expected returns for the future tend to decrease over time. On the other hand, there is still a chance that participants may not meet again in the future. Therefore, the present value of future decisions gradually diminishes over time, which can be represented by a discount rate parameter, denoted as 'w'.

Interesting Strategy Tournament

Robert Axelrod conducted a simulated strategy tournament where various participants were tasked with devising game strategies, and the effectiveness of these strategies was evaluated. Surprisingly, a very simple strategy called "Tit for Tat" emerged as the winner. The Tit for Tat strategy begins with a cooperative choice and then mirrors the opponent's previous move in subsequent rounds, essentially reciprocating their actions.

What is intriguing is that among the various strategies tested, those that started off with a friendly and cooperative approach in the first round significantly outperformed those that started with betrayal. Can we say that humans are inherently good-natured? It can be reasonably inferred that throughout the course of evolution, adopting strategies that increase the chances of survival has a higher likelihood of becoming the ultimate genetic trait or a default part of our character.

But is it enough to simply "play nice"?

There is another fascinating strategy that seems to be more aligned with human nature, based on the principle of "maximizing expected outcomes." It's called DOWNING. This strategy makes different choices based on actively predicting the strategies of other players. For example, if other players' strategies are perceived as weak or lack countermeasures, the DOWNING strategy will opportunistically choose betrayal to gain an advantage. On the other hand, if other players have countermeasures in place, the DOWNING strategy will choose cooperation. Initially, since the DOWNING strategy doesn't know if other strategies have countermeasures, it will inevitably choose the common-sense option of betrayal (as mentioned earlier) to test the waters. This process can lead to severe punishment from strategies with countermeasures, resulting in a relatively low score for the DOWNING strategy.

However, the existence of the DOWNING strategy also means that purely naive "play nice" strategies can be heavily exploited and suffer the consequences. This explains why excessively naive individuals who solely adopt a "play nice" approach often suffer greatly when they encounter malicious individuals.

In modern society, the presence of central governments or social oversight institutions helps to scrutinize and penalize wrongdoers and fraudsters, enabling genuinely naive and good-natured individuals to fare well. Since they are inclined to cooperate rather than betray, their contribution to the overall societal values is significant. However, in the absence of central control or weak social monitoring, such as in areas without government or in international communities lacking enforceable agreements, relying solely on "play nice" can easily be exploited by those employing the DOWNING strategy.

Revenge and Forgiveness

One of the lowest-scoring strategies in the entire tournament was called FRIEDMAN, which employed an uncompromising "never forgive, always retaliate" strategy. In other words, once deceived or betrayed, it held a grudge indefinitely and sought constant retaliation. Surprisingly, this seemingly foolproof strategy of not being taken advantage of turned out to have the lowest score.

It's interesting because strategies that focus on avoiding being deceived or betrayed and appear to have an advantage actually give up the benefits of cooperation and result in the worst long-term outcome over the lifespan of the game.

In comparison, the Tit for Tat strategy operates on a short memory pattern. If the opponent betrayed in the previous round, it retaliates; however, if the opponent chose cooperation, it forgets the previous unpleasantness and chooses to cooperate. This combination of effective retaliation and the ability to quickly resume positive cooperation makes it a more forgiving approach than the forever retaliation strategy.

Of course, in addition to the simple Tit for Tat strategy, there are more cunning strategies or ways of conducting oneself, such as one called JOSS. It generally behaves well but has a 10% chance of betraying when the opponent chooses to cooperate, aiming to gain a small advantage occasionally. The problem with this strategy is that while JOSS can indeed take advantage in many instances, when it encounters the Tit for Tat strategy, its proactive choice to betray leads to retaliation, potentially creating a cycle of retaliation from which it cannot escape.

However, these retaliatory strategies, whether it's JOSS or Tit for Tat, tend to get trapped in a vicious cycle of retaliation once they enter it due to various reasons, including misunderstandings or errors in feedback loops.

In fact, some more forgiving strategies have emerged to avoid falling into a cycle of retaliation. For example, "Tit for Two Tats" only retaliates when the opponent betrays twice in a row. This may seem contrary to our usual expectations, as this strategy appears to incur short-term losses (allowing the opponent to deceive twice), but more forgiving strategies actually benefit in the long run. Another strategy slightly different from DOWNING is more optimistic towards others, and instead of actively choosing betrayal, it responds more positively and cooperatively. In the initial round of the tournament, these more forgiving strategies outperformed the Tit for Tat strategy by avoiding the "echo" effect and escaping the cycle of retaliation.

These strategy outcomes seem to confirm the saying, "Greed for small advantages leads to significant losses."

The Second Round of the Strategy Tournament

It is evident that the success or degree of success of a strategy, apart from its own characteristics, depends to a large extent on the environment in which it operates and the choices of other strategies.

During the long process of biological and social evolution, it is clear that different organisms, populations, and strategists continually evolve to adapt better. The worst-performing strategies are eliminated, resulting in a constantly changing overall strategic environment.

In the second round of our strategy tournament, all participants could see the results of the first round. The strategies that were deemed effective or ineffective and should be eliminated were openly recognized and understood. In theory, the second round of strategies should be more complex and more effective (or have higher total rewards). For example, some conclusions drawn from the first round, such as being kind to others, not initiating betrayal, and showing more forgiveness, all led to higher scores in overall interactions. Does this also mean that in the process of continuous evolution, society as a whole tends to develop towards greater overall rewards?

However, surprisingly, in the second round of the strategy tournament, no strategy emerged as superior to the simple Tit for Tat strategy. This includes the more forgiving strategies that performed well in the first round. The main reason these strategies underperformed in the second round was that they were easily targeted by other strategies. For example, if a strategy intentionally betrays once and then immediately switches to cooperation, the Tit for Two Tats strategy can be taken advantage of. Another strategy called TESTER specifically exploits weaker strategies but apologizes and returns to cooperation when facing a stronger opponent. Similarly, another strategy called TRANQUILIZER tests the opponent's forgiveness towards occasional betrayal and tries to take advantage of it. Although these strategies themselves may not rank high, they make the "good" strategies suffer.

If there is no way to identify and eliminate these cunning, advantage-seeking individuals, the "nice" individuals in a society will be taken advantage of. However, these advantage-seeking strategies themselves are not particularly successful because when they encounter less friendly strategies, they lose more points than they gain, and they score less than the conservative cooperative strategies.

In the end, despite the strategists wracking their brains and having the lessons learned from the first round of success, it was still the Tit for Tat strategy that emerged as the winner. Its concept is so simple: be kind to others first, do not initiate betrayal, but if betrayed, respond firmly, while still maintaining forgiveness and letting go of past grievances as soon as the other party shows remorse.

The Evolution of Strategies and "Survival of the Fittest"

As the tournament progresses round by round, strategies that are clearly ineffective will be eliminated by the participants, while successful strategies will be chosen more and more frequently, ultimately converging on a few highly successful strategies.

This dynamic process of strategy selection is very similar to the natural selection process in biology. Even in organisms, a simple ability to remember certain features of encountered individuals, such as whether they are beneficial or harmful, is enough. In fact, scientists have demonstrated that even single-celled organisms like bacteria possess feedback mechanisms and memory systems to respond to their environment.

Therefore, the process of learning, imitation, and selection gradually focuses on successful strategies as participants engage in multiple rounds of the game over time. Unsuccessful strategies are eliminated. This process is similar in tournaments, biological development, and social evolution. It is the process of "survival of the fittest."

The ultimate demise of advantage-seeking strategies

In the evolutionary process of multi-round strategy tournaments, the development of some particularly interesting strategies is observed. For example, Strategy 8 in the figure, called HARRINGTON, is based on advantage-seeking thinking. This strategy is initially very successful because there are many strategies that are not very successful or overly friendly, allowing HARRINGTON to take advantage. However, as multiple interactions or generations occur, these strategies gradually disappear, and HARRINGTON's success rate declines significantly when faced with more successful strategies and the same self-strategy, ultimately leading to its demise.

This is similar to being in a community of acquaintances, where initially taking advantage of others can yield benefits. However, as the awareness of the community increases (equivalent to the disappearance of friendly strategies), the effectiveness of such advantage-seeking strategies diminishes over a longer time horizon.

Why is Tit for Tat strategy robust?

After many rounds of testing, it is observed that the Tit for Tat strategy performs consistently well in different environments. One important reason for this is how other strategies treat users of the Tit for Tat strategy. Tit for Tat has some distinguishing characteristics:

It is a common strategy.
It is easily recognizable.
Once recognized, the Tit for Tat strategy cannot be betrayed (it will inevitably retaliate), and this trait is respected.

Therefore, the Tit for Tat strategy benefits from its transparency. On the other hand, the Tit for Tat strategy also gives up the opportunity to take advantage of other strategies. Although occasional gains may be achieved through taking advantage, in a broader range of strategic opponents and over a longer time horizon, it is not worth it.

In summary, the success of the Tit for Tat strategy comes from its friendliness (not initiating betrayal), the inevitability of retaliation when betrayed, forgiveness (not holding grudges when opponents regret and return to cooperation), and transparency.

The variation and evolution of strategy populations

We can imagine a scenario where a strategy population, which is most successful in a certain environment (likely converging to a single strategy), encounters a mutated strategy. The question is whether this mutated strategy can achieve higher scores and lead to changes in the overall population. Or in other words, whether the strategy population's resilience allows it to resist the invasion of external strategies or the impact of its own mutated strategies. This logic is similar to the process of biological evolution.

For example, let's consider an all-friendly strategy (always choosing cooperation). When invaded by advantage-seeking or betraying strategies, it lacks resilience and is easily taken advantage of. This is similar to placing a wolf among a group of sheep.

Only strategy populations with resilience can persist over a longer time horizon.

The resilience of the Tit for Tat strategy can be easily mathematically proven because it only responds to the opponent's previous move, and as long as the weight assigned to the future (denoted as "w") is sufficiently large, opponents have to consider its current move.

When w is sufficiently small (e.g., less than 1/2), indicating that the future is less important, and both players' choices do not need to consider the future, then all choices will be self-serving and betraying. For example, in situations where there is no expectation of future encounters, such as in cases where parties are unlikely to meet again or during an apocalypse, it is not necessary to plan for the future. In such cases, living in the present moment is the most important.

However, when a participant is perceived as not existing in the long term or has lost the ability to retaliate (meaning that in addition to representing the intention of the strategy, the ability and how others perceive it are equally important), the Tit for Tat strategy may become unstable. In history, Pompey's allies abandoned their alliance because they saw Pompey's dim prospects, turning their cooperative relationship into a hostile one. Similarly, when a manufacturing company starts to go bankrupt, its customers, suppliers, and banks will shift from cooperation to hostility (as there is no need to honor contracts), with customers refusing to pay, suppliers refusing to deliver, and banks refusing to provide loans.

The foundation of cooperation

Based on the previous discussion, we can understand that sustainable long-term cooperation requires two key foundations:

It must be based on a sufficiently large weight (w) that gives significant influence to future considerations. The mutual relationship must have a long-term foundation.
Cooperation must be based on equality. Whether it is cooperation or retaliation, both sides must be on equal footing. Cooperation does not require the participants to be rational, understand why and how they cooperate, or establish informational connections. Their actions themselves serve as a more effective language. Trust is not necessary as a foundation for cooperation; as long as the measures are equal, betrayal is ineffective. Cooperation does not require a central authority; it can arise spontaneously.

The conditions for cooperation are simply that (1) participants can identify others and their past behaviors, which is achievable even for single-celled organisms like bacteria, and (2) the future relationship between them must be long enough so that betrayal in the present is not beneficial.

The disadvantages of the Tit for Tat strategy

After a long process of strategy evolution and continuous challenges, the Tit for Tat strategy, or equivalently, an equal strategy, has demonstrated its superiority and resilience. However, it is not without weaknesses:

Due to its reactive mechanism, when encountering another strategy with the same reactive mechanism, there may be a tit-for-tat cycle, leading to an endless cycle of retaliation (in such cases, the intervention of a central authority to break the cycle is a consideration, or when the proportion of "good" individuals in the population is sufficiently high, it can mitigate this phenomenon to some extent).
On the other hand, for some strategies such as completely random wandering or strategies with no feedback mechanism (although such strategies tend to be unsuccessful and marginalized in the long process of evolution and strategy development, encountering them is still a low-probability event), the Tit for Tat strategy may appear overly generous by cooperating when it shouldn't. However, overall, in extensive strategy interactions and over a long time horizon, the performance of the Tit for Tat strategy is already quite good. After all, in our social interactions, we cannot expect to always take advantage of others in every situation. It is often a case of gaining small advantages but suffering greater losses.

In conclusion, let's summarize the foundation of the Tit for Tat strategy's success to help us make better decisions in the future: friendliness (not betraying first), immediate retaliation when faced with betrayal, forgiveness (not holding grudges when the opponent shows remorse and returns to cooperation), and transparency.

BR Partners

Scientific Exploration of Strategy Selection - Reflections on Investment Thinking

Recent Posts

Comments