{:policies [{:key "pavlov", :name "Pavlov (Win-Stay, Lose-Shift)", :description "Repeat your last action if the payoff was satisfactory, otherwise switch. Also known as Win-Stay, Lose-Shift (Nowak & Sigmund, 1993). More adaptive than Tit for Tat — can correct mistakes and exploit unconditional cooperators.", :category "reciprocal", :config_keys ["threshold"]} {:key "grim-trigger", :name "Grim Trigger", :description "Cooperate until the opponent defects once, then punish forever. The harshest reciprocal strategy — a single deviation triggers permanent retaliation. Powerful as a deterrent threat in the folk theorem, but unforgiving of noise or mistakes.", :category "reciprocal", :config_keys ["cooperative-action" "cooperative-response" "punishment-action"]} {:key "generous-tit-for-tat", :name "Generous Tit for Tat", :description "Like Tit for Tat, but occasionally forgives defection. With probability `forgiveness` (default 0.1), cooperates even when the opponent defected last round. Breaks cycles of mutual retaliation that can trap standard TfT.", :category "reciprocal", :config_keys ["default-action" "forgiveness"]} {:key "nash-equilibrium", :name "Nash Equilibrium", :description "Play according to a computed Nash equilibrium mixed strategy. Guarantees the minimax expected payoff regardless of what the opponent does. The equilibrium strategy is provided via the config (fetched from solver results by the orchestration layer).", :category "equilibrium", :config_keys ["equilibrium"]} {:key "random", :name "Random", :description "Play each action with equal probability. The simplest baseline — impossible to exploit but also impossible to gain an advantage with.", :category "simple", :config_keys []} {:key "tit-for-tat", :name "Tit for Tat", :description "Start cooperatively, then copy the opponent's previous action each round. The simplest and most successful reciprocal strategy — it is nice (cooperates first), retaliatory (punishes defection), forgiving (returns to cooperation), and clear (opponent can predict your behavior).", :category "reciprocal", :config_keys ["default-action"]} {:key "best-response", :name "Best Response", :description "Play the best pure response to the opponent's observed action frequencies. Adapts over time as more data accumulates. Exploits predictable opponents but can be exploited in turn by adaptive opponents.", :category "exploitative", :config_keys []} {:key "fictitious-play", :name "Fictitious Play", :description "Best response to opponent's action frequencies, weighted by recency. Uses exponential decay (default λ=0.9) so recent actions matter more. Adapts faster than flat best-response to opponents who change strategy.", :category "exploitative", :config_keys ["decay"]} {:key "always", :name "Always", :description "Unconditionally play a fixed action every round. Useful as a baseline or when you have domain-specific knowledge that one action dominates.", :category "simple", :config_keys ["action"]}]}