Documentos de Académico
Documentos de Profesional
Documentos de Cultura
Adversarial Search
Acknowledgement:
Tom Lenaerts, SWITCH, Vlaams Interuniversitair Instituut voor Biotechnologie
Outline
2
What are and why study games?
Search – no adversary
– Solution is (heuristic) method for finding goal
– Heuristics and CSP techniques can find optimal solution
– Evaluation function: estimate of cost from start to goal
through given node
– Examples: path planning, scheduling activities
Games – adversary
– Solution is strategy (strategy specifies move for every
possible opponent reply).
– Time limits force an approximate solution
– Evaluation function: evaluate “goodness” of
game position
– Examples: chess, checkers, Othello, backgammon
4
Types of Games
Game setup
6
Partial Game Tree for Tic-Tac-Toe
Optimal strategies
MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminal
maxs ∈ successors(n) MINIMAX-VALUE(s) If n is a max node
mins ∈ successors(n) MINIMAX-VALUE(s) If n is a min node
8
Two-Ply Game Tree
10
Two-Ply Game Tree
11
12
What if MIN does not play optimally?
13
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
v←MAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for a,s in SUCCESSORS(state) do
v ← MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for a,s in SUCCESSORS(state) do
v ← MIN(v,MAX-VALUE(s))
return v
14
Properties of Minimax
Criterion Minimax
Time O(bm)
Space O(bm) ☺
15
Multiplayer games
16
Problem of minimax search
Revisit example …
17
Alpha-Beta Example
[-∞,+∞]
[-∞, +∞]
18
Alpha-Beta Example (continued)
[-∞,+∞]
[-∞,3] 3
19
[-∞,+∞]
[-∞,3]
20
Alpha-Beta Example (continued)
[3,+∞]
[3,3]
21
[3,+∞]
This node is worse
for MAX
[3,3] [-∞,2]
22
Alpha-Beta Example (continued)
[3,14] ,
23
[3,5] ,
24
Alpha-Beta Example (continued)
[3,3]
25
[3,3]
26
Alpha-Beta Algorithm
27
Alpha-Beta Algorithm
28
General alpha-beta pruning
Consider a node n
somewhere in the tree
If player has a better
choice m at
– Parent node of n
– Or any choice point further
up
n will never be reached
in actual play.
Hence when enough is
known about n, it can be
pruned.
29
30
Games of imperfect information
31
32
Heuristic EVAL
Idea: produce an estimate of the expected utility of
the game from a given position.
33
35
Heuristic difficulties
36
Horizon effect
Fixed depth search
thinks it can avoid
the queening move
37
Discussion
Combining all these techniques results in a program
that can play reasonable chess (or other game).
– We can generate and evaluate around a million nodes per
second on the latest PC, allowing us to search roughly 200
million nodes per move under standard time controls ( 3
minutes / move)
– On average, branching factor for chess is 35.
– If we use minimax search we could look ahead only about
five plies (355 = 50 million)
– If we use alpha-beta search we could look ahead only about
ten plies
– To reach grandmaster status we need an extensively tuned
evaluation function
– A supercomputer would perform better.
38
Games that include chance
Backgammon: luck + skill
39
chance nodes
Doubles (e.g. [1,1], [6,6]) chance 1/36, all other chance 1/18
40
Expected minimax value
EXPECTIMINIMAX(n)=
UTILITY(n) If n is a terminal
maxs ∈ successors(n) MINIMAX-VALUE(s) If n is a max node
mins ∈ successors(n) MINIMAX-VALUE(s) If n is a max node
∑s ∈ successors(n) P(s) . EXPECTIMINIMAX(s) If n is a chance node
41
Left, A1 wins
Right A2 wins
Outcome of evaluation function may not change when values are
scaled differently.
Avoiding this sensitivity: the EVAL must be a positive linear
transformation of the probability of winning from a position.
42
Discussion
43
Summary
44