291 Minimax

Make it
Theory workshop
Behind the minimax algorithm

Ever fancied programming two-player zero-sum games? Heres everything you need to know
neofthemostinterestingavenues ofcomputerscienceisthatof programmingacomputertoplay agameagainstahumanopponent. Examplesabound,withthemostfamousthat ofprogrammingacomputertoplaychess.But nomatterwhatthegameis,theprogramming tendstofollowanalgorithmcalledminimax, withvariousattendantsub-algorithmsintow. First,adei nition:atwo-playerzero-sum gameisoneplayedbetweentwoplayerswhere theplayersplayalternately,thewholegameis visibletobothandtheresawinnerandaloser (ortheresadraw).Itszero-sumbecauseifthe gameisplayedformoney,theloserpaysthe winnerandoveralltheresnolossofmoney. (Abitlikeenergyinareaction:nomoneyis createdordestroyed.) Oneofthesimplesttwo-playerzerosumgamesisnoughtsandcrosses,where theplayersalternatelyplaceXsandOsin
a3x3grid,withthewinnerbeingthei rst playertoplacethreeoftheirsymbolinarow, columnordiagonalline.Likeme,youprobably playedthisasachildand,asyouplayedit,you learnedhowtoforceawinordraweverytime. Infact,oncebothplayersgetthatinsight,every gameisguaranteedtoresultinadraw.The onlywaytowinistoplayanoviceplayer.
The algorithm
Analysingnoughtsandcrosseswiththe minimaxalgorithmisprettystandardin gametheory,soIlldiscussadifferentgame calledNimtoillustrateminimaxandits variants.Nimisinterestingbecauseitseasily understood,fairlyunfamiliarandsimply modelled.Plus,therearenodrawsinNim,so thewholewinner/loserthingismuchsimpler: someonealwayswins.Butwho? InNim,theplayersfacethreepilesofstones with,say,i vestonesineachpile.Eachplayer 3 291 February 2010 85
PCP291.theory 85
11/12/09 4:08:56 pm
Make it Theory workshop
Player 1
5, 5, 5
Max
Player 2
0, 5, 5
Minnie
X O
X O
X O
O X
O X
0, 0, 5
Max
X O
Player 1
0, 0, 0
Minnie
1 Figure 1: The rst few levels in the noughts and crosses game tree.
takesitinturntoplaybyremovingfroma singlepileanythingfromonestonetothe entirepile.Theloseristheonewhoisforced toremovetheinalstonefromtheinalpile, leavingallthreepilesempty.(Anotherwayof lookingatitisthatthewinneristheirstplayer tobefacedwiththreeemptypiles.) Forexample,supposeourtwoplayersare namedMaxandMinnie.Maxstarts(healways does,notbeingagentleman)anddecidesto removeallthestonesfrompileone.Minnie thenremovesallbuttwostonesfrompiletwo. Maxthinksforawhile,thenremovesallbut twostonesfrompilethree.Minnieresigns, becausenomatterwhatshedoes,Maxwill win.(Ifsheremovesonestonefromapile, Maxremovesbothstonesfromtheother,and shesleftwiththeinalstone.Ifsheremoves bothstonesfromapile,Maxremovesonestone fromtheother,leavingherwiththeinalstone.)
Traversing nodes
GamessuchasNimaremodelledasgame trees.Youstartoffwiththeinitialstateofthe gameasanode,therootofthetree.Fromthis node,eachpossiblemoveismodelledasalink toanothernode,whichstandsinforanother stateorpositionofthegame. So,forexample,innoughtsandcrosses, therootnodeistheemptygrid.Traditionally
Xstartsandtherearethreepossiblemoves: thecentre,acornerandthemiddlecellalong anedge(allthecellsareequivalenttooneof thosethree).So,theinitialrootnodehasthree linkstoothergamestates.Eachofthosenew nodeshasdifferentpossiblemovesforO,as showninFigure1.Youcanimaginegoing furtheranddrawingmorelevels. Nimstreeismorecomplex.Theinitial statehas15possiblelinks,correspondingto removingone,two,three,fourorivestones fromeachofthethreepiles.Eachofthese15 possiblestatesofthegamethenhasupto14 possiblelinkstootherstatesforthesecond player,andsoon.Youcanimaginethatthe numberofgamestates(thatis,nodesinthe gametree)explodesprettyquickly. Ifyouhappenedtohaveabigenoughpiece ofpaper,itwouldbepossibletomapoutthe entiregametreefortheversionofNimthatI described.Fortheleafnodesofthetree(that is,thenodeswithnolinkscomingoutofthem), youwouldbeabletoidentifytheloserofthe gameforthepathtakenthroughthetree toeachparticularleaf.Figure2showsa particularlydaftpaththroughthetreewhere theplayerstakeallthestonesfromeachpile inturn(notexactlyaninsightfulgame,but neverthelessapossibleoneundertherules).
1 Figure 2: An allowable but idiotic game play for Nim, resulting in Max losing.
TheloserisMax,becausehetakesallthe stonesfrompilethreeinthethirdmove. Wecanassignavaluetoeachleafnode toindicatewhowins(orloses).Tomakesure wedontgetcompletelyconfused,weassigna monetaryvaluefromtheviewpointoftheirst player,Max.Letssaythewinnerofthepath totheleafreceives1andtheloserhastopay outthatamountsoifthewinnerisMax,the valueofthenodeis1,whileifthewinneris Minnie,thevalueis-1(sinceMaxhastopay thatamounttoher).
Player one
Letsimaginethatwesetuptheentiregame treefromtheviewpointofMax,theplayer whomakeshismoveirst.Eachgameposition correspondstoanodeinthetree,andifyou thinkaboutit,awholelevelofthetreewill correspondtoagivenplayer.So,therootof thetreeiswhatMaxisfacedwithatthevery startofthegame:ivestonesineachofthe threepiles,and15possiblegamepositions toleaveforMinnie.WhatdoesMaxchoose toplayinthissituation? Whatheshoulddoisanalyseallpossible movesfromthebottomupandassignavalue toeachnodeasheworkshiswayupthetree,
Claude Shannon
In 1950, Claude Shannon published a paper called Programming a Computer for Playing Chess, which was the rst such paper to consider this particular game tree. In it, he reached an upper bound for the number of nodes in a game tree for chess to be about 10120, which meant, as he put it, that a machine operating at the rate of one [node] per micro-second would require over 1090 years to calculate the rst move. Shannons paper was remarkable because it contained the insight of an evaluation function for the strength of a position, and using it to be able to calculate node values to several levels deep. n
Spotlight on Alpha-beta pruning

First proposed by John McCarthy at a conference in 1956 (although only named as such later on), alpha-beta pruning is a method for cutting o whole branches of the game tree so that they dont have to be evaluated with minimax. In essence, the algorithm maintains two extra values during the minimax recursion: alpha and beta. Alpha is the minimum value for Max (biggest loss for him) and beta is the maximum value for Minnie (biggest win for Max). They start out as negative innity for alpha and positive innity for beta. As the minimax recursion proceeds, the value for alpha is replaced when a new minimax value that is larger is found (ditto for beta, when a smaller value is calculated). If they cross at any time, the branch of the tree currently being investigated is no good for either player and can be further ignored, or pruned. It can be shown that this algorithm doesnt mistakenly prune branches that will benet either player and so its widely used in minimax implementations. n
86
291 February 2010
PCP291.theory 86
11/12/09 4:08:56 pm
Theory workshop Make it
Expectiminimax
Games such as Nim and chess have outcomes that are solely dependent on the skill of the players, or, presumably, their access to a wellwritten minimax analyser. Games such as backgammon are dierent, because their outcomes also depend on a randomisation factor such as the roll of a dice. The minimax algorithm has been expanded to suit such games (leading to the expectiminimax algorithm) by including what are known as chance nodes that incorporate the expected value of the randomisation agent (for backgammon, this would be the dice). n
5 4 3 2 1 0
L L L W W W L W
Max
3 1
L L L
L L L
2 0
W
Minnie
W Max
2 0
L
2 1 0
W W
1
L
1 0
L L
1
W
1 0
W W
Minnie Max Minnie
accordingtotheamounthecouldwinonthat nodeifheplayedoptimally. Letstakealookatamade-upexample, showninFigure3.Here,therootnodeshows agamepositionfromwhichMaxmustplay. Therearetwopossibilities:playingthe left-handoptiongoestoagamepositionthat hesalreadyworkedoutmeanshewins1; playingtheright-handoptiongoestoagame positionwhereheloses1.(Remember,all payoutsarefromMaxsviewpoint.)Idont knowaboutyou,butIdchoosetheirstplay. Thismeansthatthecurrentgameposition alsohasavalueof1.Foreverygameposition whereitshisturntoplay,Maxwouldchoose theoptionthatwouldmaximisehiswinnings. Minnie,whoisjustasperceptiveasMax, would,ofcourse,chooseplaysthatwould resultinthebestresultforherandignoreall theothers.Soshewouldalwayschooseaplay thatmaximisedherwinnings,which,from Maxsperspective,meansminimisinghis. Ifyouhadtheentiretree,youcouldwork outavalueforeachnodeworkingfromthe bottomup.IfitwasaMaxnode(thatis,Max hadtoplayfromit),itwouldhaveavaluethat wasthemaximumofthechildnodes.Ifitwas aMinnienodeitwouldhaveavaluethatwas thesmallest(theminimum)ofthechildnodes. This,inessence,istheminimaxalgorithm: buildthetree,workoutthevalueofeach nodeusinganalternateminimise/maximise constraint,andthevalueoftherootisthe valueoftheentiregameforplayerone(Max, aswecalledhim).
1 Figure 4: The complete game tree for a simplied Nim game.
The recursive method

Insteadofbuildingtheentiretreeandthen analysingit,thebestapproachistotraverse thetreerecursively(apostixtraversal,infact) andcalculatewhatyouneedwhenyouneedit
(anddestroythestuffyoudontneedwhen youredone).Inessence,sinceatreeisdeined recursively,youcalculatetheminimaxvalue bycalculatingthemaximum(orminimum)of theminimaxesofallthechildtrees.Remember thatthelevelsalternatebetweenmaximising andminimising(sometimesyoulookatitfrom MinniesviewpointinsteadofMaxs). Figure4showsaverysimpliiedNimgame (onepileofivestones,youcanremoveone, twoorthreestoneseachplay),fullyexpanded intoagametree.Thenumberinsideeachnode isthenumberofstonesleftinthepileafterthe move,andtheletteralongsideeachnodeisthe minimaxvalueforMax(W=win,L=lose). NotethatthevalueofthegameisLthatis, Maxwillalwayslose(ifyoulike,thissimpliied Nimisalwaysawinforthesecondplayer). Althoughtheminimaxalgorithmisalways guaranteedtoindthebestplayforMax,there isabigproblem.Thegametreecanbehuge mind-bogglinglyhuge.Considerchess,the classicarchetypeofatwo-playerzero-sum game.Ateachgamepositiontherecouldbe somethinglike30possiblemoves.Sinceeach chessgameismadeupofabout80plays(40 back-and-forths),itwouldmeanthatthe lowestlevelofthetreewouldhavesomething like10118nodes.(Notethatintournamentsits rareforagametogotocheckmatethelosing playerislikelytoresignwellbeforethen.)Asa comparison,therearearound1080atomsinthe observableuniverse,meaningthat,inessence, theresnopossiblewayforacomputertomap theentirechessgametree.Sowhatcanwedo? Theirstoptimisationistolimitthedepth towhichweevaluatethegametreeusingthe minimaxalgorithm.Sincewemaynotactually
reachaleafnodeindoingthis,wemakeuse ofanapproximationfunctionaheuristic toapproximatethevalueofthenodeorgame position.Ofnecessity,thisvalueisnotgoing tobeaccurate,butitwillenableustoapplythe minimaxalgorithmwithouthavingtoevaluate allthenodesdowntotheleaves.Thebetterthe heuristic,thebetterthechancesofdevisinga winninggameplayandthemoreaccurateour minimaxvalueswillbe.
Limiting depth
Inourrecursivealgorithmforminimax,well needtolimitthedepthoftherecursioninstead ofallowingtherecursiontoreachtheleaves. Thesimplestwaytodothisistopassadepth parametertotherecursiveminimaxfunction anddecrementitsvalueateveryrecursivecall. Atthelowestleveloftherecursion,weusethe heuristicfunctiontocalculatetheminimax valueofthecurrentgameposition. Now,theresultingminimaxvalueatthe rootofthegametreeisonlygoingtobean approximation.Thedeeperweallowthe partialminimaxalgorithmtogo,themore accurateitsvaluewillbe(becauseweremore likelytoindleafnodesinourtraversal),but thelongerthetraversalwilltake.Wehaveto strikeabalancebetweenaccuracyandthe timetakentocalculatetheminimaxvalue (andhencethemovetoplay). Onceitsourturnagaintomakeamove,we shouldrecalculatetheminimaxvalueatour newgameposition,makingit,ineffect,the rootofthecurrentstateofthegame.Every movewouldbemadeafteranewminimax calculationbasedonthecurrentgamestate. Inmanychessprogramsthatrunon standardPChardware,thedepthofthe minimaxsearchislimitedtosomesix full-widthlevelsaroundabillionpossible gamepositions.Anymorethanthatandthe timetakentoanalysethegamepositions wouldbefartoolongtobepractical.For example,analysingpositionsatarateofa millionpersecond,sixfull-widthlevels wouldtakeaboutaquarterofanhour.n Julian M Bucknall has worked for companies ranging from TurboPower to Microsoft and is now CTO for Developer Express. feedback@pclus.co.uk 291 February 2010 87
Chess programs
Computer chess is one of the most fertile avenues of game research. Modern chess programs running on standard hardware use sophisticated pruning techniques to cull unprotable areas of the game tree, use advanced heuristics to evaluate a game position and have large databases of standard opening games (so that they dont have to analyse an opening game at all) and common endgames (so that they can more easily force checkmates without having to analyse a multitude of game positions towards the end of a game). n
Max
Win 1
Lose 1
Minnie
1 Figure 3: A simple choice in a game tree, to calculate the minimax value of the root node.
PCP291.theory 87
11/12/09 4:08:56 pm

291 Minimax

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

291 Minimax

Cargado por

Copyright:

Formatos disponibles

Make it

Behind the minimax algorithm

Make it Theory workshop

Spotlight on Alpha-beta pruning

291 February 2010

Theory workshop Make it

Minnie Max Minnie

1 Figure 4: The complete game tree for a simplied Nim game.

The recursive method

También podría gustarte