版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
MachineIntelligenceResearch20(3),June2023,299-317
DOI:10.1007/s11633-022-1384-6
AIinHuman-computerGaming:Techniques,ChallengesandOpportunities
Qi-YueYin1,2JunYang3Kai-QiHuang1,2,4Mei-JingZhao1Wan-ChengNi1,2BinLiang3YanHuang1,2ShuWu1,2LiangWang1,2,4
1InstituteofAutomation,ChineseAcademyofSciences,Beijing100190,China
2SchoolofArtificialIntelligence,UniversityofChineseAcademyofSciences,Beijing100049,China
3DepartmentofAutomation,TsinghuaUniversity,Beijing100084,China
4CenterforExcellenceinBrainScienceandIntelligenceTechnology,ChineseAcademyofSciences,Beijing100190,China
Abstract:WiththebreakthroughofAlphaGo,human-computergamingAIhasusheredinabigexplosion,attractingmoreandmoreresearchersallovertheworld.Asarecognizedstandardfortestingartificialintelligence,varioushuman-computergamingAIsystems(AIs)havebeendeveloped,suchasLibratus,OpenAIFive,andAlphaStar,whichbeatprofessionalhumanplayers.Therapiddevelop-mentofhuman-computergamingAIsindicatesabigstepfordecision-makingintelligence,anditseemsthatcurrenttechniquescanhandleverycomplexhuman-computergames.So,onenaturalquestionarises:Whatarethepossiblechallengesofcurrenttechniquesinhuman-computergamingandwhatarethefuturetrends?Toanswertheabovequestion,inthispaper,wesurveyrecentsuccessfulgameAIs,coveringboardgameAIs,cardgameAIs,first-personshootinggameAIs,andreal-timestrategygameAIs.Throughthissurvey,we1)comparethemaindifficultiesamongdifferentkindsofgamesandthecorrespondingtechniquesutilizedforachievingprofessionalhu-man-levelAIs;2)summarizethemainstreamframeworksandtechniquesthatcanbeproperlyreliedonfordevelopingAIsforcomplexhuman-computergames;3)raisethechallengesordrawbacksofcurrenttechniquesinthesuccessfulAIs;and4)trytopointoutfuturetrendsinhuman-computergamingAIs.Finally,wehopethatthisbriefreviewcanprovideanintroductionforbeginnersandinspirein-sightforresearchersinthefieldofAIinhuman-computergaming.
Keywords:Human-computergaming,AI,intelligentdecisionmaking,deepreinforcementlearning,self-play.
Citation:Q.Y.Yin,J.Yang,K.Q.Huang,M.J.Zhao,W.C.Ni,B.Liang,Y.Huang,S.Wu,L.Wang.AIinhuman-computer
gaming:Techniques,challengesandopportunities.MachineIntelligenceResearch,vol.20,no.3,pp.299-317,2023.
/10.
1007/s11633-022-1384-6
1Introduction
Human-computergaminghasalonghistoryandhasbeenamaintoolforverifyingkeyartificialintelligencetechnologies
[1
,
2
].TheTuringtest[
3
],proposedin1950,wasthefirsthuman-computergametojudgewhetherama-chinehashumanintelligence.Thishasinspiredresearch-erstodevelopAIsystems(AIs)thatcanchallengeprofes-sionalhumanplayers.AtypicalexampleisadraughtsAIcalledChinook,whichwasdevelopedin1989todefeattheworldchampion,andsuchatargetisachievedbybeatingMarionTinsleyin1994[
4
].Afterward,DeepBluefromIBMbeatthechessgrandmasterGarryKasparovin1997,settinganewerainthehistoryofhuman-com-putergaming
[5
].
Inrecentyears,wehavewitnessedtherapiddevelop-
Review
ManuscriptreceivedonAugust18,2022;acceptedonOctober19,2022;publishedonlineonJanuary7,2023
RecommendedbyAssociateEditorMao-GuoGong
Coloredfiguresareavailableintheonlineversionat
https://link.
/journal/11633
oTheAuthor(s)2023
mentofhuman-computergamingAIs,fromtheDQNagent
[6
],AlphaGo
[7
],Libratus
[8
],andOpenAIFive[
9
]toAl-phaStar[
10
].TheseAIscandefeatprofessionalhumanplayersincertaingameswithacombinationofmoderntechniques,indicatingabigstepinthedecision-makingintelligence
[11–13
].Forexample,AlphaGoZero[
14
],whichusesMonteCarlotreesearch,self-play,anddeeplearn-ing,defeatsdozensofprofessionalgoplayers,represent-ingpowerfultechniquesforlargestateperfectinforma-tiongames.OpenAIFive[
9
],usingself-play,deeprein-forcementlearning,andcontinualtransferviasurgery,becamethefirstAItobeattheworldchampionsataneSportsgame,displayingusefultechniquesforcompleximperfectinformationgames.
AfterthesuccessofAlphaStarandOpenAIFive,whichreachtheprofessionalhumanplayerlevelinthegamesStarCraftandDota2,respectively,itseemsthatcurrenttechniquescansolveverycomplexgames.Espe-ciallythebreakthroughofthemostrecenthuman-com-putergamingAIsforgamessuchastheHonorofKings
[15
]andMahjong
[16
]obeyssimilarframeworksofAlphaStarandOpenAIFive,indicatingacertaindegreeofuniver-
springer
300MachineIntelligenceResearch20(3),June2023
salityofcurrenttechniques.So,onenaturalquestionarises:Whatarethepossiblechallengesofcurrenttech-niquesinhuman-computergaming,andwhatarethefu-turetrends?Thispaperaimstoreviewrecentsuccessfulhuman-computergamingAIsandtriestoanswerthequestionthroughathoroughanalysisofcurrenttech-niques.
Basedonthecurrentbreakthroughofhuman-com-putergamingAIs(mostpublishedinjournalssuchasSci-enceandNature),wesurveyfourtypicaltypesofgames,i.e.,boardgameswithGo;cardgamessuchasheads-upno-limitTexashold'em(HUNL),DouDiZhu,andMah-jong;firstpersonshootinggames(FPS)withQuakeIIIArenaincapturetheflag(CTF);real-timestrategygames(RTS)withStarCraft,Dota2,andHonorofKings.ThecorrespondingAIscoverAlphaGo
[7
],AlphaGoZero[
14
],AlphaZero
[17
],Libratus[
8
],DeepStack
[18
],Dou-Zero[
19
],Suphx
[16
],FTW
[20
],AlphaStar
[10
],OpenAIFive[
9
],JueWu1
[15
],andCommander
[21
].Abriefsummaryisdis-playedin
Fig.
1
.
Theremainderofthepaperisorganizedasfollows.InSection2,wedescribethegamesandAIscoveredinthispaper.Sections3–6elaborateontheAIsforboardgames,cardgames,FPSgames,andRTSgames,respectively.InSection7,wesummarizeandcomparethedifferenttech-niquesutilized.InSection8,weshowthechallengesincurrentgameAIs,whichmaybethefutureresearchdir-ectionofthisfield.Finally,weconcludethepaperinSection9.
2TypicalgamesandAIs
Basedontherecentprogressofhuman-computergam-ingAIs,thispaperreviewsfourtypesofgamesandtheircorrespondingAIs,i.e.,boardgames,cardgames,FPSgames,andRTSgames.Tomeasurehowhardagameistodevelopprofessionalhuman-levelAI,weextractsever-alkeyfactorsthatchallengeintelligentdecision-mak-ing[
22
],whichareshownin
Table1
.
Imperfectinformation.Exceptfortheboardgames,almostallthecardgames,FPSgames,andRTSgamesareimperfectinformationgames,whichmeansthatplayersdonotknowexactlyhowtheycometothecurrentstates,e.g.,currentfaceinHUNL.Accordingly,playersneedtomakedecisionsunderpartialobservation.Thisleadstomorethanonenodeinaninformationsetifthegameisexpandedintoatree.Forexample,theaver-ageinformationsetsforthecardgamesHUNLandMah-jongare103and1015,respectively.Moreover,comparedwithperfectinformationgamessuchasGo,asubgameinanimperfectinformationgamecannotbesolvedisolatedfromeachother[
23
],whichmakessolvingtheNashequilib-riumofimperfectinformationgamesmoredifficult
[24
].
Longtimehorizon.Inreal-timegames,suchas
StarCraft,Dota2,andHonorofKings,agamelastssever-alminutesandevenmorethananhour.Accordingly,an
1Anameknownbythepublic.
springer
AIneedstomakethousandsofdecisions.Forexample,Dota2gamesrunat30framespersecondforabout45minutes,resultinginapproximately20000stepsinagameifmakingadecisioneveryfourframes.Incontrast,playersincardgamesusuallymakefewerdecisions.Thelongtimehorizonleadstoanexponentialincreaseinthenumberofdecisionpoints,whichbringsinaseriesofproblems,suchasexplorationandexploitation,whenop-timizingastrategy.
In-transitivegame.Iftheperformanceofdifferentplayersistransitive,agameiscalledatransitivegame[
25
].Mathematically,ifvtcanbeatvt?1andvt+1canbeatvt,vt+1outperformsvt?1.Then,agameisstrictlytransitive.However,mostgamesintherealworldarein-transitive.Forexample,inasimplegame,“Rock-Paper-Scissor”,thestrategyisin-transitiveorcyclic.Commonly,mostgamesconsistoftransitiveandin-transitiveparts,i.e.,obeythespinningtopsstructure
[26
].Thein-transitivecharacterist-icmakesthestandardizedself-playtechnique,widelyusedforagentabilityevolution,failtoiterativelyap-proachtheNashequilibriumstrategy.
Multi-agentcooperation.Mostboardgamesandcardgamesarepurelycompetitive,wherenocooperationbetweenplayersisrequired.AnexceptionisDouDizhu,whichneedstwoPeasantplayersplayingasateamtofightagainsttheLandlordplayer.Incontrast,almostallreal-timegames,i.e.,FPSgamesandRTSgames,relyonplayers'cooperationtowinthegame.Forexample,fiveplayersinDota2andHonorofKingsformacamptofightagainstanothercamp.EventhoughStarCraftisatwo-payercompetitivegame,eachplayerneedstocon-trolalargenumberofunits,whichneedtocooperatewelltowin.Overall,howtoobtaintheNashequilibriumstrategyorabetter-learnedstrategyundermulti-agentcooperationisahardproblembecausespeciallydesignedagentinteractionoralignmentneedstobeconsidered.
Insummary,differentgamessharedifferentcharacter-isticsandaimtofinddifferentkindsofsolutions,sodis-tinctlearningstrategiesaredevelopedtobuildAIsys-tems.InSections3–6,wewillseethatbehindthegametypesistheevolutionoftechniquesthataredesignedforperfectinformation,imperfectinformation,andmorecomplexreal-timeandlong-timehorizonimperfectin-formationgames.So,ataxonomybasedondifferentkindsofgamesisutilized.Finally,inthispaper,theAIscover:AlphaGo,AlphaGoZero,andAlphaZerofortheboardgameGo;Libratus,DeepStack,DouZero,andSuphxforcardgamesHUNL,DouDiZhu,andMahjong,respect-ively;FTWfortheFPSgameQuakeIIIArenaincap-turetheflagmodel;AlphaStar,Commander,OpenAIFive,andJueWuforStarCraft,Dota2andHonorofKings,respectively.
3BoardgameAIs
TheAlphaGoseriesisbuiltbasedonMonteCarlo
StarCraft
StarCraft
AlphaStar
JueWu
Commander
DouZero
Suphx
Heads-upNo-limitTexashold′em
Mahjong
DouDiZhu
Cardgame
Q.Y.Yinetal./AIinHuman-computerGaming:Techniques,ChallengesandOpportunities301
Go
Nature
AlphaGo
game
Board
Go
Nature
AlphaGoZero
Go,chess,Shogi
Science
AlphaZero
RTSgame
Dota2HonorofKings
Nature
NeurIPS
ICML
arXiv
OpenAIFive
Nov2019Dec2019
Dec2020Jul2021
Jul2021
Jan2016
Oct2017Dec2018
May2019
Apr2020
Jan2018
May2017
Libratus
DeepStack
FTWQuakeIIIArenain
Science
Science
ICML
arXiv
Sciencecapturetheflagmode
FPSgame
Fig.1GamesandAIssurveyedinthispaper
Table1Characteristicsoffourtypicalkindsofgames
Games
Boardgames
Goseries
HUNL
Cardgames
DouDiZhu
Mahjong
FPSgames
CTF
StarCraft
RTSgames
Dota2
HonorofKings
Imperfectinformation
×
?
?
?
?
?
?
?
Longtimehorizon
?
×
×
×
?
?
?
?
In-transitivegame
?
?
?
?
?
?
?
?
Multi-agentcooperation
×
×
?
×
?
?
?
?
treesearch(MCTS)[
27
,
28
],whichiswidelyutilizedinpre-viousGoprograms.AlphaGocameoutin2015andbeatsEuropeanGochampionFanHuiby5:0,whichwasthefirsttimethatanAIwonagainstprofessionalplayersinafull-sizegame,GowithoutRenzi.Afterward,anad-vancedversioncalledAlphaGoZerowasdevelopedusingdifferentlearningframeworks,whichneedsnopriorpro-fessionalhumanconfrontationdataandreachessuperhu-manperformance.AlphaZerousesasimilarlearningframeworktoAlphaGoZeroandexploresageneralrein-forcementlearningalgorithm,whichmastersGoalongwithanothertwoboardgames,chess,andShogi.Abriefsummarizationisshownin
Fig.
2
.
3.1MCTSforAlphaGoseries
OneofthekeyfactorsoftheAlphaGoseriesisMCTS,whichisatypicaltreesearch-basedmethod.Generally,asimulationofMCTSconsistsoffoursteps,repeatedhun-dredsandthousandsoftimesforonestepdecision.Thefourstepsconsistofselection,expansion,evaluation,andbackup,whichareoperatedinatreeasshowninthelowerrightcornerof
Fig.
2
.Intheselectionstep,aleafnodeissetstartingfromtherootnode,i.e.,thestatewhereanactionneedstobedecided,basedontheevalu-ationofthenodesinthetree.Nextistheexpansionofthetreebyaddinganewnode.Finally,startingfromtheexpandednode,arolloutisperformedtoobtainavalue
forthenode,whichisusedtoupdatethevaluesofallnodesinthetree.
IntheAlphaGoseries,traditionalMCTSisimprovedviadeeplearningtolimitthewidthanddepthofthesearchsoastohandlethehugegametreecomplexity.Firstly,intheselectionstage,anodeisselectedbasedonthesumoftheactionvalueQandabonusu(p).Theac-tionvalueistheaveragenodevalueofallsimulations,andthenodevalueistheevaluationofanodebasedonthepredicationofthevaluenetworkandtherolloutres-ultsbasedontherolloutnetwork.Thebonusispropor-tionaltothepolicyvalue(probabilityofselectingpointsinGo)calculatedviathepolicynetwork,butinverselyproportionaltothevisitcount.Secondly,intheexpan-sionstage,anodeisexpandedanditsvalueisinitializedthroughthepolicyvalue.Finally,whenmakinganestim-ateoftheexpandednode,therolloutresultsbasedontherolloutnetworkandthepredictedresultsbasedonthevaluenetworkarecombined.AsnotedinAlphaGoZeroandAlphaZero,therolloutisremovedandtheevalu-ationoftheexpandednodeisbasedsolelyonthepredic-tionresultsofthevaluenetwork,whichwillbeexplainedinthefollowingsubsection.
3.2LearningforAlphaGoseries
3.2.1LearningforAlphaGo
LearningofAlphaGoconsistsofseveralsteps.Firstly,
springer
302MachineIntelligenceResearch20(3),June2023
AlphaGo
Humanexpertdata
Policy(p)
Rollout
policy
Self-play
RL
policy
Self-play
Value
(v)
AlphaGoZero
AlphaZero
SimilarwithAlphaGoZero
>except
Trainingdetails:1)Nodataaugmentandboardpositiontransform;2)
Purelyself-play;3)Otherdetails.
Policy&Valueinitialization(p)(v)
…
Update
Update
Policy&Value(p)(v)
Self-playSelf-play
MCTS
EmbeddedtocalculateQandu(basedonpandv)
Selection
Expansion
…
…
Policy
(p)
Evaluation
…
…
Rollout
policy
Backup
…
…
Value
(v)
Q+u(p)
max
…
Q+u(p)
max
…
Trainedandthenembeddedto
calculateQandu(basedonpandv)
Fig.2AbriefframeworkoftheAlphaGoseries
asupervisedlearningpolicynetworkandarolloutpolicynetworkaretrainedwithhumanexpertdata,whichout-putstheprobabilityofthenextmovepositionbasedon160000gamesplayedbyKGS6to9danhumanplayers.Thedifferencesbetweenthemaretheneuralnetworkar-chitecturesandfeaturesused.Specifically,thesupervisedpolicyconsistsofseveralconvolutionallayersusinga19×19×48imagestackof48featureplanesasinput,whereastherolloutpolicyisjustalinearsoftmaxpolicyusingsomelessfast,incrementallycomputed,localpat-tern-basedfeatures.Withtheabovehigh-qualitydata,averygoodinitiationofthesupervisedlearningpolicynet-workisobtained,whichreachesamateurlevel,i.e.,aboutamateur3dan(d).
Withthesupervisedlearningpolicynetworktrained,areinforcementlearningpolicynetworkisinitialized(withthesamenetwork)andthenimprovedthroughself-play,whichusesthenetworkofthecurrentversiontofightagainstitspreviousversions.Basedonconventionalpolicygradientmethodstomaximizethewinningsignal,thereinforcementlearningpolicynetworkreachesbetterperformancethanthesupervisedlearningnetwork,i.e.,an80%winningrateagainstthesupervisedlearningpolicy.
InthethirdstepofAlphaGo,avaluenetworkistrainedtoevaluatethestate,whichsharesthesamefea-turesandneuralnetworkarchitecturewiththesuper-visedlearningpolicynetworkexceptforthelasttwolay-ersduetodifferentoutputdimensionalities.Especially,adatasetconsistingof30millionstate-outcomepairsiscol-locatedthroughtheself-playofthereinforcementlearn-ingnetwork.Then,aregressiontaskisdevelopedbymin-
springer
imizingthemeansquarederrorbetweenthepredictedresultofthevaluenetworkandthecorrespondingout-come(winorlosssignal).Withthevaluenetwork,MCTScanreachabetterperformancethanjustusingthesuper-visedlearningpolicynetwork.Finally,thewell-trainedsupervisedlearningpolicy,valuenetwork,androlloutnetworkareembeddedintoMCTS,whichreachesapro-fessionallevelof1to3dan(p).
3.2.2LearningforAlphaGoZeroandAlphaZero
UnlikeAlphaGo,whosepolicynetworkandvaluenet-workaretrainedthroughsupervisedlearningandself-playbetweenthepolicynetworks,AlphaGoZerotrainspolicyandvaluenetworksthroughself-playofMCSTem-beddedinthecurrentversionofthenetworks.Besides,differentneuralnetworkarchitecturesareadoptedcom-paredwithAlphaGo,i.e.,residualnetworks.Asfortheinput,moresimplifiedfeaturesareusedwithoutconsider-ingthehumanplayerexperience.AlphaZerosharesthesamelearningframeworkasAlphaGoZero.Overall,theyconsistoftwoalternatingrepetitionsteps:automaticallygeneratingdata,policyandvaluenetworkstraining.
Whengeneratingtrainingdata,self-playofMCTSisperformed.MCTSembeddedinthecurrentpolicyandvaluenetworksisusedtoselecteachmoveforthetwoplayersateachstate.Generally,MCTSselectsanactionbasedonthemaximumcount,butAlphaGoZeromakesitaprobabilitytoexploremoreactionsbynormalizingthecount.Accordingly,state-moveprobabilitypairsarestored.Finally,whenagameends,thewinningsignal(+1or–1)isrecordedforvaluenetworktraining.
Relyingonthecollectedstate-moveprobabilityandwinningsignal,thepolicyandvaluenetworksaretrained.
Q.Y.Yinetal./AIinHuman-computerGaming:Techniques,ChallengesandOpportunities303
Morespecifically,thedistancebetweenthepredictedprobabilityofthepolicynetworkandthecollectedprob-abilityforeachstateisminimized.Besides,thedistancebetweenthepredictedvalueofthevaluenetworkandthewinningsignalisminimized.Theoveralloptimizationob-jectivealsocontainsanL2weightregularizationtopre-ventoverfitting.
3.2.3Learningdifferences
BasedonMCTS,deeplearning,reinforcementlearn-ing,andself-playarenicelyevolvedintheAlphaGoseries,asshownin
Fig.
2
.Themaindifferenceisthelearningframeworksutilized,elaboratedinthefollowingparagraphs.Tosumup,AlphaGouseshumanexpertdatatoobtainthesupervisedpolicynetwork,basedonwhichself-playbetweensupervisedpolicynetworkisper-formedtoobtainreinforcementlearningpolicyandthesubsequentvaluenetworkbasedonsimilarself-playofre-inforcementlearningpolicy,andallthetrainednetworksareembeddedintoMCTSfordecisionmaking.However,AlphaGoZerousesnohumanexpertdataandtrainsthepolicyandvaluenetworksbasedondatageneratedthroughself-playofMCTSembeddedinthecurrentver-sionofpolicyandvaluenetworks.AlphaZerosharesthesametrainingframeworkasAlphaGoZero,exceptforseveralsmalltrainingsettings.
Apartfromthetrainingframework,thereareseveralfactorsinwhichAlphaGoZerodiffersfromAlphaGo.Firstly,norolloutpolicynetworkisusedtoevaluatetheexpandednode,andthebenefitisaspeedupoftheMCTSsimulation.Withthehigherqualitydatagener-atedbythenewlearningframework,valuesofleafnodescanbebetterestimatedwithoutusingarolloutpolicy.Besides,nohumanexpertdataareutilizedfordeepneur-alnetworktraining.Secondly,thepolicyandvaluenet-worksinAlphaGoZerosharemostoftheparameters(convolutionallayers)insteadoftwoseparatenetworks,whichshowsabetterElorating[
29
].What'smore,resid-ualblocks,asapowerfulmodularfordeeplearning,isutilizedinAlphaGoZero,andshowsmuchbetterper-formancethanjustusingconvolutionalblocksasinAl-phaGo.Finally,theinputtothepolicyofAlphaGoZeroisa19×19×17imagestackinsteadofthe19×19×48imagestack,whichrarelyuseshumanengineeringfea-turescomparedwithAlphaGo,e.g.,thedesignedladdercaptureandladderescapefeatures.
AlphaZeroaimstodevelopamoregeneralreinforce-mentlearningalgorithmforvariousboardgamessuchasGo,chess,andShogi.SincetherulesofchessandShogiareverydifferentfromGo,AlphaZeromakesseveralchangestothetrainingdetailstofittheabovegoal.AsforthegameGo,therearetwomaintrainingdetailsthataredifferentfromAlphaGoZero.Firstly,nodataaug-mentandtransformationssuchasrotationorreflectionofthepositionsareapplied.Secondly,AlphaZerousesapureself-trainingframeworkbymaintainingonlyasingleneuralnetworkinsteadofsavingabettermodelineach
iterationoftraining.
4CardgameAIs
TheCardgame,asatypicalin-perfectinformationgame,hasbeenalong-standingchallengeforartificialin-telligence.DeepStackandLibratusaretwotypicalAIsystemsthatdefeatprofessionalpokerplayersinHUNL.Theysharethesamebasictechnique,i.e.,counterfactualregretminimization(CFR)[
30
].Afterward,researchersarefocusingonMahjongandDouDiZhu,whichraisenewchallengesforartificialintelligence.Suphx,developedbyMicrosoftResearchAsia,isthefirstAIsystemthatout-performsmosttophumanplayersinMahjong.DouZero,designedforDouDiZhu,isanAIsystemthatwasrankedfirstontheBotzoneleaderboardamong344AIagents.Abriefintroductionisshownin
Fig.
3
.
4.1DeepStackandLibratusforHUNL
HUNLisoneofthemostpopularpokergamesintheworld,andplentyofworld-levelcompetitionsareheldeveryyear,suchastheWorldSeriesofPoker.BeforeDeepStackandLibratuscameout,HUNLwasaprimarybenchmarkandchallengeofimperfectinformationgameswithnoAIsthathaddefeatedprofessionalplayers.
4.1.1CFRforDeepStackandLibratus
Sincebeingproposedin2007,CFRhasbeenintro-ducedinpokergames.CFRminimizescounterfactualre-gretforlargeextensivegames,whichcanbeusedtocom-puteaNashequilibrium.Generally,itdecomposesthere-gretofanextensivegameintoasetofadditiveregrettermsoninformationsetsthatcanbeminimizedinde-pendently.Duetothehighcostoftimeandspace,basicCFRisnotapplicabletoHUNL,whichismuchmorecomplexthanlimitedpoker.VariousimprovedCFRap-proacheshavebeendeveloped,consideringimprovingcomputingspeedorcompressingtherequiredstoragespace
[31
,
32
].Forexample,basedonCFR,continue-resolv-ing[
18
],andsafeandnestedsubgamesolving[
8
],arek
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 幼兒園安全教育心得體會
- 城南舊事的觀后感350字
- 電工工作計劃
- 108個基準崗位名單
- 七年級數(shù)學下冊培優(yōu)輔導人教版
- 個人安全管理制度
- 第11課 北洋政府的統(tǒng)治與軍閥割據(jù)(解析版)
- 2024插畫約稿合同協(xié)議書范本
- 食品加工合作合同管理
- 2024年陶瓷制品代加工生產(chǎn)合作協(xié)議3篇
- 《鋼結構》期末考試/試題庫(含答案)要點-2
- 小學綜合實踐活動案例,小學綜合實踐活動案例
- 思政教師培訓心得體會2021
- 零基礎的住宅和城市設計知到章節(jié)答案智慧樹2023年同濟大學
- 防止電力生產(chǎn)事故的-二十五項重點要求2023版
- 建辦號建筑工程安全防護、文明施工措施費用及使用管理規(guī)定
- GB/T 30170-2013地理信息基于坐標的空間參照
- 醫(yī)院消毒供應中心清洗、消毒、滅菌質控評分表
- 2022年學校寒假德育特色作業(yè)實踐方案(詳細版)
- 可愛卡通插畫風讀書分享通用PPT模板
- 光伏發(fā)電項目試驗計劃
評論
0/150
提交評論