數(shù)據(jù)高效的強(qiáng)化學(xué)習(xí) Deciding what to learn in complex environments_第1頁
數(shù)據(jù)高效的強(qiáng)化學(xué)習(xí) Deciding what to learn in complex environments_第2頁
數(shù)據(jù)高效的強(qiáng)化學(xué)習(xí) Deciding what to learn in complex environments_第3頁
數(shù)據(jù)高效的強(qiáng)化學(xué)習(xí) Deciding what to learn in complex environments_第4頁
數(shù)據(jù)高效的強(qiáng)化學(xué)習(xí) Deciding what to learn in complex environments_第5頁
已閱讀5頁,還剩206頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

DECIDINGWHATTOLEARNINCOMPLEXENVIRONMENTSADISSERTATIONSUBMITTEDTOTHEDEPARTMENTOFCOMPUTERSCIENCEANDTHECOMMITTEEONGRADUATESTUDIESOFSTANFORDUNIVERSITYINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYDilipArumugam?2024byDilipSrimalARe-distributedbyStanfordUniversiThisworkislicensedunderaCreaNoncommercial3.0UnitedStatesLicense./licensThisdissertationisonlineat:/fh7IcertifythatIhavereadscopeandqualityasadissertationforthedegreeofDoctorofPhilosophy.IcertifythatIhavereadscopeandqualityasadissertationforthedegreeofDoctorofPhilosophy.IcertifythatIhavereadscopeandqualityasadissertationforthedegreeofDoctorofPhilosophy.IcertifythatIhavereadscopeandqualityasadissertationforthedegreeofDoctorofPhilosophy.Thissignaturepagewasgeneratedelectronicallyuponsubmissionofelectronicformat.ivReinforcementlearningistheparadigmofmachinelearningdedicatedtosequentialdecision-makingproblems.Likemanyotherareasofmachinelearningandstatistics,thereisoftenaprevalentconcernoverdataefficiency;thatis,howmuchtrial-and-errorinteractiondatadoesasequentialdecision-makingagentrequireinordertolearndesiredbehaviors?Oneofthekeyobstaclestodata-efficientreinforcementlearningisthechallengeofexploration,wherebyasequentialdecision-makingagentmustbalancebetweengainingnewknowledgeaboutanenvironmentandexploitingcurrentknowledgetomaximizenear-termperformance.Thetraditionalliteratureonbalancingexplorationandexploitationfocusesonenvironmentsinwhichanagentcanapproachoptimalperformancewithinarelevanttimeframe.However,modernartificialdecision-makingagentsengagewithcom-plexenvironments,suchastheWorldWideWeb,inwhichthereisnohopeofapproachingoptimalperformancewithinanyrelevanttimeframe.Thisdissertationfocusesondevelopingprincipled,practicalapproachesforaddressingtheexplo-rationproblemincomplexenvironments.Ourmethodsfocusonthesimpleobservationthat,ratherthanendeavortoobtainenoughinformationforachievingoptimalbehavior,anagentconfrontedwithsuchacomplexenvironmentshouldinsteadtargetamodestcorpusofinformationthat,whilecapableoffacilitatingbehavioralimprovement,isitselfinsufficienttoenablenear-optimalperfor-mance.Wedesignanagentthatmodulatesexplorationinthiswayandprovidebothatheoreticalaswellasanempiricalanalysisofitsbehavior.Ineffect,ateachtimeperiod,thisawhattolearnsoastostrikeadesiredtrade-offbetweeninformationrequirementsandperformance.Asweelucidateinthisdissertation,centraltothedesignofsuchanagentareclassictoolsfrominformationtheoryandlossycompression,whichnotonlyfacilitateprincipledtheoreticalguaranteesbutalsoremainamenabletopracticalimplementationatscale.vFirstandforemost,Iwouldliketothankmyparents,whohavealwaysprovidedmewitheveryopportunitytosucceedinlife.Astwoimmigrantstothiscountry,theydidsothroughtremendoushardworkandsacrifice,forwhichIwillneverbeabletofullyexpressmygratitudeinwords.Forthelastsixyears,IhavehadthehonorandprivilegeofbeingadvisedbyBenjaminVanRoy.Theworkoutlinedinthisdissertationwouldnothavebeenpossiblewithouthismeticulousinsightandseeminglyunlimitedpatience.Ben’sprofoundabilitytoabstractawaytheminutiaofastandardthatI’llbeaspiringtomeetfortherestofmyresearchcareer.Iwouldliketothankthemembersofmydissertationcommittee:ChelseaFinn,EmmaBrunskill,alwaysbeenincrediblygenerouswithhertimeandcareeradvice.I’vecertainlyfeltherpresenceduringmyPhDjourneywhilethinkingaboutresearchideasthatarenotonlyprincipledbutalsopractical.I’mproudtosaythatalleightoccasionswhenIhaveservedabreathoffreshairtotemporarilyleavethenarrowconceptualbubbleofmyresearchandsharemypassionforreinforcementlearningwitheagerstudents.Ihaveappreciatedtheseopportunitiestothinkmoreholisticallyaboutreinforcementlearningasafieldandthechancetodevelopmypedagogicalskills,especiallyashethecontributionsofthisdissertationgreatlydrawontoolsfrominformationtheory,IwashonoredwhenTsachyWeissmanagreedtobeonmydissertationcommittee.Havingtakentwoinformationtheorycoursesfromhimandthroughnumerousconversations,I’vealwaysappreciatedhisclarityofthoughtandlookforwardtoseeinghowourforthcomingresearchcollaborationsturnout.I’veenjoyedthreeseparatestatisticsseminarswithArtovermytenurehereatStanfordand,ineachone,hehasalwaysbeenincrediblysupportiveofmyeffortstointertwinereinforcementlearningwithstatisticalmachineryfromthecourse.Themostrecentoftheseeffortswillhopefullybethefirstofmanypaperswewritetogetheronstatisticalreinforcementlearning.Despitenotbeingmyofficialco-advisor,I’mincrediblygratefultohavehadNoahGoodmanalongformyentirePhDjourneyasanunofficialsecvithe“probabilisticprogramminglanguagesguy”onmymeetingsheetduringPhDadmitweekend,butourshort20-minutemeetingoversharedinterestsinabstractionandtheroleofnaturallanguageindecisionmakingwasprobablymyfavoriteoftheentireevent.Whilemydissertationworkendeduptakingmeinadifferentdirection,NoahhasgraciouslyletmemakeasecondhomeoutoftheCoCoLabandIamleavingStanfordamuchbetterresearcherforthetimeI’vespentthere.I’msoexcitedtohavetakenmyfirst(baby)stepsintocognitivescienceresearchwithNoah’ssupportandMyresearchjourneyhereatStanfordrestsuponasolidfoundationestablishedattheBrownUniversityCSDepartment,whichIowelargelytoMichaelLittmanandStefanieTellex.MichaelisanabsolutelegendandIcouldnothavebeenluckiertohavehimasmyadvisorduringbothmyundergraduateandMaster’sdegrees.MytimewithMichaelinstilledaloveofreinforcementlearningandI’llalwaysbeproudtohavebeenoneofhisstudents.EvenafterdepartingBrown,MichaelcontinuestobeanamazingmentorwhoisalwayswillingtohoponacallandofferwordsofwisdomwheneverI’mfeelingstuck.Also,asifbeinganincredibleresearcherandmentorwasn’tMichaelisalsooneofthefunniestpeopleIknowandbringinghislevelofwittylightheartednesstodeepandcomplexresearchisafeatIwilllikelyneverachieve.StefaniegavememyfirstopportunityThefirstH2RlabmeetingIattendedwastheonewhereIfirstheardaboutreinforcementlearningandMDPs,verymuchleadingtowhereIamtoday.Stefanie’scommitmenttorigorousresearchandadvancingthestateofroboticsystemcapabilitieswitheachpaperissomethingIcontinuetoadmireandthepaperswewrotetogetherwerethespringboardthatallowedmetocomedoaPhDatStanford.IwouldalsoliketothankEugeneCharniak(inmemoriam),TomDoeppner,GeorgeKonidaris,ShriramKrishnamurthi,AnnaLysyanskaya,EliUpfal,andPaulValiantforbeinghelpfulandpositiveinfluencesonmeduringmytimeatBrown.MyresearchcareerhasbeenenrichedbythewisdomandsupportofmanyfantasandcollaboratorsincludingDaveAbel,KavoshAsadi,Pierre-LucBacon,DebadeeptaDey,ShiDong,NakulGopalan,PeterHenderson,MarkHo,SaurabhKumar,JamesMacGlashan,BrendanO’Donoghue,BenPrystawski,SatinderSingh,LawsonWong,andWanqiaoXu.MytimeatBrownandStanfordwouldnothavebeenpossiblewithoutSiddKaramcheti.Thankyouforbeingatruefriendandbrother.I’mexcitedtoseewhat’snextforyouinbothcareerandlife.SallyHosokawahasalwaysbeenI’llalwaysbeluckytohaveyouinmylifeand,asalways,Icannotwaitforournextadventure.OtherStanfordfriendswhohavemadethelastsixyearsmemorableandfunincludeKristyChoi,ChrisCundy,KawinEthayarajh,KaranGoel,AlbertGu,PeterHenderson,JohnHewitt,AnanyaKumar,MinaeKwon,EricMitchell,AndyPalan,SurajNair,RafaelRafailov,ShreyaRajpal,ShioriSagawa,ArchitSharma,MeghaSrivastava,SanjanaSrivastava,AlexTaXie,andAllanZhou.Additionally,IcouldnothavemadeitherewithoutamazingfriendsfromviiBrownlikeCharlesRaymondGleason,ChrisGrimm,EllisHershkowitz,AdamHoff,DanielKeliher,JoyceKim,OlabadeOmole,HarrisonPincket,EimiSatoh,BeverlyTai,KeshavVemuri,andGregYauney.viiiAbstractivAcknowledgmentsv1Introduction11.1Data-EfficientReinforcementLearning 1.2ExplorationinComplexEnvironments 1.3Organization 1.4Contributions 2Preliminaries62.1InformationTheory 2.2Rate-DistortionTheory 3Multi-ArmedBandits103.1ProblemFormulation 3.2TheoreticalAnalysis 3.2.1ThompsonSampling&Satisficing 3.2.2TargetActions 3.2.3Rate-DistortionThompsonSampling 3.3EmpiricalResults 3.3.1Blahut-ArimotoSatisficingThompsonSampling 3.3.2ComputationalExperiments 4ReinforcementLearning264.1TheoreticalAnalysis 4.1.1ProblemFormulation 4.1.2TheValueEquivalencePrinciple ix4.1.3Value-EquivalentSamplingforReinforcementLearning 4.2EmpiricalResults 4.2.1ProblemFormulation 4.2.2RandomizedValueFunctions 4.2.3Blahut-ArimotoRandomizedValueFunctions 4.2.4Experiments 4.3.2GreaterCompressionviaStateAbstraction 5Conclusion54AMulti-ArmedBanditProofs56BReinforcementLearningProofs62B.1ProofofTheorem2 B.2ProofofLemma8 B.3ProofofTheorem3 B.4ProofofLemma1 B.5ProofofLemma2 B.6ProofofTheorem4 B.7ProofofTheorem5 B.8ProofofLemma3 B.9ProofofTheorem6 CDeepRLExperimentDetails86C.1AlgorithmImplementations C.1.1DQN C.1.2RVF&Blahut-ArimotoRVF C.2HyperparameterSelection C.3ComputeDetails xC.1DQNHyperparameters C.2RVFHyperparameters xi3.1CumulativeregretcurvesforBernoulliandGaussianbanditswith10independentarmscomparingtraditionalThompsonSampling(TS)againstBlahut-ArimotoSTS(BLASTS),sweepingovertheβhyperparameterofthelatter 3.2Rate-distortioncurvesfortargetactionscomputedviaBLASTS(A-t)andSTS(Aε)inthefirsttimeperiodsofBernoulliandGaussianbanditswith1000independentarms.213.3RatecurvesforBernoulliandGaussianbanditswith10independentarmscompar-ingtraditionalThompsonSampling(TS)againstBlahut-ArimotoSTS(BLASTS),sweepingovertheβhyperparameterofthelatter 3.4CumulativediscountedregretvaluescomparingTS,BLASTS,andGittins’indicesforthe2-armBernoullibanditacrossthreediscountfactors 3.5CumulativediscountedregretvaluescomparingTS,BLASTS,andGittins’indicesforthe10-armBernoullibanditacrossthreediscountfactors 3.6CumulativediscountedregretvaluescomparingTSandBLASTSforthe20-armlo-gisticbanditwithdimensiond=3acrossthreediscountfactors 4.1(Top)MiniGridenvironmentsusedinourempiricalevaluationofBlahut-ArimotoRVF.Anobservationisapartialimageofthewholegridindicatedbytheshadedregion.Blacktilesrepresentemptysquares,graytiletilesrepresentgoalstates.TheaterminateswhentheagLearningcurvesofDQN,RVF,andBlahut-ArimotoRVF 4.2TheRiverSwimMDPofStrehlandLittman[2008]asstudiedbyOsbandetal.[2013].484.3LearningcurvesforBA-RVFvaryingβvaluseintheConfluenceSwimenvironment 11.1Data-EfficientReinforcementLearningReinforcementlearning(RL)[SuttonandBarto,1998,Kaelblingetal.,1996]istheparadigmofma-chinelearningdedicatedtosequentialdecision-makingproblems.Likemanyotherareasofmachinelearningandstatistics,thereisoftenaprevalentconcernoverdataefficiency;thatis,howmuchtrial-and-errorinteractiondatadoesasequentialdecision-makingagentrequireinordertolearndesiredbehaviors?Unlikeanyotherparadigmofmachinelearning,however,dataefficiencyinRLrequirestacklingthreefundamentalchallengessimultaneously:1.Exploration:judiciouslyprioritizingwhatdataiscollectedfromtheenvironmenttoimprovelong-termperformance.2.Generalization:robustlydistillingtransferableinformationfromsampleddatathatextends3.CreditAssignment:accuratelyrelatinglongsequencesofper-stepdecisionstotemporallyUnliketraditionalsupervisedlearning,wherealearnerisprovidedwithafixed,staticdataset,aRLagentadaptivelycollectsdatathroughitsinteractionswiththeenvironment.Broadlyspeaking,theexplorationchallengeboilsdowntoabinarychoicefacedbyasequentialdecision-makeithergainnewknowledgeabouttheworldorexploitcurrentknowledgeinanefforttomaximizeimmediateperformance.Whilethechallengesposedbygeneralizationandcreditassignmentareoftensignificant,thisdissertationwillmaintainanexclusivefocusontacklingexplorationinRL.Nevertheless,oursolutionconceptsaredesignedinsuchawaythatfutureworkmayfinditfruitfultocombinetheseideaswithmethodsforhandlingtheothertwochallengesintoamoreholistic,data-efficientRLagent.CHAPTER1.INTRODUCTION21.2ExplorationinComplexEnvironmentsAgentsthatlearntoidentifyoptimalbehaviorsrepresentthepredominantfocusofthesequentialdecision-makingliterature.Indeed,thereisalonghistoryofreinforcement-learningalgorithmsthatguideexploratorydecisionswiththegoaloflearningoptimalbehavior.However,learningisaprocessofacquiringinformationandsoanythingthatanagentmayendeavortolearnrequiresobtainingapreciseamountofinformationfromitsinteractionswiththeenvironment;naturally,asmeasuredbythisrequisiteinformation,somethingsareeasiertolearnthanothers.Wheninteractingwithacomplexenvironment,identifyinganoptimalpolicymaypresentanexceedinglydifficultchallengeasthereissimplytoomuchtolearnwithinanyreasonabletimeframe.Consequently,aboundedagentisforcedtoprioritize.Asimpleapproachistodesignatealearningtarget,whichcanbethoughtofasacorpusofinformationthat,whileinsufficienttoyieldoptimalperformancewithintheenvironment,sufficestoguideeffectivedecisionsandfacilitatebehavioralimprovement.Then,theagentcanreorientitsexplorationtoprioritizegatheringofinformationaboutthislearningtargetinlieuofoptimalbehavior.Ratherthanplacetheonusupontheagentdesignertofabricatealearningtargetfortheagent,eachagentconsideredinthisdissertationisdesignedtoselectitsownleamated,data-drivenfashion.Thisshiftstheagentdesigner’srolefromspecifyingonetoendowingtheagentwiththeabilitytodesignateandtosuitablyadaptthetargetaslearningprogresses.Thedesignercanspecifythegeneralformofthislearningtargetaspartofthescaffoldforalearningalgorithm.Moretraditional,fixed-targetlearningalgorithmscanthenberepurposedassubroutinesentmayusetoachieveitsowngoals.Weintroduceinthisdifwork,spanningmulti-armedbanditproblemsallthewaytodeepreinforcementlearning,toaddressafundamentalquestion:Howshouldanagentinteractingwithacomplexenvironmentdecidewhattolearn?1.3OrganizationWebeginthisdissertationbyprovidinganintroductiontoinformationtheory[Shannon,1948,CoverandThomas,2012],thefielddedicatedtothestudyofcommunicationandcompression,whichoffersasuiteoftoolsparamounttothepresentedwork.Itisperhapsintuitivethattheabilitytomeasureandquantifyinformationisusefulinthecontextofexplorationwhenanagentmustbeabletodistinguishbetweenwhatisknownandwhatisnovel.Fordealingwithexplorationincomplexenvironmentsspecifically,wefindthattoolsfromrate-distortiontheorygearedtowardslossycompressionprovideacrispintuitionandmathematically-preciselanguageforbalancingthesimplicityofwhatanagentaspirestolearnagainstitsutility.InChapter3,werestrictourfocustocomplexbanditenvironments,aquintessentialsettingforstudyingtheexplorationchallengeinisolationwithouttheadditionalburdensofgeneralizationandCHAPTER1.INTRODUCTION3creditassignment.WeintroducetwobanditalgorithmsthatextendclassicThompsonSamplingbutcanbeparameterizedtoaccountforcomplexenvironments,wherethedefaultapproachofexploringtolearnanoptimalactionisdemonstrablyineffective.Weprovideatheoreticalanalysiswithinformation-theoreticBayesianregretboundsthatgeneralizeexistingresultsthroughtheuseofrate-distortiontheory.Additionally,weoffercomplementaryempiricalresultswhichverifytheefficacyofouragentdesigninretainingthescalabilityofThompsonSamplingwhilemorecloselyapproximatingtheperformanceoftheBayes-optimalpolicy.InChapter4,weextendtheinsightsdevelopedforexplorationincomplexbanditenvironmentstothefullreinforcement-learningproblem.WebeginbyintroducinganovelextensionofthePosteriorSamplingforReinforcementLearningalgorithm,whichcanbecoupledwithrate-distortiontheorytoorientexplorationaroundlossysurrogatemodelsthateschewdetailsofthetrueenvironmentinordertoimprovedataefficiencyatthecostofnear-optimality.Weprovideaninformation-theoreticregretanalysisforouralgorithm,parallelingthetheoreticalguaranteesestablishedinthemulti-armedbanditsetting,beforeturningourattentiontothedesignofascalabledeepreinforcement-learningagent.Byamortizingmultiplelossycompressionproblemsacrossthestatespace,ouragentleveragenon-linearfunctionapproximationwhilebeingabletoinduceabroadspectrumofbehaviorsachievingvaryingdegreesofnear-optimality.1.Chapter3.2:DilipArumugam,MarkK.Ho,NoahD.Goodman,andBenjaminVanRoy."BayesianReinforcementLearningwithLimitedCognitiveLoad."OpenMind8(2024):2.Chapter3.3:DilipArumugam,andBenjaminVanRoy."DecidingWhattoLearn:ARate-DistortionApproach."InInternationalConferenceonMachineLearning,pp.373-382.PMLR,2021.3.Chapter4.1:DilipArumugamandBenjaminVanRoy."DecidingWhattoModel:Value-EquivalentSamplingforReinforcementLearning."AdvancesinNeuralInformationProcessing4.Chapter4.2:DilipArumugam,SaurabhKumar,RamkiGummadi,andBenjaminVanRoy."SatisficingExplorationforDeepReinforcementLearning."Underreview(2024).OtherpublicationsIhaveworkedonduringmyPhDwhichhavenotbeencoveredwithinthisdissertationinclude:1.DavidAbel,DilipArumugam,KavoshAsadi,YuuJinnai,MichaelL.Littman,andLawsonLSWong."StateAbstractionasCompressioninApprenticeshipLearning."InProceedingsoftheAAAIConferenceonArtificialIntelligence,vol.33,no.01,pp.3134-3142.2019.CHAPTER1.INTRODUCTION42.DavidAbel,NateUmbanhowar,KhimyaKhetarpal,DilipArumugam,DoinaPrecup,andMichaelLittman."ValuePreservingState-ActionAbstractions."InInternationalConferenceonArtificialIntelligenceandStatistics,pp.1639-1650.PMLR,2020.3.DilipArumugamandBenjaminVanRoy."RandomizedValueFunctionsviaPosteriorState-AbstractionSampling."NeurIPSWorkshoponBiological&ArtificialIntelligence(2020).4.DilipArumugam,PeterHenderson,andPierre-LucBacon."AnInformation-TheoreticPer-spectiveonCreditAssignmentinReinforcementLearning."NeurIPSWorkshoponBiological&ArtificialIntelligence(2020).5.AidanCurtis,MinjianXin,DilipArumugam,KevinFeigelis,andDanielYamins."FlexibleandEfficientLong-RangePlanningThroughCuriousExploration."InInternationalConferenceonMachineLearning,pp.2238-2249.PMLR,2020.6.DavidAbel,CameronAllen,DilipArumugam,D.EllisHershkowitz,MichaelL.Littman,andLawsonLSWong."Bad-PolicyDensity:AMeasureofReinforcementLearningHardness."ICMLWorkshoponReinforcementLearningTheory(2021).7.RoseE.Wang,JesseMu,DilipArumugam,NatashaJaques,andNoahGoodman."IntheZONE:MeasuringDifficultyandProgressioninCurriculumGeneration."NeurIPSDeepReinforcementLearningWorkshop,2022.8.DilipArumugamandSatinderSingh."PlanningtotheInformationHorizonofBAMDPsviaEpistemicStateAbstraction."AdvancesinNeuralInformationProcessingSystems35(2022):9.BenPrystawski,DilipArumugam,andNoahGoodman."CulturalReinforcementLearning:AFrameworkforModelingCumulativeCultureonaLimitedChannel."InProceedingsofthe10.Jan-PhilippFr?nken,SamKwok,PeixuanYe,KanishkGandhi,DilipArumugam,JaredMoore,AlexTamkin,TobiasGerstenberg,andNoahD.Goodman."SocialContractAI:AligningAIAssistantswithImplicitGroupNorms."NeurIPSWorkshoponSociallyRespon-1.4ContributionsWesummarizetheessentialcontributionsofthisdissertationasfollows:CHAPTER1.INTRODUCTION5?Wechallengethedefaultassumptioninthesequentialdecision-makingliteraturethatexplo-rationisalwaysconductedwithrespecttooptimalbehavior.Wehighlighthowcomplexen-vironmentsthatencapsulatereal-worlddecision-makingsystemscannotalwayspresumethatoptimalbehaviorisatractablelearninggoal.?Wedemonstratehowtoolsfromrate-distortiontheoryforlossycompressionproblemscanbebroughttobearonthemoregeneralexplorationproblemwithincomplexenvironments.Thefundamentalrate-distortionfunctiongivesamathematically-precisecharacterizationforhowanagentshouldattempttobalancethesimplicityofwhatitaimstolearnagainsttheutilityofthatinformationwithrespecttothetaskathand.?Weelucidatehowtobridgethelearningtargetscomputedviarate-distortiontheorywiththeactionselectionsmadebyanagentovertime.Ourtheoreticalanalysesclarifyhowtheseideasfrominformationtheoryyieldprincipled,statistically-efficientdecision-makingagentsforbothmulti-armedbanditsandreinforcement-learningproblems.?Weprovideanempiricalanalysiswhichillustratesthattheseprincipledideasfrominformation-theorycanalsogiverisetopractical,scalableagentsthatremainamenabletoimplementationinreal-worlddecision-makingsystems.6Inthissection,weoutlineournotationaswellasprovidebriefbackgroundoninformationtheoryandrate-distortiontheory.Allrandomvariablesaredefinedonaprobabilityspace(?,F,P).ForanyrandomvariableX:?→Xtakingvaluesonthemeasurablespace(X,X),weuseσ(X)纟{X?1(A)|A∈X}?Ftodenotetheσ-algebrageneratedbyX.ForanynaturalnumberN∈N,wedenotetheindexsetas[N]纟{1,2,...,N}.ForanyarbitrarysetX,?(X)denotesthesetofallprobabilitydistributionswithsupportonX.ForanytwoarbitrarysetsXandY,wedenotetheclassofallfunctionsmappingfromXtoYas{X→Y}纟{f|f:X→Y}.Whileourexpositionthroughoutthisdissertationwillconsistentlyrefertobitsofinformation,itwillbeusefulforthepurposesofanalysisthatalllogarithmsbeinbasee.2.1InformationTheoryHereweintroducevariousconceptsinprobabilitytheoryandinformationtheoryusedthroughoutthispaper.Weencouragereaderstoconsult[CoverandThomas,2012,Gray,2011,Duchi,2021,PolyanskiyandWu,2022]formorebackground.WedefinethemutualinformationbetweenanytworandomvariablesX,YthroughtheKullback-Leibler(KL)divergence:=DKL,Radon-NikodymderivativeofPwithrespecttoQ.AnanalogousdefinitionofconditionalmutualCHAPTER2.PRELIMINARIES7informationholdsthroughtheexpectedKL-divergenceforanythreerandomvariablesX,Y,Z:I(X;Y|Z)=E[DKL(P((X,Y)∈·|Z)||P(X∈·|Z)×P(Y∈·|Z))].Withthesedefinitionsinhand,wemaydefinetheentropyandconditionalentropyforanytworandomvariablesX,YasH(X)=I(X;X)H(Y|X)=H(Y)?I(X;Y).ThisyieldsthefollowingidentitiesformutualinformationandconditionalmutualinformationforanythreearbitraryrandomvariablesX,Y,andZ:I(X;Y)=H(X)?H(X|Y)=H(Y)?H(Y|X),I(X;Y|Z)=H(X|Z)?H(X|Y,Z)

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論