基于元學(xué)習(xí)和對(duì)稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第1頁
基于元學(xué)習(xí)和對(duì)稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第2頁
基于元學(xué)習(xí)和對(duì)稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第3頁
基于元學(xué)習(xí)和對(duì)稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第4頁
基于元學(xué)習(xí)和對(duì)稱性的數(shù)據(jù)高效深度學(xué)習(xí)探索 Towards data-efficient deep learning with meta-learning and symmetries_第5頁
已閱讀5頁,還剩225頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

TowardsData-EfficientDeepLearningwithMeta-LearningandSymmetries

JinXu

BalliolCollege

UniversityofOxford

AthesissubmittedforthedegreeofDoctorofPhilosophyinStatistics

Trinity2023

2

Acknowledgements

Firstandforemost,Iwanttoexpressmydeepgratitudetomysupervisors,Prof.Yee

WhyeTehandDr.TomRainforth.Theirunwaveringsupport,carefulguidance,andconstantinspirationhavebeeninvaluablethroughoutmyPhDjourney.Ithasbeenaprivilegetobementoredbythem,whoIregardasresearchrolemodels.Theirdepthandbreadthofknowledgehavebeenbothhumblingandenlightening.SpecialacknowledgementgoestoYeeWhye,whohasalwaysbeenconsiderateandreadytohelpintoughtimes.MyheartfeltthanksgotoTomforhisguidanceduringthechallengingtimesbroughtonbythepandemic.

IwouldliketoextendmygratitudetoallmycollaboratorsHyunjikKim,Jean-FrancoisTon,AdamKosiorek,EmilienDupont,andKasparM?rtens.TheirexpertiseandfeedbackhavebeencrucialinimprovingmyworkandIlearnagreatdealfromthem.AbigthankyoutoProf.RyanAdamsfromPrincetonUniversityandtomyinternshiphosts,JamesHensmanandMaxCrociatMicrosoftResearch.TheirmentorshipoutsideofmyPhDlifehasbeenanindispensablepartofmyresearchexperience.

Moreover,Ifeelextremelyfortunatetobesurroundedbyamazingandcaringfriendswhosenamesarenotpossibletoenumeratehere.AmongthemareEmilienDupont,Jean-FrancoisTon,CharlineLeLan,BobbyHe,SheheryarZaidi,QinyiZhang,GuneetDhillon,AndrewCampbell,ChrisWilliams,CarloAlfano,FaaizTaufiq,AnnaMenacherandothersfromourlovelyoffice1.17,HanwenXing,YanzhaoYang,NingMiao,ChaoZhang,Yutonglu,YixuanHe,XiLin,YuanZhou,FanWu,BohaoYaofromthedepartmentofstatistics,DunhongJin,SihanZhou,SijiaYao,HuiningYang,KevinWang,NataliaHong,HangYuan,KangningZhang,ChengyangWangandmanyothersfromotherdepartmentsatOxford,DenizOktay,SulinLiu,JennyZhanandothersfromPrincetonUniversity,internshippeersatMicrosoftResearchincludingAlexanderMeulemans,SalehAshkboosfromETH.

Aspecialthankstoalluniversityanddepartmentstaff,especiallyChrisCullenforhiskindandpatientsupportduringdifficulttimes,andtoJoannaStoneham,Stuart

3

McRobert,andotherswhoensuredasmoothPhDexperience.

Finally,aboveall,mydeepestthanksgotoYifanYuforherloveandcompanionship.SheimmenselyenrichedmytimeinOxford,bringingcolourandjoytomylife.Additionally,IameternallygratefultomyparentsChengxiangXuandFengChenforgivingmethefreedomtopursuemypassionsandfortheirunquestioningsupportthroughoutthisjourney.

4

Abstract

Recentadvancesindeeplearninghavebeensignificantlypropelledbytheincreasingavailabilityofdataandcomputationalresources.Whiletheabundanceofdataenablesmodelstoperformwellincertaindomains,therearereal-worldapplications,suchasinthemedicalfield,wherethedataisscarceordifficulttocollect.Furthermore,therearealsoscenarioswherethelargedatasetisbetterviewedaslotsofrelatedsmalldatasets,andthedatabecomesinsufficientforthetaskassociatedwithoneofthesmalldatasets.Itisalsonoteworthythathumanintelligenceoftenrequiresonlyahandfulofexamplestoperformwellonnewtasks,emphasizingtheimportanceofdesigningdata-efficientAIsystems.Thisthesisdelvesintotwostrategiestoaddressthischallenge:meta-learningandsymmetries.Meta-learningapproachesthedata-richenvironmentasacollectionofmanysmall,individualdatasets.Eachofthesesmalldatasetsrepresentsadistincttask,yetthereisunderlyingsharedknowledgebetweenthem.Harnessingthissharedknowledgeallowsforthedesignoflearningalgorithmsthatcanefficientlyaddressnewtaskswithinsimilardomains.Incomparison,symmetryisaformofdirectpriorknowledge.Byensuringthatmodels’predictionsremainconsistentdespiteanytransformationtotheirinputs,thesemodelsenjoybettersampleefficiencyandgeneralization.

Inthesubsequentchapters,wepresentnoveltechniquesandmodelswhichallaimatimprovingthedataefficiencyofdeeplearningsystems.Firstly,wedemonstratethesuccessofencoder-decoderstylemeta-learningmethodsbasedonConditionalNeuralProcesses(cnps).Secondly,weintroduceanewclassofexpressivemeta-learnedstochasticprocessmodelswhichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Finally,weproposegroupequivariantsubsampling/upsamplinglayerswhichtacklesthelossofequivarianceinconventionalsubsampling/upsamplinglayers.Theselayerscanbeusedtoconstructend-to-endequivariantmodelswithimproveddata-efficiency.

i

Contents

1Introduction

1

1.1Motivation

1

1.2Thesisoutline

3

1.3Papers

4

2Background

6

2.1Meta-learning

6

2.1.1Conventionalsupervisedlearningandmeta-learning

6

2.1.2Differentviewsofmeta-learning

8

2.1.3Commonapproachestometa-learning

10

2.2Neuralprocesses

11

2.2.1Stochasticprocesses

12

2.2.2Neuralprocessesasstochasticprocesses

12

2.2.3Neuralprocesstrainingobjectives

13

2.2.4Ameta-learningperspective

14

2.3Symmetriesindeeplearning

15

2.3.1Group,cosetandquotientspace

15

2.3.2Grouphomomorphism,groupactionsandgroupequivariance

.16

2.3.3Homogeneousspacesandliftingfeaturemaps

16

2.3.4FeaturemapsinG-CNNs

17

2.3.5Groupequivariantneuralnetworks

18

3MetaFun:Meta-LearningwithIterativeFunctionalUpdates

20

3.1Introduction

20

3.2MetaFun

22

3.2.1Learningfunctionaltaskrepresentation

23

3.2.2MetaFunforregressionandclassification

26

3.3Relatedwork

27

ii

3.4Experiments

31

3.4.11-Dfunctionregression

31

3.4.2Classification:miniImageNetandtieredImageNet

33

3.4.3Ablationstudy

36

3.5Conclusionsandfuturework

37

3.6Supplementarymaterials

38

3.6.1Functionalgradientdescent

38

ReproducingkernelHilbertspace

38

Functionalgradients

39

Functionalgradientdescent

40

3.6.2Experimentaldetails

40

4DeepStochasticProcessesviaFunctionalMarkovTransitionOpera-

tors

44

4.1Introduction

44

4.2Background

46

4.3Markovneuralprocesses

47

4.3.1AmoregeneralformofNeuralProcessdensityfunctions

47

4.3.2Markovchainsinfunctionspace

48

4.3.3Parameterisation,inferenceandtraining

49

4.4Relatedwork

52

4.5Experiments

54

4.5.11Dfunctionregression

54

4.5.2Contextualbandits

55

4.5.3Geologicalinference

56

4.6Discussion

58

4.7Supplementarymaterials

59

4.7.1Proofs

59

60

4.7.2Implementationdetails

63

4.7.3Data

63

Modelarchitecturesandhyperparameters

65

Computationalcostsandresources

66

4.7.4Broaderimpacts

67

iii

5GroupEquivariantSubsampling

68

5.1Introduction

68

5.2Equivariantsubsamplingandupsampling

70

5.2.1TranslationequivariantsubsamplingforCNNs

70

5.2.2Groupequivariantsubsamplingandupsampling

72

5.2.3ConstructingΦ

75

5.3Application:Groupequivariantautoencoders

75

5.4Relatedwork

77

5.5Experiments

79

5.5.1Basicproperties:Equivariance,disentanglementandout-of-

distributiongeneralization

80

5.5.2Singleobject

81

5.5.3Multipleobjects

82

5.6Conclusions,limitationsandfuturework

83

5.7Supplementarymaterials

84

5.7.1Equivariantsubsamplingandupsampling

84

ConstructingΦ

84

Multiplesubsamplinglayers

85

5.7.2Groupequivariantautoencoders

87

5.7.3Proofs

88

5.7.4Implementationdetails

93

Data

93

Modelarchitectures

94

Hyperparameters

95

Computationalresources

95

6ConclusionsandFutureOutlook

96

Bibliography

99

1

Chapter1

Introduction

1.1Motivation

Recentbreakthroughsindeeplearningcanbelargelyattributedtothevastamountofdataavailableandtheadvancementofcomputationalresources[

Dengetal.,

2009,

Rainaetal.,

2009,

Silveretal.,

2016,

Jumperetal.,

2021,

Brownetal.,

2020a]

.Whiletrainingonlargedatasetsenablesdeeplearningmodelstoexcelincertaintasks,manyreal-worldapplicationsonlyprovidelimiteddataforaspecifictask.Forinstance,inmedicalfields,obtainingdata,especiallyforrarediseases,ischallengingandoftenexpensive.Indrugdevelopmentorrecommendationsystems,therewillalwaysbeinsufficientdatafornewdrugs/users,eventhoughabundantdataexistsforotherdrugsorusers.Therefore,toapplydeeplearningtothesefields,itisvitaltodevelopsystemsthataredata-efficient.Moreover,foradvancedAIsystems,data-efficiencycanbeacrucialingredient:Firstly,AIsystemsshouldbeabletogeneralizebeyondspecificdatadistributionswithoutrelyingondata;forinstance,animagerecognitionsystemshouldrecognizeobjectsregardlessoftheirpositionororientation.Secondly,humanintelligencecanoftensolvenewtaskswithjustafewexamples.Thus,forAItoemulatehuman-likeintelligence,itshouldalsohavesuchcapability.

FromaBayesianperspective,learninginvolvesupdatingourbeliefsaboutamodel(representedbyθ)giventhedata,i.e.p(θ|Ddata).Foramodeltolearnefficientlyfromasmallamountofdata,it’simportanttostartwithagoodinitialguessor"prior"p(θ).Inthispaper,welookattwodirectionstoobtainsuchpriorfordata-efficientlearning:Thefirstismeta-learning,whichlearnstheprior(orthesharedknowledge)from

2

similartasks.Itcanbeunderstoodas"learningtolearnmoreefficiently".Thesecondissymmetriesindeeplearning,whichservesasaknownpriorforcertainproblems.Symmetry,afundamentalconceptinphysics,representsaformofpriorknowledgethatisubiquitouslyobservedthroughoutourphysicalworld.

Meta-learningtacklesaspecificscenarioinwhichthevastpoolofdatacanbeviewedasmanysmalldatasets,eachrepresentingadistincttask.Yet,thesetaskscontainunderlyingsharedknowledgethatcanbeharnessedtoaddressnewtaskswithinthesamecategory.Thisscenarioisprevalentinmanyapplications.Take,forinstance,anonlineretailcompanywithdatafromcustomersworldwide.Thedataassociatedwitheachuseristypicallysparse.Inthiscontext,predictingbehavioursforeachuserconstitutesanindividualtask,butpatternsamongdifferentusersoftenexhibitsimilarities.Meta-learningalgorithmsaredesignedtohandlesuchcircumstances.Thegoalofmeta-learningistolearndata-efficientlearningalgorithmsthatcanlaterbeappliedtoaparticulartask.Thetrainingdataformeta-learningcomprisesnumerousrelatedtasks,eachwithalimitedsetofdatapoints.Afterthemeta-learningphase,thelearnedlearningalgorithmscansolveanewtaskinadata-efficientmanner.Incontrast,theaimofconventionalsupervisedlearningisjusttolearnapredictivemodel.

Meta-learningproblemscanbetackledfromvariousperspectives,andtheseap-proachescanbeunderstoodthroughdifferentviewpointssuchasoptimization-basedap-proaches[

RaviandLarochelle,

2016,

Finnetal.,

2017a

],metric-basedapproaches[

Koch,

2015

,

Vinyalsetal.,

2016,

Sungetal.,

2018,

Snelletal.,

2017],andmodel-based

approaches[

Santoroetal.,

2016,

Mishraetal.,

2018,

Garneloetal.,

2018a

],amongothers.Notethattheseviewsarenotexclusive.Forexample,methodssuchasprototypicalNetworks[

Snelletal.,

2017

],MAML[

Finnetal.,

2017a

],ML-PIP[

Gordon

etal.

,

2018

]etc.canbereformulatedunderamodel-basedframeworkthatusesanencoder-decodersetup.Inthissetup,theencoderproducesataskrepresentationusingtrainingdata,andthedecoderthenmakespredictionsbasedonthetaskrep-resentation.Theseapproachestransformthemeta-learningchallengetoresemblearegularlearningprobleminvolvingsequences,anditisalsomorecomputationallyefficientifnogradientcomputationisinvolvedinboththeencoderandthedecoderlikecnp-typemodels[

Garneloetal.,

2018a]

.OurstudyinChapter

3

explicitlyadoptsthisencoder-decoderframeworkformeta-learning.Byusingafunctionaltaskrepresentation,anditerativelyupdatingtherepresentationdirectlyinfunctionspace,

3

wedemonstratethatencoder-decoderapproacheswithoutgradientinformationcanalsobecompetitivewithotherapproaches,whichhasnotbeenshownbefore.

Furthermore,becausetrainingdataforeachtaskinmeta-learningisoftenlimited,uncertaintyestimationbecomescrucial.StochasticProcesses(sps)(e.g.GaussianProcesses(gps))canbeusedtomakepredictionswithuncertaintyestimation.Thus,learningtheseprocessescanbeseenasawaytoapproachmeta-learningwithuncer-taintyinmind.InChapter

4

,weproposeanewframeworktoconstructexpressiveneuralparameterisedspsbyparameterisingMarkovtransitionsinfunctionspace.

Unlikemeta-learningabove,whichdiscoverssharedknowledgefromrelatedtasks,symmetryservesasadirectformofpriororinductivebias,integratedintodeeplearningmodelswithouttheneedforpre-training.Symmetriesrefertotransformationsthatmaintaincertainpropertiesofanobjectofinterestunchanged.Theseincludetransformationssuchasimagetranslation,rotation,orpermutationofsetelements.Byincorporatingthesesymmetriesintodeeplearningmodels,ensuringthattheoutputsremainconsistent(thesameorundergothecorrespondingtransformation)despiteinputtransformations,themodelinherentlygeneralizestotransformedinputs.Consequently,deeplearningmodelsequippedwiththesesymmetriesnotonlybecomemoredata-efficientbutalsogeneralizebetter.AsimpleexampleofthisisConvolutinalNeuralNetworks(cnns),whichareinvarianttoinputtranslationsforclassificationtasks,andperformsignificantlybettercomparedtoplainfeed-forwardnetworks.Earlierresearchhasintroducedmanymethodstobuildconvolutional[

Cohenand

Welling,

2016,

2017,

Cohenetal.,

2019]andattentionblocks[Hutchinsonetal.,

2021,

Fuchsetal.,

2020

]thatareequivariantw.r.t.tovarioussymmetries.However,thepoolinglayersorsubsampling/upsamplinglayerscommonlyusedinvariousdeeplearningarchitecturesbreakthesesymmetries[

Zhang,

2019]

.InChapter

5,wepresent

groupequivariantsubsampling/upsamplinglayersthathaveexactequivariance.

1.2Thesisoutline

InChapter

2

,weprovideashortintroductiontometa-learning,neuralprocessesandsymmetriesindeeplearning,tosetthestageforlaterchapters.

InChapter

3

,weintroduceaniterativefunctionalencoder-decodermethodforsu-pervisedmeta-learning,whichisbasedonNeuralProcesses(nps)[

Garneloetal.,

4

2018a

,b]

.Onstandardfew-shotclassificationbenchmarkslikeminiImageNetandtieredImageNet,itisdemonstratedthatmeta-learningmethodsbasedontheneuralprocessfamilycanbecompetitiveorevenoutperformgradient-basedmethodssuchasMAML[

Finnetal.,

2017a

]andLEO[

Rusuetal.,

2019]

.

InChapter

4

,weintroduceMarkovNeuralProcesses(MNPs),anewclassofStochasticProcesses(SPs)whichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Therefore,theproposediterativeconstructionaddssubstantialflexibilityandexpressivitytotheoriginalframeworkofNeuralProcesses(NPs)withoutcompromisingconsistencyoraddingrestrictions.OurexperimentsdemonstrateclearadvantagesofMNPsoverbaselinemodelsonavarietyoftasks.It’snoteworthythatspmodelscanbeviewedthroughameta-learninglens.Sotheproposedmethodcanalsobeseenasameta-learningapproachwithprincipleduncertaintyestimation.

Chapter

5

,wefirstintroducetranslationequivariantsubsampling/upsamplinglayersthatcanbeusedtoconstructexacttranslationequivariantCNNs.Wethengeneralisetheselayersbeyondtranslationstogeneralgroups,thusproposinggroupequivariantsubsampling/upsampling.Weusetheselayerstoconstructgroupequivariantautoen-coders(GAEs)thatallowustolearnlow-dimensionalequivariantrepresentations.Weempiricallyverifyonimagesthattherepresentationsareindeedequivarianttoinputtranslationsandrotations,andthusgeneralisewelltounseenpositionsandorienta-tions.WefurtheruseGAEsinmodelsthatlearnobject-centricrepresentationsonmulti-objectdatasets,andshowimproveddataefficiencyanddecompositioncomparedtonon-equivariantbaselines.

InChapter

6

,wesummarizeourfindingsandexplorepotentialavenuesforfutureresearchtofurtheradvancethefield.

1.3Papers

Thisisanintegratedthesisandincludesthefollowingpublishedpapers:Chapter3contains:

Xu,J.,Ton,J.F.,Kim,H.,Kosiorek,A.,&Teh,Y.W.Metafun:Meta-

5

learningwithiterativefunctionalupdates.InternationalConferenceon

MachineLearning(ICML),2020[

Xuetal.,

2020]

Chapter4contains:

Xu,J.,Kim,H.,Rainforth,T.,&Teh,Y.(2021).Groupequivariantsub-sampling.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2021[

Xuetal.,

2021]

Chapter5contains

Xu,J.,Dupont,E.,M?rtens,K.,Rainforth,T.,&Teh,Y.W.(2023).DeepStochasticProcessesviaFunctionalMarkovTransitionOperators.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2023[

Xu

etal.

,

2023]

6

Chapter2

Background

2.1Meta-learning

2.1.1Conventionalsupervisedlearningandmeta-learning

Inconventionalsupervisedlearning,theobjectiveistolearnafunctionfthatmapsaninputfeaturevectorx∈Xtoanoutputlabely∈Y.Learningisbasedonexampleinput-outputpairsinatrainingsetDtrain={(xi,yi.Commontypesofsupervisedlearningtasksincluderegressionwhereoutputlabelsarereal-valued,andclassificationwheretheoutputlabelsrepresentdifferentclasses.Thefunctionf,oftenreferredto

asthepredictivemodel,isamemberofahypothesisclass,H:={f|f(x;?),?∈Rdφ}.

Foreachtask,thereisariskfunction?(y,f(x))whichmeasurespredictionerror.Asanexample,inthecontextofaregressiontask,?oftentakestheformofasquarederror,?(y,f(x))=(y?f(x))2.Thetrainingprocessofthemodelftranslatestosolvinganoptimizationproblemdefinedasfollows:

ItiscalledempiricalriskminimizationbecausethisobjectiveisanestimationofthepopulationriskE(xi,yi)~p(x,y)[?(yi,f(xi))]basedontheempiricaldistributionoftrainingdata.

7

Aftertraining,themodelshouldgeneralizeeffectivelywhenpresentedwithatestset,denotedasDtest={(xi,yim+1.Themodel’sperformancecanbeassessedusing

thetestrisk(f;Dtest)whichservesasanestimateoftheoverallpopulationrisk

usingunseendata.

Figure2.1:Dataforameta-classificationproblem.Boththemeta-trainingandmeta-testsetsconsistoftasks(redrectangles)andarepresumedtocomefromthesametaskdistributionp(T).Eachofthesetasksencompassesitsowntask-specifictrainingandtestsets,whicharecommonlyreferredtoasthecontext(yellowlabels)andthetarget(greylabels)respectively.

Inpractice,itiscommontohavescenarioswherelotsofsupervisedlearningtasksarerelatedtoeachother,yetthenumberofdatapointsforeachindividualtaskislimited.Meta-learningemergesasanewlearningparadigmtoaddresssuchchallenges.

Specifically,wehaveameta-trainingsetdefinedasMtrain={(Dt(a)in,Dt(s)t,?(j)

andameta-testsetgivenbyMtest={(Dt(a)in,Dt(s)t,?(j)M+1.Eachelementinthese

meta-datasetsisatupleconsistingofatrainingset(calledthecontext),atestset(calledthetarget)andariskfunction(typicallythesamewithinameta-dataset).This3-tuplecharacterizesataskTj(seeFigure

2.1

illustration).Insupervisedlearning,weusetrainingdatatotrainapredictivemodel,hopingitcangeneralizeacrosstheentiredatadistribution.Inmeta-learning,theassumptionisthatthereisacommontaskdistribution,denotedasp(T),fromwhichboththemeta-trainingsetandthemeta-testsetaredrawn.Meta-learningalgorithmsaimtousemeta-trainingdatatodiscoverlearningalgorithmsthatcangeneralizeacrosstheentiretaskdistribution.

Morespecifically,alearningalgorithmforasupervisedlearningtasktakesinatraining

8

setDtrain,ariskfunction?andoutputsapredictivemodel,writtenas:

=ΦA(chǔ)LGO(Dtrain,?).(2.2)

Since?isusuallyfixed,wewillomitthedependencyonitinsubsequentdiscussions.Foraparticulartask,thelearningalgorithmΦA(chǔ)LGOcanbeevaluatedbythetestriskofthelearnedpredictivemodel,denotedas:

(;Dtest).(2.3)

Meta-learningfindsalearningalgorithmbasedontasksfromthemeta-trainingsetMtrain,sothatthislearningalgorithmcanbemoreefficientlyappliedtonewtasks,andgeneralizesacrossthetaskdistributionp(T).Themeta-learningalgorithmcanberepresentedas:

ΦA(chǔ)LGO=MetaAlgo(Mtrain).(2.4)

Toevaluatethemeta-learningalgorithm,wecancompute:

Whileitresemblesthetestlossinsupervisedlearning,theaggregatedtestriskforataskreplacesthetraditionalriskfunctionforadatapoint.

Itisworthnotingthatwhilewefocusonsupervisedlearningtaskshere,meta-learningcanbeextendedtounsupervisedlearning[

EdwardsandStorkey,

2016,

Reedetal.,

2018

,

Hsuetal.,

2018]orreinforcementlearning[

Wangetal.,

2016,

Finnetal.,

2017a

,b]

.

2.1.2Differentviewsofmeta-learning

Bi-leveloptimizationviewLetusassumeboththepredictivemodelfandthelearningalgorithmΦA(chǔ)LGOcanbeparameterised,andtheparametersaredenotedas?andθaccordingly.Thatistosay,thelearningalgorithmcanbewrittenas:

?=ΦA(chǔ)LGO(Dtrain;θ).(2.6)

9

Meta-learningcanbeformulatedasthefollowingbi-leveloptimizationproblem:

wheretask-specificparameter?jdependsonθthroughtheinner-loopoptimization:

?j(θ)=ΦA(chǔ)LGO(Dt(a)in;θ)(2.8)

Manymeta-learningalgorithmsaredevelopedbasedonthisbi-leveloptimizationview,suchas

Finnetal.

[2017a],

Nicholetal.

[2018],

RaviandLarochelle

[2016]

.

HierarchicalmodelviewFromaprobabilisticperspective,thegenerativeprocessforeachtaskTjcanbeexpressedas:

θ~p(θ),?j~p(?j|θ),yi(j)~p(yi(j)|xi(j)?j,θ)(2.9)

BoththetrainingsetDt(a)inandthetestsetDt(s)tfollowthesamedistribution(as

illustratedinFigure

2.2

).Thiscanbeseenasaprobabilistichierarchicalmodelwhereθindicatesthehigh-levelglobalparametersforalltasksand?jdenotesthelow-levellocalparametersforeachtask.Inthiscontext,meta-learningisaboutinferringθfromlotsoftasksinthemeta-trainingset,thatisp(θ|Mtrain).Learning,ontheother

hand,infers?jgiventhetrainingsetDt(a)infortaskTj,thatisp(?j|θ,Dt(a)in).

(j)i

j=1,...

Figure2.2:Meta-learningashierarchicalmodels(AremakeofFigure1in

Gordon

etal.

[2018])

.Task-specificparameter?jdependsontheglobalparameterθ.Datapointsinboththecontextandthetargethavethesamegenerativeprocess,whichdependonbothθand?j.

Notethatp(?j|θ)canbeseenasapriorfortaskTjconditionedonθ.Therefore,meta-learningcanbeseenaslearninganempiricalpriorfromthemeta-trainingset.

Finnetal.

[2018],

Requeimaetal.

[2019]adoptsthisview

.

10

Model-basedviewAlearningalgorithmf=ΦA(chǔ)LGO(Dtrain)canbeseenasafunctionthattakesintheentiretrainingsetandoutputsapredictivemodel.ThemodelisthenusedtomakepredictionsontestdatainDtest.Thelearningandpredictionprocessescanthusbeconceptualizedassequence-to-sequencemappings.Forthesakeofbrevity,let’suseaconcisenotationfordatasequences,suchasx1:n={x1,x2,...,xn}.ForaspecifictaskTj,makingpredictionsfortestsetdatapointsbasedonthosefromthetrainingsetcanbedescribedasthefollowinginferencetask

p(ym+1:n|xm+1:n,x1:m,y1:m).(2.10)

Fromthisperspective,meta-learningisaboutcreatingthisconditionalmodel.Meta-learningonlydiffersfromconventionalsupervisedlearninginthatboththeinp

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論