學習人工智能的深度架構

上傳人：媚*** IP屬地：境外上傳時間：2024-04-23 格式：DOCX 頁數(shù)：35 大小：275.71KB 積分：12 舉報 版權申訴

已閱讀5頁，還剩30頁未讀，繼續(xù)免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內容提供方，若內容存在侵權，請進行舉報或認領

文檔簡介

Fulltextavailableat:

/10.1561/2200000006

LearningDeepArchitecturesforAI

YoshuaBengio

Dept.IRO,Universit′edeMontr′eal

C.P.6128,Montreal,Qc

Canada

yoshua.b

engio@umontreal.ca

Boston–Delft

FoundationsandTrendsQRinMachineLearning

Published,soldanddistributedby:

nowPublishersInc.

POBox1024

Hanover,MA02339USA

Tel.+1-781-985-4510

sales@

OutsideNorthAmerica:

nowPublishersInc.

POBox179

2600ADDelftTheNetherlands

Tel.+31-6-51115274

ThepreferredcitationforthispublicationisY.Bengio,LearningDeepArchitectures

forAI,FoundationandTrendsQRinMachineLearning,vol2,no1,pp1–127,2009

ISBN:978-1-60198-294-0

Qc2009Y.Bengio

Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,mechanical,photocopying,recordingorotherwise,withoutpriorwrittenpermissionofthepublishers.

Photocopying.IntheUSA:ThisjournalisregisteredattheCopyrightClearanceCen-ter,Inc.,222RosewoodDrive,Danvers,MA01923.Authorizationtophotocopyitemsforinternalorpersonaluse,ortheinternalorpersonaluseofspecificclients,isgrantedbynowPublishersInc.forusersregisteredwiththeCopyrightClearanceCenter(CCC).The‘services’foruserscanbefoundontheinternetat:

Forthoseorganizationsthathavebeengrantedaphotocopylicense,aseparatesystemofpaymenthasbeenarranged.Authorizationdoesnotextendtootherkindsofcopy-ing,suchasthatforgeneraldistribution,foradvertisingorpromotionalpurposes,forcreatingnewcollectiveworks,orforresale.Intherestoftheworld:Permissiontopho-tocopymustbeobtainedfromthecopyrightowner.PleaseapplytonowPublishersInc.,POBox1024,Hanover,MA02339,USA;Tel.+1-781-871-0245;;

sales@

nowPublishersInc.hasanexclusivelicensetopublishthismaterialworldwide.Permissiontousethiscontentmustbeobtainedfromthecopyrightlicenseholder.PleaseapplytonowPublishers,POBox179,2600ADDelft,TheNetherlands,;e-mail:

sales@

FoundationsandTrendsQRinMachineLearningVolume2Issue1,2009

EditorialBoard

Editor-in-Chief:

MichaelJordan

DepartmentofElectricalEngineeringandComputerScienceDepartmentofStatistics

UniversityofCalifornia,BerkeleyBerkeley,CA94720-1776

Editors

PeterBartlett(UCBerkeley)

YoshuaBengio(Universit′edeMontr′eal)AvrimBlum(CarnegieMellonUniversity)CraigBoutilier(UniversityofToronto)StephenBoyd(StanfordUniversity)

CarlaBrodley(TuftsUniversity)InderjitDhillon(UniversityofTexasatAustin)

JeromeFriedman(StanfordUniversity)KenjiFukumizu(InstituteofStatisticalMathematics)

ZoubinGhahramani(CambridgeUniversity)

DavidHeckerman(MicrosoftResearch)TomHeskes(RadboudUniversityNijmegen)GeoffreyHinton(UniversityofToronto)AapoHyvarinen(HelsinkiInstituteforInformationTechnology)

LesliePackKaelbling(MIT)MichaelKearns(UniversityofPennsylvania)

DaphneKoller(StanfordUniversity)

JohnLafferty(CarnegieMellonUniversity)MichaelLittman(RutgersUniversity)GaborLugosi(PompeuFabraUniversity)DavidMadigan(ColumbiaUniversity)PascalMassart(Universit′edeParis-Sud)AndrewMcCallum(UniversityofMassachusettsAmherst)

MarinaMeila(UniversityofWashington)AndrewMoore(CarnegieMellonUniversity)

JohnPlatt(MicrosoftResearch)

LucdeRaedt(Albert-LudwigsUniversitaetFreiburg)

ChristianRobert(Universit′eParis-Dauphine)

SunitaSarawagi(IITBombay)

RobertSchapire(PrincetonUniversity)BernhardSchoelkopf(MaxPlanckInstitute)RichardSutton(UniversityofAlberta)LarryWasserman(CarnegieMellonUniversity)

BinYu(UCBerkeley)

EditorialScope

FoundationsandTrendsQRinMachineLearningwillpublishsur-

veyandtutorialarticlesinthefollowingtopics:

Adaptivecontrolandsignalprocessing

Applicationsandcasestudies

Behavioral,cognitiveandneurallearning

Bayesianlearning

Classificationandprediction

Clustering

Datamining

Dimensionalityreduction

Evaluation

Gametheoreticlearning

Graphicalmodels

Independentcomponentanalysis

Inductivelogicprogramming

Kernelmethods

MarkovchainMonteCarlo

Modelchoice

Nonparametricmethods

Onlinelearning

Optimization

Reinforcementlearning

Relationallearning

Robustness

Spectralmethods

Statisticallearningtheory

Variationalinference

Visualization

InformationforLibrarians

FoundationsandTrendsQRinMachineLearning,2009,Volume2,4issues.

ISSNpaperversion1935-8237.ISSNonlineversion1935-8245.Alsoavailableasacombinedpaperandonlinesubscription.

FoundationsandTrendsQRinMachineLearning

Vol.2,No.1(2009)1–127

Qc2009Y.Bengio

DOI:10.1561/2200000006

LearningDeepArchitecturesforAI

YoshuaBengio

Dept.IRO,Universit′edeMontr′eal,C.P.6128,Montreal,Qc,H3C3J7,Canada,

yoshua.bengio@umontreal.ca

Abstract

Theoreticalresultssuggestthatinordertolearnthekindofcom-plicatedfunctionsthatcanrepresenthigh-levelabstractions(e.g.,invision,language,andotherAI-leveltasks),onemayneeddeeparchitec-tures.Deeparchitecturesarecomposedofmultiplelevelsofnon-linearoperations,suchasinneuralnetswithmanyhiddenlayersorincom-plicatedpropositionalformulaere-usingmanysub-formulae.Searchingtheparameterspaceofdeeparchitecturesisadifficulttask,butlearningalgorithmssuchasthoseforDeepBeliefNetworkshaverecentlybeenproposedtotacklethisproblemwithnotablesuccess,beatingthestate-of-the-artincertainareas.Thismonographdiscussesthemotivationsandprinciplesregardinglearningalgorithmsfordeeparchitectures,inparticularthoseexploitingasbuildingblocksunsupervisedlearningofsingle-layermodelssuchasRestrictedBoltzmannMachines,usedtoconstructdeepermodelssuchasDeepBeliefNetworks.

Contents

Introduction

HowdoWeTrainDeepArchitectures?

IntermediateRepresentations:SharingFeaturesand

AbstractionsAcrossTasks

DesiderataforLearningAI

OutlineofthePaper

TheoreticalAdvantagesofDeepArchitectures 13

ComputationalComplexity 16

InformalArguments 18

LocalvsNon-LocalGeneralization 21

TheLimitsofMatchingLocalTemplates 21

LearningDistributedRepresentations 27

NeuralNetworksforDeepArchitectures 31

Multi-LayerNeuralNetworks 31

TheChallengeofTrainingDeepNeuralNetworks 32

UnsupervisedLearningforDeepArchitectures 40

DeepGenerativeArchitectures 41

ConvolutionalNeuralNetworks 44

Auto-Encoders 46

Energy-BasedModelsandBoltzmannMachines 49

Energy-BasedModelsandProductsofExperts 49

BoltzmannMachines 54

RestrictedBoltzmannMachines 56

ContrastiveDivergence 60

GreedyLayer-WiseTrainingofDeep

Architectures 69

Layer-WiseTrainingofDeepBeliefNetworks 69

TrainingStackedAuto-Encoders 72

Semi-SupervisedandPartiallySupervisedTraining 73

VariantsofRBMsandAuto-Encoders 75

SparseRepresentationsinAuto-Encoders

andRBMs 75

DenoisingAuto-Encoders 81

LateralConnections 83

ConditionalRBMsandTemporalRBMs 84

FactoredRBMs 86

GeneralizingRBMsandContrastiveDivergence 87

StochasticVariationalBoundsforJoint

OptimizationofDBNLayers 91

UnfoldingRBMsintoInfiniteDirected

BeliefNetworks 92

VariationalJustificationofGreedyLayer-wiseTraining 94

JointUnsupervisedTrainingofAlltheLayers 97

LookingForward 101

GlobalOptimizationStrategies 101

WhyUnsupervisedLearningisImportant 107

OpenQuestions 108

Conclusion 113

Acknowledgments 115

References

117

Introduction

Allowingcomputerstomodelourworldwellenoughtoexhibitwhatwecallintelligencehasbeenthefocusofmorethanhalfacenturyofresearch.Toachievethis,itisclearthatalargequantityofinforma-tionaboutourworldshouldsomehowbestored,explicitlyorimplicitly,inthecomputer.Becauseitseemsdauntingtoformalizemanuallyallthatinformationinaformthatcomputerscanusetoanswerques-tionsandgeneralizetonewcontexts,manyresearchershaveturnedtolearningalgorithmstocapturealargefractionofthatinformation.Muchprogresshasbeenmadetounderstandandimprovelearningalgorithms,butthechallengeofartificialintelligence(AI)remains.Dowehavealgorithmsthatcanunderstandscenesanddescribetheminnaturallanguage?Notreally,exceptinverylimitedsettings.Dowehavealgorithmsthatcaninferenoughsemanticconceptstobeabletointeractwithmosthumansusingtheseconcepts?No.Ifweconsiderimageunderstanding,oneofthebestspecifiedoftheAItasks,wereal-izethatwedonotyethavelearningalgorithmsthatcandiscoverthemanyvisualandsemanticconceptsthatwouldseemtobenecessarytointerpretmostimagesontheweb.ThesituationissimilarforotherAItasks.

2 Introduction

Fig.1.1Wewouldliketherawinputimagetobetransformedintograduallyhigherlevelsofrepresentation,representingmoreandmoreabstractfunctionsoftherawinput,e.g.,edges,localshapes,objectparts,etc.Inpractice,wedonotknowinadvancewhatthe“right”representationshouldbeforalltheselevelsofabstractions,althoughlinguisticconceptsmighthelpguessingwhatthehigherlevelsshouldimplicitlyrepresent.

ConsiderforexamplethetaskofinterpretinganinputimagesuchastheoneinFigure

1.1.

WhenhumanstrytosolveaparticularAItask(suchasmachinevisionornaturallanguageprocessing),theyoftenexploittheirintuitionabouthowtodecomposetheproblemintosub-problemsandmultiplelevelsofrepresentation,e.g.,inobjectpartsandconstellationmodels

[138,

179,

197]

wheremodelsforpartscanbere-usedindifferentobjectinstances.Forexample,thecurrentstate-of-the-artinmachinevisioninvolvesasequenceofmodulesstartingfrompixelsandendinginalinearorkernelclassifier

[134,

145],

withintermediatemodulesmixingengineeredtransformationsandlearning,

e.g.,firstextractinglow-levelfeaturesthatareinvarianttosmallgeo-metricvariations(suchasedgedetectorsfromGaborfilters),transform-ingthemgradually(e.g.,tomaketheminvarianttocontrastchangesandcontrastinversion,sometimesbypoolingandsub-sampling),andthendetectingthemostfrequentpatterns.Aplausibleandcommonwaytoextractusefulinformationfromanaturalimageinvolvestrans-formingtherawpixelrepresentationintograduallymoreabstractrep-resentations,e.g.,startingfromthepresenceofedges,thedetectionofmorecomplexbutlocalshapes,uptotheidentificationofabstractcat-egoriesassociatedwithsub-objectsandobjectswhicharepartsoftheimage,andputtingallthesetogethertocaptureenoughunderstandingofthescenetoanswerquestionsaboutit.

Here,weassumethatthecomputationalmachinerynecessarytoexpresscomplexbehaviors(whichonemightlabel“intelligent”)requireshighlyvaryingmathematicalfunctions,i.e.,mathematicalfunc-tionsthatarehighlynon-linearintermsofrawsensoryinputs,anddisplayaverylargenumberofvariations(upsanddowns)acrossthedomainofinterest.Weviewtherawinputtothelearningsystemasahighdimensionalentity,madeofmanyobservedvariables,whicharerelatedbyunknownintricatestatisticalrelationships.Forexample,usingknowledgeofthe3Dgeometryofsolidobjectsandlighting,wecanrelatesmallvariationsinunderlyingphysicalandgeometricfac-tors(suchasposition,orientation,lightingofanobject)withchangesinpixelintensitiesforallthepixelsinanimage.Wecallthesefactorsofvariationbecausetheyaredifferentaspectsofthedatathatcanvaryseparatelyandoftenindependently.Inthiscase,explicitknowledgeofthephysicalfactorsinvolvedallowsonetogetapictureofthemath-ematicalformofthesedependencies,andoftheshapeofthesetofimages(aspointsinahigh-dimensionalspaceofpixelintensities)asso-ciatedwiththesame3Dobject.Ifamachinecapturedthefactorsthatexplainthestatisticalvariationsinthedata,andhowtheyinteracttogeneratethekindofdataweobserve,wewouldbeabletosaythatthemachineunderstandsthoseaspectsoftheworldcoveredbythesefactorsofvariation.Unfortunately,ingeneralandformostfactorsofvariationunderlyingnaturalimages,wedonothaveananalyticalunderstand-ingofthesefactorsofvariation.Wedonothaveenoughformalized

4 Introduction

priorknowledgeabouttheworldtoexplaintheobservedvarietyofimages,evenforsuchanapparentlysimpleabstractionasMAN,illus-tratedinFigure

1.1.

Ahigh-levelabstractionsuchasMANhasthepropertythatitcorrespondstoaverylargesetofpossibleimages,whichmightbeverydifferentfromeachotherfromthepointofviewofsimpleEuclideandistanceinthespaceofpixelintensities.Thesetofimagesforwhichthatlabelcouldbeappropriateformsahighlycon-volutedregioninpixelspacethatisnotevennecessarilyaconnectedregion.TheMANcategorycanbeseenasahigh-levelabstractionwithrespecttothespaceofimages.Whatwecallabstractionherecanbeacategory(suchastheMANcategory)orafeature,afunctionofsensorydata,whichcanbediscrete(e.g.,theinputsentenceisatthepasttense)orcontinuous(e.g.,theinputvideoshowsanobjectmovingat2meter/second).Manylower-levelandintermediate-levelconcepts(whichwealsocallabstractionshere)wouldbeusefultoconstructaMAN-detector.Lowerlevelabstractionsaremoredirectlytiedtoparticularpercepts,whereashigherlevelonesarewhatwecall“moreabstract”becausetheirconnectiontoactualperceptsismoreremote,andthroughother,intermediate-levelabstractions.

Inadditiontothedifficultyofcomingupwiththeappropriateinter-mediateabstractions,thenumberofvisualandsemanticcategories(suchasMAN)thatwewouldlikean“intelligent”machinetocap-tureisratherlarge.Thefocusofdeeparchitecturelearningistoauto-maticallydiscoversuchabstractions,fromthelowestlevelfeaturestothehighestlevelconcepts.Ideally,wewouldlikelearningalgorithmsthatenablethisdiscoverywithaslittlehumaneffortaspossible,i.e.,withouthavingtomanuallydefineallnecessaryabstractionsorhav-ingtoprovideahugesetofrelevanthand-labeledexamples.Ifthesealgorithmscouldtapintothehugeresourceoftextandimagesontheweb,itwouldcertainlyhelptotransfermuchofhumanknowledgeintomachine-interpretableform.

HowdoWeTrainDeepArchitectures?

Deeplearningmethodsaimatlearningfeaturehierarchieswithfea-turesfromhigherlevelsofthehierarchyformedbythecompositionof

HowdoWeTrainDeepArchitectures? 5

lowerlevelfeatures.Automaticallylearningfeaturesatmultiplelevelsofabstractionallowasystemtolearncomplexfunctionsmappingtheinputtotheoutputdirectlyfromdata,withoutdependingcompletelyonhuman-craftedfeatures.Thisisespeciallyimportantforhigher-levelabstractions,whichhumansoftendonotknowhowtospecifyexplic-itlyintermsofrawsensoryinput.Theabilitytoautomaticallylearnpowerfulfeatureswillbecomeincreasinglyimportantastheamountofdataandrangeofapplicationstomachinelearningmethodscontinuestogrow.

Depthofarchitecturereferstothenumberoflevelsofcompositionofnon-linearoperationsinthefunctionlearned.Whereasmostcur-rentlearningalgorithmscorrespondtoshallowarchitectures(1,2or3levels),themammalbrainisorganizedinadeeparchitecture

[173]

withagiveninputperceptrepresentedatmultiplelevelsofabstrac-tion,eachlevelcorrespondingtoadifferentareaofcortex.Humansoftendescribesuchconceptsinhierarchicalways,withmultiplelevelsofabstraction.Thebrainalsoappearstoprocessinformationthroughmultiplestagesoftransformationandrepresentation.Thisispartic-ularlyclearintheprimatevisualsystem

[173],

withitssequenceofprocessingstages:detectionofedges,primitiveshapes,andmovinguptograduallymorecomplexvisualshapes.

Inspiredbythearchitecturaldepthofthebrain,neuralnetworkresearchershadwantedfordecadestotraindeepmulti-layerneuralnetworks

[19,

191],

butnosuccessfulattemptswerereportedbefore2006

:researchersreportedpositiveexperimentalresultswithtypicallytwoorthreelevels(i.e.,oneortwohiddenlayers),buttrainingdeepernetworksconsistentlyyieldedpoorerresults.Somethingthatcanbeconsideredabreakthroughhappenedin2006:Hintonetal.atUniver-sityofTorontointroducedDeepBeliefNetworks(DBNs)

[73],

withalearningalgorithmthatgreedilytrainsonelayeratatime,exploitinganunsupervisedlearningalgorithmforeachlayer,aRestrictedBoltz-mannMachine(RBM)

[51].

Shortlyafter,relatedalgorithmsbasedonauto-encoderswereproposed

[17,

153],

apparentlyexploitingthe

1Exceptforneuralnetworkswithaspecialstructurecalledconvolutionalnetworks,dis-cussedinSection4.5.

6 Introduction

sameprinciple:guidingthetrainingofintermediatelevelsofrepresen-tationusingunsupervisedlearning,whichcanbeperformedlocallyateachlevel.OtheralgorithmsfordeeparchitectureswereproposedmorerecentlythatexploitneitherRBMsnorauto-encodersandthatexploitthesameprinciple

[131,

202]

(seeSection4).

Since2006,deepnetworkshavebeenappliedwithsuccessnotonlyinclassificationtasks

[2,

17,

99,

111,

150,

153,

195],

butalso

inregression

[160],

dimensionalityreduction

[74,

158],

modelingtex-

tures

[141],

modelingmotion

[182,

183]

,objectsegmentation

[114],

informationretrieval

[154,

159,

190],

robotics

[60],

naturallanguage

processing

[37,

130,

202],

andcollaborativefiltering[

162].

Althoughauto-encoders,RBMsandDBNscanbetrainedwithunlabeleddata,inmanyoftheaboveapplications,theyhavebeensuccessfullyusedtoinitializedeepsupervisedfeedforwardneuralnetworksappliedtoaspecifictask.

IntermediateRepresentations:SharingFeaturesandAbstractionsAcrossTasks

Sinceadeeparchitecturecanbeseenasthecompositionofaseriesofprocessingstages,theimmediatequestionthatdeeparchitecturesraiseis:whatkindofrepresentationofthedatashouldbefoundastheoutputofeachstage(i.e.,theinputofanother)?Whatkindofinterfaceshouldtherebebetweenthesestages?Ahallmarkofrecentresearchondeeparchitecturesisthefocusontheseintermediaterepresentations:thesuccessofdeeparchitecturesbelongstotherepresentationslearnedinanunsupervisedwaybyRBMs

[73],

ordinaryauto-encoders

[17],

sparseauto-encoders

[150,

153],

ordenoisingauto-encoders

[195].

Thesealgo-rithms(describedinmoredetailinSection7.2)canbeseenaslearn-ingtotransformonerepresentation(theoutputofthepreviousstage)intoanother,ateachstepmaybedisentanglingbetterthefactorsofvariationsunderlyingthedata.AswediscussatlengthinSection4,ithasbeenobservedagainandagainthatonceagoodrepresenta-tionhasbeenfoundateachlevel,itcanbeusedtoinitializeandsuccessfullytrainadeepneuralnetworkbysupervisedgradient-basedoptimization.

SharingFeaturesandAbstractionsAcrossTasks 7

Eachlevelofabstractionfoundinthebrainconsistsofthe“activa-tion”(neuralexcitation)ofasmallsubsetofalargenumberoffeaturesthatare,ingeneral,notmutuallyexclusive.Becausethesefeaturesarenotmutuallyexclusive,theyformwhatiscalledadistributedrepresen-tation

[68,

156]:

theinformationisnotlocalizedinaparticularneuronbutdistributedacrossmany.Inadditiontobeingdistributed,itappearsthatthebrainusesarepresentationthatissparse:onlyaaround1-4%oftheneuronsareactivetogetheratagiventime

[5,

113].

Sec-tion3.2introducesthenotionofsparsedistributedrepresentationandSection7.1describesinmoredetailthemachinelearningapproaches,someinspiredbytheobservationsofthesparserepresentationsinthebrain,thathavebeenusedtobuilddeeparchitectureswithsparserep-resentations.

Whereasdensedistributedrepresentationsareoneextremeofaspectrum,andsparserepresentationsareinthemiddleofthatspec-trum,purelylocalrepresentationsaretheotherextreme.Localityofrepresentationisintimatelyconnectedwiththenotionoflocalgener-alization.Manyexistingmachinelearningmethodsarelocalininputspace:toobtainalearnedfunctionthatbehavesdifferentlyindifferentregionsofdata-space,theyrequiredifferenttunableparametersforeachoftheseregions(seemoreinSection3.1).Eventhoughstatisticaleffi-ciencyisnotnecessarilypoorwhenthenumberoftunableparametersislarge,goodgeneralizationcanbeobtainedonlywhenaddingsomeformofprior(e.g.,thatsmallervaluesoftheparametersarepreferred).Whenthatpriorisnottask-specific,itisoftenonethatforcesthesolutiontobeverysmooth,asdiscussedattheendofSection3.1.Incontrasttolearningmethodsbasedonlocalgeneralization,thetotalnumberofpatternsthatcanbedistinguishedusingadistributedrepresentationscalespossiblyexponentiallywiththedimensionoftherepresentation(i.e.,thenumberoflearnedfeatures).

Inmanymachinevisionsystems,learningalgorithmshavebeenlim-itedtospecificpartsofsuchaprocessingchain.Therestofthedesignremainslabor-intensive,whichmightlimitthescaleofsuchsystems.Ontheotherhand,ahallmarkofwhatwewouldconsiderintelligentmachinesincludesalargeenoughrepertoireofconcepts.RecognizingMANisnotenough.Weneedalgorithmsthatcantackleaverylarge

8 Introduction

setofsuchtasksandconcepts.Itseemsdauntingtomanuallydefinethatmanytasks,andlearningbecomesessentialinthiscontext.Fur-thermore,itwouldseemfoolishnottoexploittheunderlyingcommon-alitiesbetweenthesetasksandbetweentheconceptstheyrequire.Thishasbeenthefocusofresearchonmulti-tasklearning

[7,

32,

88,

186].

Architectureswithmultiplelevelsnaturallyprovidesuchsharingandre-useofcomponents:thelow-levelvisualfeatures(likeedgedetec-tors)andintermediate-levelvisualfeatures(likeobjectparts)thatareusefultodetectMANarealsousefulforalargegroupofothervisualtasks.Deeplearningalgorithmsarebasedonlearningintermediaterep-resentationswhichcanbesharedacrosstasks.Hencetheycanleverageunsuperviseddataanddatafromsimilartasks

[148]

toboostperfor-manceonlargeandchallengingproblemsthatroutinelysufferfromapovertyoflabelleddata,ashasbeenshownby

[37],

beatingthestate-of-the-artinseveralnaturallanguageprocessingtasks.Asimi-larmulti-taskapproachfordeeparchitectureswasappliedinvisiontasksby

[2].

Consideramulti-tasksettinginwhichtherearedifferentoutputsfordifferenttasks,allobtainedfromasharedpoolofhigh-levelfeatures.Thefactthatmanyoftheselearnedfeaturesaresharedamongmtasksprovidessharingofstatisticalstrengthinproportiontom.Nowconsiderthattheselearnedhigh-levelfeaturescanthem-selvesberepresentedbycombininglower-levelintermediatefeaturesfromacommonpool.Againstatisticalstrengthcanbegainedinasim-ilarway,andthisstrategycanbeexploitedforeverylevelofadeeparchitecture.

Inaddition,learningaboutalargesetofinterrelatedconceptsmightprovideakeytothekindofbroadgeneralizationsthathumansappearabletodo,whichwewouldnotexpectfromseparatelytrainedobjectdetectors,withonedetectorpervisualcategory.Ifeachhigh-levelcate-goryisitselfrepresentedthroughaparticulardistributedconfigurationofabstractfeaturesfromacommonpool,generalizationtounseencate-goriescouldfollownaturallyfromnewconfigurationsofthesefeatures.Eventhoughonlysomeconfigurationsofthesefeatureswouldpresentinthetrainingexamples,iftheyrepresentdifferentaspectsofthedata,newexamplescouldmeaningfullyberepresentedbynewconfigurationsofthesefeatures.

DesiderataforLearningAI 9

DesiderataforLearningAI

Summarizingsomeoftheaboveissues,andtryingtoputtheminthebroaderperspectiveofAI,weputforwardanumberofrequirementswebelievetobeimportantforlearningalgorithmstoapproachAI,manyofwhichmotivatetheresearcharedescribedhere:

Abilitytolearncomplex,highly-varyingfunctions,i.e.,withanumberofvariationsmuchgreaterthanthenumberoftrainingexamples.

Abilitytolearnwithlittlehumaninputthelow-level,

intermediate,andhigh-levelabstractionsthatwouldbeuse-fultorepresentthekindofcomplexfunctionsneededforAItasks.

Abilitytolearnfromaverylargesetofexamples:computa-

tiontimefortrainingshouldscalewellwiththenumberofexamples,i.e.,closetolinearly.

Abilitytolearnfrommostlyunlabeleddata,i.e.,toworkin

thesemi-supervisedsetting,wherenotalltheexamplescomewithcompleteandcorrectsemanticlabels.

Abilitytoexploitthesynergiespresentacrossalargenum-

beroftasks,i.e.,multi-tasklearning.ThesesynergiesexistbecausealltheAItasksprovidedifferentviewsonthesameunderlyingreality.

Strongunsupervisedlearning(i.e.,capturingmostofthesta-

tisticalstructureintheobserveddata),whichseemsessentialinthelimitofalargenumberoftasksandwhenfuturetasksarenotknownaheadoftime.

Otherelementsareequallyimportantbutarenotdirectlyconnectedtothematerialinthismonograph.Theyincludetheabilitytolearntorepresentcontextofvaryinglengthandstructure

[146],

soastoallowmachinestooperateinacontext-dependentstreamofobservationsandproduceastreamofactions,theabilitytomakedecisionswhenactionsinfluencethefutureobservationsandfuturerewards

[181]

,andtheabilitytoinfluencefutureobservationssoastocollectmorerelevantinformationabouttheworld,i.e.,aformofactivelearning

[34].

10 Introduction

OutlineofthePaper

Section2reviewstheoreticalresults(whichcanbeskippedwithouthurtingtheunderstandingoftheremainder)showingthatanarchi-tecturewithinsufficientdepthcanrequiremanymorecomputationalelements,potentiallyexponentia

人人文庫> 全部分類> 行業(yè)資料 > 信息產業(yè)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網頁內容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
5. 人人文庫網僅提供信息存儲空間，僅對用戶上傳內容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內容本身不做任何修改或編輯，并不能對任何下載內容負責。
6. 下載文件中如有侵權或不適當內容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

學習人工智能的深度架構

文檔簡介

溫馨提示

最新文檔

評論

學習人工智能的深度架構

文檔簡介

溫馨提示

最新文檔

評論

相關文檔