![學習人工智能的深度架構_第1頁](http://file4.renrendoc.com/view14/M00/1D/2F/wKhkGWYm73KAdsdmAABYY1p81a8511.jpg)
![學習人工智能的深度架構_第2頁](http://file4.renrendoc.com/view14/M00/1D/2F/wKhkGWYm73KAdsdmAABYY1p81a85112.jpg)
![學習人工智能的深度架構_第3頁](http://file4.renrendoc.com/view14/M00/1D/2F/wKhkGWYm73KAdsdmAABYY1p81a85113.jpg)
![學習人工智能的深度架構_第4頁](http://file4.renrendoc.com/view14/M00/1D/2F/wKhkGWYm73KAdsdmAABYY1p81a85114.jpg)
![學習人工智能的深度架構_第5頁](http://file4.renrendoc.com/view14/M00/1D/2F/wKhkGWYm73KAdsdmAABYY1p81a85115.jpg)
版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
Fulltextavailableat:
/10.1561/2200000006
LearningDeepArchitecturesforAI
LearningDeepArchitecturesforAI
YoshuaBengio
Dept.IRO,Universit′edeMontr′eal
C.P.6128,Montreal,Qc
Canada
yoshua.b
engio@umontreal.ca
Boston–Delft
FoundationsandTrendsQRinMachineLearning
Published,soldanddistributedby:
nowPublishersInc.
POBox1024
Hanover,MA02339USA
Tel.+1-781-985-4510
sales@
OutsideNorthAmerica:
nowPublishersInc.
POBox179
2600ADDelftTheNetherlands
Tel.+31-6-51115274
ThepreferredcitationforthispublicationisY.Bengio,LearningDeepArchitectures
forAI,FoundationandTrendsQRinMachineLearning,vol2,no1,pp1–127,2009
ISBN:978-1-60198-294-0
Qc2009Y.Bengio
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,mechanical,photocopying,recordingorotherwise,withoutpriorwrittenpermissionofthepublishers.
Photocopying.IntheUSA:ThisjournalisregisteredattheCopyrightClearanceCen-ter,Inc.,222RosewoodDrive,Danvers,MA01923.Authorizationtophotocopyitemsforinternalorpersonaluse,ortheinternalorpersonaluseofspecificclients,isgrantedbynowPublishersInc.forusersregisteredwiththeCopyrightClearanceCenter(CCC).The‘services’foruserscanbefoundontheinternetat:
Forthoseorganizationsthathavebeengrantedaphotocopylicense,aseparatesystemofpaymenthasbeenarranged.Authorizationdoesnotextendtootherkindsofcopy-ing,suchasthatforgeneraldistribution,foradvertisingorpromotionalpurposes,forcreatingnewcollectiveworks,orforresale.Intherestoftheworld:Permissiontopho-tocopymustbeobtainedfromthecopyrightowner.PleaseapplytonowPublishersInc.,POBox1024,Hanover,MA02339,USA;Tel.+1-781-871-0245;;
sales@
nowPublishersInc.hasanexclusivelicensetopublishthismaterialworldwide.Permissiontousethiscontentmustbeobtainedfromthecopyrightlicenseholder.PleaseapplytonowPublishers,POBox179,2600ADDelft,TheNetherlands,;e-mail:
sales@
FoundationsandTrendsQRinMachineLearningVolume2Issue1,2009
EditorialBoard
Editor-in-Chief:
MichaelJordan
DepartmentofElectricalEngineeringandComputerScienceDepartmentofStatistics
UniversityofCalifornia,BerkeleyBerkeley,CA94720-1776
Editors
PeterBartlett(UCBerkeley)
YoshuaBengio(Universit′edeMontr′eal)AvrimBlum(CarnegieMellonUniversity)CraigBoutilier(UniversityofToronto)StephenBoyd(StanfordUniversity)
CarlaBrodley(TuftsUniversity)InderjitDhillon(UniversityofTexasatAustin)
JeromeFriedman(StanfordUniversity)KenjiFukumizu(InstituteofStatisticalMathematics)
ZoubinGhahramani(CambridgeUniversity)
DavidHeckerman(MicrosoftResearch)TomHeskes(RadboudUniversityNijmegen)GeoffreyHinton(UniversityofToronto)AapoHyvarinen(HelsinkiInstituteforInformationTechnology)
LesliePackKaelbling(MIT)MichaelKearns(UniversityofPennsylvania)
DaphneKoller(StanfordUniversity)
JohnLafferty(CarnegieMellonUniversity)MichaelLittman(RutgersUniversity)GaborLugosi(PompeuFabraUniversity)DavidMadigan(ColumbiaUniversity)PascalMassart(Universit′edeParis-Sud)AndrewMcCallum(UniversityofMassachusettsAmherst)
MarinaMeila(UniversityofWashington)AndrewMoore(CarnegieMellonUniversity)
JohnPlatt(MicrosoftResearch)
LucdeRaedt(Albert-LudwigsUniversitaetFreiburg)
ChristianRobert(Universit′eParis-Dauphine)
SunitaSarawagi(IITBombay)
RobertSchapire(PrincetonUniversity)BernhardSchoelkopf(MaxPlanckInstitute)RichardSutton(UniversityofAlberta)LarryWasserman(CarnegieMellonUniversity)
BinYu(UCBerkeley)
EditorialScope
FoundationsandTrendsQRinMachineLearningwillpublishsur-
veyandtutorialarticlesinthefollowingtopics:
Adaptivecontrolandsignalprocessing
Applicationsandcasestudies
Behavioral,cognitiveandneurallearning
Bayesianlearning
Classificationandprediction
Clustering
Datamining
Dimensionalityreduction
Evaluation
Gametheoreticlearning
Graphicalmodels
Independentcomponentanalysis
Inductivelogicprogramming
Kernelmethods
MarkovchainMonteCarlo
Modelchoice
Nonparametricmethods
Onlinelearning
Optimization
Reinforcementlearning
Relationallearning
Robustness
Spectralmethods
Statisticallearningtheory
Variationalinference
Visualization
InformationforLibrarians
FoundationsandTrendsQRinMachineLearning,2009,Volume2,4issues.
ISSNpaperversion1935-8237.ISSNonlineversion1935-8245.Alsoavailableasacombinedpaperandonlinesubscription.
FoundationsandTrendsQRinMachineLearning
Vol.2,No.1(2009)1–127
Qc2009Y.Bengio
DOI:10.1561/2200000006
LearningDeepArchitecturesforAI
YoshuaBengio
Dept.IRO,Universit′edeMontr′eal,C.P.6128,Montreal,Qc,H3C3J7,Canada,
yoshua.bengio@umontreal.ca
Abstract
Theoreticalresultssuggestthatinordertolearnthekindofcom-plicatedfunctionsthatcanrepresenthigh-levelabstractions(e.g.,invision,language,andotherAI-leveltasks),onemayneeddeeparchitec-tures.Deeparchitecturesarecomposedofmultiplelevelsofnon-linearoperations,suchasinneuralnetswithmanyhiddenlayersorincom-plicatedpropositionalformulaere-usingmanysub-formulae.Searchingtheparameterspaceofdeeparchitecturesisadifficulttask,butlearningalgorithmssuchasthoseforDeepBeliefNetworkshaverecentlybeenproposedtotacklethisproblemwithnotablesuccess,beatingthestate-of-the-artincertainareas.Thismonographdiscussesthemotivationsandprinciplesregardinglearningalgorithmsfordeeparchitectures,inparticularthoseexploitingasbuildingblocksunsupervisedlearningofsingle-layermodelssuchasRestrictedBoltzmannMachines,usedtoconstructdeepermodelssuchasDeepBeliefNetworks.
Contents
Introduction
1
HowdoWeTrainDeepArchitectures?
4
IntermediateRepresentations:SharingFeaturesand
AbstractionsAcrossTasks
6
DesiderataforLearningAI
9
OutlineofthePaper
10
TheoreticalAdvantagesofDeepArchitectures 13
ComputationalComplexity 16
InformalArguments 18
LocalvsNon-LocalGeneralization 21
TheLimitsofMatchingLocalTemplates 21
LearningDistributedRepresentations 27
NeuralNetworksforDeepArchitectures 31
Multi-LayerNeuralNetworks 31
TheChallengeofTrainingDeepNeuralNetworks 32
UnsupervisedLearningforDeepArchitectures 40
DeepGenerativeArchitectures 41
ConvolutionalNeuralNetworks 44
Auto-Encoders 46
ix
Energy-BasedModelsandBoltzmannMachines 49
Energy-BasedModelsandProductsofExperts 49
BoltzmannMachines 54
RestrictedBoltzmannMachines 56
ContrastiveDivergence 60
GreedyLayer-WiseTrainingofDeep
Architectures 69
Layer-WiseTrainingofDeepBeliefNetworks 69
TrainingStackedAuto-Encoders 72
Semi-SupervisedandPartiallySupervisedTraining 73
VariantsofRBMsandAuto-Encoders 75
SparseRepresentationsinAuto-Encoders
andRBMs 75
DenoisingAuto-Encoders 81
LateralConnections 83
ConditionalRBMsandTemporalRBMs 84
FactoredRBMs 86
GeneralizingRBMsandContrastiveDivergence 87
StochasticVariationalBoundsforJoint
OptimizationofDBNLayers 91
UnfoldingRBMsintoInfiniteDirected
BeliefNetworks 92
VariationalJustificationofGreedyLayer-wiseTraining 94
JointUnsupervisedTrainingofAlltheLayers 97
LookingForward 101
GlobalOptimizationStrategies 101
WhyUnsupervisedLearningisImportant 107
OpenQuestions 108
Conclusion 113
Acknowledgments 115
References
117
1
Introduction
Allowingcomputerstomodelourworldwellenoughtoexhibitwhatwecallintelligencehasbeenthefocusofmorethanhalfacenturyofresearch.Toachievethis,itisclearthatalargequantityofinforma-tionaboutourworldshouldsomehowbestored,explicitlyorimplicitly,inthecomputer.Becauseitseemsdauntingtoformalizemanuallyallthatinformationinaformthatcomputerscanusetoanswerques-tionsandgeneralizetonewcontexts,manyresearchershaveturnedtolearningalgorithmstocapturealargefractionofthatinformation.Muchprogresshasbeenmadetounderstandandimprovelearningalgorithms,butthechallengeofartificialintelligence(AI)remains.Dowehavealgorithmsthatcanunderstandscenesanddescribetheminnaturallanguage?Notreally,exceptinverylimitedsettings.Dowehavealgorithmsthatcaninferenoughsemanticconceptstobeabletointeractwithmosthumansusingtheseconcepts?No.Ifweconsiderimageunderstanding,oneofthebestspecifiedoftheAItasks,wereal-izethatwedonotyethavelearningalgorithmsthatcandiscoverthemanyvisualandsemanticconceptsthatwouldseemtobenecessarytointerpretmostimagesontheweb.ThesituationissimilarforotherAItasks.
1
2 Introduction
Fig.1.1Wewouldliketherawinputimagetobetransformedintograduallyhigherlevelsofrepresentation,representingmoreandmoreabstractfunctionsoftherawinput,e.g.,edges,localshapes,objectparts,etc.Inpractice,wedonotknowinadvancewhatthe“right”representationshouldbeforalltheselevelsofabstractions,althoughlinguisticconceptsmighthelpguessingwhatthehigherlevelsshouldimplicitlyrepresent.
ConsiderforexamplethetaskofinterpretinganinputimagesuchastheoneinFigure
1.1.
WhenhumanstrytosolveaparticularAItask(suchasmachinevisionornaturallanguageprocessing),theyoftenexploittheirintuitionabouthowtodecomposetheproblemintosub-problemsandmultiplelevelsofrepresentation,e.g.,inobjectpartsandconstellationmodels
[138,
179,
197]
wheremodelsforpartscanbere-usedindifferentobjectinstances.Forexample,thecurrentstate-of-the-artinmachinevisioninvolvesasequenceofmodulesstartingfrompixelsandendinginalinearorkernelclassifier
[134,
145],
withintermediatemodulesmixingengineeredtransformationsandlearning,
3
e.g.,firstextractinglow-levelfeaturesthatareinvarianttosmallgeo-metricvariations(suchasedgedetectorsfromGaborfilters),transform-ingthemgradually(e.g.,tomaketheminvarianttocontrastchangesandcontrastinversion,sometimesbypoolingandsub-sampling),andthendetectingthemostfrequentpatterns.Aplausibleandcommonwaytoextractusefulinformationfromanaturalimageinvolvestrans-formingtherawpixelrepresentationintograduallymoreabstractrep-resentations,e.g.,startingfromthepresenceofedges,thedetectionofmorecomplexbutlocalshapes,uptotheidentificationofabstractcat-egoriesassociatedwithsub-objectsandobjectswhicharepartsoftheimage,andputtingallthesetogethertocaptureenoughunderstandingofthescenetoanswerquestionsaboutit.
Here,weassumethatthecomputationalmachinerynecessarytoexpresscomplexbehaviors(whichonemightlabel“intelligent”)requireshighlyvaryingmathematicalfunctions,i.e.,mathematicalfunc-tionsthatarehighlynon-linearintermsofrawsensoryinputs,anddisplayaverylargenumberofvariations(upsanddowns)acrossthedomainofinterest.Weviewtherawinputtothelearningsystemasahighdimensionalentity,madeofmanyobservedvariables,whicharerelatedbyunknownintricatestatisticalrelationships.Forexample,usingknowledgeofthe3Dgeometryofsolidobjectsandlighting,wecanrelatesmallvariationsinunderlyingphysicalandgeometricfac-tors(suchasposition,orientation,lightingofanobject)withchangesinpixelintensitiesforallthepixelsinanimage.Wecallthesefactorsofvariationbecausetheyaredifferentaspectsofthedatathatcanvaryseparatelyandoftenindependently.Inthiscase,explicitknowledgeofthephysicalfactorsinvolvedallowsonetogetapictureofthemath-ematicalformofthesedependencies,andoftheshapeofthesetofimages(aspointsinahigh-dimensionalspaceofpixelintensities)asso-ciatedwiththesame3Dobject.Ifamachinecapturedthefactorsthatexplainthestatisticalvariationsinthedata,andhowtheyinteracttogeneratethekindofdataweobserve,wewouldbeabletosaythatthemachineunderstandsthoseaspectsoftheworldcoveredbythesefactorsofvariation.Unfortunately,ingeneralandformostfactorsofvariationunderlyingnaturalimages,wedonothaveananalyticalunderstand-ingofthesefactorsofvariation.Wedonothaveenoughformalized
4 Introduction
priorknowledgeabouttheworldtoexplaintheobservedvarietyofimages,evenforsuchanapparentlysimpleabstractionasMAN,illus-tratedinFigure
1.1.
Ahigh-levelabstractionsuchasMANhasthepropertythatitcorrespondstoaverylargesetofpossibleimages,whichmightbeverydifferentfromeachotherfromthepointofviewofsimpleEuclideandistanceinthespaceofpixelintensities.Thesetofimagesforwhichthatlabelcouldbeappropriateformsahighlycon-volutedregioninpixelspacethatisnotevennecessarilyaconnectedregion.TheMANcategorycanbeseenasahigh-levelabstractionwithrespecttothespaceofimages.Whatwecallabstractionherecanbeacategory(suchastheMANcategory)orafeature,afunctionofsensorydata,whichcanbediscrete(e.g.,theinputsentenceisatthepasttense)orcontinuous(e.g.,theinputvideoshowsanobjectmovingat2meter/second).Manylower-levelandintermediate-levelconcepts(whichwealsocallabstractionshere)wouldbeusefultoconstructaMAN-detector.Lowerlevelabstractionsaremoredirectlytiedtoparticularpercepts,whereashigherlevelonesarewhatwecall“moreabstract”becausetheirconnectiontoactualperceptsismoreremote,andthroughother,intermediate-levelabstractions.
Inadditiontothedifficultyofcomingupwiththeappropriateinter-mediateabstractions,thenumberofvisualandsemanticcategories(suchasMAN)thatwewouldlikean“intelligent”machinetocap-tureisratherlarge.Thefocusofdeeparchitecturelearningistoauto-maticallydiscoversuchabstractions,fromthelowestlevelfeaturestothehighestlevelconcepts.Ideally,wewouldlikelearningalgorithmsthatenablethisdiscoverywithaslittlehumaneffortaspossible,i.e.,withouthavingtomanuallydefineallnecessaryabstractionsorhav-ingtoprovideahugesetofrelevanthand-labeledexamples.Ifthesealgorithmscouldtapintothehugeresourceoftextandimagesontheweb,itwouldcertainlyhelptotransfermuchofhumanknowledgeintomachine-interpretableform.
HowdoWeTrainDeepArchitectures?
Deeplearningmethodsaimatlearningfeaturehierarchieswithfea-turesfromhigherlevelsofthehierarchyformedbythecompositionof
HowdoWeTrainDeepArchitectures? 5
lowerlevelfeatures.Automaticallylearningfeaturesatmultiplelevelsofabstractionallowasystemtolearncomplexfunctionsmappingtheinputtotheoutputdirectlyfromdata,withoutdependingcompletelyonhuman-craftedfeatures.Thisisespeciallyimportantforhigher-levelabstractions,whichhumansoftendonotknowhowtospecifyexplic-itlyintermsofrawsensoryinput.Theabilitytoautomaticallylearnpowerfulfeatureswillbecomeincreasinglyimportantastheamountofdataandrangeofapplicationstomachinelearningmethodscontinuestogrow.
Depthofarchitecturereferstothenumberoflevelsofcompositionofnon-linearoperationsinthefunctionlearned.Whereasmostcur-rentlearningalgorithmscorrespondtoshallowarchitectures(1,2or3levels),themammalbrainisorganizedinadeeparchitecture
[173]
withagiveninputperceptrepresentedatmultiplelevelsofabstrac-tion,eachlevelcorrespondingtoadifferentareaofcortex.Humansoftendescribesuchconceptsinhierarchicalways,withmultiplelevelsofabstraction.Thebrainalsoappearstoprocessinformationthroughmultiplestagesoftransformationandrepresentation.Thisispartic-ularlyclearintheprimatevisualsystem
[173],
withitssequenceofprocessingstages:detectionofedges,primitiveshapes,andmovinguptograduallymorecomplexvisualshapes.
Inspiredbythearchitecturaldepthofthebrain,neuralnetworkresearchershadwantedfordecadestotraindeepmulti-layerneuralnetworks
[19,
191],
butnosuccessfulattemptswerereportedbefore2006
1
:researchersreportedpositiveexperimentalresultswithtypicallytwoorthreelevels(i.e.,oneortwohiddenlayers),buttrainingdeepernetworksconsistentlyyieldedpoorerresults.Somethingthatcanbeconsideredabreakthroughhappenedin2006:Hintonetal.atUniver-sityofTorontointroducedDeepBeliefNetworks(DBNs)
[73],
withalearningalgorithmthatgreedilytrainsonelayeratatime,exploitinganunsupervisedlearningalgorithmforeachlayer,aRestrictedBoltz-mannMachine(RBM)
[51].
Shortlyafter,relatedalgorithmsbasedonauto-encoderswereproposed
[17,
153],
apparentlyexploitingthe
1Exceptforneuralnetworkswithaspecialstructurecalledconvolutionalnetworks,dis-cussedinSection4.5.
6 Introduction
sameprinciple:guidingthetrainingofintermediatelevelsofrepresen-tationusingunsupervisedlearning,whichcanbeperformedlocallyateachlevel.OtheralgorithmsfordeeparchitectureswereproposedmorerecentlythatexploitneitherRBMsnorauto-encodersandthatexploitthesameprinciple
[131,
202]
(seeSection4).
Since2006,deepnetworkshavebeenappliedwithsuccessnotonlyinclassificationtasks
[2,
17,
99,
111,
150,
153,
195],
butalso
inregression
[160],
dimensionalityreduction
[74,
158],
modelingtex-
tures
[141],
modelingmotion
[182,
183]
,objectsegmentation
[114],
informationretrieval
[154,
159,
190],
robotics
[60],
naturallanguage
processing
[37,
130,
202],
andcollaborativefiltering[
162].
Althoughauto-encoders,RBMsandDBNscanbetrainedwithunlabeleddata,inmanyoftheaboveapplications,theyhavebeensuccessfullyusedtoinitializedeepsupervisedfeedforwardneuralnetworksappliedtoaspecifictask.
IntermediateRepresentations:SharingFeaturesandAbstractionsAcrossTasks
Sinceadeeparchitecturecanbeseenasthecompositionofaseriesofprocessingstages,theimmediatequestionthatdeeparchitecturesraiseis:whatkindofrepresentationofthedatashouldbefoundastheoutputofeachstage(i.e.,theinputofanother)?Whatkindofinterfaceshouldtherebebetweenthesestages?Ahallmarkofrecentresearchondeeparchitecturesisthefocusontheseintermediaterepresentations:thesuccessofdeeparchitecturesbelongstotherepresentationslearnedinanunsupervisedwaybyRBMs
[73],
ordinaryauto-encoders
[17],
sparseauto-encoders
[150,
153],
ordenoisingauto-encoders
[195].
Thesealgo-rithms(describedinmoredetailinSection7.2)canbeseenaslearn-ingtotransformonerepresentation(theoutputofthepreviousstage)intoanother,ateachstepmaybedisentanglingbetterthefactorsofvariationsunderlyingthedata.AswediscussatlengthinSection4,ithasbeenobservedagainandagainthatonceagoodrepresenta-tionhasbeenfoundateachlevel,itcanbeusedtoinitializeandsuccessfullytrainadeepneuralnetworkbysupervisedgradient-basedoptimization.
SharingFeaturesandAbstractionsAcrossTasks 7
Eachlevelofabstractionfoundinthebrainconsistsofthe“activa-tion”(neuralexcitation)ofasmallsubsetofalargenumberoffeaturesthatare,ingeneral,notmutuallyexclusive.Becausethesefeaturesarenotmutuallyexclusive,theyformwhatiscalledadistributedrepresen-tation
[68,
156]:
theinformationisnotlocalizedinaparticularneuronbutdistributedacrossmany.Inadditiontobeingdistributed,itappearsthatthebrainusesarepresentationthatissparse:onlyaaround1-4%oftheneuronsareactivetogetheratagiventime
[5,
113].
Sec-tion3.2introducesthenotionofsparsedistributedrepresentationandSection7.1describesinmoredetailthemachinelearningapproaches,someinspiredbytheobservationsofthesparserepresentationsinthebrain,thathavebeenusedtobuilddeeparchitectureswithsparserep-resentations.
Whereasdensedistributedrepresentationsareoneextremeofaspectrum,andsparserepresentationsareinthemiddleofthatspec-trum,purelylocalrepresentationsaretheotherextreme.Localityofrepresentationisintimatelyconnectedwiththenotionoflocalgener-alization.Manyexistingmachinelearningmethodsarelocalininputspace:toobtainalearnedfunctionthatbehavesdifferentlyindifferentregionsofdata-space,theyrequiredifferenttunableparametersforeachoftheseregions(seemoreinSection3.1).Eventhoughstatisticaleffi-ciencyisnotnecessarilypoorwhenthenumberoftunableparametersislarge,goodgeneralizationcanbeobtainedonlywhenaddingsomeformofprior(e.g.,thatsmallervaluesoftheparametersarepreferred).Whenthatpriorisnottask-specific,itisoftenonethatforcesthesolutiontobeverysmooth,asdiscussedattheendofSection3.1.Incontrasttolearningmethodsbasedonlocalgeneralization,thetotalnumberofpatternsthatcanbedistinguishedusingadistributedrepresentationscalespossiblyexponentiallywiththedimensionoftherepresentation(i.e.,thenumberoflearnedfeatures).
Inmanymachinevisionsystems,learningalgorithmshavebeenlim-itedtospecificpartsofsuchaprocessingchain.Therestofthedesignremainslabor-intensive,whichmightlimitthescaleofsuchsystems.Ontheotherhand,ahallmarkofwhatwewouldconsiderintelligentmachinesincludesalargeenoughrepertoireofconcepts.RecognizingMANisnotenough.Weneedalgorithmsthatcantackleaverylarge
8 Introduction
setofsuchtasksandconcepts.Itseemsdauntingtomanuallydefinethatmanytasks,andlearningbecomesessentialinthiscontext.Fur-thermore,itwouldseemfoolishnottoexploittheunderlyingcommon-alitiesbetweenthesetasksandbetweentheconceptstheyrequire.Thishasbeenthefocusofresearchonmulti-tasklearning
[7,
8,
32,
88,
186].
Architectureswithmultiplelevelsnaturallyprovidesuchsharingandre-useofcomponents:thelow-levelvisualfeatures(likeedgedetec-tors)andintermediate-levelvisualfeatures(likeobjectparts)thatareusefultodetectMANarealsousefulforalargegroupofothervisualtasks.Deeplearningalgorithmsarebasedonlearningintermediaterep-resentationswhichcanbesharedacrosstasks.Hencetheycanleverageunsuperviseddataanddatafromsimilartasks
[148]
toboostperfor-manceonlargeandchallengingproblemsthatroutinelysufferfromapovertyoflabelleddata,ashasbeenshownby
[37],
beatingthestate-of-the-artinseveralnaturallanguageprocessingtasks.Asimi-larmulti-taskapproachfordeeparchitectureswasappliedinvisiontasksby
[2].
Consideramulti-tasksettinginwhichtherearedifferentoutputsfordifferenttasks,allobtainedfromasharedpoolofhigh-levelfeatures.Thefactthatmanyoftheselearnedfeaturesaresharedamongmtasksprovidessharingofstatisticalstrengthinproportiontom.Nowconsiderthattheselearnedhigh-levelfeaturescanthem-selvesberepresentedbycombininglower-levelintermediatefeaturesfromacommonpool.Againstatisticalstrengthcanbegainedinasim-ilarway,andthisstrategycanbeexploitedforeverylevelofadeeparchitecture.
Inaddition,learningaboutalargesetofinterrelatedconceptsmightprovideakeytothekindofbroadgeneralizationsthathumansappearabletodo,whichwewouldnotexpectfromseparatelytrainedobjectdetectors,withonedetectorpervisualcategory.Ifeachhigh-levelcate-goryisitselfrepresentedthroughaparticulardistributedconfigurationofabstractfeaturesfromacommonpool,generalizationtounseencate-goriescouldfollownaturallyfromnewconfigurationsofthesefeatures.Eventhoughonlysomeconfigurationsofthesefeatureswouldpresentinthetrainingexamples,iftheyrepresentdifferentaspectsofthedata,newexamplescouldmeaningfullyberepresentedbynewconfigurationsofthesefeatures.
DesiderataforLearningAI 9
DesiderataforLearningAI
Summarizingsomeoftheaboveissues,andtryingtoputtheminthebroaderperspectiveofAI,weputforwardanumberofrequirementswebelievetobeimportantforlearningalgorithmstoapproachAI,manyofwhichmotivatetheresearcharedescribedhere:
Abilitytolearncomplex,highly-varyingfunctions,i.e.,withanumberofvariationsmuchgreaterthanthenumberoftrainingexamples.
Abilitytolearnwithlittlehumaninputthelow-level,
intermediate,andhigh-levelabstractionsthatwouldbeuse-fultorepresentthekindofcomplexfunctionsneededforAItasks.
Abilitytolearnfromaverylargesetofexamples:computa-
tiontimefortrainingshouldscalewellwiththenumberofexamples,i.e.,closetolinearly.
Abilitytolearnfrommostlyunlabeleddata,i.e.,toworkin
thesemi-supervisedsetting,wherenotalltheexamplescomewithcompleteandcorrectsemanticlabels.
Abilitytoexploitthesynergiespresentacrossalargenum-
beroftasks,i.e.,multi-tasklearning.ThesesynergiesexistbecausealltheAItasksprovidedifferentviewsonthesameunderlyingreality.
Strongunsupervisedlearning(i.e.,capturingmostofthesta-
tisticalstructureintheobserveddata),whichseemsessentialinthelimitofalargenumberoftasksandwhenfuturetasksarenotknownaheadoftime.
Otherelementsareequallyimportantbutarenotdirectlyconnectedtothematerialinthismonograph.Theyincludetheabilitytolearntorepresentcontextofvaryinglengthandstructure
[146],
soastoallowmachinestooperateinacontext-dependentstreamofobservationsandproduceastreamofactions,theabilitytomakedecisionswhenactionsinfluencethefutureobservationsandfuturerewards
[181]
,andtheabilitytoinfluencefutureobservationssoastocollectmorerelevantinformationabouttheworld,i.e.,aformofactivelearning
[34].
10 Introduction
OutlineofthePaper
Section2reviewstheoreticalresults(whichcanbeskippedwithouthurtingtheunderstandingoftheremainder)showingthatanarchi-tecturewithinsufficientdepthcanrequiremanymorecomputationalelements,potentiallyexponentia
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年普通整流管芯片項目可行性研究報告
- 2025至2031年中國帶棚架松式絡筒機行業(yè)投資前景及策略咨詢研究報告
- 2025年室內型電子顯示屏幕項目可行性研究報告
- 2025年噸包裝機項目可行性研究報告
- 2025至2031年中國丁維鈣粉行業(yè)投資前景及策略咨詢研究報告
- 2025年三氯乙基磷酸酯項目可行性研究報告
- 2025至2030年香水木大雙龍船項目投資價值分析報告
- 2025至2030年中國鋁條插角件數(shù)據(jù)監(jiān)測研究報告
- 2025至2030年線性增壓內壓力試驗機項目投資價值分析報告
- 2025至2030年環(huán)氧樹脂地坪面層涂料項目投資價值分析報告
- 牛津書蟲系列1-6級 雙語 4B-03.金銀島中英對照
- 2024-2025學年深圳市南山區(qū)六年級數(shù)學第一學期期末學業(yè)水平測試試題含解析
- 2024-2030年中國免疫細胞存儲行業(yè)市場發(fā)展分析及競爭形勢與投資戰(zhàn)略研究報告
- 工貿行業(yè)企業(yè)安全生產標準化建設實施指南
- 機械基礎(少學時)(第三版) 課件全套 第0-15章 緒論、帶傳動-氣壓傳動
- T-CACM 1560.6-2023 中醫(yī)養(yǎng)生保健服務(非醫(yī)療)技術操作規(guī)范穴位貼敷
- 07J912-1變配電所建筑構造
- 鋰離子電池串并聯(lián)成組優(yōu)化研究
- 人教版小學數(shù)學一年級下冊第1-4單元教材分析
- JTS-215-2018碼頭結構施工規(guī)范
- 大酒店風險分級管控和隱患排查治理雙體系文件
評論
0/150
提交評論