在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models

上傳人：策*** IP屬地：山西上傳時間：2025-01-04 格式：DOCX 頁數(shù)：151 大?。?.76MB 積分：19.9 舉報 版權(quán)申訴

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models_第2頁

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models_第3頁

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models_第4頁

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models_第5頁

已閱讀5頁，還剩146頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

BuildingAgenticSystemsinanEraofLargeLanguageModels

CharlesPacker

ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley

TechnicalReportNo.UCB/EECS-2024-223

/Pubs/TechRpts/2024/EECS-2024-223.html

December19,2024

Allrightsreserved.

Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor

personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare

notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.

Fall2024

BuildingAgenticSystemsinanEraofLargeLanguageModels

CharlesPacker

Adissertationsubmittedinpartialsatisfactionoftherequirementsforthedegreeof

DoctorofPhilosophy

ComputerScience

inthe

GraduateDivision

ofthe

UniversityofCalifornia,Berkeley

Committeeincharge:

ProfessorJosephE.Gonzalez,ChairProfessorIonStoica

ProfessorMateiZahariaDoctorYuandongTian

BuildingAgenticSystemsinanEraofLargeLanguageModels

CharlesPacker

Abstract

BuildingAgenticSystemsinanEraofLargeLanguageModels

CharlesPacker

DoctorofPhilosophyinComputerScience

UniversityofCalifornia,BerkeleyProfessorJosephE.Gonzalez,Chair

Buildingintelligentautonomoussystemsthatcanreason,adapt,andinteractwiththeirenvironmenthasbeenalong-standinggoalinartificialintelligence.Thisthesisexplorestheevolutionofagenticsystemsthroughthedeeplearningrevolution,fromreinforcementlearningtomodernLargeLanguageModels(LLMs),focusingonthecriticalcomponentsneededtocreatereliableautonomousagents.

First,weaddressthefundamentalchallengeofgeneralizationindeepreinforcementlearn-ing(RL),introducingasystematicframeworkforevaluatingandimprovinghowlearnedpoli-ciestransferacrossenvironments.Buildingonthisfoundation,wepresentHindsightTaskRelabeling(HTR),anovelapproachthatenablesmeta-RLalgorithmstolearnadaptationstrategiesinsparserewardsettingswithoutrequiringdenserewardsignalsduringtraining.

Finally,weaddresstheemergingchallengesofbuildingreliableagentsusingLargeLan-guageModels.WhileLLMsdemonstrateunprecedentedreasoningcapabilities,theireffec-tivenessasautonomousagentsislimitedbyfundamentalconstraintsintheirarchitecture-mostnotably,theirstatelessnatureandfixedcontextwindows.WepresentMemGPT,anoperatingsystem-inspiredframeworkthatenablesLLMstomanagetheirownmemoryandstate,introducingconceptslikevirtualcontextmanagementandself-directedmemoryopera-tions.MemGPTdemonstratesthatbytreatingLLMsasanewfundamentalunitofcompute-analogoustohowCPUswerethefundamentalunitintraditionaloperatingsystems-wecanbuildmorereliableandcapableautonomousagents.

Together,thesesystemstracetheevolutionofagenticAIsystemsandprovidekeybuild-ingblocksforcreatingmorereliableandcapableautonomousagents.Byaddressingcorechallengesingeneralization,adaptation,andmemorymanagement,thisthesisestablishesafoundationforengineeringthenextgenerationofAIsystemsthatcaneffectivelyreasonandinteractwiththeworld.

Tomyparents

Contents

ListofFigures

ListofTables

Acknowledgments

1Introduction

1.1Background

1.1.1TheDeepLearningRevolutioninRoboticsandControl

1.1.2TheRiseofFoundationModels

1.2DeepLearningforAgenticSystems

1.3TheLLMAgentParadigm

2AssessingGeneralizationinDeepReinforcementLearning

2.1Introduction

2.2Background

2.3Notation

2.4Algorithms

2.5Environments

2.6Experimentalsetup

2.7Experimentalsetup

2.8Resultsanddiscussion

2.9Conclusion

2.10Additionaldetails

2.10.1EnvironmentDetails

2.10.2TrainingHyperparameters

2.10.3DetailedExperimentalResults

2.10.4BehaviorofMountainCar

2.10.5TrainingCurves

2.10.6Videosoftrainedagents

Contentsiii

3HindsightTaskRelabelling:ExperienceReplayforSparseRewardMeta-

3.1Introduction

3.2Relatedwork

3.3Background

3.3.1Meta-ReinforcementLearning(Meta-RL)

3.3.2Off-PolicyMeta-ReinforcementLearning

3.3.3HindsightExperienceReplay

3.4LeveragingHindsightinMeta-ReinforcementLearning

3.4.1AlgorithmDesign

3.4.2SingleEpisodeRelabeling(SER)strategy

3.4.3EpisodeClustering(EC)strategy

3.4.4ComparisonofHTRandHER

3.4.5Limitations

3.5Experiments

3.5.1Environments

3.5.2HTRenablesmeta-trainingusingonlysparsereward

3.5.3Varyingkeyhyperparameters

3.6Conclusion

3.7ExperimentalSetup(additionaldetails)

3.7.1ComputingInfrastructure

3.7.2Hyperparameters

3.7.3RewardFunctions

3.7.4ChangingtheDistancetoGoal

3.8AlgorithmSpecifics

3.8.1Sample-TimevsDataGenerationRelabelling

3.8.2SingleEpisodeRelabellingImplementationDetails

3.8.3EpisodeClusteringImplementationDetails

3.8.4TimeandSpaceComplexity

4MemGPT:TowardsLLMsasOperatingSystems

4.1Introduction

4.2MemGPT(MemoryGPT)

4.2.1Maincontext(prompttokens)

4.2.2QueueManager

4.2.3Functionexecutor(handlingofcompletiontokens)

4.2.4Controlflowandfunctionchaining

4.3Experiments

4.4Experiments

4.4.1MemGPTforconversationalagents

4.4.2MemGPTfordocumentanalysis

4.5Relatedwork

Contentsiv

4.6Conclusion

4.7Additionaldetails

4.7.1Limitations

4.7.2MemGPTpseudocode

4.7.3MemGPTfunctionset

4.7.4Promptsandinstructions

4.7.5BalancingWorkingContextandtheFIFOQueue

5FromServingModelstoServingAgents:TheMissingPiecesforSupport-

ingAgenticWorkloads

5.1Introduction

5.1.1TheExistingStatelessLLMProgrammingModel

5.1.2AgenticProgrammingModel

5.1.3AgentState

5.2TheAgentHostingLayer

5.2.1LLMInference:Co-optimizationwiththeinferencelayer

5.2.2State&ContextManagement

5.2.3Multi-agentcommunicationandorchestration

6Conclusion&FutureWork

Bibliography

ListofFigures

2.1Schematicofthethreeversionsofanenvironment

2.2MountainCar:heatmapoftherewardsachievedbyA2CwiththeFFarchitecture

onDRandDE.TheaxesarethetwoenvironmentparametersvariedinRandE.

2.3Pendulum:heatmapoftherewardsachievedbyA2CwiththeFFarchitecture

onDRandDE.TheaxesarethetwoenvironmentparametersvariedinRandE.

2.4PPOwithFFarchitecture

2.5PPOwithRCarchitecture

2.6EPOpt-PPOwithFFarchitecture

2.7EPOpt-PPOwithRCarchitecture

2.8RL2-PPO

2.9TrainingcurvesforthePPO-basedalgorithmsonCartPole,allthreeenvironment

versions.Notethatthedecreaseinmeanepisoderewardat10000episodesinthe

twoEPOpt-PPOplotsisduetothefactthatittransitionsfrombeingcomputed

usingallgeneratedepisodes(?=1)toonlythe10%withlowestreward(?=0.1).

2.10VideoframesofagentstrainedwithA2ConHalfCheetah,trainedintheDeter-

ministic(D),Random(R),andExtreme(E)settings(fromtoptobottom).All

agentsevaluatedintheDsetting

2.11VideoframesofagentstrainedwithPPOonHalfCheetah,trainedintheDeter-

ministic(D),Random(R),andExtreme(E)settings(fromtoptobottom).All

agentsevaluatedintheDsetting

ListofFiguresvi

3.1Ingoal-conditionedRL(a),anagentmustnavigatetoaprovidedgoallocationg

(filledcircle,revealedtotheagent).Anunsuccessfulattemptforgoalgprovides

nosparserewardsignal,butcanberelabelledasasuccessfulattemptforgoalg′,

creatingsparserewardthatcanbeusedtotraintheagent.Inmeta-RL(b),the

taskT(i.e.,goal,hollowcircle)isneverrevealedtotheagent,andinsteadmust

beinferredusingexperienceonpriortasksandlimitedexperience(τ1:t?1)onthe

newtask.In(b),thereisnosharedoptimaltaskT′torelabelallattemptswith.

HTRrelabelseachattemptτunderitsownhindsighttaskT′,andmodifiesthe

underlyingmeta-RLtraininglooptolearnadaptationstrategiesontherelabelled

tasks.Notethatweincludemultipletrajectoriesτin(b)vsasingletrajectory

in(a)tohighlighttheadaptationstageinmeta-RL,whichdoesnotexistin

goal-conditionedRLandrequiressignificantlydifferentsamplingandrelabeling

procedures

3.2Sparserewardenvironmentsformeta-RLthatrequiretemporally-extendedex-

ploration.Ineachenvironment,thetask(thetop-leftcirclein(a),thegreen

spherein(b)and(c))isnotrevealedtotheagentviatheobservation.Theagent

mustinsteadinferthetaskthroughtemporally-extendedexploration(illustrated

bythedottedlinesin(a)),sincenorewardsignalisprovideduntilthetaskis

successfullycompleted.Priormeta-RLmethodssuchasPEARL(.Rakellyetal

2019)andMAESN(Guptaetal.2018b)areonlyableto(meta-)learnmeaning-

fuladaptationstrategiesusingdenserewardfunctions.Ourapproach,Hindsight

TaskRelabeling(HTR),can(meta-)trainwiththeoriginalsparserewardfunction

anddoesnotrequireadditionaldenserewardfunctions

3.3IllustrationofHindsightTaskRelabeling(HTR)inameta-RLtrainingloop.

HTRisagnostictotheunderlying(off-policy)meta-RLalgorithm;theagent

architectureand/ortrainingspecifics(e.g.,theencoderφ,actorπandQ-function

neuralnetworksshowninblue)canbemodifiedindependentlyoftherelabeling

scheme.HTRcanalsobeperformedinan‘eager’fashionatthedatacollection

stage(asopposedto‘lazy’relabelinginthedatasamplingstage),seeSection3

fordetails

3.4HTRalgorithm

3.5Evaluatingadaptationtotraintasksprogressivelyduringmeta-training.Y-

axismeasuresaveragesparsereturnduringadaptationthroughoutmeta-training

(shadedstddev),thoughtheoracleisstilltrainedusingdensereward.Conven-

tionalmeta-RLmethodsstruggletolearnusingsparsereward.HindsightTask

Relabeling(HTR)iscomparabletodenserewardmeta-trainingperformance

3.6Evaluatingadaptationtotesttasksaftermeta-training.Y-axismeasuresaverage

(sparse)returnduringadaptationusingcontextcollectedonline,usingsparsere-

wardonly.AdaptationstrategieslearnedwithHindsightTaskRelabeling(HTR)

generalizetoheld-outtasksaswellastheoraclewhichislearnedusingshapedre-

wardfunctions.WithoutHTRoraccesstoashapedrewardduringmeta-training,

theagentisunabletolearnareasonablestrategy

ListofFiguresvii

3.7Visualizingexplorationbehaviorlearnedduringmeta-trainingusing300pre-

adaptationtrajectories(i.e.,sampledfromthelatenttaskprior).Inthesparse

rewardsetting,withoutHTR(middlerow)theagentisunabletolearnameaning-

fulexplorationstrategyandappearstoexplorerandomlyneartheorigin.With

HTR(bottomrow),theagentlearnstoexplorenearthetruetaskdistribution

(greycircles),similartoanagenttrainedwithashapeddenserewardfunction

(toprow)

3.8ComparingHTRwithSERvsEConPointRobot

3.9AveragereturnwhenvaryingKonPointRobot

3.10AveragetaskdistancewhenvaryingKonPointRobot

3.11RelativerewardsignalfromhindsightvsgroundtruthtasksusingPointRobot.

3.12Meta-trainingonPointRobotwithvaryinggoaldistances.Ifthedistanceto

thegoalisshortenoughforrandomexplorationtoleadtosparsereward,meta-

trainingispossibleusingonlythesparserewardfunction.Oncethisisnolonger

thecase,meta-trainingisonlypossiblewithaproxydenserewardfunction,or

byusingHindsightTaskRelabellingontheoriginalsparserewardfunction

3.13IllustrationofHindsightTaskRelabeling(HTR)usingEpisodeClustering(EC)

inameta-RLtrainingloop,whererelabellingoccursatthedatacollectionstage.

4.1MemGPTwritesdatatopersistentmemoryafteritreceivesasystemalertabout

limitedcontextspace

4.2MemGPTcansearchout-of-contextdatatobringrelevantinformationintothe

currentcontextwindow

4.3InMemGPT,afixed-contextLLMprocessorisaugmentedwithahierarchical

memorysystemandfunctionsthatletitmanageitsownmemory.TheLLM’s

prompttokens(inputs),ormaincontext,consistofthesysteminstructions,work-

ingcontext,andaFIFOqueue.TheLLMcompletiontokens(outputs)arein-

terpretedasfunctioncallsbythefunctionexecutor.MemGPTusesfunctions

tomovedatabetweenmaincontextandexternalcontext(thearchivalandre-

callstoragedatabases).TheLLMcanrequestimmediatefollow-upLLMin-

ferencetochainfunctioncallstogetherbygeneratingaspecialkeywordargu-

ment(request_heartbeat=true)initsoutput;functionchainingiswhatallows

MemGPTtoperformmulti-stepretrievaltoansweruserqueries

lected1/2024).*Approximatessagecounassumingaprepromptof1ktokens,

4.4ComparingcontextlengthsofcommonlyusedmodelsandLLMAPIs(datacol-

andanaveragemessagesizeof50tokens(250characters)

4.5AnexampleconversationsnippetwhereMemGPTupdatesstoredinformation.

Heretheinformationisstoredinworkingcontextmemory(locatedwithinthe

prompttokens)

ListofFiguresviii

4.6DocumentQAtaskperformance.MemGPT’sperformanceisunaffectedby

increasedcontextlength.Methodssuchastruncationcanextendtheeffective

contextlengthsoffixedlengthmodelssuchasGPT-4,butsuchcompression

methodswillleadtoperformancedegradationasthenecessarycompressiongrows.

RunningMemGPTwithGPT-4andGPT-4Turbohaveequivalentresultsonthis

task

4.7AnexampleofMemGPTsolvingthedocumentQAtask.AdatabaseofWikipedia

documentsisuploadedtoarchivalstorage.MemGPTqueriesarchivalstoragevia

functioncalling,whichpullspaginatedsearchresultsintomaincontext

4.8NestedKVretrievaltaskperformance.MemGPTistheonlyapproach

thatisabletoconsistentlycompletethenestedKVtaskbeyond2nestinglevels.

WhileGPT-4Turboperformsbetterasabaseline,MemGPTwithGPT-4Turbo

performsworsethanMemGPTwithGPT-4

4.9AnexampleofMemGPTsolvingthenestedKVtask(UUIDsshortenedforread-

ability).Theexamplekey-valuepairhastwonestinglevels,andtheMemGPT

agentreturnsthefinalanswerwhenaqueryforthefinalvalue(f37 617)only

returnsoneresult(indicatingthatitisnotalsoakey)

4.10MemGPTalgorithmpseudocode

ListofTables

2.1Generalizationperformance(in%success)ofeachalgorithm,averagedoverall

environments(meanandstandarddeviationoverfiveruns)

2.2Rangesofparametersforeachversionofeachenvironment,usingsetnotation

2.3Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onAcrobot

2.4Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onCartPole

2.5Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onMountainCar

2.6Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onPendulum

2.7Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onHalfCheetah

2.8Meanandstandarddeviationoverfiverunsofgeneralizationperformance(in%

success)onHopper

4.1Deepmemoryretrieval(DMR)performance.Inthistask,theagentisaskeda

specificquestionaboutatopicdiscussedinapriorconversation(sessions1–5).

Theagent’sresponseisscoredagainstthegoldanswer.MemGPTsignificantly

outperformsthefixed-contextbaselines.‘R-L’isROUGE-L

4.2Conversationopenerperformance.Theagent’sconversationopenerisevaluated

usingsimilarityscorestothegoldpersonalabels(SIM-1/3)andtothehuman-

createdopener(SIM-H).MemGPTisabletoexceedtheperformanceofthe

human-createdconversationopenerwithavarietyofunderlyingmodels

Acknowledgments

Firstandforemost,Iwanttothankmyfamily,whoalwayspushedmetoachievemore.TheyarethereasonIlovetodohardthings.

NextIwouldliketothankmyadvisor,ProfessorJosephE.Gonzalez.JoeyhelpedmeachievemyonetruegoalinthePhD:tomakesciencefictionintosciencereality.Hisflexibilityandencouragement,regardlessofwheremyresearchinterestsled(evenwhennotdirectlyinhiscriticalresearchpath),wereinstrumentaltomysuccess.IcouldnothaveaskedforabetterPhDadvisor.

Iamalsodeeplygratefultomyotherthesiscommitteemembers:IonStoica,MateiZaharia,andYuandongTian.HavingsuchrenownedworldexpertsinAIandsystemsresearchonmycommitteewasanincrediblehonor.

MyjourneyinAIresearchbeganatUCSanDiego,whereIworkedwithProfessorsJulianMcAuleyandKamalikaChaudhuriasanundergraduate.ThisledtomyworkwithProfessorLawrenceHolderduringanREUatWashingtonStateUniversity,whereIwrotemyfirstfirst-authorpaper.Aftergraduation,ProfessorDawnSongtookachanceonme,hiringmeafterabriefchatataStarbucksinHayesValley-amomentthatbroughtmetoBerkeleyandsetmeonmypathtowardthePhD.

SeveralmentorswerecrucialtomydevelopmentasaresearcherduringmytimeatBerke-ley.VladlenKoltuntaughtmeinvaluablelessonsaboutresearchdiscipline,particularlyaboutknowingwhentoabandon‘zombie’researchprojects-adviceIwishIhadfollowedmoreclosely.RichardShinandKatelynGaoworkedcloselywithmeduringmyfirsttwoyearsatBerkeleyandweregreatmentors.OnceIbeganthePhD,RowanMcAllisterandNickRhinehartguidedmyresearchinautonomousvehiclesandhelpedmaintainmyresearchmo-mentumduringthechallengingmiddleyearsofmyPhD.I’malsogratefultoPieterAbbeelandSergeyLevine,who,thoughnotmyformaladvisors,providedcrucialfeedbackthathelpedseveralpaperscrossthefinishlinetopublication.

TheRISELabwasanincrediblehomeformyresearch.Iwasfortunatetoworkalong-sideamazingcolleaguesinJoey’sgroup:KevinLin,LisaDunlap,JustinWong,ShishirPatil,TianjunZhang,ParasJain,SukritKalra,andSuziePetryk.Theinfamous"StarFactory"cubicle,whichallegedlyhousedtheDatabricksfoundersandlatertheAnyscalefounders,becamethebirthplaceofMemGPT,Gorilla,andSkyPlaneduringmytimethere-anunmatcheddensityofopensourceresearchcontributionsinasinglecubiclespace.

Andfinally,IwouldliketothankSarahWoodersandKevinLin,whoarejoiningmeonan

Acknowledgmentsxi

excitingnewadventurepost-PhD,wherewe’llbetakingourresearchoncontextmanagementforLLMagentsintotherealworld.

Thisthesis,andthejourneyitrepresents,wouldnothavebeenpossiblewithoutthesupport,guidance,andencouragementofalltheseincrediblepeople.Thankyou.

Additionalcontextaroundthisthesis:Thisthesiswaswrittenduringanextraordinaryperiodinartificialintelligenceresearch(2017-2024).WhenIbeganmyPhD,deepreinforce-mentlearningwasattheforefrontofautonomoussystemsresearch,withbreakthroughslikeAlphaGoandOpenAIFivedemonstratingsuperhumanperformanceincomplexgames.

Thencamethetransformerrevolution.Whatstartedasincrementalimprovementsinnaturallanguageprocessingrapidlyevolvedintosomethingfarmoreprofound.ThereleaseofChatGPTinlate2022markedaparadigmshiftnotjustinAIresearch,butinhowsocietyviewedartificialintelligence.LargeLanguageModelsdemonstratedcapabilitiesthatseemedimpossiblejustafewyearsearlier:sophisticatedreasoningandintelligencethatwasgeneral.

Ihadtheuniqueprivilegeofnotjustwitnessingthisrevolution,butactivelyparticipatinginit.Myresearchjourneyparalleledthistransition:fromworkingonfundamentalchallengesindeepreinforcementlearning,toultimatelyhelpingpioneernewapproachesforbuildingreliableautonomoussystemsusingLargeLanguageModels.Thisthesisreflectsboththe‘before’and‘a(chǎn)fter’ofthispivotalmomentinAIhistory;atimethatwilllikelyberememberedasthebeginningofthefoundationmodelera.

Thespeedofprogressduringthisperiodwasunprecedented.Papersthatseemedcutting-edgewhenIstartedmyPhDquicklybecamehistoricalartifacts.Researchdirectionsthatappearedpromisingweresuddenlyobsolete.Yetthisrapidevolutioncreatedextraordinaryopportunitiestocontributetogenuinelynewdirectionsincomputerscience:tohelpestab-lishthefoundationsforhowwebuildAIsystemsinthisnewera.

Thisthesisrepresentsmysmallcontributiontothisremarkableperiodincomputinghistory.

Chapter1

Introduction

Buildingintelligentautonomoussystemsthatcaneffectivelyreason,adapt,andinteractwiththeirenvironmenthasbeenalongstandinggoalinartificialintelligence.Therecentdeeplearningrevolution,particularlytheemergenceofLargeLanguageModels(LLMs),hasdramaticallychangedourapproachtobuildingsuchsystems.Thisthesistracesthisevolutionthroughseveralkeyadvancesinbuildingagenticsystems,fromdeepreinforcementlearningtomodernLLM-basedapproaches,focusingonthecriticalcomponentsneededtocreatereliableautonomousagents.

1.1Background

Thedevelopmentofagenticsystemshasundergoneseveralsignificantparadigmshifts,eachintroducingnewcapabilitiesandchallenges.Understandingtheseshiftsandtheirim-plicationsiscrucialforbuildingeffectiveautonomousagents.

1.1.1TheDeepLearningRevolutioninRoboticsandControl

Theintegrationofdeepneuralnetworkswithreinforcementlearningmarkedasignificantadvancementinautonomoussystems.Thiscombinationenabled:

?End-to-EndLearning:DeepRLallowedsystemstolearndirectlyfromrawsensoryinput,eliminatingtheneedforhand-engineeredfeatures.

?ComplexPolicyLearning:Neuralnetworksasfunctionapproximatorsenabledlearningsophisticatedcontrolpoliciesforhigh-dimensionaltasks.

?ImprovedGeneralization:Deeparchitecturespromisedbettertransferoflearnedbe-haviorsacrosssimilartasks.

However,severalkeychallengesemerged:

1.2.DEEPLEARNINGFORAGENTICSYSTEMS2

?LimitedGeneralization:Learnedpoliciesoftenfailedtotransferbeyondtheirspecifictrainingconditions

?SampleInefficiency:DeepRLsystemsrequiredextensiv

人人文庫> 全部分類> 應(yīng)用文書 > 研究報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models

文檔簡介

溫馨提示

最新文檔

評論

在大規(guī)模語言模型時代構(gòu)建自主系統(tǒng) Building Agentic Systems in an Era of Large Language Models

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔