Accenture企業(yè)人工智能 - 擴(kuò)展機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型_第1頁(yè)
Accenture企業(yè)人工智能 - 擴(kuò)展機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型_第2頁(yè)
Accenture企業(yè)人工智能 - 擴(kuò)展機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型_第3頁(yè)
Accenture企業(yè)人工智能 - 擴(kuò)展機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型_第4頁(yè)
Accenture企業(yè)人工智能 - 擴(kuò)展機(jī)器學(xué)習(xí)和深度學(xué)習(xí)模型_第5頁(yè)
已閱讀5頁(yè),還剩45頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

AWSWhitepaper

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

Copyright?2024AmazonWebServices,Inc.and/oritsa?liates.Allrightsreserved.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels:AWSWhitepaper

Copyright?2024AmazonWebServices,Inc.and/oritsa?liates.Allrightsreserved.

Amazon'strademarksandtradedressmaynotbeusedinconnectionwithanyproductorservicethatisnotAmazon's,inanymannerthatislikelytocauseconfusionamongcustomers,orinanymannerthatdisparagesordiscreditsAmazon.AllothertrademarksnotownedbyAmazonarethepropertyoftheirrespectiveowners,whomayormaynotbea?liatedwith,connectedto,orsponsoredbyAmazon.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

PAGE\*roman

iii

TableofContents

Abstractandintroduction

i

Abstract

1

AreyouWell-Architected?

1

Introduction

2

Frictionlessideationtoproduction

2

Workforceanalyticsusecases

3

Anintelligentplatformapproach

4

MLarchitectureonAWS

5

Featureengineering

6

Thefeaturestore

6

Thealgorithms

8

Dataengineeringanddataquality

8

Hyper-parametertuning(HPT)

8

Modelregistry

10

Optimization

11

Optimizationdrivers

11

Fine-tuningandreuseofmodels

11

Scalingwithdistributedtraining

13

Avoidingcommonmisstepstoreducerework

13

Machinelearningpipelines

15

GoingfromPOCtolarge-scaledeployments

15

Applyingsoftwareengineeringprinciplestodatascience

16

Machinelearningautomationthroughpipelines

17

Trackinglineage

18

Monitoringforperformanceandbias

19

Post-trainingbiasmetrics

20

Monitoringperformance

20

Dataqualitymonitoring

20

Dealingwithdrifts

21

AugmentedAI

22

Human-in-the-loopwork?ows

22

Updatingmodelversions

24

Conclusion

25

Contributors

26

Furtherreading

27

AboutAccenture

28

Documentrevisions

29

Notices

30

AWSGlossary

31

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Abstract

1

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

Publicationdate:July27,2022(

Documentrevisions

)

Abstract

Today,thereisareal-time,global,tectonicshiftintheworkplacecausedbydigitaltransformation.AcceleratedbytheCovidpandemic,thisdigitaltransformationhascreatednever-seen-beforeopportunitiesandsigni?cantworkplacedisruption.Fullyrealizingthenewmarketopportunitiesdemandsamodernizedworkforce.Askillsgapcontributedtobyseveralfactorsexistintoday'slabormarket.Someofthesefactorsaretheincreaseinthenumberofpeopleenteringtheworkforceeachyear,lackofrelevanteducation,andtheriseintechnologywhichneedsworkerstobeequippedwithnewskillstohelpthemkeepupwithadvancements.AddressingthiswideninggapbetweenthecurrentworkforceskillsandthoseneededfortomorrowisfrontandcenterinthemindsofeveryC-suite.

Thiswhitepaperoutlinesaninnovative,scalableandautomatedsolutionusing

deeplearning

(DL)and

machinelearning

(ML)onAmazonWebServices(AWS),tohelpsolvetheproblemofbridgingtheexistingtalentandskillsgapforbothworkersandorganizations.Combiningadvanceddatascience,MLengineering,deeplearning,

ethicalarti?cialintelligence

(AI),and

MLOps

onAWS,

thiswhitepaperprovidesaroadmaptoenterprisesandteamstohelpbuildproduction-readyMLsolutions,andderivebusinessvalueoutofthesame.

AreyouWell-Architected?

The

AWSWell-ArchitectedFramework

helpsyouunderstandtheprosandconsofthedecisionsyoumakewhenbuildingsystemsinthecloud.ThesixpillarsoftheFrameworkallowyoutolearnarchitecturalbestpracticesfordesigningandoperatingreliable,secure,e?cient,cost-e?ective,andsustainablesystems.Usingthe

AWSWell-ArchitectedTool

,availableatnochargeinthe

AWS

ManagementConsole

,youcanreviewyourworkloadsagainstthesebestpracticesbyansweringasetofquestionsforeachpillar.

Inthe

MachineLearningLens

,wefocusonhowtodesign,deploy,andarchitectyourmachinelearningworkloadsintheAWSCloud.ThislensaddstothebestpracticesdescribedintheWell-ArchitectedFramework.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Introduction

2

Formoreexpertguidanceandbestpracticesforyourcloudarchitecture—referencearchitecturedeployments,diagrams,andwhitepapers—refertothe

AWSArchitectureCenter

.

Introduction

Today’srapidlychangingenvironmentdemandstheabilityfororganizationstoadapttochangetocreateasustainableandproductiveworkforce.Thrivinginthisenvironmentrequiresrapidadaptationandreadinessforupskillingtheworkforcefortomorrow.

AccordingtoVentureBeat,about87%ofMLmodelsnevermakeittoproduction.Eventhough9outof10businessexecutivesbelievethatAIwillbeatthecenterofthenexttechnologicalrevolution,completion,andsuccessfulproductiondeploymentis

seenasabigchallenge

asit

requiresspeci?cengineeringexpertiseandcollaborationbetweenseveralteams(MLengineering,IT,DataScience,DevOps,andsoon).

Accenture

hasbuiltascalable,industrialized,AI-poweredsolutionthatisakeycomponentinhelpingsolvethetalentandskillingproblemoftodayandtomorrowtocreateaproductiveworkforce.Itdescribesaninnovative,cloud-nativeAWSapproachthatcanbetakentoindustrializetheMLsolution,andhelporganizationsbridgetheskillsgap.

Thiswhitepaperdescribesatechnicalsolution(alsoreferredtoasindustrysolution)forbuildingandscalingML,andspeci?cally,DLmodelsfortheseusecases,andhowAccentureisindustrializingtheend-to-endprocesstoachievethetechnicalgoalspreviouslydetailed.Thetechnicalthoughtprocessexplainedherecanbeexpandedandappliedtomostproblemsinotherindustries.YoucanalsouseittocreateastableandsustainableEnterpriseAIsystem.

Frictionlessideationtoproduction

ThegoalofEnterpriseAIandMLOpsistoreducefrictionandgetallmodelsfromideationtoproductionintheshortestpossibletime,withaslittleriskaspossible.IntegratingAItechnologiesintobusinessoperationscanprovetobeagame-changerfororganizations,withthebene?ts

ofreducingcosts,boostinge?ciency,generatingactionable,preciseinsights,andcreatingnewrevenuestreams.Thisrequiresnotonlycreatinge?cientmodels,butalsocreatingacompleteend-to-endstable,resilient,andrepeatableEnterpriseAIsystemthatcanprovidesustainablevalueandbeamenabletocontinuousimprovementstoadapttochangingenvironments.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

PAGE

3

Workforceanalyticsusecases

Workforceanalytics

isanadvancedsetofdataanalysistoolsandmetricsforcomprehensiveworkforceperformancemeasurementandimprovement.WorkforceanalyticsarehighlysensitivebynatureandrequiretrustedMLandDLalgorithmstocreatescalable,productionizedsolutionsforharnessinghumanpotential.

TobringaboutabalancebetweenresponsibleAIandcomputingatscale,Accenturecreatedanadvanced,reusable,AWSMLOpsarchitectureformultipleworkforceproductivityusecases,including:

Solutions.AI

forTalentandSkilling—An

AI-poweredsolution

thatdeliversintelligentinsightstohelpclosetheskillsgapinanyorganization.ThroughSolutions.AI,Accenturecreatedenterprise-wideAIsolutionsthatdelivergame-changingresults,fast.

FutureofU:Skills.Jobs.Growth

—AnAI-enabledplatformsolutioncreatedincollaborationwithAccenturepartners,whoarecommittedtoacceleratingasmoothtransitiontohelpgetdiversetalentskilledandintegratedintotheAIworkforce.

IntelligentWorkforceInsights

–Usesthepowerofcutting-edgeAI/MLtechnologiestodeveloprolemapsthatidentifydeclining,stable,andemergingskills,bothinternallyandinthemarket,andassessrolesforupskilling.

ThechallengewiththeseusecasesisthatthebackendneedsmassiveamountofcomputingresourcestoinferandproducetheMLscoreresults.Foroptimaluserexperience,thescoringneedstohappenfastenough,withnearreal-timeintermediatescores.Allthreeoftheseusecaseshavethreecommontechnicaltraits:

MLengineeringatscale

AI/MLautomationwithresponsibleAI

End-to-endproductionizedlarge-scaleresilientAI/MLsystems

Anintelligentplatformapproach

Deeplearningissettoachievetransformationacrossindustriesandcreatenewopportunitiesonascalenotseensincetheindustrialrevolutioninthe19thcentury.Itcomeswithgreatpromise,

butwithitaresigni?cantchallenges.MostDLmodelsarebuiltonricher

algorithmicprimitives

,andthereforerequirehigherreusebetweentasks,ratherthantrainingamodelfromscratcheachtimethereisanewproblemathand,oranewdataset.

Forbusinessestoderivevalue,MLandDLmodelsneedtobeproductionized,runatscale,andreusedacrossorganizations.Lackofscalability,repeatability,andmanualprocessesdiminishanyvaluethatwouldbeotherwiserealizedfromthesepowerfulmodels.

CompletesolutionforscalingandproductionizingDLmodelswithautomatedpipelines

Thisproposedarchitectureinthepreviousdiagramisdesignedtohelpachievethreegoals:

Greater,systematicreuseoffeaturesandarchitectures

Reducesmanualprocesses

Increasesspeedtomarket

MLarchitectureonAWS

Expandingonthepreviousarchitecture,thefollowingarchitectureisadrill-downofhowacompletesolutioncanbebuiltwithvariousAWScomponentsandconnectedforaseamless,resilient,production-gradesolutionthatisdrivenbyperformanceandeasytomaintain.

Completesolutionwithcloud-nativeAWScomponents

(Copyright2022?Accenture.Allrightsreserved)

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Thefeaturestore

PAGE

6

Featureengineering

ManyDLandMLmodelsareusedfortheworkforceproductivitysolution;however,textclassi?cationandsentencepredictionareinherentlythemainclassi?ersyouneed.Giventhesuperiorperformanceofneurallanguagemodels,andbecauseitenablesmachinestounderstandqualitativeinformation,it?tstheneedofbuildingneuralnetwork-basedDLmodelsforassessingpeoples’skillspro?ciency,andforrecommendingnewcareerpathways.

BidirectionalEncoderRepresentationsfromTransformers

(BERT)isthe?rstNaturalLanguageProcessing(NLP)techniquetorelysolelyonaself-attentionmechanism,whichismadepossiblebythebidirectionaltransformersatthecenterofBERT'sdesign.Thisissigni?cantbecauseawordmaychangemeaningasasentencedevelops.Eachwordaddedaugmentstheoverallmeaningofthesentence,andthecontextmaycompletelyalterthemeaningofaspeci?cword.

Thefeaturestore

OneofthekeyneedsfortheindustryusecaseslistedinthiswhitepaperistoprovideC-suiteandorganizationswitharoadmaptoaccelerate,scale,andsustaindigitaladoption.ToenableindividualtalentmobilityusingAI,itisnecessarytocollectdatapointsattheindividuallevel.

MakingAImodelsunderstandpeople’sstrengths,interests,andotherpersonalcriteriaresultinprovidingbettercareerrecommendationsthatbene?ttheworkforceandorganizationsalike.Oneofthe?rststepsinthejourneyofcreatingaproductionized,stableAI/MLplatformistofocusonacentralizedfeaturestore.

After

AmazonSageMakerProcessing

appliesthetransformationsde?nedinthe

SageMakerData

Wrangler

,thenormalizedfeaturesarestoredinano?inefeaturestoresothefeaturescanbesharedandreusedconsistentlyacrosstheorganizationamongcollaboratingdatascientists.ThismeansSageMakerProcessingandDataWranglercanbeusedtogeneratefeatures,andthenstoretheminafeaturestore.Thisstandardizationisoftenkeytocreatinganormalized,reusablesetoffeaturesthatcanbecreated,shared,andmanagedasinputintotrainingMLmodels.Youcanusethisfeatureconsistencyacrossthe

maturityspectrum

,whetheryouareastartuporanadvancedorganizationwithanMLcenterofexcellence.

The

AmazonSageMakerFeatureStore

isaccessibleacrosstheorganizationfordi?erentteamstocollaborate,promotingreuse,reducingoverallcost,andavoidingsiloswithduplicateworke?orts.ThefollowingqueryisasampleofthecentralFeatureStorecreatedwithBERTembeddings.A

SageMakerFeatureGroup

andaFeatureStorearecreated.Multipledownstreamteamscanretrieve

andusefeaturesfromthiscentralstoreinsteadofredoingfeatureengineeringrepeatedly,addingtotheorganization’soperationalcostsandnon-standardizationissues.

FeatureStorewithBERTembeddingsreadyforreuseacrosstheorganization

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Dataengineeringanddataquality

PAGE

8

Thealgorithms

TherearethreekeypillarstobuildingsuccessfulMLapplications.Ifnotdonecorrectly,inspiteofallthestate-of-the-artcontinuousintegration/continuousdelivery(CI/CD),featurestore,featureengineering,andgraphicsprocessingunit(GPU)-acceleratedDLorautomatedpipelines,theend-to-endEnterpriseAIplatformisboundtofail:

Thequalityofthedata

Theminimumlevelofcomplexityemployedtosolvetheproblem

Theabilityofthesolutiontobemeasuredandmonitored

Dataengineeringanddataquality

Thetalentandskillingindustryusecaserequiresover20datasourcestobeingestedfrom.Oneofthemainchallengesisto?xdataqualitybeforefeedingtherawdatasetsintoyourDLmodelsforclassi?cationandrecommendation.Dataqualityissuescandeeplyimpactnotjustthedataengineeringpipelines,butalltheMLpipelinesdownstreamaswell.

Deequ

helpsinanalyzingthe

datasetsacrossallthestagesoffeatureengineering,traininganddeployment.The

training-serving

skew

isaptlyshownbyDeequ,bydetectingdeviationfrombaselinestatistics.Deequcancreateschemaconstraintsandstatisticsforeachinputfeature.Completeness,Correlation,Uniqueness

andComplianceDeequmetricscanbetrackedintheMetricsRepository,andSparkprocessing

alertscanbesetondetectinganomaliesforimmediateactions.

Hyper-parametertuning(HPT)

Ashyper-parameterscontrolhowtheMLalgorithmlearnsthemodelparametersduringtraining,it’simportanttode?neoptimizationmetricsandcreate

SageMakerHyperParameterTuning

jobstoconvergeonthebestcombinationofhyper-parameters.BasedonMLbuildexperience,AWSrecommendsusing

Bayesianhyper-parameteroptimization

strategyovermanual,random,orgridsearch,asitusuallyyieldsbetterresultsusingfewercomputerresources.

Forthetalentandskillingindustryusecasesde?nedearlier,theDLmodelsneedtoclassifymillionsofjobsandskillstopredictagoodmatchanduserlearningsequence.ThefollowingdetailsaresomeofthethingsthatwefoundusefulandwerekeyinourthoughtleadershipforcreatingAIsolutions.Wede?netheobjectivemetricthattheHPTjobwilltrytooptimize,whichis

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Hyper-parametertuning(HPT)

PAGE

9

validationaccuracyforthetalentandskillingusecases.Thefollowingisanexamplecodesnippetforthemetricsde?nition:

objective_metric_name="validation:accuracy"

metrics_definitions=[

{"Name":"train:loss","Regex":"loss:([0-9\\.]+)"},

{"Name":"train:accuracy","Regex":"accuracy:([0-9\\.]+)"},

{"Name":"validation:loss","Regex":"val_loss:([0-9\\.]+)"},

{"Name":"validation:accuracy","Regex":"val_accuracy:([0-9\\.]+)"},

]

Next,wesettheHyperparameterTunerwithestimatorandHyperparameterranges.

Acrucialsettingistheearly_stopping_type,whichyousetsothatSageMakercanstopthetuningjobwhenitstartsto

over?t

,andcanhelpsavecostoftheoveralltuningjob.Inaccuratehyperparametertuningcannotonlyresultinexcessivecosts,butalsoanine?ectivemodelevenafterhoursoftraining.

objective_metric_name="validation:accuracy"

tuner=HyperparameterTuner(estimator=estimator,objective_type="Maximize",objective_metric_name=objective_metric_name,hyperparameter_ranges=hyperparameter_ranges,metric_definitions=metrics_definitions,max_jobs=2,

max_parallel_jobs=10,strategy="Bayesian",early_stopping_type="Auto",

)

Combiningallofthistogether,youhavethefollowingbuildandtrainingprocesstakingBERTasanexample.OtherDLmodelsbuiltwithPyTorch,MXNet,orTensorFlowfollowthesame

process.Itisessentialtogetthefollowingthreestages(withintheboxunderMACHINELEARNINGENGINEERING)correcttomoveontoproductionizingthesystemwithlargescalemodeldeployments.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Modelregistry

PAGE

10

CompleteMLengineeringprocessand?ne-tuningdeeplearningmodels

Modelregistry

Itisimportanttocatalogmodelstoexplainthemodelpredictionsandinsights.Itisalsoimportantthatallmodelspromotedtoproductionarecataloged,allmodelversionsmanaged,metadatasuchastrainingmetricsareassociatedwithamodel,andtheapprovalstatusofamodelismanaged.

Thisisespeciallyneededwhenorganizationswanttomovefromad-hocone-o?proof-of-conceptstoembeddingAIintheirenterprisesystemswithmultipleteams,doingdailyDLexperiments.ThisisimplementedinthesolutionusingSageMakerModelRegistry.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Optimizationdrivers

PAGE

11

Optimization

DLissimpleinessence.Inthelastfewyears,AWShasachievedastonishingresultsonmachine-perceptionproblemswiththehelpofsimpleparametricmodelstrainedwithgradientdescent(GD).Extendingthat,allthatisneededatthecoreissu?cientlylargeparametricmodelstrainedwithGDonalargedataset.

CreatingaDLalgorithmoridentifyingthealgorithmtouseand?ne-tuneisthe?rststep.Thenextstepforanenterpriseistoderivebusinessvalueoutofthealgorithm.Thatcanbeachievedonlywhenthemodelsareappropriatelyindustrialized,scaled,andcontinuouslyimproved.Ill-performingmodelsnegativelyimpactabusinessororganization’sbottomline.InAccenture’stalentandskillingsolution,thereareover50modelsrunninginProduction,makingalarge-scale,smooth,operationalizationprocessanecessity.

Optimizationdrivers

DLhaspositioneditselfasanAIrevolutionandisheretostay.Someofthebene?tsofusingDLmodelsare:

Reusability

Scalability

OptimizingandscalingMLandDLmodelsinproductionisacrucialsetoftasks,andonethatmustbedonewith?nesse.Tomaximizethebene?tslistedpreviously,aproperimplementationapproachmustbetaken.

Followingaredetailsonhowitshouldbeimplementedfortheindustryusecases,takingtheexampleofafewofthemodels.ThesameapproachcanbeusedforscalingmanyotherDLmodelsfornewproblems.

Fine-tuningandreuseofmodels

Periodically,businessesgetupdatedtrainingdatafrommarketintelligencedatasourcesonnewmarkettrends.Thereisalwaysaneedtooptimizethehyper-parametersoftheTensorFlowBERTclassi?erlayer.Forsuchcases,wherethetuningjobmustberunagainwithanupdated

datasetoranewversionofthealgorithm,warmstartwithTRANSFER_LEARNINGasthestarttype

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Fine-tuningandreuseofmodels

PAGE

12

helpsreusethepreviousHPTjobresults,butalongwithnewhyperparameters.Thisspeedsupconvergingonthebestmodelfaster.

ThisisparticularlyimportantinEnterpriseAIsystems,asmultipleteamsmaywanttoreusethemodelscreated.TrainingDLmodelsfromscratchrequiresalotofGPU,compute,andstorageresources.Modelreuseacrosstheorganizationhelpsinreducingcosts.Therefore,ausefultechniqueformodelreuseis?ne-tuning.Fine-tuningmethodologyinvolvesunfreezingafewofthetoplayersofafrozenmodelbaseforfeatureextraction,andthenjointlytrainingboththenewlyaddedpartofthemodel,whichisthefullyconnectedclassi?erandtoplayers.Withthis,amodelcanbereusedforadi?erentproblem,anddoesnothavetobere-trained,savingcostsforthecompany.

Inthefollowingsections,youwillseehowyoucanimplementandscalethemodel?ne-tuningstrategiespreviouslydiscussed,whilemaintainingalaserfocusonthebusinessmetricsweneedtoattain.

WarmStartConfigusesoneormoreoftheprevioushyper-parameterstuningjobrunscalledthe

parentjobs,andneedsaWarmStartType.

fromsagemaker.tensorflowimportTensorFlow

estimator=TensorFlow(entry_point="tf_bert_reviews.py",source_dir="src",

role=role,instance_count=train_instance_count,instance_type=train_instance_type,volume_size=train_volume_size,py_version="py37",framework_version="2.3.1",hyperparameters={

"epochs":epochs,"epsilon":epsilon,

"validation_batch_size":validation_batch_size,"test_batch_size":test_batch_size,"train_steps_per_epoch":train_steps_per_epoch,"validation_steps":validation_steps,"test_steps":test_steps,

"use_xla":use_xla,"use_amp":use_amp,

"max_seq_length":max_seq_length,"enable_sagemaker_debugger":enable_sagemaker_debugger,

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Scalingwithdistributedtraining

PAGE

13

"enable_checkpointing":enable_checkpointing,

"enable_tensorboard":enable_tensorboard,"run_validation":run_validation,"run_test":run_test,

"run_sample_predictions":run_sample_predictions,

},

input_mode=input_mode,metric_definitions=metrics_definitions,

SettingupHyperparameterTunerwithWarmStartConfig,includingnewhyper-parameterranges.

objective_metric_name="train:accuracy"tuner=HyperparameterTuner(

estimator=estimator,objective_type="Maximize",objective_metric_name=objective_metric_name,hyperparameter_ranges=hyperparameter_ranges,metric_definitions=metrics_definitions,max_jobs=2,

max_parallel_jobs=1,strategy="Bayesian",early_stopping_type="Auto",warm_start_config=warm_start_config,

)

Scalingwithdistributedtraining

Fore?cientparallelcomputingduringdistributedtraining,employboth

dataparallelismand

modelparallelism

.SageMakersupportsdistributedPyTorch.Youcanusethe

HuggingFace

Transformers

librarythatnativelysupportstheSageMakerdistributedtrainingframeworkforbothTensorFlowandPyTorch.TheSageMakerbuilt-in,distributed,all-reducecommunicationstrategyshouldbeusedtoachievedataparallelismbyscalingPyTorchtrainingjobstomultipleinstancesinacluster.

Avoidingcommonmisstepstoreducerework

ThebiggestdrivingfactorinmakingasuccessfulproductionizedMLprojectthathasminimaltonoamountofreworkisacollaborativeinvolvementbetweentheMLteamandthebusinessunit.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

Avoidingcommonmisstepstoreducerework

PAGE

14

Secondly,transformingdatascienceprototypescriptsfromexperimentationphasetomodularperformantcodeforproductionisadeeplyinvolvedtask,andifnotdoneright,willnotproduceastableproductionsystem.

Finally,theecosystemofMLengineeringandMLOpsisaculminationofmultipleprocessesandstandardsfromwithinDevOps,addinginML-speci?ctoolinganddomain-speci?celements,therebybuildingrepeatable,resilient,production-capabledatasciencesolutionsonthecloud.ThesethreetenetsalonedistinguishamaturedAI/MLenterprisefromonethathasjuststartedinthejourneyofusingMLforderivingbusinessvalue.

Forindustrysolutions,asmentionedintheWorkforceanalyticsusecasessectionofthisdocument,followingaresomeoptimizationsthathaveprovedusefultohavingane?cient,enterprise-grade,end-to-end,industrialized,MLsolution:

Removemonolithicprototypescripts

Identifydi?cult-to-testcodeinlarge,tightlycoupledcodebases

Introducee?ectiveencapsulationandabstractiontechniques

Inascaled,industrialized,productionversion,thefullend-to-endautomateddataengineeringandMLengineeringpipelineistheproductbuiltonthedatasciencescriptsintheexperimentationphase.

AccentureEnterpriseAI–ScalingMachineLearningandDeepLearningModels

AWSWhitepaper

GoingfromPOCtolarge-scaledeployments

PAGE

15

Machinelearningpipelines

DespitemanycompaniesgoingallinonML,hiringmassiveteamsofhighlycompensateddatascientists,anddevotinghugeamountsof?nancialandotherresources,theirprojectsendup

failing

athighrates

.

Movingfromasinglelaptopsetuptowardascalable,production-grade,datascienceplatformisacompletelydi?erentchallengefromtheproof-of-conceptstage,andarguably,oneofthemostdi?cult,asitinvolvescollaboratingwithdi?erentteamsacrossanorganization.Accenturehas

devisedascalableanduniqueapproachfortheAccenturetalentandskillingAIsolutionsdiscussedinthiswhitepaper,togofromprototypetofull-scaleproductionizedsystemsinaveryshortperiodoftime;enhancing“speedtomarket”andgeneratingvalue.

MLtechnicaldebtmayrapidlyaccumulate.WithoutstrictenforcementofMLengineeringprinciples(builtonrigoroussoftwareengineeringprinciples)todatasciencecodemayresultinamessypipeline,andmanagingthesepipelines–detectingerrorsandrecoveringfromfailures–

becomesextremelydi?cultandcostly.Comprehensivelivemonitoringofsystembehaviorinnearrealtime,combinedwithautomatedresponse,iscriticalfor

long-termsystemreliability

.

Thisandthefollowingsectionsaddresstheseproblems,andprovideasolution.

GoingfromPOCtolarge-scaledeployments

ThemainchallengesforcompanieslookingtomovebeyondtherealmofbasicAIproofofconcepts(POCs),manualdatasciencePOCs,andpilotprogramstoEnterpriseAI,canbegroupedaroundtheneedtoachievethefollowingatEnterpriselevel:

Repeatability

Scalability

Transparency/Explainability

AccentureEnterpr

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論