2024微軟人工智能系統(tǒng)概述_第1頁(yè)
2024微軟人工智能系統(tǒng)概述_第2頁(yè)
2024微軟人工智能系統(tǒng)概述_第3頁(yè)
2024微軟人工智能系統(tǒng)概述_第4頁(yè)
2024微軟人工智能系統(tǒng)概述_第5頁(yè)
已閱讀5頁(yè),還剩40頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

Self-driving Surveillancedetection Medicaldiagnostics GamePersonalassistant

DeepLearning深度學(xué)習(xí)正在改變世界

Art Imagerecognition Speechrecognition Naturallanguage Generativemodel Reinforcementlearning catdoghoneybadgercatdoghoneybadger

CatDogRaccoonlossloss????1

????2

????3

????4

????5

ErrorsDogRDMA14Mimages海量的(標(biāo)識(shí))數(shù)據(jù)RDMA14Mimages

深度學(xué)習(xí)算法的進(jìn)步 語言、框架

計(jì)算能力 深度學(xué)習(xí)+系統(tǒng)的進(jìn)步:編程語言、優(yōu)化、計(jì)算機(jī)體系結(jié)構(gòu)、并行計(jì)算以及分布式系統(tǒng)E.g.,imageclassificationproblemMNISTImageNetWebImages60Ksamples16MsamplesBillionsofImages10categories1000categoriesOpenedcategoriesTESTERRORRATE(%)TESTERRORRATE(%)123AlexNet,16.4%ReLU,Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residualway,2015 AlexNet,16.4%ReLU,Dropout,2012Inception,6.7%Batchnormalization,2015ResNet,3.57%Residualway,2015EfficientNet,3.1%NASLeNet,convolution,max-pooling,softmax,1998EfficientNet,3.1%NASLeNet,convolution,max-pooling,softmax,1998 ImagerecognitionSpeechrecognition

NaturallanguageReinforcementlearning TPUv3360TPUv3360TopsV100TPUv1125Tops90TopsPerformance(Op/Sec)?TPUDedicatedPerformance(Op/Sec)?TPUDedicatedHardwareGPUCPUMoore’slaw5KopsENIAC~500GopsXeonE5108x105x

1970 1980 1990 2000

2019CompilerBackendTVMTensorFlowXLACompilerBackendTVMTensorFlowXLALanguageFrontendSwiftforTensorFlowMxNetCNTKLanguageFrontendSwiftforTensorFlowMxNetCNTKPyTorchCustompurposemachinelearningalgorithmsTheanoDisBeliefCaffeAlgebra&linearAlgebra&linearlibsCPUGPUDensematmulengineGPUFPGASpecialAIacceleratorsTPUGraphCoreOtherASICs CustompurposemachinelearningalgorithmsTheanoCustompurposemachinelearningalgorithmsTheanoDisBeliefCaffeDeeplearningframeworksprovideeasierwaystoleveragevariouslibrariesMachineLearningLanguageandCompilerPowerfulCompilerInfrastructure:Codeoptimization,sparsityoptimization,hardwaretargetingAFull-FeaturedProgrammingLanguageforML:ExpressiveandflexibleControlflow,recursion,sparsityAlgebra&Algebra&linearlibsCPUGPUAIframeworkDensematmulengineSIMD→MIMDSparsitySupportControlFlowandDynamicityAssociatedMemory End-to-EndAIUserExperiencesModel,Algorithm,Pipeline,Experiment,End-to-EndAIUserExperiencesModel,Algorithm,Pipeline,Experiment,LifeCycleManagementProgrammingInterfacesComputationgraph,(auto)GradientcalculationIR,CompilerinfrastructureProgrammingInterfacesComputationgraph,(auto)GradientcalculationIR,CompilerinfrastructureHardwareAPIs(GPU,CPU,FPGA,ASIC)ResourceManagement/SchedulerHardwareAPIs(GPU,CPU,FPGA,ASIC)ResourceManagement/Scheduler ScalableNetworkStack(RDMA,IB,NVLink)DeepLearningRuntime:Optimizer,Planner,ExecutorArchitecture(singlenodeandCloud)

class3class4class5class6class7class8更廣泛的AI系統(tǒng)生態(tài)class

機(jī)器學(xué)習(xí)新模式(RL)

深度學(xué)習(xí)算法和框架classclassclass

自動(dòng)機(jī)器學(xué)習(xí)(AutoML)安全與隱私模型推導(dǎo)、壓縮與優(yōu)化

通用AI算法支持與進(jìn)化深度神經(jīng)網(wǎng)絡(luò)編譯架構(gòu)及優(yōu)化

深度學(xué)習(xí)任務(wù)運(yùn)行和優(yōu) 通用資源管理和調(diào)度化環(huán)境 統(tǒng)

新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計(jì)算棧 (2)開始訓(xùn)練

定義網(wǎng)絡(luò)結(jié)構(gòu) Fullyconnected最后幾層

Convolutionalneuralnetwork等Locality強(qiáng)的數(shù)據(jù)

Recurrentneuralnetwork化的數(shù)據(jù),比如文本信息、知識(shí)圖

Transformerneuralnetwork比如文本信息 #ArecursiveTreeBankmodelinadozenlinesofJPLcode#Walkthetree,accumulatingembeddingvecs#Wordembeddingmodelisusedattheleafnodetomapword#indexintohigh-dimensionalsemanticwordrepresentation.#Getsemanticrepresentationsforleftandrightchildren.#Acompositionfunctionisusedtolearnsemantic#representationforphraseattheinternalnode.#Maptreeembeddingtosentiment

更多樣化的結(jié)構(gòu)更復(fù)雜的依賴關(guān)系更細(xì)粒度的計(jì)算模式ExecutionRuntimeCPU,GPU,RDMAdevicesGraphdefinition(IR)xw*b+yFront-endLanguageBinding:Python,Lua,R,C++OptimizationBatching,Cache,Overlap ExecutionRuntimeCPU,GPU,RDMAdevicesGraphdefinition(IR)xw*b+yFront-endLanguageBinding:Python,Lua,R,C++OptimizationBatching,Cache,OverlapData-FlowGraph(DFG)asIntermediateRepresentation

x y z*a+bΣc

TensorFlow AddgradientbackpropagationAddgradientbackpropagationData-FlowGraph(DFG)xyz??x??y*a*????z+bΣc+????a??bΣ??x y z

??x ??yCPUcodeGPUcode

* a+ +??b ??bΣ Σ??c

??a

??zxyz??x??y*a*????z+bΣc+????a??bΣ??xyz??x??y*a*????z+bΣc+????a??bΣ??......1......1Operators IDEProgrammingwith:VSCode,JupiterNotebookIDEProgrammingwith:VSCode,JupiterNotebookLanguageIntegratedwithmainstreamPL:PyTorchandTensorFlowinsidePythonCompilerIntermediaterepresentationCompilationOptimizationBasicdatastructure:TensorLexicalanalysis:TokenUsercontrolled:mini-batchBasiccomputation:DAGParsing:ASTDataparallelismandmodelparallelismAdvancefeatures:controlflowSemanticanalysis:SymbolicADLoopnetsanalysis:pipelineparallelism,controlflowGeneralIRs:MLIRCodeoptimizationDataflowanalysis:Arithmetic,FusionCodegenerationHardwaredependentoptimizations:matrixcomputation,layoutResourceallocationandscheduler:memory,recomputation,RuntimesSinglenode:CuDNNMultimode:Parameterservers,AllreducerComputationclusterresourcemanagementandjobschedulerHardwareHardwareaccelerators:CPU/GPU/ASIC/FPGANetworkaccelerators:RDMA/IB/NVLinkFrameworksArchitectureCompilerBackendTVMTensorFlowXLALanguageFrontendSwiftforTensorFlowMxNetTensorFlowCNTKPyTorch CompilerBackendTVMTensorFlowXLALanguageFrontendSwiftforTensorFlowMxNetTensorFlowCNTKPyTorchDeeplearningframeworksSpecialAIacceleratorsTPUGraphCoreOtherASICsAIFrameworkDensematmulengineGPUFPGAimport"tensorflow/core/framework/to";SpecialAIacceleratorsTPUGraphCoreOtherASICsAIFrameworkDensematmulengineGPUFPGAMachineLearningLanguageandCompilerPowerfulCompilerInfrastructure:Codeoptimization,sparsityoptimization,hardwaretargetingAFull-FeaturedProgrammingLanguageforML:ExpressiveandflexibleControlflow,recursion,sparsityMachineLearningLanguageandCompilerPowerfulCompilerInfrastructure:Codeoptimization,sparsityoptimization,hardwaretargetingAFull-FeaturedProgrammingLanguageforML:ExpressiveandflexibleControlflow,recursion,sparsitySIMD→SIMD→MIMDSparsitySupportControlFlowandDynamicityAssociatedMemory//SyntacticallysimilartoLLVM:func@testFunction(%arg0:i32){%x=call@thingToCall(%arg0):(i32)->i32br^bb1^bb1:%y=addi%x,%x:i32return%y:i32}深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模

8layers1.416%Error2012AlexNet

Image152layersGFLOP%Error2015ResNetSpeech提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開發(fā)速度大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度Inferenceperformance→Servinglatency

80GFLOP7,000hrsofData8%Error2014DeepSpeech1

465GFLOP12,000hrsofData5%Error2015DeepSpeech2 Differentarchitectures:CNN,RNN,Transformer,…

Highcomputationresourcerequirements:modelsize,…Differentgoals:throughput,accuracy,…BeBetransparenttovarioususerrequirementsapplyoverheterogeneoushardwareenvironmentScale-out LocalEfficiency MemoryEffectivenessHardware SSD CPU/GPU/FGPA InfiniBand/NVLinkHyper-params OptimizerMini-batchLearningrateOptimizations Hardware SSD CPU/GPU/FGPA InfiniBand/NVLinkHyper-params OptimizerMini-batchLearning

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論