版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
2024人工智能AI技術(shù)教程
課程
講義名稱
備注
1
課程介紹
Overviewandsystem/AIbasics
2
人工智能系統(tǒng)概述
SystemperspectiveofSystemforAI
SystemforAI:ahistoricview;Fundamentalsofneuralnetworks;FundamentalsofSystemforAI
3
深度神經(jīng)網(wǎng)絡(luò)計(jì)算框架基礎(chǔ)
ComputationframeworksforDNN
BackpropandAD,Tensor,DAG,ExecutiongraphPapersandsystems:PyTorch,TensorFlow
4
矩陣運(yùn)算與計(jì)算機(jī)體系結(jié)構(gòu)
ComputerarchitectureforMatrixcomputation
Matrixcomputation,CPU/SIMD,GPGPU,ASIC/TPU
Papersandsystems:Blas,TPU
5
分布式訓(xùn)練算法
Distributedtrainingalgorithms
Dataparallelism,modelparallelism,distributedSGDPapersandsystems:
6
分布式訓(xùn)練系統(tǒng)
Distributedtrainingsystems
MPI,parameterservers,all-reduce,RDMAPapersandsystems:Horovod
7
異構(gòu)計(jì)算集群調(diào)度與資源管理系統(tǒng)
Schedulingandresourcemanagementsystem
RunningDNNjoboncluster:container,resourceallocation,schedulingPapersandsystems:KubeFlow,OpenPAI,Gandiva,HiveD
8
深度學(xué)習(xí)推導(dǎo)系統(tǒng)
Inferencesystems
Efficiency,latency,throughput,anddeployment
課程
講義名稱
備注
9
計(jì)算圖編譯優(yōu)化
Computationgraphcompilationandoptimization
IR,sub-graphpatternmatch,Matrixmultiplicationandmemoryoptimization
Papersandsystems:XLA,MLIR,TVM,NNFusion
10
模型壓縮和稀疏化處理
Efficiencyviacompressionandsparsity
Modelcompression,SparsityPruning
11
自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)
AutoMLsystems
Hyperparametertuning,NAS
Papersandsystems:Hyperband,SMAC,ENAS,AutoKeras,NNI
12
強(qiáng)化學(xué)習(xí)系統(tǒng)
Reinforcementlearningsystems
TheoryofRL,systemsforRL
Papersandsystems:AC3,RLlib,AlphaZero
13
模型安全與隱私保護(hù)
SecurityandPrivacy
Federatedlearning,security,privacyPapersandsystems:DeepFake
14
用AI技術(shù)優(yōu)化計(jì)算機(jī)系統(tǒng)
AIforsystems
AIfortraditionalsystemsproblems,forsystemalgorithms
Papersandsystems:LearnedIndexes,Learnedquerypath
課程
講義名稱
備注
Lab1(forweek1,2)
框架及工具入門(mén)示例
Asimplethroughoutend-to-endAIexample,froma
systemperspective
Understandthesystemsfromdebuggerinfoand
systemlogs
Lab2(forweek3)
定制一個(gè)新的張量運(yùn)算
Customizeoperators
Designandimplementacustomizedoperator(bothforwardandbackward):inpython
Lab3(forweek4)
CUDA實(shí)現(xiàn)和優(yōu)化CUDAimplementation
AddaCUDAimplementationforthecustomizedoperator
Lab4(forweek5,6)
AllReduce實(shí)現(xiàn)和優(yōu)化
AllReduce
ImproveoneofAllReduceoperators’
implementationonHorovod
Lab5(forweek7,8)
配置Container來(lái)進(jìn)行云上訓(xùn)練或推理準(zhǔn)備
Configurecontainersforcustomizedtrainingandinference
Configurecontainers
Lab6
學(xué)習(xí)使用調(diào)度管理系統(tǒng)
Schedulingandresourcemanagementsystem
GetfamiliarwithOpenPAIorKubeFlow
Lab7
分布式訓(xùn)練任務(wù)練習(xí)
Distributedtraining
Trydifferentkindsofallreduceimplementations
Lab8
自動(dòng)機(jī)器學(xué)習(xí)系統(tǒng)練習(xí)
AutoML
SearchforanewneuralnetworkNNstructureeforImage/NLPtasks
Lab9
強(qiáng)化學(xué)習(xí)系統(tǒng)練習(xí)
RLSystems
Configureandgetfamiliarwithoneofthefollowing
RLSystems:RLlib,…
Self-driving Surveillancedetection Translation Medicaldiagnostics Game
Personalassistant
DeepLearning
深度學(xué)習(xí)正在改變世界
Art
Imagerecognition Speechrecognition Naturallanguage Generativemodel Reinforcementlearning
cat
dog
honeybadger
??1 ??2 ??3 ??4 ??5
CatDogRaccoon
loss
??error
????1
??error
????2
??error
????3
??error
????4
??error
????5
Errors
Dog
RDMA
海量的(標(biāo)識(shí))數(shù)據(jù)
深度學(xué)習(xí)算法的進(jìn)步 語(yǔ)言、框架
計(jì)算能力
14Mimages
深度學(xué)習(xí)+系統(tǒng)的進(jìn)步:編程語(yǔ)言、優(yōu)化、計(jì)算機(jī)體系結(jié)構(gòu)、并行計(jì)算以及分布式系統(tǒng)
E.g.,imageclassificationproblem
MNIST
ImageNet
WebImages
60Ksamples
16Msamples
BillionsofImages
10categories
1000categories
Openedcategories
TESTERRORRATE(%)
12
5
7.7
3.3
1.4
4.7
1.7
0.23
LeNet,convolution,max-pooling,softmax,1998
EfficientNet,
3.1%
NAS2019
AlexNet,16.4%
ReLU,Dropout,
2012
Inception,6.7%Batchnormalization,2015
ResNet,3.57%Residualway,2015
Imagerecognition
Speechrecognition
Naturallanguage
Reinforcementlearning
TPUv3
360Tops
V100
TPUv1
125Tops
90Tops
Performance(Op/Sec)
?
TPU
Dedicated
Hardware
GPU
CPU
Moore’slaw
5Kops
ENIAC
~500Gops
XeonE5
108x
105x
1960
1970 1980 1990 2000 2010
2019
CompilerBackend
TVM
TensorFlowXLA
LanguageFrontend
SwiftforTensorFlow
MxNetTensorFlowCNTK
PyTorch
Custompurposemachinelearningalgorithms
TheanoDisBeliefCaffe
Deeplearningframeworks
Algebra&
linearlibs
CPU
GPU
Densematmulengine
GPU
FPGA
SpecialAIaccelerators
TPU
GraphCore
OtherASICs
Custompurposemachinelearningalgorithms
TheanoDisBeliefCaffe
Deeplearningframeworksprovideeasierwaystoleveragevariouslibraries
MachineLearningLanguageandCompiler
PowerfulCompilerInfrastructure:
Codeoptimization,sparsityoptimization,hardwaretargeting
AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible
Controlflow,recursion,sparsity
Algebra&
linearlibs
CPU
GPU
AIframeworkDensematmulengine
SIMDMIMD
SparsitySupport
ControlFlowandDynamicityAssociatedMemory
End-to-EndAIUserExperiences
Model,Algorithm,Pipeline,Experiment,Tool,LifeCycleManagement
Experience
ProgrammingInterfacesComputationgraph,(auto)Gradientcalculation
IR,Compilerinfrastructure
Frameworks
HardwareAPIs(GPU,CPU,FPGA,ASIC)
ResourceManagement/Scheduler
ScalableNetworkStack(RDMA,IB,NVLink)
DeepLearningRuntime:Optimizer,Planner,Executor
Runtime
Architecture
(singlenodeandCloud)
class3
class4
class5
class6
class7
class8
更廣泛的AI系統(tǒng)生態(tài)
class12
機(jī)器學(xué)習(xí)新模式
(RL)
深度學(xué)習(xí)算法和框架
class11
class13
class10
自動(dòng)機(jī)器學(xué)習(xí)
(AutoML)
安全與隱私
模型推導(dǎo)、壓縮與優(yōu)
化
廣泛用途的高效新型通用AI算法
多種深度學(xué)習(xí)框架的支持與進(jìn)化
深度神經(jīng)網(wǎng)絡(luò)編譯架
構(gòu)及優(yōu)化
核心系統(tǒng)軟硬件
深度學(xué)習(xí)任務(wù)運(yùn)行和優(yōu) 通用資源管理和調(diào)度系化環(huán)境 統(tǒng)
新型硬件及相關(guān)高性能網(wǎng)絡(luò)和計(jì)算棧
(2)開(kāi)始訓(xùn)練
定義網(wǎng)絡(luò)結(jié)構(gòu)
Fullyconnected
通常用作分類(lèi)問(wèn)題的最后幾層
Convolutionalneuralnetwork
通常用作圖像、語(yǔ)音等Locality強(qiáng)的數(shù)據(jù)
Recurrentneuralnetwork
通常用作序列及結(jié)構(gòu)化的數(shù)據(jù),比如文本信息、知識(shí)圖
Transformerneuralnetwork
通常用作序列數(shù)據(jù),比如文本信息
#ArecursiveTreeBankmodelinadozenlinesofJPLcode#Walkthetree,accumulatingembeddingvecs
#Wordembeddingmodelisusedattheleafnodetomapword#indexintohigh-dimensionalsemanticwordrepresentation.
#Getsemanticrepresentationsforleftandrightchildren.
#Acompositionfunctionisusedtolearnsemantic#representationforphraseattheinternalnode.
#Maptreeembeddingtosentiment
更多樣化的結(jié)構(gòu)
更強(qiáng)大的建模能力
更復(fù)雜的依賴關(guān)系
更細(xì)粒度的計(jì)算模式
ExecutionRuntime
CPU,GPU,RDMAdevices
Graphdefinition(IR)
xw
*b
+
y
Front-end
LanguageBinding:Python,Lua,R,C++
Optimization
Batching,Cache,Overlap
x
y
z
*
a
+
b
Σ
c
TensorFlow
Data-FlowGraph(DFG)
asIntermediateRepresentation
x
y
z
??x
??y
*
a
*??
??z
+
b
Σ
c
+??
??a
??b
Σ??
AddgradientbackpropagationtoData-FlowGraph(DFG)
TensorFlow
x y z
??x ??y
CPUcode
GPUcode
* *
a
+ +??
b ??b
Σ Σ??
c
??a
??z
x
y
z
??x
??y
*
a
*??
??z
+
b
Σ
c
+??
??a
??b
Σ??
......
1
Operators
IDE
Programmingwith:VSCode,JupiterNotebook
Language
IntegratedwithmainstreamPL:PyTorchandTensorFlowinsidePython
Compiler
Intermediaterepresentation
Compilation
Optimization
Basicdatastructure:Tensor
Lexicalanalysis:Token
Usercontrolled:mini-batch
Basiccomputation:DAG
Parsing:AST
Dataparallelismandmodelparallelism
Advancefeatures:controlflow
Semanticanalysis:SymbolicAD
Loopnetsanalysis:pipelineparallelism,controlflow
GeneralIRs:MLIR
Codeoptimization
Dataflowanalysis:CSP,Arithmetic,Fusion
Codegeneration
Hardwaredependentoptimizations:matrixcomputation,layout
Resourceallocationandscheduler:memory,recomputation,
Runtimes
Singlenode:CuDNN
Multimode:Parameterservers,Allreducer
Computationclusterresourcemanagementandjobscheduler
Hardware
Hardwareaccelerators:CPU/GPU/ASIC/FPGA
Networkaccelerators:RDMA/IB/NVLink
Experience
Frameworks
Architecture
CompilerBackend
TVM
TensorFlowXLA
LanguageFrontend
SwiftforTensorFlow
MxNetTensorFlowCNTK
PyTorch
Deeplearningframeworks
SpecialAIaccelerators
TPU
GraphCore
OtherASICs
AIFrameworkDense
matmulengine
GPU
FPGA
import"tensorflow/core/framework/to";import"tensorflow/core/framework/op_to";import"tensorflow/core/framework/tensor_to
MachineLearningLanguageandCompiler
PowerfulCompilerInfrastructure:
Codeoptimization,sparsityoptimization,hardwaretargeting
AFull-FeaturedProgrammingLanguageforML:Expressiveandflexible
Controlflow,recursion,sparsity
SIMDMIMD
SparsitySupport
ControlFlowandDynamicityAssociatedMemory
//SyntacticallysimilartoLLVM:
func@testFunction(%arg0:i32){
%x=call@thingToCall(%arg0):(i32)->i32br^bb1
^bb1:
%y=addi%x,%x:i32
return%y:i32}
深度學(xué)習(xí)高度依賴數(shù)據(jù)規(guī)模和模型規(guī)模
8layers
1.4GFLOP
16%Error
2012
AlexNet
Image
152layers
22.6GFLOP
3.5%Error
2015
ResNet
Speech
提高訓(xùn)練速度可以加快深度學(xué)習(xí)模型的開(kāi)發(fā)速度
大規(guī)模部署深度學(xué)習(xí)模型需要更快和更高效的推演速度
InferenceperformanceServinglatency
80GFLOP
7,000hrsofData
8%Error
2014
DeepSpeech1
465GFLOP
12,000hrsofData
5%Error
2015
DeepSpeech2
Differentarchitectures:CNN,
RNN,Transformer,…
Highcomputationresource
requirements:modelsize,…
Differentgoals:latency,
throughput,accuracy,…
Betransparenttovarioususerrequirements
Transparentlyapplyoverheterogeneoushardwareenvironment
Scale-out LocalEfficiency MemoryEffectiveness
系統(tǒng)、算法和硬件必須相互結(jié)合:
算法層面:模型的結(jié)構(gòu),是否可壓縮、可稀疏化,batch的大小、學(xué)習(xí)算法
系統(tǒng)層面:各個(gè)層次的并行化,去重,Overlap,調(diào)度與資
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 股骨骨折護(hù)理查房
- 《讓你的更精彩背景》課件
- 【培訓(xùn)課件】財(cái)政科技經(jīng)費(fèi)審計(jì)要點(diǎn)培訓(xùn)
- 《食品的生物性污染》課件
- 八年級(jí)英語(yǔ)Iammoreoutgoingthanmysister課件
- 勻速圓周運(yùn)動(dòng)的實(shí)例分析課件
- 《計(jì)算機(jī)基礎(chǔ)說(shuō)》課件
- 交通事故報(bào)告范文
- 患者性格分析報(bào)告范文
- 調(diào)研報(bào)告范文名師
- 康美藥業(yè)財(cái)務(wù)造假的分析與研究
- 世界變局中的國(guó)家海權(quán)智慧樹(shù)知到課后章節(jié)答案2023年下大連海洋大學(xué)
- 高考數(shù)學(xué)數(shù)列大題訓(xùn)練
- 體量與力量-雕塑的美感-課件
- 小學(xué)生三好學(xué)生競(jìng)選演講稿PPT幻燈片
- 關(guān)于新能源汽車(chē)的論文1500字
- 物業(yè)消防系統(tǒng)管理規(guī)程
- 高考作文模擬寫(xiě)作:駁“語(yǔ)文學(xué)習(xí)無(wú)用論”(附文題詳解及范文展示)
- 泳池合伙協(xié)議
- 倉(cāng)庫(kù)盤(pán)點(diǎn)管理流程
- 仿寫(xiě)句子專(zhuān)題訓(xùn)練課件
評(píng)論
0/150
提交評(píng)論