




版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
TheConvergenceofAutonomousDrivingandSystem1&2Thinking
PengJia
LiAuto,China
Contents
01
LiADOverview
02
LiADTechnologyHighlights
LiAuto'sViewonAutonomousDriving
Real
Rule-drivenL2:2D/Mono3DData-drivenL3:BEV/End2EndKnowledge-drivenL4:VLM/WorldModel
World
EveryDay
Driving
Scenarios
L4
UnknownScenarios
L3
ExpandedDriving
Scenarios
L2
Known
Scenarios
LiADFramework
SYSTEM1
Intuition&instinct
SYSTEM2
Rationalthinking
5%
Takeseffort
Slow
Logical
Lazy
Indecisive
95%
Unconscious
Fast
Associative
Automaticpilot
System1--End-to-EndModelforL3AD
Fastend-to-endresponsetothesurroundingenvironment.
System2--LargeMultimodal-ActionModel
Exploreandlogicallythinkunderunknownenvironments.Modalitiesincludelanguage,vision,pointclouds,canbusandnavigationtosolveL4unknownscenes.
System1System2TrainingLoop
Perception
Decision&Planning
Control
Short-termMemory
Vehicle
L3EndtoEndModel
Sensors
L4MultimodalLLM
Recognition
GeneralKnowledge
SimReinforcementLearningModel
EvaluationNetwork
GenerativeWorldModel
Cloud
Well-recognizedworksfromLiADteam
1
MUTR3D
2021World's1st
Incamera-based3Dtracking
FUTR3D
Industryleading
DenseTNR
1stPlaceSolution
InICCV2021INTERPRET
Challenge
DETR3D
2021World's1st
Incamera-based3Ddetection
HDMapNet
CVPR2021ADP3Workshop
Multi-sensor3Ddetectionmodel
(BestPaperNomination)
CORL2021DETR3D:
/pdf/2110.06922.pdf
CVPR2022MUTR3D:
/pdf/2205.00613.pdf
ICML2023VectorMapNet:
/pdf/2206.08920.pdf
CVPR2023NPN:
/pdf/2304.08481.pdf
CVPR2023FUTR3D:
/pdf/2203.10642.pdf
ICRA2022HDMapNet:
/pdf/2107.06307.pdf
CVPR2023VIP3D:
/pdf/2208.01582.pdf
ArchitectureofADMax3.0
SafetyPerception
SafetyPlanner
OneModel(Multi-taskPerception)
Prediction&PlanningNetwork
ShadowApp1
ShadowApp2
Shadow
StaticBEV
ObjectBEV
Occupancy
MPC
……
Spatio-TemporalPlanner
End2EndTrafficSignalNetwork
Camera×7
LiDAR
Radar
NavigationMap
NVIDIADRIVEOrin×2
InferencePerfOptimization
ForPerceptionPipeline111ms/9fps=>48ms/21fps
ItemOptimizationActionOptTypeLatency(percentage)
1
ApplyMPStoavoidCUDAcontextswitchoverhead(systemofmultipleprocesseswithGPUcalls).
Pipeline
-9.91%
2
RemoveunneededCUDAcalls(e.g.,cudaWaitExternalSemaphore).
Pipeline
-5.60%
3
EnlargeCUDA_DEVICE_MAX_CONNECTIONStoresolvefalsedependencyamongCUDAstreams.
Pipeline
-4.80%
4
Schedulemodelheadswithdifferentinferencefrequency.
Pipeline
-8.10%
5
ReplaceD2DwithH2DcopyinimagingstreamstoutilizeGPUCopyEngineinsteadofCUDAcore.
Pipeline
-3.25%
6
Optimizebevpoolv2pluginbyreducingwarpdivergence&fp16(from
Lidar_AI_Solution
).
Model
-10.80%
7
EliminatereformattingkernelsduetounfusedQATnodes.
Model
-3.60%
8
ApplyTRTMHAfastkerneltoacceleratetransformerblocks.
Model
-5.60%
9
UseSparsity.
Model
-3.60%
10
OffloadROI-aligntoPVA.
Model
-1.80%
Finalperfimprovement-57.06%
LiADL4Research——Drive-VLM
SequenceofImagesDriveVLMSceneDescriptionCriticalObjectsAnalysisHierarchicalPlanning
Input:
Basedonthesummarized
analysis,pleasedetermine
themetaactionsanddecisions.
Drive-VLM:
MetaActions:['Slowdown’,'Stop',‘Reverse’].
Decisions:Driveslowly,stopinfrontoftheconstructionworkers,andthenreverse.
Input:
Basedontheabovedecisionsandanalysis,pleaseindicatethe3sfuturetrajectoryoftheegovehicle.
Drive-VLM:
Futuretrajectory:[(x1,y1),(x2,y2),…,(xn,yn)]
Input:
Picture:<img>
Pleaseindicatethescenecondition.
Drive-VLM:
Weather:Sunny.
Time:Day.
RoadEnvironment:Urban.
AlternativeLane:Ownlaneaheadimpassable,occupiedbyparked
vehicles,leftlanepassable,oncominglane,proceedwithcaution.
EgoLanePosition:Donotapply.
VisionEncoder
FeatureAlignment
Input:
Pleaselistthekeyobjects.
Drive-VLM:
Criticalobjects:<Twoconstructionworkers>
Box:<(x1,y1),(x2,y2)>
Matching
3DPerceptionResults
Input:
Matchedobjects:
Twoconstructionworkers
Thepositionandhistory
trajectoryofmatchedobjectsinBEVregion:<…>
Unmatchedobjects:NoneDescribetheobjectand
indicateitsinfluence.
Drive-VLM:
Characteristic:Garbagecleaninginprogressontherightlaneahead.
Influence:Blockingtherouteofourvehicle.
Input:
Ego-statesandhistoricaltrajectory:<…>
Basedontheanalysisof
sceneandcriticalobjects,determinethedrivingmetaactionsanddecisions.
Collaboration
Dual
System
Slow-Fast
3DPerception
MotionPredictionTrajectoryPlanning
TraditionalAVPipeline
*SubmittedtoCVPR24
https://openreview.net/forum?id=jL4YMzXYII
LLMDeployedOnNVIDIADRIVEOrin
LLaMA2-3B(BS=1,Input_len=128,Output_len=128)
PlatformConfigContextLatency(ms)DecoderPerf(tokens/s)
DriveOSLinux,OrinINT4(GPTQ)52.565.6
LLaMA2-7B(BS=1,Input_len=128,Output_len=128)
PlatformConfigContextLatency(ms)DecoderPerf(tokens/s)
DriveOSLinux,OrinINT4(GPTQ)73.1541.8
LiADSimResearch——StreetGaussians
Originalscene
Street-gaussianswapping
Originalscene
Street-gaussianswapping
Reallog
Camerasimulation
Originscene
Linrescene
Unisimscene
RenderingImages
Decomposition
Semanticmaps
Geometrymodel
PositionμRotationαOpacityRScale$
Point-basedRendering
BackgroundmodelComposition
Dynamicappearancemodel
)?
TimebasisSHbasis
……?
∑(
OptimizableTrackedboxes
Objectmodel
Scenerepresentation
3DGS[16]NSG[31]MARS[51]Ours
PSNR↑
29.95
30.23
31.37
34.54
PSNR*↑
17.74
22.05
23.07
25.16
SSIM↑
0.907
0.866
0.904
0.936
LPIPS↓
0.140
0.331
0.246
0.091
FPS↑
277
0.47
0.68
133
Table1.QuantitativeresultsontheWaymo[40]dataset.
Therenderingimageresolutionis1066°?1600.“PNSR*”denotesthePSNRofmovingobjects.
*SubmittedtoCVPR24
https://openreview.net/forum?id=jL4YMzXYII
LiADResearch——BEV-CLIP:MultimodalDataRetrieval
Weightmatrix
BEV
Encoder
Θ
SharedCross-madalPrompt
Pedscrossing
crosswalk,
manycars……
Language
Textembedding
Θ
Encoder
LoRA
Weightmatrix
KGEmbedding
(a)(b)
Knowledgegraph
BEVCaptionGenerationHead
Contrastiveloss
(c)
Figure2.OverallstructureofBEV-CLIP.
(a)ProcessingofBEVandtextfeatures.Theimagefrom6surroundingcamerasaregeneratedintoaBEVfeaturebytheBEVEncoderwithfrozenparameters.Atthesametime,theinputtextembeddingisconcatenatedwiththekeyword-matchedKnowledgeGraphnodeembedding,andfedintotheLanguageEncoder
withLoRAbranchforprocessing.(b)Sharedcross-modalprompt(SCP),whichalignstheBEVandlinguisticfeaturesinthesamehiddenspace.(c)Jointsupervisionofcaptiongenerationandretrievaltasks.⊙denotesdotproduct.
1.81
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 企業(yè)開戶銀行合同范本
- 個體老板合同范本
- vr公司合同范本
- 2025年煙臺駕駛資格證模擬考試
- 化妝店轉租上海合同范本
- 獸醫(yī)診所轉讓合同范本
- 副業(yè)兼職合同范本
- 二手車行業(yè)勞動合同范本
- 軍旅衣服租賃合同范本
- 農(nóng)村房屋場地出租合同范本
- 七年級語文閱讀理解十篇含答案解析
- 單元知識結構圖(排球)
- 卡通風寒假生活PPT模板課件
- 教學課件:物流營銷
- 小兒泄瀉(小兒腹瀉?。┰\療方案
- 種子內(nèi)部構造圖片集
- 羊水栓塞的處理)
- 廣州預拌混凝土行業(yè)發(fā)展專項規(guī)劃
- 初中英語考試答題卡(可編輯WORD版)
- 【教案】 人民音樂家 教案高中人音版(2019)必修《音樂鑒賞》
- 風光高壓變頻器用戶手冊最新2011-11-17
評論
0/150
提交評論