驅動汽車科技創(chuàng)新發(fā)展演講資料-理想自動駕駛-2024-04-自動駕駛_第1頁
驅動汽車科技創(chuàng)新發(fā)展演講資料-理想自動駕駛-2024-04-自動駕駛_第2頁
驅動汽車科技創(chuàng)新發(fā)展演講資料-理想自動駕駛-2024-04-自動駕駛_第3頁
驅動汽車科技創(chuàng)新發(fā)展演講資料-理想自動駕駛-2024-04-自動駕駛_第4頁
驅動汽車科技創(chuàng)新發(fā)展演講資料-理想自動駕駛-2024-04-自動駕駛_第5頁
已閱讀5頁,還剩17頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領

文檔簡介

TheConvergenceofAutonomousDrivingandSystem1&2Thinking

PengJia

LiAuto,China

Contents

01

LiADOverview

02

LiADTechnologyHighlights

LiAuto'sViewonAutonomousDriving

Real

Rule-drivenL2:2D/Mono3DData-drivenL3:BEV/End2EndKnowledge-drivenL4:VLM/WorldModel

World

EveryDay

Driving

Scenarios

L4

UnknownScenarios

L3

ExpandedDriving

Scenarios

L2

Known

Scenarios

LiADFramework

SYSTEM1

Intuition&instinct

SYSTEM2

Rationalthinking

5%

Takeseffort

Slow

Logical

Lazy

Indecisive

95%

Unconscious

Fast

Associative

Automaticpilot

System1--End-to-EndModelforL3AD

Fastend-to-endresponsetothesurroundingenvironment.

System2--LargeMultimodal-ActionModel

Exploreandlogicallythinkunderunknownenvironments.Modalitiesincludelanguage,vision,pointclouds,canbusandnavigationtosolveL4unknownscenes.

System1System2TrainingLoop

Perception

Decision&Planning

Control

Short-termMemory

Vehicle

L3EndtoEndModel

Sensors

L4MultimodalLLM

Recognition

GeneralKnowledge

SimReinforcementLearningModel

EvaluationNetwork

GenerativeWorldModel

Cloud

Well-recognizedworksfromLiADteam

1

MUTR3D

2021World's1st

Incamera-based3Dtracking

FUTR3D

Industryleading

DenseTNR

1stPlaceSolution

InICCV2021INTERPRET

Challenge

DETR3D

2021World's1st

Incamera-based3Ddetection

HDMapNet

CVPR2021ADP3Workshop

Multi-sensor3Ddetectionmodel

(BestPaperNomination)

CORL2021DETR3D:

/pdf/2110.06922.pdf

CVPR2022MUTR3D:

/pdf/2205.00613.pdf

ICML2023VectorMapNet:

/pdf/2206.08920.pdf

CVPR2023NPN:

/pdf/2304.08481.pdf

CVPR2023FUTR3D:

/pdf/2203.10642.pdf

ICRA2022HDMapNet:

/pdf/2107.06307.pdf

CVPR2023VIP3D:

/pdf/2208.01582.pdf

ArchitectureofADMax3.0

SafetyPerception

SafetyPlanner

OneModel(Multi-taskPerception)

Prediction&PlanningNetwork

ShadowApp1

ShadowApp2

Shadow

StaticBEV

ObjectBEV

Occupancy

MPC

……

Spatio-TemporalPlanner

End2EndTrafficSignalNetwork

Camera×7

LiDAR

Radar

NavigationMap

NVIDIADRIVEOrin×2

InferencePerfOptimization

ForPerceptionPipeline111ms/9fps=>48ms/21fps

ItemOptimizationActionOptTypeLatency(percentage)

1

ApplyMPStoavoidCUDAcontextswitchoverhead(systemofmultipleprocesseswithGPUcalls).

Pipeline

-9.91%

2

RemoveunneededCUDAcalls(e.g.,cudaWaitExternalSemaphore).

Pipeline

-5.60%

3

EnlargeCUDA_DEVICE_MAX_CONNECTIONStoresolvefalsedependencyamongCUDAstreams.

Pipeline

-4.80%

4

Schedulemodelheadswithdifferentinferencefrequency.

Pipeline

-8.10%

5

ReplaceD2DwithH2DcopyinimagingstreamstoutilizeGPUCopyEngineinsteadofCUDAcore.

Pipeline

-3.25%

6

Optimizebevpoolv2pluginbyreducingwarpdivergence&fp16(from

Lidar_AI_Solution

).

Model

-10.80%

7

EliminatereformattingkernelsduetounfusedQATnodes.

Model

-3.60%

8

ApplyTRTMHAfastkerneltoacceleratetransformerblocks.

Model

-5.60%

9

UseSparsity.

Model

-3.60%

10

OffloadROI-aligntoPVA.

Model

-1.80%

Finalperfimprovement-57.06%

LiADL4Research——Drive-VLM

SequenceofImagesDriveVLMSceneDescriptionCriticalObjectsAnalysisHierarchicalPlanning

Input:

Basedonthesummarized

analysis,pleasedetermine

themetaactionsanddecisions.

Drive-VLM:

MetaActions:['Slowdown’,'Stop',‘Reverse’].

Decisions:Driveslowly,stopinfrontoftheconstructionworkers,andthenreverse.

Input:

Basedontheabovedecisionsandanalysis,pleaseindicatethe3sfuturetrajectoryoftheegovehicle.

Drive-VLM:

Futuretrajectory:[(x1,y1),(x2,y2),…,(xn,yn)]

Input:

Picture:<img>

Pleaseindicatethescenecondition.

Drive-VLM:

Weather:Sunny.

Time:Day.

RoadEnvironment:Urban.

AlternativeLane:Ownlaneaheadimpassable,occupiedbyparked

vehicles,leftlanepassable,oncominglane,proceedwithcaution.

EgoLanePosition:Donotapply.

VisionEncoder

FeatureAlignment

Input:

Pleaselistthekeyobjects.

Drive-VLM:

Criticalobjects:<Twoconstructionworkers>

Box:<(x1,y1),(x2,y2)>

Matching

3DPerceptionResults

Input:

Matchedobjects:

Twoconstructionworkers

Thepositionandhistory

trajectoryofmatchedobjectsinBEVregion:<…>

Unmatchedobjects:NoneDescribetheobjectand

indicateitsinfluence.

Drive-VLM:

Characteristic:Garbagecleaninginprogressontherightlaneahead.

Influence:Blockingtherouteofourvehicle.

Input:

Ego-statesandhistoricaltrajectory:<…>

Basedontheanalysisof

sceneandcriticalobjects,determinethedrivingmetaactionsanddecisions.

Collaboration

Dual

System

Slow-Fast

3DPerception

MotionPredictionTrajectoryPlanning

TraditionalAVPipeline

*SubmittedtoCVPR24

https://openreview.net/forum?id=jL4YMzXYII

LLMDeployedOnNVIDIADRIVEOrin

LLaMA2-3B(BS=1,Input_len=128,Output_len=128)

PlatformConfigContextLatency(ms)DecoderPerf(tokens/s)

DriveOSLinux,OrinINT4(GPTQ)52.565.6

LLaMA2-7B(BS=1,Input_len=128,Output_len=128)

PlatformConfigContextLatency(ms)DecoderPerf(tokens/s)

DriveOSLinux,OrinINT4(GPTQ)73.1541.8

LiADSimResearch——StreetGaussians

Originalscene

Street-gaussianswapping

Originalscene

Street-gaussianswapping

Reallog

Camerasimulation

Originscene

Linrescene

Unisimscene

RenderingImages

Decomposition

Semanticmaps

Geometrymodel

PositionμRotationαOpacityRScale$

Point-basedRendering

BackgroundmodelComposition

Dynamicappearancemodel

)?

TimebasisSHbasis

……?

∑(

OptimizableTrackedboxes

Objectmodel

Scenerepresentation

3DGS[16]NSG[31]MARS[51]Ours

PSNR↑

29.95

30.23

31.37

34.54

PSNR*↑

17.74

22.05

23.07

25.16

SSIM↑

0.907

0.866

0.904

0.936

LPIPS↓

0.140

0.331

0.246

0.091

FPS↑

277

0.47

0.68

133

Table1.QuantitativeresultsontheWaymo[40]dataset.

Therenderingimageresolutionis1066°?1600.“PNSR*”denotesthePSNRofmovingobjects.

*SubmittedtoCVPR24

https://openreview.net/forum?id=jL4YMzXYII

LiADResearch——BEV-CLIP:MultimodalDataRetrieval

Weightmatrix

BEV

Encoder

Θ

SharedCross-madalPrompt

Pedscrossing

crosswalk,

manycars……

Language

Textembedding

Θ

Encoder

LoRA

Weightmatrix

KGEmbedding

(a)(b)

Knowledgegraph

BEVCaptionGenerationHead

Contrastiveloss

(c)

Figure2.OverallstructureofBEV-CLIP.

(a)ProcessingofBEVandtextfeatures.Theimagefrom6surroundingcamerasaregeneratedintoaBEVfeaturebytheBEVEncoderwithfrozenparameters.Atthesametime,theinputtextembeddingisconcatenatedwiththekeyword-matchedKnowledgeGraphnodeembedding,andfedintotheLanguageEncoder

withLoRAbranchforprocessing.(b)Sharedcross-modalprompt(SCP),whichalignstheBEVandlinguisticfeaturesinthesamehiddenspace.(c)Jointsupervisionofcaptiongenerationandretrievaltasks.⊙denotesdotproduct.

1.81

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論