IROS2019國際學術會議論文集0952_第1頁
IROS2019國際學術會議論文集0952_第2頁
IROS2019國際學術會議論文集0952_第3頁
IROS2019國際學術會議論文集0952_第4頁
IROS2019國際學術會議論文集0952_第5頁
已閱讀5頁,還剩2頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領

文檔簡介

Conditional Generative Neural System for Probabilistic Trajectory Prediction Jiachen Li, Hengbo Ma and Masayoshi Tomizuka AbstractEffective understanding of the environment and accurate trajectory prediction of surrounding dynamic obsta- cles are critical for intelligent systems such as autonomous vehicles and wheeled mobile robotics navigating in complex sce- narios to achieve safe and high-quality decision making, motion planning and control. Due to the uncertain nature of the future, it is desired to make inference from a probability perspective instead of deterministic prediction. In this paper, we propose a conditional generative neural system (CGNS) for probabilistic trajectory prediction to approximate the data distribution, with which realistic, feasible and diverse future trajectory hypotheses can be sampled. The system combines the strengths of conditional latent space learning and variational divergence minimization, and leverages both static context and interaction information with soft attention mechanisms. We also propose a regularization method for incorporating soft constraints into deep neural networks with differentiable barrier functions, which can regulate and push the generated samples into the feasible regions. The proposed system is evaluated on several public benchmark datasets for pedestrian trajectory prediction and a roundabout naturalistic driving dataset collected by ourselves. The experimental results demonstrate that our model achieves better performance than various baseline approaches in terms of prediction accuracy. I. INTRODUCTION It is desired for a multi-agent prediction system to satisfy the following requirements to generate diverse, realistic future trajectories. 1) Context-aware: The system should be able to forecast trajectories which are inside the traversable regions and collision-free with static obstacles in the en- vironment. For instance, when the vehicles navigate in a roundabout (see Fig. 1(a) they need to advance along the curves and avoid collisions with road boundaries. 2) Interaction-aware: The system needs to generate reason- able trajectories compliant to traffi c or social rules, which takes into account interactions and reactions among mul- tiple entities. For instance, when the vehicles approach an unsignalized intersection (see Fig. 1(b), they need to antic- ipate others possible intentions and motions as well as the infl uences of their own behaviors on surrounding entities. 3) Feasibility-aware: The system should anticipate naturalistic and physically-feasible trajectories which are compliant to vehicle kinematics or dynamics constraints, although these constraints can be ignored for pedestrians due to the large fl exibility of their motions. 4) Probabilistic prediction: Since the future is full of uncertainty, the system should be able to learn an approximated distribution of future trajectories J. Li, H. Ma and M. Tomizuka are with the Department of Mechanical Engineering, University of California, Berkeley, CA 94720, USA (e-mail: jiachen li, hengbo ma, ) (a)(b) Fig. 1. Typical urban traffi c scenarios with large uncertainty and interac- tions among multiple entities. The shaded areas represent the reachable sets of possible trajectories. (a) Unsignalized roundabout with four-way yield signs; (b) Unsignalized intersection with four-way stop signs. close to data distribution and generate diverse samples which represent various possible behavior patterns. In this work, we propose a generative neural system that satisfi es all the aforementioned requirements for predicting trajectories in highly interactive scenarios. The system takes advantage of both explicit and implicit density learning in a unifi ed generative system to predict the distributions of trajectories for multiple interactive agents, from which the sampled hypotheses are not only reasonable and feasible but also cover diverse possible motion patterns. The main contributions of this paper are as follows: A Conditional Generative Neural System (CGNS) is proposed to jointly predict future trajectories of multiple highly-interactive agents, which takes into account the static context information, interactions among multiple entities and feasibility constraints. A block attention mechanism and a Gaussian mixture attention mask are proposed and applied to historical trajectories and scene image sequences respectively, which are computationally effi cient. An effective strategy for soft constraint incorporation into deep neural networks is presented. The latent space learning and variational divergence minimization approaches are integrated into a unifi ed framework in a novel fashion, which combines their strengths on distribution learning. The proposed CGNS is validated on multiple pedestrian trajectory forecasting benchmarks and is used to solve a task of anticipating motions of on-road vehicles nav- igating in highly-interactive scenarios. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE6150 II. RELATEDWORK In this section, we provide a brief overview on related research and illustrate the distinction and advantages of the proposed generative system. Trajectory and Sequence Prediction Many research efforts have been devoted to predict be- haviors and trajectories of pedestrians and on-road vehi- cles. Many classical approaches were employed to make time-series prediction, such as variants of Kalman fi lter based on system process models, time-series analysis and auto-regressive models. However, such methods only suf- fi ce for short-term prediction in simple scenarios where interactions among entities can be ignored. More advanced learning-based models have been proposed to cope with more complicated scenarios, such as hidden Markov models 1, 2, Gaussian mixture regression 3, 4, Gaussian process, dynamic Bayesian networks, and rapidly-exploring random tree. However, these approaches are nontrivial to handle high-dimensional data and require hand-designed input features, which confi nes the fl exibility of representation learning. Moreover, these methods only predict behaviors for a certain entity. A few works also took advantage of both recurrent neural networks 5, 6 and generative modeling to learn an explicit or implicit trajectory distribution, which achieved better performance 79. However, they either leveraged only static context images or only trajectories of agents, which is not suffi cient to make predictions for the agents that interact with both static and dynamic obstacles. In this paper, we propose a conditional generative neural system which can leverage both historical scene evolution information and trajectories of multiple interactive agents and generate realistic and diverse trajectory hypotheses. Soft Attention Mechanisms Soft attention mechanisms have been widely used in neural networks to enable the capability of focusing on a subset of input features, which have been extensively studied in the fi eld of image captioning 10, visual object tracking 11 and natural language processing. Several works also brought attention mechanisms into trajectory prediction tasks to fi gure out the most informative and related obstacles 1215. In this paper, we put forward a block attention mask mechanism for trajectories to extract the most critical features of each entity as well as a Gaussian mixture attention mechanism for context images to extract the most crucial static features. Deep Bayesian Generative Modeling The objective of generative models is to approximate the true data distribution, with which one can generate new samples similar to real data points with a proper variance. Generative models have been widely employed in tasks of representation learning and distribution approximation in literature, which basically fall into two categories: explicit density models and implicit density models 16. In recent years, since deep neural networks have been leveraged as universal distribution approximators thanks to its high fl exi- bility, two deep generative models have been widely studied: Variational Auto-Encoder (VAE) 17 and Generative Adver- sarial Network (GAN) 18. Since in trajectory forecasting tasks the predicted trajectories are sampled from the posterior distribution conditioned on historical information, the two models were extended to their conditional versions which results in conditional VAE (CVAE) 19 and conditional GAN (CGAN) 12, 20. In this paper, we combine the strengths of conditional latent space learning via CVAE and variational divergence minimization via adversarial training. III. PROBLEMFORMULATION The objective of this paper is to develop a deep generative system that can accurately forecast motions and trajectories for multiple agents simultaneously. The system should take into account the historical state information, static context and interactions among dynamic entities. Assume there are in total N entities in the observation area, which may vary in different cases. We denote a set of trajectories covering the history and prediction horizons (Th and Tf) as TkTh:k+Tf= ti kTh:k+Tf|t i k = (xi k,y i k),i = 1,.,N (1) where (x,y) is the 2D coordinate in the pixel space or world space. The latent random variable is denoted as zk, where k is the current time step. The sequence of context images up to time step k is denoted as IkTh:k. Our goal is to predict the conditional distribution of future trajec- tories given the historical context images and trajectories p(Tk+1:k+Tf|TkTh:k,IkTh:k). The long-term prediction is realized by propagating the generative system multiple times to the future. To simplify the notations in the fol- lowing sections, we denote the condition variable as C = TkTh:k,IkTh:k, the sequence of predicted variables as Y = Tk+1:k+Tf. IV. METHODOLOGY In this section, we fi rst provide an overview of the key components and the architecture of the proposed Conditional Generative Neural System (CGNS). The detailed theories and models of each component are then illustrated. A. System Overview The architecture of CGNS is shown in Fig. 2 where there is a deep feature extractor (DFE) with an environment attention mechanism (EAM) as well as a generative neural sampler (GNS). First, the DFE extracts deep features from a sequence of historical context images and trajectories of multiple interactive agents to obtain both the information of static and dynamic obstacles, where the EAM tells which areas and dynamic entities should be paid more attention to than others when predicting the trajectory of a certain entity. The above information is utilized as the input of GNS which takes advantage of a deep latent variable model and a variational divergence minimization approach to generate a set of feasible, realistic and diverse future trajectories of all the involved entities. All the components are implemented with deep neural networks thus can be trained end-to-end effi ciently and consistently. 6151 FC II CNN II Pooling Expand CNN I GRU I GRU I GRUII GRUII FC I CNN III Concatenate GRU I GRUII FC IV GRU III GRU III GRU III GRU III GRU III GRU III Concatenate GRU IV GRU IV GRU IV GRU IV GRU IV GRU IV FC V Real Fake GeneratorDiscriminator EncoderDeep Feature Extractor Latent Space Concatenate ? ? FC III GRU I Fig. 2.The overview of proposed conditional generative neural system (CGNS), which consists of four key components: (a) A deep feature extractor with soft attention mechanism, which extracts multi-level features from scene context image sequences and trajectories; (b) An encoder to learn conditional latent space representations; (c) A generator (decoder) to sample future trajectory hypotheses; (d) A discriminator to distinguish predicted trajectories from groundtruth. B. Environment-Aware Deep Feature Extraction We take advantage of both context images and historical trajectories of interactive agents to extract deep features of both static and dynamic environments. In order to fi gure out the most crucial parts to consider when forecasting behaviors of certain agents, we propose a soft block attention mechanism applied to trajectories and a Gaussian mixture attention mechanism applied to context images. The details are illustrated below. The historical and future trajectories are constructed as matrices which are treated as 2D images. The former is fed into a convolutional neural network (CNN) and an average pooling layer to obtain a contractable attention mask over the whole trajectory matrix, which is then expanded to the same size as the trajectory matrix by duplicating each column twice corresponding to coordinates x and y. The original trajectory matrix is multiplied by the block attention mask elementwisely. This mechanism is not applied to the future trajectory matrix since it is unreasonable to have particular attention on the future evolution. The context image sequences are also fed into a CNN followed by fully connected layers to obtain a set of parameters of the Gaussian mixture distribution, which is used to calculate the context attention mask. The elementwise multiplication of original images and attention masks is fed to a pre-trained feature extractor, which is the convolution base of VGG-19 21 in this paper. The interaction-aware features and context- aware features are concatenated and fed into a recurrent layer followed by fully connected layers to obtain a comprehensive and consistent feature embedding. C. Deep Generative Sampling The GNS is composed of an encoder E and a generator G. The goal of encoder is to learn a consistent distribution in a lower-dimensional latent space, from which the latent variable can be sampled effi ciently. The generator aims to produce trajectories as real as possible. An auxiliary discriminator D is adopted, which aims to distinguish fake trajectories from groundtruth. The generator G and discrimi- nator D formulates a minimax game. The three components can be optimized jointly via conditional latent space learning and variational divergence minimization. Conditional Latent Space Learning (CLSL) The conditional latent variable model defi ned in this paper contains three classes of variables: condition variable C, predicted variable Yand latent variable z. We aim to obtain the conditional distribution p(Y |C). Given the training data (C,Y ), the model fi rst samples z from an arbitrary distribution Q. Our goal is to maximize the variational lower bound, which is written as logp(Y |C) DKLQ(z|C,Y )|p(z|C,Y ) = EzQlogp(Y |z,C) DKLQ(z|C,Y )|p(z|C). (2) where p(z|C) = N(0,I). This process can be realized with a Conditional Variational Auto-Encoder which consists of an encoder network E to obtain Q(z|C,Y ) and a decoder (generator) network G to model p(Y |z,C). The loss function can be formulated as a weighted sum of the reconstruction error and KL divergence: LG,E RC = E tk+1:k+Tf,zkQ ?kt k+1:k+Tf G(Ck,zk)k2 ?, (3) LE KL = E tk+1:k+TfDKL(E(Ck)|p(zk), (4) where zk N(0,I). The optimal encoder and generator can be obtained by G,E= argmin G,E 1LG,E RC + 2LE KL.(5) Variational Divergence Minimization (VDM) GiventwoconditionaldistributionsPdata(Y |C)and PGNS(Y |C) with absolutely continuous density function pdata(Y |C) and pGNS(Y |C) which denotes the real data dis- tribution and its approximation with GNS, the f-divergence 6152 22 is defi ned as Df(Pdata| PGNS) = R Y pGNS(Y |C)f ? pdata(Y |C) pGNS(Y |C) ? dY,(6) where f : R+ R is a convex and lower-semicontinuous function with f(1) = 0. A lower bound of f-divergence can be derived with the convex conjugate function f Df(Pdata| PGNS) sup TT ?Z Y pdata(Y |C)T(Y |C)dY Z Y pGNS(Y |C)f(T(Y |C)dY ? = sup TT (EY PdataT(Y |C) EY PGNSf(T(Y |C), (7) where T is an arbitrary class of mapping T : Y R. In order to minimize the variational lower bound in (7), we can formulate a minimax game of pGNS(Y |C) and T(Y |C), which are parameterized by and , respectively. Then the optimal and can be obtained by ,= argmin max EY pdata(Y |C)T(Y |C) EY p(Y |C)f(T(Y |C). (8) In this work, we propose to minimize the Pearson-2diver- gence between Pdata+PGNSand 2PGNS D2 Pearson = Z Y (2pGNS (pdata+ pGNS)2 pdata+ pGNS dY.(9) Since (9) is intractable, we leverage the adversarial learning techniques with a generator G and a discriminator D imple- mented as deep networks. The adversarial loss functions are derived as LG VDM= 1 2Ezkp(z)(D(G(Ck,zk) 2, (10) LD VDM= 1 2Etk+1:k+Tf (D(tk+1:k+Tf) 1)2 +1 2Ezkp(z)(D(G(Ck,zk) + 1) 2, (11) To discriminate the effect of latent space learning, we also involve two additional terms LG,E VDM and LD,E VDM where the input zkare sampled from the encoded latent distribution. Thus, the optimal encoder, generator and discriminator by variational divergence minimization can be obtained as E,G,D= argmin G,E max D 3(LG VDM+ L D VDM) + 4(L G,E VDM+ L D,E VDM). (12) D. Soft Constraint Incorporation In order to make generated samples compliant to fea- sibility constraints of vehicle kinematics, we propose to incorporate a differentiable barrier (indicator) function I() in the loss function, which enables soft constraints in deep neural networks via pushing predicted trajectories to the feasible regions. In this work, we denote the empirical upper bounds on the absolute values of accelerations ak+1:k+Tf and path curvatures k+1:k+Tfas amaxand max, respectively. Then the feasibility loss can be calculated as LG,E F = 1Eak+1:k+Tf k+Tf X t=k+1 max(0,sgn(|at| amax) + 2Ek+1:k+Tf k+Tf X t=k+1 max(0,sgn(|t| max) , (13) where sgn() refers to the sign function and at,tcan be calculated with the predicted waypoints. This loss term is not applied to human trajectory prediction. E. Conditional Generative

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論