已閱讀5頁,還剩2頁未讀, 繼續(xù)免費閱讀
版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
Robot Learning of Shifting Objects for Grasping in Cluttered Environments Lars Berscheid, Pascal Meiner, and Torsten Krger AbstractRobotic grasping in cluttered environments is often infeasible due to obstacles preventing possible grasps. Then, pre-grasping manipulation like shifting or pushing an object becomes necessary. We developed an algorithm that can learn, in addition to grasping, to shift objects in such a way that their grasp probability increases. Our research contribution is threefold: First, we present an algorithm for learning the optimal pose of manipulation primitives like clamping or shifting. Second, we learn non-prehensible actions that explicitly increase the grasping probability. Making one skill (shifting) directly dependent on another (grasping) removes the need of sparse rewards, leading to more data-effi cient learning. Third, we apply a real-world solution to the industrial task of bin picking, resulting in the ability to empty bins completely. The system is trained in a self-supervised manner with around 25000 grasp and 2500 shift actions. Our robot is able to grasp and fi le objects with 274 3 picks per hour. Furthermore, we demonstrate the systems ability to generalize to novel objects. I. INTRODUCTION Grasping is an essential task in robotics, as it is the key to successfully interact with the robots environment and enables further object manipulation. The fundamental challenges of grasping are particularly visible in bin picking, the task of grasping objects out of unsystematic environments like a randomly fi lled bin. It emphasizes challenges as partially hidden objects and an obstacle-rich environment. Furthermore, bin picking is of enormous signifi cance in todays industrial and logistic automation, enabling pick and place applications or automatic assembly. To enable future robotic trends like service or domestic robotics, bin picking and therewith robotic grasping needs to be solved robustly. In general, grasping is more complex than a single clamp- ing action. For example, surrounding obstacles might prevent all possible grasps of a specifi c object. In this case, it needs to be moved fi rst so that it can be grasped afterwards. While pre-grasping manipulations are trivial for humans, they require interactions like sliding, pushing or rolling, which are complex for robots. In the context of bin picking, pre-grasping is essential to empty a bin completely, since the bin itself might block grasps in its corners. Additionally, when items are stored as space-effi ciently as possible, objects often prevent each other from being grasped in densely fi lled bins. Our work is structured as follows: First, we present a vision-based algorithm for learning the most rewarding pose for applying object manipulation primitives. In our case, we IntelligentProcessAutomationandRoboticsLab(IPR), KarlsruheInstituteofTechnology(KIT)lars.berscheid, pascal.meissner, a b c d 1. Shift2. Grasp Fig. 1: Our setup of a Franka robotic arm including the stan- dard force-feedback gripper (a), an Ensenso depth camera (b), custom 3D-printed gripper jaws with anti-slip tape (c), and two industrial bins with objects (d). The robot learns fi rst grasping (2) and then shifting objects in order to explicitly increase grasp success (1). defi ne fi ve primitives: Three for grasping at different gripper widths and two for shifting. Second, we use that approach to learn grasping by estimating the grasp probability at a given pose. Third, we derive both a grasping-dependent reward function as well as a training procedure for shifting. This way, sparse rewards of the grasp success can be bypassed for more data-effi cient learning of shifting. Fourth, we present a robotic system (Fig. 1) which learns the industrial task of bin picking. Beyond the capabilities of the fi rst two steps, the system is able to empty bins completely and achieves arbitrary grasp rates at the expense of picks per hour (PPH). Furthermore, we evaluate the systems ability of grasping novel objects from non-graspable positions. II. RELATED WORK Object manipulation and in particular grasping are well- researched fi elds within robotics. Bohg et al. 1 differentiate between analytical and data-driven approaches to grasping. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE612 Historically, grasp synthesis was based on analytical con- structions of force-closure grasps 2. In comparison, data- driven approaches are defi ned by sampling and ranking possible grasps. Popular ranking functions include classi- cal mechanics and model-based grasp metrics 3, 2. As modeling grasps itself is challenging, even more complex interactions like motion planning of pre-grasping actions were studied less frequently. Within this scope, Dogar et Srinivasa 4 combined pushing and grasping into a single action, enabling them to grasp more cluttered objects from a table. Chang et al. 5 presented a method for rotating objects to fi nd more robust grasps for transport tasks. In recent years, the progress of machine learning in computer vision enabled robot learning based on visual input 6. As most approaches, in particular deep learn- ing, are limited by its data consumption, data generation becomes a fundamental challenge. Possible solutions were supposed: First, training in simulation with subsequent sim- to-real transfer showed great results for grasp quality es- timation 7, 8. However, as contact forces are diffi cult to simulate, training of more complex object interactions for pre-grasping manipulation might be challenging. Second, imitation learning deals with integrating expert knowledge by observing demonstrations 9. Third, training of a robot using real-world object manipulation in a self-supervised manner showed great results for generalizing to novel objects 10. Levine et al. 11 improved the grasp rate to 82.5%, at the cost of upscaling the training to a multitude of robots for 2 months. Data consumption of learning for robotic grasping can be minimized by utilizing space invariances and improving the data exploration strategy 12. More recently, robot learning of manipulation skills are formulated as reinforcement learning (RL) problems. For differentiation, we fi nd the action space either to be discrete, defi ned by a few motion primitives 12, 13, or continuous as a low-level control 14, 15. While the latter allows for an end-to-end approach for general grasping strategies, it comes with the cost of very sparse rewards and a high data consumption. Kalashnikov et al. 14 trained an expensive multi-robot setup for 580000 grasp attempts, resulting in an impressive grasp rate of 96% for unknown objects. Furthermore, the robots implicitly learned pre-grasping ma- nipulation like singularization, pushing and reactive grasping. In contrast, Zeng et al. 13 introduced a pushing motion primitive and learned grasping and pushing in synergy by rewarding the sparse grasp success. Using signifi cant less training data than 14, their robot was able to clear a table from tightly packed objects. III. SHIFT OBJECTS FOR GRASPING Reinforcement learning (RL) provides a powerful frame- work for robot learning. We introduce a Markov decision process (MDP) (S,A,T,r,p0) with the state space S, the action space A, the transition distribution T, the reward function r and the initial confi guration p0. Similar to other data-driven approaches, RL is limited by its challenging data consumption, and even more so for time-dependent tasks 1 5 5 32 Stride (2,2), BN Dropout (0.4) 5 5 48 BN Dropout (0.4) 5 5 64 BN Dropout (0.3) 6 6 142 BN Dropout (0.3) 1 1 128 Dropout (0.3) 1 1 |M| Fig. 2: Our fully-convolutional neural network (NN) archi- tecture, making use of both batch normalization (BN) and dropout for a given motion primitive set M. including sparse rewards. For this reason, we reduce our process to a single time step. Then, a solution to this MDP is a policy : S 7 A mapping the current state s S to an action a A. A. Spatial Learning of Object Manipulation Given the visual state space S, let s denote the ortho- graphic depth image of the scene. We simplify the action space to four parameters (x,y,a,d) A = R3 N in the planar subspace. The spatial coordinates (x,y,a) are given in the image frame, using the usual x- and y-coordinate and the rotation a around the axis z orthogonal to the image frame. While the relative transformation between the camera frame and tool center point (TCP) needs to be known, the absolute extrinsic calibration is learned. To get a full overview image of the object bin, we set the remaining angles b = c = 0 resulting in planar object manipulation. The fourth parameter d corresponds to the index within the discrete set of motion primitives M. The policy (s) = Q(s,a) is split into an action-value function Q and a selection function . Q(s,a) estimates the reward r for an action a given an orthographic depth image s. We introduce a sliding window s0 s that crops the orthographic image at the given translation (x,y) and rotation (a). The x-y-translation is implemented effi ciently as a fully convolutional NN, the rotation parameters by applying the NN on multiple pre-rotated images. The motion primitive set M is calculated by the number of output channels of the last convolutional layer. Fig. 2 shows the detailed architecture of the used NN. The training output has a size of (11|M|), corresponding to the size of the motion primitive set M. During inference, the NN calculates an output of size (40 40 |M|), corresponding to the (x,y,d) parameters. For the rotation a, the input image is pre-transformed and the NN is recalculated for 20 angles. This way, 32000 reward estimations are calculated for each motion primitive. Overall, the NN approximates the action- value function Q for a discrete set of actions a within four dimensions. The selection function maps the four dimensional ten- sor of reward predictions r to an action a. In RL, the greedy strategy using argmaxaQ(s,a) is commonly used 613 (a) w= 0.046%, 0w= 93.2%(b) w= 14.7%, 0w= 75.9%(c) w= 40.2%, 0w= 97.7% (d) w= 92.1%, 0w= 43.9%(e) w= 90.5%, 0w= 95.8%(f) w= 90.8%, 0w= 45.8% Fig. 3: Examples of depth images before (left) and after (right) an applied motion primitive. The maximal grasp probability within the red window wbefore and 0 w after are given below; their difference is then estimated by a fully-convolutional neural network. for suffi ciently trained systems. By returning one of the N maximum elements uniformly, we integrate a stochastic component so that possible manipulation failures are not repeated. The indices of the selected element at (x,y,a) are then transformed into the camera frame using the intrinsic calibration. The height z is read trivially from the depth image at position (x,y), adding an fi xed height offset for each specifi c motion primitive d. Finally, the algorithm outputs the pose (x,y,z,a,b = const,c = const) of a defi ned motion primitive d with an estimated reward Q. B. Action Space Exploration As the design of an exploration strategy is a key to fast and data-effi cient learning for robotic grasping 12, we introduce high-level strategies generalized for further object manipulation. We strictly divide between exploration (training) and exploitation (application) phase. Without prior information, the system explores the environment by sam- pling continuously and uniformly random poses within the hull of the action space A. Let defi ne the fraction of random samples. Due to our off-policy algorithm, we are able to combine an -based strategy with the following set of high- level strategies: 1) Maximizeself-informationcorrespondingto maxxlog P(x)withtheestimatedprobability mass function P(x). For manipulation tasks, actions with high absolute rewards are usually more seldom. Therefore, the action with the maximum reward estimation maxa|Q(s,a)| should be chosen. This conforms with the common -greedy strategy. In comparison, we fi nd that sampling corresponding to p(a) |Q(s,a)| yields a more extensive exploration. 2) Minimizeuncertaintyofpredictiongivenby maxaVarQ(s,a). In RL, this is usually added to the action-value function itself (for exploitation), leading to the common Upper confi dence bound (UCB) algo- rithm. We approximate the Bayesian uncertainty of our NN using Monte-Carlo dropout for variance sampling 16. 3) Minimize uncertainty of outcome for binary rewards r = 0,1 by choosing mina|Q(s,a) 1 2|. The system is not able to predict the outcome for those actions reliably, e.g. due to missing information or stochastic physics. C. Learning for Grasping A major contribution of our work is making one skill (shifting) explicitly dependent on the other (grasping). This way, we bypass the problem of sparse-rewards in time- dependent MDPs. Besides faster and more data-effi cient training, this also allows to learn the skills successively. Therefore, we can reuse successful approaches of learning for grasping 12 and focus on pre-grasping manipulation. Briefl y, we defi ne the set of motion primitives M as gripper clamping actions starting from three different pre-shaped gripper widths. The robots trajectory is given by the grasp pose (x,y,z,a,b,c) and its approach vector parallel to the gripper jaws. If the robot detects a collision with its internal force-sensor, the robot retracts a few millimeters and closes the gripper. Then, the object is lifted and the robot moves to a random pose above the fi ling bin. Then, the grasp success is measured using the force-feedback of the gripper. We defi ne the binary reward function rg(s) = ( 1 if grasp and fi ling successful, 0else. (1) For binary rewards, the grasping action-value function Qg(s,a) can be interpreted as a grasp probability . We train a NN mapping the image s to and use it for: (1) estimating the grasp probability at a given position (x,y,a), (2) calculating the best grasp (x,y,a,d), (3) calculating the maximum grasp probability in the entire bin , and (4) calculating the maximum grasp probability w(s,x,y,a) in 614 a window with a given side length centered around a given pose (x,y,a). D. Learning for Shifting We integrate prior knowledge about the relationship be- tween shifting and grasping by making the reward function for shifting explicitly dependent on the grasping probability. More precisely, the system predicts the infl uence of a motion primitive on the maximum grasp probability . We train a second NN using the reward function rs(s) = 1 2 ? w(s0,x,y,a) w(s,x,y,a) + 1 ? (2) mapping the image s to the difference of the maximum grasping probability in a window w(s,x,y,a) before s and after s0the manipulation primitive. Therefore, depth images before and after the shifting attempt are recorded and applied to the grasp probability NN. Additionally, the reward is re-normalized to rs 0,1. The window w is approximately 50% larger than for the grasping action, the latter corresponding roughly to the maximum gripper width. In contrast to the grasping reward function rg 0,1, estimating shift rewards is a regression task. We further denote the estimated reward for shifting Qs(s,a) as . The NN is trained optimizing the mean squared loss between the predicted and actual reward and rs. We defi ne two motion primitives for pre-grasping object manipulation. In both cases, the gripper closes completely and approaches the shift pose parallel to its gripper jaws. If a collision occurs, the approach motion stops. Then, the robot moves either 30mm in positive x-direction or positive y-direction of the gripper frame. Those two motion primitives are distinct as the gripper is asymmetric. Since data generation is the limiting factor in deep RL, it is important that training is self-supervised and requires as little human intervention as possible. To guarantee a continuous training, the system fi rst estimates the overall maximum grasping probability . If 0.2, the system tries to increase the grasping probability until 0.8 is reached. Then, the system tries to decrease the maximum grasping probability until 0.2 again. This is done by exploring the negative action-value-functionQs(s,a) while keeping the selection function constant. Training started with a single object in the bin, further ones were added over time. E. Combined Learning and Inference For the task of bin picking, grasping and shifting needs to be combined into a single controller. Beside inference itself, combined learning also enables matching data distributions for training and application. Firstly, let gbe a threshold probability deciding between a grasping and a shifting at- tempt. Secondly, let sdenote a threshold between a shift attempt and the assumption of an empty bin. As shown in Fig. 4, the system fi rst infers the maximum grasping probability . If is higher than gthe robot grasps, else it evaluates the shifting NN and estimates the maximum shifting reward . If is larger than sthe robot shifts and restart the grasp attempt, else it assumes the bin to Estimate max grasp probability Estimate max shift reward Grasp Shift Bin empty g g s 0.8. For this reason, the grasp rate looses its signifi cance as the major evaluation metric. Instead, we can directly optimize the industrially important metric of picks per hour (PPH). At low g, frequent failed grasps take a lot of time. For high g, the improved grasp rate comes with the cost of more shifts. Since some shifts might be superfl uous and worsen the PPH, an optimal threshold has to exist. Fig. 6 confi rms our expectation, resulting in an optimal threshold g 0.75. Interestingly, the corresponding grasp rate is less than 1. D. Generali
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年度醫(yī)院醫(yī)療廢物處理設(shè)施建設(shè)合同4篇
- 2025年度個人創(chuàng)業(yè)貸款合同示范文本8篇
- 二零二五年度美容院美容護膚品生產(chǎn)股份合作合同4篇
- 二零二五版木質(zhì)家具定制設(shè)計與生產(chǎn)加工合同3篇
- 二零二五年度電子商務(wù)平臺標(biāo)志設(shè)計及用戶體驗合同3篇
- 二零二五年度托盤租賃與供應(yīng)鏈金融結(jié)合合同范本3篇
- 二零二五年度昌平區(qū)食堂員工激勵與績效考核合同3篇
- 2025年度汽車租賃與品牌合作推廣合同范本3篇
- 二零二五年度城市綠化工程承包合同14篇
- 2025年度線上線下聯(lián)動大型促銷活動合作合同3篇
- 河北省滄州市五縣聯(lián)考2024-2025學(xué)年高一上學(xué)期期末英語試卷(含答案含含聽力原文無音頻)
- 急性肺栓塞搶救流程
- 《統(tǒng)計學(xué)-基于Python》 課件全套 第1-11章 數(shù)據(jù)與Python語言-時間序列分析和預(yù)測
- 《形象價值百萬》課件
- 紅色文化教育國內(nèi)外研究現(xiàn)狀范文十
- 中醫(yī)基礎(chǔ)理論-肝
- 小學(xué)外來人員出入校門登記表
- 《土地利用規(guī)劃學(xué)》完整課件
- GB/T 25283-2023礦產(chǎn)資源綜合勘查評價規(guī)范
- 《汽車衡全自動智能稱重系統(tǒng)》設(shè)計方案
- 義務(wù)教育歷史課程標(biāo)準(zhǔn)(2022年版)
評論
0/150
提交評論