首页 /研究 /Semiautomatic Labeling for Deep Learning in Robotics
PERCEPTION

Semiautomatic Labeling for Deep Learning in Robotics

Daniele De Gregorio, Alessio Tonioni, Gianluca Palli, Luigi Di Stefano

发表年份
2019
引用次数
2

摘要

In this article, we propose an augmented reality semiautomatic labeling (ARS), a semiautomatic method which leverages on moving a 2-D camera by means of a robot, proving precise camera tracking, and an augmented reality pen (ARP) to define initial object bounding box, to create large labeled data sets with minimal human intervention. By removing the burden of generating annotated data from humans, we make the deep learning technique applied to computer vision, which typically requires very large data sets, truly automated and reliable. With the ARS pipeline, we created two novel data sets effortlessly, one on electromechanical components (industrial scenario) and other on fruits (daily-living scenario) and trained two state-of-the-art object detectors robustly, based on convolutional neural networks, such as you only look once (YOLO) and single shot detector (SSD). With respect to conventional manual annotation of 1000 frames that takes us slightly more than 10 h, the proposed approach based on ARS allows to annotate 9 sequences of about 35 000 frames in less than 1 h, with a gain factor of about 450. Moreover, both the precision and recall of object detection is increased by about 15% with respect to manual labeling. All our software is available as a robot operating system (ROS) package in a public repository alongside with the novel annotated data sets. Note to Practitioners-This article was motivated by the lack of a simple and effective solution for the generation of data sets usable to train a data-driven model, such as a modern deep neural network, so as to make them accessible in an industrial environment. Specifically, a deep learning robot guidance vision system would require such a large amount of manually labeled images that it would be too expensive and impractical for a real use case, where system reconfigurability is a fundamental requirement. With our system, on the other hand, especially in the field of industrial robotics, the cost of image labeling can be reduced, for the first time, to nearly zero, thus paving the way for self-reconfiguring systems with very high performance (as demonstrated by our experimental results). One of the limitations of this approach is the need to use a manual method for the detection of objects of interest in the preliminary stages of the pipeline (ARP or graphical interface). A feasible extension, related to the field of collaborative robotics, could be used to exploit the robot itself, manually moved by the user, even for this preliminary stage, so as to eliminate any source of inaccuracy.

关键词

Artificial intelligenceComputer sciencePipeline (software)Deep learningConvolutional neural networkMinimum bounding boxObject (grammar)Object detectionComputer visionRobotics

相关论文

查看 PERCEPTION 分类全部论文