Authors
Chen Zhu*, Takafumi Matsumaru
Graduate School of Information, Production and Systems, Waseda University,
2-7 Hibikino, Wakamatsu Kitakyushu, Fukuoka 808-0135, Japan
*Corresponding author. Email: [email protected]
Corresponding Author
Chen Zhu
Received 9 November 2018, Accepted 19 November 2018, Available Online 25
June 2019.
DOI
https://doi.org/10.2991/jrnal.k.190531.008
Keywords
Image processing; robotics picking; deep learning; COCO dataset
Abstract
In this research, six brands of soft drinks are decided to be picked up
by a robot with a monocular Red Green Blue (RGB) camera. The drinking bottles
need to be located and classified with brands before being picked up. The
Mask Regional Convolutional Neural Network (R-CNN), a mask generation network
improved from Faster R-CNN, is trained with common object in contest datasets
to detect and generate the mask on the bottles in the image. The Inception
v3 is selected for the brand classification task. Around 200 images are
taken or found at first; then, the images are augmented to 1500 images
per brands by using random cropping and perspective transform. The result
shows that the masked image can be labeled with its brand name with at
least 85% accuracy in the experiment.
Copyright
© 2019 The Authors. Published by ALife Robotics Corp. Ltd.
Open Access
This is an open access article distributed under the CC BY-NC 4.0 license
(http://creativecommons.org/licenses/by-nc/4.0/).
Download article (PDF)