Nattapat Koomklang1, Prem Gamolped1, Eiji Hayashi1, Abbe Mowshowitz2
1Department of Mechanical Information Science and Technology, Kyushu Institute
of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
2Department of Computer Science, The City College of New York, 160 Convent
Avenue, New York, NY 10031, USA
pp. 33–38
ABSTRACT
This research paper presents a novel approach for accurate weight estimation
in robotic manipulation of noodle-like objects. The proposed approach combines
vision transformer and autoencoder techniques with action data and RGB-D
encoding to enhance the capabilities of robots in manipulating objects
with varying weights. A deep-learning neural network is introduced to estimate
the grasping action of a robot for picking up noodle-like objects using
RGB-D camera input, a 6-finger gripper, and Cartesian movement. The hardware
setup and characteristics of the noodle-like objects are described. The
study builds upon previous work in RGB-D perception, weight estimation,
and deep learning, addressing the limitations of existing methods by incorporating
robot actions. The effectiveness of vision transformers, autoencoders,
self-supervised deep reinforcement learning, and deep residual learning
in robotic manipulation is discussed. The proposed approach leverages the
Transformer network to encode sequential and spatial information for weight
estimation. Experimental evaluation on a dataset of 20,000 samples collected
from real environments demonstrates the effectiveness and accuracy of the
proposed approach in grappling noodle-like objects. This research contributes
to advancements in robotic manipulation, enabling robots to manipulate
objects with varying weights in real-world scenarios.
ARTICLE INFO
Article History
Received 02 December 2022
Accepted 17 August 2023
Keywords
Robotic manipulation
Weight estimation
Noodle-like objects
Vision transformer
Autoencoder
RGB-D encoding
Deep learning
Transformer network
JRNAL10105
Download article(PDF)