Tokyo University of Science Unveils HEAPGrasp: A Breakthrough for Robotic Handling of Transparent and Mirror-Like Objects

Tokyo University of Science develops HEAPGrasp, using silhouettes and RGB cameras to help robots grasp transparent and shiny objects with 96% success.

By: AXL Media

Published: Mar 30, 2026, 7:20 AM EDT

Source: Information for this report was sourced from Tokyo University of Science via EurekAlert!

Tokyo University of Science Unveils HEAPGrasp: A Breakthrough for Robotic Handling of Transparent and Mirror-Like Objects - article image
Tokyo University of Science Unveils HEAPGrasp: A Breakthrough for Robotic Handling of Transparent and Mirror-Like Objects - article image

The Challenge of Optical Properties in Automation

As industries ranging from logistics to food service move toward total automation, the ability for robots to handle diverse materials has become a primary hurdle. Conventional 3D measurement systems, such as depth sensors and LIDAR, struggle significantly with transparent glass, clear plastics, and highly reflective metal parts. Because these materials distort or fail to reflect light in predictable ways, they often appear "invisible" or "malformed" to standard sensors, leading to grasping failures that necessitate human intervention.

Introducing the HEAPGrasp Methodology

To overcome these optical bottlenecks, Associate Professor Shogo Arai and Mr. Ginga Kennis from the Tokyo University of Science developed HEAPGrasp (Hand-Eye Active Perception to Grasp). Unlike traditional methods that rely on light-bouncing sensors, HEAPGrasp operates on the principle that an object's contour or silhouette remains reliable even if its surface is clear or mirror-like. By focusing strictly on the edges of objects against a background, the system bypasses the complexities of transparency and reflectivity entirely.

Reconstructing 3D Shapes from 2D Silhouettes

The core of the HEAPGrasp system involves a computer vision technique known as "Shape from Silhouette" (SfS). The process begins with a single hand-eye RGB camera capturing images from multiple viewpoints. A deep learning architecture—specifically DeepLabv3+ with ResNet-50—performs semantic segmentation to separate the object's pixels from the background. By intersecting the "visual cones" created by these silhouettes from different angles, the system reconstructs a 3D volume that represents the object’s true shape and position in space.

Categories

Topics

Related Coverage