DexSim2Real2:Building Explicit World Model for Precise Articulated Object Manipulation

DexSim2Real²:Building Explicit World Model for Precise Articulated Object Manipulations

Department of Mechanical Engineering, Tsinghua University
JD Explore Academy

Abstract

Articulated object manipulation is ubiquitous in daily life. In this paper, we present DexSim2Real², a novel robot learning framework for goal-conditioned articulated object manipulation using both two-finger grippers and multi-finger dexterous hands.

The key of our framework is constructing an explicit world model of unseen articulated objects through active one-step interactions. This explicit world model enables sampling-based model predictive control to plan trajectories achieving different manipulation goals without needing human demonstrations or reinforcement learning.

It first predicts an interaction motion using an affordance estimation network trained on self-supervised interaction data or videos of human manipulation from the internet. After executing this interaction on the real robot, the framework constructs a digital twin of the articulated object in simulation based on the two point clouds before and after the interaction. For dexterous multi-finger manipulation, we propose to utilize eigengrasp to reduce the high-dimensional action space, enabling more efficient trajectory searching. Extensive experiments validate the framework's effectiveness for precise articulated object manipulation in both simulation and the real world using a two-finger gripper and a 16-DoF dexterous hand.

The robust generalizability of the explicit world model also enables advanced manipulation strategies, such as manipulating with different tools.

Method

Our framework consists of three phases. (1) Given a partial point cloud of an unseen articulated object, in the Interactive Perception phase, we train an affordance prediction module and use it to change the object's joint state through a one-step interaction. Training data can be acquired through self-supervised interaction in simulation or from egocentric human demonstration videos. (2) In the Explicit Physics Model Construction phase, we build a mental model in a physics simulator from the two point clouds. (3) In the Sampling-based Model Predictive Control phase, we use the model to plan a long-horizon trajectory in simulation and then execute the trajectory on the real robot to complete the task. For dexterous hands, an eigengrasp module is needed for dimensionality reduction.

Main Results

For complex manipulation tasks, dexterous hand can complete tasks in fewer steps than 2-finger gripper

We validate effectiveness of EigenGrasp method in reducing the action dimension of a dexterous hand by evaluating three factors: success rate, joint jerk, and algorithm running time.

For scenarios that object may beyond the robot's reach or the gripper cannot fit into the object, we uses a T-shaped tool or a semi-ring to interact with the object.

DexSim2Real2:Building Explicit World Model for Precise Articulated Object Manipulations

Abstract

Video

Method

Main Results

DexSim2Real²:Building Explicit World Model for Precise Articulated Object Manipulations