HandMeThat: Human-Robot Communication in Physical and Social Environments

Abstract

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

Publication
In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Please refer to our website http://handmethat.csail.mit.edu for more details.

Yanming Wan
Yanming Wan
宛彦明

My research interest is to build AI systems that can better understand human intentions and take actions in the physical world for humans and for themselves.