HandMeThat: Human-Robot Communication in Physical and Social Environments

Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum

November, 2022

Abstract

We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments. While previous datasets primarily focused on language grounding and planning, HandMeThat considers the resolution of human instructions with ambiguities based on the physical (object states and relations) and social (human actions and goals) information. HandMeThat contains 10,000 episodes of human-robot interactions. In each episode, the robot first observes a trajectory of human actions towards her internal goal. Next, the robot receives a human instruction and should take actions to accomplish the subgoal set through the instruction. In this paper, we present a textual interface for our benchmark, where the robot interacts with a virtual environment through textual commands. We evaluate several baseline models on HandMeThat, and show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat, suggesting significant room for future work on physical and social human-robot communications and interactions.

Type

Conference paper

Publication

In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track

Please refer to our website http://handmethat.csail.mit.edu for more details.

HandMeThat: Human-Robot Communication in Physical and Social Environments

Abstract

Yanming Wan

宛彦明