Home /Research /Grounding Large Language Models in Robot Control: Facilitating Human-Robot Collaboration in Construction
HRI

Grounding Large Language Models in Robot Control: Facilitating Human-Robot Collaboration in Construction

Xiayu Zhao, Houtan Jebelli

Year
2025
Citations
1

Abstract

Human-robot Collaboration (HRC) is gaining traction in the construction industry, a development driven by the industry’s ongoing efforts to enhance efficiency, safety, and precision in complex construction tasks. A key challenge in implementing HRC is the complexity for human workers to intuitively interact and communicate with pre-programmed construction robots, hindering HRC’s broader adoption in construction environments. This study aims to provide an AI-driven robot control code generation framework, facilitating seamless and efficient interaction between workers and robots through the understanding of unrestricted spoken instructions. The framework utilized Large Language Models (LLMs) to interpret both the captured site images and spoken instruction language, generating code by synthesizing both natural language and visual inputs. This framework utilized Code as Policies, a language model program category that translates natural language into executable code for robots, responsive to both environmental perception and execution on the robot. To evaluate the efficacy of this system, the case study employed a robot arm with a series of construction tasks in a simulated environment. The robot arm’s end effectors include a pen holder, a 3-finger gripper, a nail gun with a 3D-printed custom adapter, and an RGB-D camera for window panel installation, while spoken instructions from human workers are captured through a microphone. This method demonstrated high pass rates of 92.6%, 88.2%, and 70.1% across four distinct tasks, showcasing its ability to comprehend unstructured human speech. This research paves the way for integrating LLMs into construction robotics, significantly lowering the operational barriers for effective worker-robot collaboration.

Keywords

ExecutableRobotNatural languageSpoken languageCode (set theory)Window (computing)Human–robot interactionNatural language generation

Related papers

Browse all HRI papers