首页 /研究 /VST-LLM HRI: Multimodal Human-Robot Interaction via Large Language Model Prompts

HRI

VST-LLM HRI: Multimodal Human-Robot Interaction via Large Language Model Prompts

Weikai Ding, Shijun Xiao, Zhengguo Zhu, Teng Chen, Guoteng Zhang

发表年份: 2025
引用次数: 2

摘要

This paper proposes a Visual-Speech-Text Large Language Model framework for Human-Robot Interaction (VSTLLM HRI). By designing a Modality Language Model (MLM), the framework achieves a closed-loop system for robot perception, task planning, and control. Without requiring fine-tuning of the Large Language Model (LLM), the framework leverages visual semantic extraction, speech command conversion, and prompt engineering guidance to accomplish tasks. We conducted experiments on a bipedal robot to validate the adaptability and control performance of the framework in complex terrain task scenarios. The experimental results demonstrated that the proposed method exhibited good generalization capabilities. The related project files and programs have been uploaded to https://github.com/dwk-Suga/LLMandVLM.git.

关键词

Task (project management)RobotModality (human–computer interaction)GeneralizationTask analysisLanguage understandingAdaptabilityTerrainNatural language

VST-LLM HRI: Multimodal Human-Robot Interaction via Large Language Model Prompts

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

A new optimizer using particle swarm theory

Self-Organizing Maps