Real-Time Textless Dialogue Generation

Long Mai, Julie Carson-Berndsen

发表年份: 2025
访问权限: 开放获取

摘要

Recent advancements in large language models (LLMs) have led to significant progress in text-based dialogue systems. These systems can now generate high-quality responses that are accurate and coherent across a wide range of topics and tasks. However, spoken dialogue systems still lag behind in terms of naturalness. They tend to produce robotic interactions, with issues such as slow response times, overly generic or cautious replies, and a lack of natural rhythm and fluid turn-taking. This shortcoming is largely due to the over-reliance on the traditional cascaded design, which involve separate, sequential components, as well as the use of text as an intermediate representation. This paper propose a real-time, textless spoken dialogue generation model (RTTL-DG) that aims to overcome these challenges. Our system enables fluid turn-taking and generates responses with minimal delay by processing streaming spoken conversation directly. Additionally, our model incorporates backchannels, filters, laughter, and other paralinguistic signals, which are often absent in cascaded dialogue systems, to create more natural and human-like interactions. The implementations and generated samples are available in our repository: https://github.com/mailong25/rts2s-dg

关键词

cs.CLcs.AIcs.SDeess.AS

Real-Time Textless Dialogue Generation

摘要

关键词

相关论文

一种面向线弧增材制造的电动汽车结构可制造性拓扑优化的双环框架

几何数字孪生：一种用于航空发动机装配精度预测的数字智能模型

新型大口径偏置馈电可展开天线设计与动态性能预测

通过人工智能驱动的机器人技术革新产业