Home /Research /Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study

OTHER

Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study

Kai Goebel, Patrik Zips

Year: 2025
Access: Open access

Abstract

Recent advancements in Large Language Models have sparked interest in their potential for robotic task planning. While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain. This paper presents a systematic evaluation of a broad spectrum of current state of the art language models, each directly prompted using Planning Domain Definition Language domain and problem files, and compares their planning performance with the Fast Downward planner across a variety of benchmarks. In addition to measuring success rates, we assess how faithfully the generated plans translate into sequences of actions that can actually be executed, identifying both strengths and limitations of using these models in this setting. Our findings show that while the models perform well on simpler planning tasks, they continue to struggle with more complex scenarios that require precise resource management, consistent state tracking, and strict constraint compliance. These results underscore fundamental challenges in applying language models to robotic planning in real world environments. By outlining the gaps that emerge during execution, we aim to guide future research toward combined approaches that integrate language models with classical planners in order to enhance the reliability and scalability of planning in autonomous robotics.

Keywords

cs.ROcs.AI

Can LLM-Reasoning Models Replace Classical Planning? A Benchmark Study

Abstract

Keywords

Related papers

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection