A Benchmark on LLM-Based Power Flow Computation: Do More Structured Prompts Help?
Tingwei Chen, Kaiyang Huang, Kai Sun
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
We present a controlled benchmark evaluating three LLMs -- Claude Sonnet 4.5, Gemini 2.5 Pro, and GPT-3.5 Turbo -- across four prompt formats (from concise narrative to structured JSON with explicit iteration trace) on Gauss--Seidel AC power flow computation for a three-bus system. Against 50 test cases with reference solutions computed numerically, Gemini 2.5 Pro with the simplest narrative prompt achieves the lowest mean absolute error (MAE = 0.257 MW/MVar, 54\% of cases within 5\% relative error), while the same model with a JSON-structured prompt raises MAE to 0.789 -- a 3.1$\times$ increase. Adding a worked example degrades accuracy for Gemini but provides a marginal gain for Claude. GPT-3.5 Turbo fails on at least 90\% of cases under all prompt formats. An independent 100-case replication with related prompt-format families confirms the qualitative ordering (Gemini $>$ Claude $>$ GPT-3.5): the best 100-case configuration (Gemini with explicit iteration trace) achieves MAE = 0.402 and 53\% within 5\%, while Claude Sonnet 4.5's near-flat accuracy profile ($\approx$38\% within 5\% across formats) and GPT-3.5's near total ineffectiveness (92--97\% above 20\% error) both replicate. In neither evaluation does any configuration achieve sufficient reliability for use as a direct numerical solver. These findings offer a diagnostic baseline for practitioners and researchers evaluating LLMs for smart-grid decision-support assistance.
关键词
相关论文
一种面向线弧增材制造的电动汽车结构可制造性拓扑优化的双环框架
Qiang Cui, Chuan Yu, Daoqian Yang 等 5 位作者
Robotics and Computer-Integrated Manufacturing · 2026
几何数字孪生:一种用于航空发动机装配精度预测的数字智能模型
Ke Shang, Xin Jin, Teli Xu 等 7 位作者
Robotics and Computer-Integrated Manufacturing · 2026
通过人工智能驱动的机器人技术革新产业
Aryan Chaudhary
Recent Advances in Computer Science and Communications · 2026
新型大口径偏置馈电可展开天线设计与动态性能预测
Chuang Shi, Tianming Liu, Ning Xue 等 9 位作者
Aerospace Science and Technology · 2026