Benchmarking Large Language Models in Evaluating Workforce Risk of Robotization: Insights from Agriculture
Lefteris Benos, Vasso Marinoudi, Patrizia Busato, Dimitrios Katerıs, Simon Pearson, Dionysis Bochtis
- Year
- 2025
- Citations
- 1
- Access
- Open access
Abstract
Understanding the impact of robotization on the workforce dynamics has become increasingly urgent. While expert assessments provide valuable insights, they are often time-consuming and resource-intensive. Large language models (LLMs) offer a scalable alternative; however, their accuracy and reliability in evaluating workforce robotization potential remain uncertain. This study systematically compares general-purpose LLM-generated assessments with expert evaluations to assess their effectiveness in the agricultural sector by considering human judgments as the ground truth. Using ChatGPT, Copilot, and Gemini, the LLMs followed a three-step evaluation process focusing on (a) task importance, (b) potential for task robotization, and (c) task attribute indexing of 15 agricultural occupations, mirroring the methodology used by human assessors. The findings indicate a significant tendency for LLMs to overestimate robotization potential, with most of the errors falling within the range of 0.229 ± 0.174. This can be attributed primarily to LLM reliance on grey literature and idealized technological scenarios, as well as their limited capacity, to account for the complexities of agricultural work. Future research should focus on integrating expert knowledge into LLM training and improving bias detection and mitigation in agricultural datasets, as well as expanding the range of LLMs studied to enhance assessment reliability.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991