Home /Research /Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis

OTHER

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis

Mete Kara, Erkan Özduran, Müge Mercan Kara, İlhan Celil Özbek, Volkan Hancı

Year: 2025
Citations: 29

Abstract

Ankylosing spondylitis (AS), which usually occurs in the second and third decades of life, is associated with chronic pain, limitation of mobility, and severe decreases in quality of life. This study aimed to make a comparative evaluation in terms of the readability, information accuracy and quality of the answers given by artificial intelligence (AI)-based chatbots such as ChatGPT, Perplexity and Gemini, which have become popular with the widespread access to medical information, to user questions about AS, a chronic inflammatory joint disease. In this study, the 25 most frequently queried keywords related to AS determined through Google Trends were directed to each 3 AI-based chatbots. The readability of the resulting responses was evaluated using readability indices such as Simple Gunning Fog (GFOG), Flesch Reading Ease Score (FRES) and Measure of Gobbledygook (SMOG). The quality of the responses was measured by Ensuring Quality Information for Patients (EQIP) and Global Quality Score (GQS) scores, and the reliability was measured using the modified DISCERN and Journal of American Medical Association (JAMA) scales. According to Google Trends data, the most frequently searched keywords related to AS are "Ankylosing spondylitis pain", "Ankylosing spondylitis symptoms" and "Ankylosing spondylitis disease", respectively. It was found that the readability levels of the answers produced by AI-based chatbots were above the 6th grade level and showed a statistically significant difference (p < 0.001). In EQIP, JAMA, mDISCERN and GQS evaluations, Perplexity stood out in terms of information quality and reliability, receiving higher scores compared to other chat robots (p < 0.05). It has been found that the answers given by AI chatbots to AS-related questions exceed the recommended readability level and the reliability and quality assessment raises concerns due to some low scores. It is possible for future AI chatbots to have sufficient quality, reliability and appropriate readability levels with an audit mechanism in place.

Keywords

ReadabilityAnkylosing spondylitisMedicinePerplexityQuality ScoreReliability (semiconductor)Quality (philosophy)Physical therapyMedical physicsInternal medicine

Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis

Abstract

Keywords

Related papers

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection