Autoderm

The landscape of virtual healthcare is rapidly transforming, with patients increasingly turning online for answers to their health questions. This rising tide of digital messages has unfortunately left many physicians feeling overwhelmed, contributing to burnout and potentially resulting in delayed or lower-quality responses for patients.

It's hard to find someone who hasn't asked a health question from AI chatting tools, nowadays. In this evolving context, AI assisting doctors in managing patient inquiries is emerging as a potential solution, but the crucial question remains in the ongoing debate of doctors vs AI:

Are these tools a genuinely helpful solution for healthcare professionals? or Do they present a risky shortcut in patient care?

To shed light on this crucial inquiry, a recent study led by Dr. John W. Ayers and his colleagues rigorously investigated how well AI chatbots could answer patient health questions compared to human physicians, utilizing real-world interactions from a public social media platform. This article will focus on the key findings of that research.[1]

Why This Matters Now

The arrival of advanced artificial intelligence (AI) models, like OpenAI's ChatGPT-4omni (GPT-4o), marks a significant step forward for virtual healthcare and telemedicine. GPT-4o can process audio, visual, and text in real-time, greatly improving its ability to understand natural language across different languages [2]. New features such as "Temporary Chat" also offer improved privacy during virtual interactions, which could help these tools integrate more seamlessly into healthcare systems.

These advancements promise clearer communication, easier integration of medical images, and stronger data privacy in online consultations, all of which highlight just how relevant AI is becoming in modern healthcare.

The Problem: Too Many Messages, Too Little Time

The rapid rise in electronic patient messages has unfortunately created a heavy burden for healthcare professionals. This surge contributes to physician burnout and can lead to slower or less thorough responses for patients. Many of these messages are complex requests for medical advice, demanding considerable time and expertise. In fact, reports show a significant increase in these messages, each adding an average of 2.3 minutes of work to a doctor's day, often pushing work into after-hours.[2]

While some strategies like limiting notifications, billing for responses, or delegating tasks to support staff have been tried to ease this load, they often come with drawbacks that can limit patients' access to quality care. Given these challenges, artificial intelligence (AI) assistants are emerging as a promising, though still largely explored, resource for managing this overwhelming volume of communications.

ChatGPT, a new generation of AI powered by advanced large language models, quickly gained widespread use shortly after its release. While this system wasn't specifically designed for healthcare, its potential to assist with patient questions was largely unexplored. Therefore, this research offers timely insights into many of the challenges facing healthcare communication today.

The Study: Comparing AI and Doctors Head-to-Head

Where the Data Came From

To conduct this cross-sectional study, researchers utilized a public and non-identifiable database of patient questions from Reddit's r/AskDocs, a social media forum where verified physicians provide responses. From this source, 195 exchanges from October 2022, each containing a public question and a physician's verified response, were randomly selected. To ensure a fair comparison, responses from an AI chatbot (ChatGPT 3.5) were generated by entering each original question into a new, clean session on December 22 and 23, 2022, preventing any influence from prior interactions.

How Responses Were Rated

A rigorous evaluation process was undertaken by a team of licensed healthcare professionals who assessed the original questions along with the anonymized and randomly ordered physician and chatbot responses in triplicate. Evaluators were asked to determine "which response was better" and specifically judged two key aspects:

The quality of information provided
The empathy or bedside manner provided

Quality was rated on a 1 to 5 scale (very poor to very good), and empathy was similarly assessed on a 1 to 5 scale (not empathetic to very empathetic). The mean outcomes for both criteria were then compared between the chatbot and physician responses.

Key Findings: AI Outperformed Doctors in Most Cases

The study's findings revealed a notable preference for AI chatbot responses over those provided by physicians. Evaluators indicated a preference for chatbot responses in 78.6% (95% CI, 75.0%-81.8%) of the 585 evaluations [2].

Crucially, chatbot responses were rated as being of significantly higher quality than physician responses (P < .001) [2]. For instance, the proportion of responses rated as good or very good quality (scores ≥4) was substantially higher for the chatbot (78.5%; 95% CI, 72.3%-84.1%) compared to physicians (22.1%; 95% CI, 16.4%-28.2%), indicating a 3.6-fold higher prevalence of high-quality responses from the AI [2].

Regarding response length, it was observed that physician responses were significantly shorter, averaging 52 words (interquartile range [IQR], 17-62 words), compared to chatbot responses which averaged 211 words (IQR, 168-245 words; P < .001) [2].

Furthermore, chatbot responses were found to be significantly more empathetic than physician responses (P < .001) [2]. The proportion of responses rated as empathetic or very empathetic (scores ≥4) was considerably higher for the chatbot (45.1%; 95% CI, 38.5%-51.8%) compared to physicians (4.6%; 95% CI, 2.1%-7.7%), representing a 9.8-fold higher prevalence of empathetic responses from the AI [2].

Statistics at a Glance

Metric	Physician Responses	Chatbot Responses
Average Word Count	52	211
Preferred by Evaluators	–	78.6% of the time
Rated Good/Very Good Quality	22.1%	78.5%
Rated Empathetic/Very Empathetic	4.6%	45.1%

Why the AI Responses Were Preferred

The study's evaluators consistently preferred AI chatbot responses over those from physicians, largely due to several key characteristics demonstrated by the AI.

Longer, More Complete Answers

The chatbot provided significantly more detailed and comprehensive responses, which evaluators found highly valuable. This ability to generate more extensive information could potentially assist clinicians by drafting initial messages for physicians or support staff to review and edit, thereby saving time and enhancing productivity in managing patient inquiries.

Higher Empathy

Chatbot responses frequently employed reassuring and compassionate language, leading to significantly higher empathy ratings compared to physician replies. This empathetic communication could play a crucial role in improving patient satisfaction and potentially fostering better adherence to medical advice.

Consistency

ChatGPT demonstrated a more uniform tone and depth across its responses, contrasting with the variability observed in physician replies. This consistency in communication could help standardize information delivery and ensure a more reliable patient experience.

What This Could Mean for Healthcare

The promising results from the study suggest several significant implications for the future of healthcare, particularly in how patient communications are managed. Integrating AI assistants into clinical workflows could offer substantial benefits:

Boosting Productivity

AI's ability to draft initial responses allows healthcare professionals to efficiently review and edit communications. This could lead to considerable time savings, reducing the administrative burden on clinicians and helping to alleviate physician burnout. It frees up valuable time that can then be redirected toward more complex patient care tasks.

Better Patient Access

By providing high-quality, empathetic, and consistent responses, AI assistants could potentially reduce the need for unnecessary in-person visits, optimizing resource allocation within healthcare systems. This also holds particular promise for fostering greater equity in patient access, especially for individuals facing mobility limitations, irregular work hours, or concerns about medical costs, who may rely more heavily on digital messaging.

Enhanced Patient Outcomes

Faster, clearer, and more empathetic digital communication may positively influence patient behaviors. This could lead to improved medication adherence, better compliance with treatment plans (e.g., diet), and fewer missed appointments, ultimately contributing to better overall patient health outcomes.

"While the potential benefits are substantial, it is important to recognize that framing the discussion as 'AI vs doctors' is unnecessary. Instead, the focus should be on how AI can assist healthcare providers, enhancing their capabilities and supporting the delivery of care."

Limitations to Keep in Mind

While the study offers valuable insights, it's crucial to acknowledge its limitations when considering the broader implications of AI in healthcare messaging.

Contextual Differences: The study utilized questions from a public online forum (Reddit), which may not fully reflect typical patient-physician interactions in a clinical setting with comprehensive patient histories.
AI Access and Verification: The AI chatbot did not have access to patients' medical records, and its accuracy was not independently verified beyond being a subcomponent of the quality evaluation [2].
Personalization and Relationships: Physicians often personalize answers based on established patient relationships, a nuanced aspect that AI cannot yet fully replicate.
Evaluator Perspective: The evaluators were healthcare professionals, not patients themselves, meaning real-world patient perceptions of empathy and helpfulness might differ [2].
Scope of Inquiry: The study primarily focused on general clinical questions, not other common patient messages like appointment requests or medication refills.
Methodological Nuances: The summary measures for quality and empathy were not pilot-tested or validated, and the longer length of chatbot responses could have inadvertently influenced empathy ratings [2].
Ethical Considerations: The use of AI in healthcare raises ethical concerns, including the potential for hallucinations, misinformation, and bias in AI-generated content, underscoring the critical need for human review of all AI-generated replies for accuracy [2].

What's Next? Responsible Use of AI in Medicine

While this cross-sectional study has demonstrated promising results for the use of AI assistants in responding to patient questions, it is crucial to acknowledge that further comprehensive research is necessary before definitive conclusions can be drawn regarding their full potential and impact in diverse clinical settings.

Moving forward, responsible implementation of AI in medicine will require focus on several key areas:

Future Trials: Robust, large-scale trials are essential to rigorously test AI-assisted messaging in real-world clinical environments, moving beyond simulated settings.
Patient Reactions and Oversight: It is vital to explore patient reactions to AI-generated communications, establish comprehensive safety checks, and develop clear oversight processes to ensure patient well-being and trust.
Addressing Ethical Concerns: Proactive measures must address critical ethical considerations, including the potential for AI "hallucinations," the spread of misinformation, and inherent biases within AI-generated health content.
Mandatory Human Review: Crucially, every AI-generated reply intended for patient consumption must undergo thorough review by a qualified healthcare professional to ensure accuracy, safety, and appropriate context.

Conclusion: A New Tool, Not a New Doctor

The findings from this study suggest that artificial intelligence, particularly advanced chatbot technology, holds significant promise as a valuable tool within healthcare. As demonstrated, AI can generate helpful and empathetic responses to patient questions, at times even surpassing the quality observed from human physicians navigating overwhelming workloads.

It is crucial to emphasize that AI is not positioned to replace the invaluable role of a General Practitioner or any healthcare professional. Instead, these technologies can evolve into trusted assistants working behind the scenes, augmenting human capabilities rather than substituting them.

With careful implementation, robust clinical oversight, and a commitment to addressing the ethical considerations and limitations identified, chatbot technology has the potential to make healthcare more efficient, accessible, and, paradoxically, more human, not less. By thoughtfully integrating AI into existing workflows, we can empower clinicians, enhance patient communication, and ultimately contribute to improved health outcomes!

References

1. Ayers, J. W., Poliak, A., Dredze, M., Leas, E. C., Zhu, Z., Kelley, J. B., Faix, D. J., Goodman, A. M., Longhurst, C. A., Hogarth, M., & Smith, D. M. (2023).Comparing physician and artificial intelligence Chatbot responses to patient questions posted to a public social media forum.JAMA Internal Medicine, 183(6), 589. https://doi.org/10.1001/jamainternmed.2023.1838

2. Temsah, M., Jamal, A., Alhasan, K., Aljamaan, F., Altamimi, I., Malki, K. H., Temsah, A., Ohannessian, R., & Al-Eyadhy, A. (2024).Transforming Virtual Healthcare: The Potentials of ChatGPT-4OMNI in Telemedicine. Cureus. https://doi.org/10.7759/cureus.61377