437 - Evaluating the impact of ChatGPT on medical conference abstracts

Saturday, May 4, 2024

3:30 PM - 6:00 PM ET

Poster Number: 437
Publication Number: 437.1566

Presenting Author(s)

Jocelyn Gravel, MD, MSc (he/him/his)

Attending
CHU Sainte-Justine
Montreal, Quebec, Canada

Background: Since its launch in November 2022, ChatGPT has received recognition for its use in medical writing. Some even suggested it could write scientific abstracts. However, its real-life effectiveness to help researchers improve conference abstracts remains unexplored.

Objective: To evaluate whether ChatGPT 4.0 could improve the quality of abstracts submitted to a medical conference by clinical researchers with various experiences.

Design/Methods: In October 2023, we conducted an experimental study involving 24 international researchers who provided one “original” abstract intended for submission at the 2024 Pediatric Academic Society (PAS) conference. Our research team created a prompt asking ChatGPT-4 to improve the quality of the abstract while adhering PAS submission guidelines. Researchers received the revised version and were tasked with creating a “final” abstract. The quality of each version (original, ChatGPT and final) was evaluated by the researchers themselves using a numeric scale (0-100). Additionally, three co-investigators assessed abstracts blinded to the version. The primary analysis focused on the mean difference in scores between the final and original abstracts. We also compared scores of the three versions and determined the proportion of researchers reporting ChatGPT's impact on their abstract and its acceptance probability. For an effect size of 0.7, it was estimated that we needed at least 20 pairs of abstracts.

Results: Abstract quality varied between the three versions with mean scores of 82, 65 and 90 for the original, ChatGPT and final versions, respectively. Overall, the final version displayed significantly improved quality compared to the original (mean difference 8.0 points; 95% CI: 5.6-10.3). Of note, 3 (14%) abstracts remained unchanged between final and original versions. Independent ratings by the co-investigator confirmed statistical improvements (mean difference 0.94 points; 95% CI: 0.44-1.44). Researchers identified minor (n=10) and major (n=3) factual errors in ChatGPT’s abstracts. However, 18 (75%) participants reported that ChatGPT contributed to the enhancement of their final abstract. Ten (42%) participants believed it improved their abstract's chances of acceptance. Eighteen (75%) researchers reported being uncomfortable using ChatGPT version for submission.

Conclusion(s): While ChatGPT 4.0 does not produce abstracts of better quality then the one crafted by researchers, it serves as a valuable tool for researchers to enhance the quality of their own abstracts. The utilization of such tools is a potential strategy for researchers seeking to improve their abstracts.