Human vs AI: Evaluating Large Language Models (LLMs) in the Grading of Scientific Inquiry Assessments (83651)

Session Information: Design, Implementation & Assessment of Innovative Technologies in Education
Session Chair: Ivan Cherh Chiet Low

Wednesday, 27 November 2024 12:15
Session: Session 2
Room: Room 603 (6F)
Presentation Type: Oral Presentation

All presentation times are UTC + 9 (Asia/Tokyo)

The advent of AI, particularly Large Language Models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to evaluate the accuracy and reliability of these LLMs against human graders in an interdisciplinary course on scientific inquiry. Human graders and three LLMs, GPT-3.5, GPT-4, and Gemini, were tasked with scoring submitted student assignments according to a set of rubrics aligned with various cognitive domains, namely ‘Understand’, ‘Analyse’, ‘Evaluate’ from the revised Bloom’s taxonomy, and ‘Scientific inquiry competency’. Our findings revealed that whilst LLMs demonstrated some level of competency, they do not yet meet the assessment standards of human graders. Specifically, inter-rater reliability (percentage agreement and correlation analysis) between human graders were superior compared to between two grading rounds for each LLM respectively. Furthermore, concordance and correlation between human and LLM graders were moderate to mostly poor in terms of overall scores and across the pre-specified cognitive domains. The results suggest a future where AI could complement human expertise in educational assessment, but underscores the importance of adaptive learning by educators and continuous improvement in current AI technologies to fully realize this potential.

Authors:
Ivan Cherh Chiet Low, National University of Singapore, Singapore
Swapna Haresh Teckwani, National University of Singapore, Singapore
Amanda Huee-Ping Wong, National University of Singapore, Singapore
Nathasha Luke, National University of Singapore, Singapore


About the Presenter(s)
Dr. Ivan Low is a Senior Lecturer and the Director for Continuous Education Training (CET) in the Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore (NUS).

Connect on Linkedin
https://www.linkedin.com/in/ivan-low-8a599878

See this presentation on the full scheduleWednesday Schedule


A Note to Presenters

To enhance academic profiles and showcase research, we encourage all presenters and co-presenters to include links to their public LinkedIn, ResearchGate profile, and research websites. Presenters may update their bio for their presentation by completing the form linked below by October 22, 2024.
- Presenter Information Update Form
Submitted changes will be reflected on November 01, 2024

Additionally, presenters should also update their IAFOR account details if there have been any changes to affiliations or biographies.
- https://submit.iafor.org/my-account/edit-account


Conference Comments & Feedback

Place a comment using your LinkedIn profile

Comments

Share on activity feed

Powered by WP LinkPress

Share this Presentation

Posted by Clive Staples Lewis

Last updated: 2023-02-23 23:45:00