Automatic Short Answer Grading (ASAG) has become a vital research area in educational technology, aiming to
provide scalable, efficient, and fair evaluation of student responses. Early studies using machine learning demonstrated the
feasibility of reducing grading workload through handcrafted features and statistical classifiers. However, such approaches
struggled with semantic variability and domain adaptation. The introduction of deep learning enabled richer semantic
representation, improving grading accuracy and robustness across tasks. In recent years, transformer based models have
become the dominant paradigm. BERT and its variants, including Sentence-BERT and hybrid extensions, have consistently
outperformed traditional neural networks by capturing deep contextual embeddings and semantic similarity more effectively.
Comparative studies further confirm BERT‘s superiority over earlier embedding based and RNN based approaches, while
also revealing challenges related to interpretability and domain transfer. More recent explorations into large language models,
such as GPT and T5, demonstrate strong zero-shot and few-shot capabilities, extending the potential of ASAG but raising
concerns around transparency, fairness, and multilingual support.This review synthesizes findings across two decades of
research, emphasizing the evolution from feature-driven methods to BERT centered deep learning approaches and recent
advances with LLMs. Open challenges remain in dataset scarcity, interpretability, multilingual grading, and trustworthy
deployment in real world classrooms. The paper concludes by outlining future directions that integrate hybrid deep learning
LLM approaches, benchmark development, and ethical frameworks to advance reliable and equitable ASAG.
[1] Krithika, R., and Jayasree Narayanan. "Learning to grade short answers using machine learning techniques."
Proceedings of the Third International Symposium on Women in Computing and Informatics. 2015.
[2] Gaddipati, Sasi Kiran, Deebul Nair, and Paul G. Plöger. "Comparative evaluation of pretrained transfer learning
models on automatic short answer grading." arXiv preprint arXiv:2009.01303 (2020).
[3] Ndukwe, Ifeanyi G., et al. "Automatic grading system using sentence-BERT network." International Conference on
Artificial Intelligence in Education. Cham: Springer, 2020.
[4] Camus, Leon, and Anna Filighera. "Investigating transformers for automatic short answer grading." International
Conference on Artificial Intelligence in Education. Cham: Springer, 2020.
[5] Ahmed, Abbirah, Arash Joorabchi, and Martin J. Hayes. "On Deep Learning Approaches to Automated
Assessment: Strategies for Short Answer Grading." CSEDU (2) (2022): 85-94.
[6] Garg, Jai, et al. "Domain-specific hybrid bert based system for automatic short answer grading." 2022 2nd
International Conference on Intelligent Technologies (CONIT). IEEE, 2022.
[7] Zhu, Xinhua, Han Wu, and Lanfang Zhang. "Automatic short-answer grading via BERT-based deep neural
networks." IEEE Transactions on Learning Technologies 15, no. 3 (2022): 364-375.
[8] Haller, Stefan, et al. "Survey on automated short answer grading with deep learning: from word embeddings to
transformers." arXiv preprint arXiv:2204.03503 (2022).
[9] Del Gobbo, Emiliano, Alfonso Guarino, Barbara Cafarelli, and Luca Grilli. "GradeAid: a framework for automatic
short answers grading in educational contexts—design, implementation and evaluation." Knowledge and
Information Systems 65, no. 10 (2023): 4295-4334.
[10]
Schneider, Johannes, Robin Richner, and Micha Riser. "Towards trustworthy autograding of short, multi
lingual, multi-type answers." International Journal of Artificial Intelligence in Education 33, no. 1 (2023): 88-118.
[11]
Weegar, Rebecka, and Peter Idestam-Almquist. "Reducing workload in short answer grading using machine
learning." International Journal of Artificial Intelligence in Education 34, no. 2 (2024): 247-273.
[12]
Mardini, G., Ivan D., et al. "A deep-learning-based grading system (ASAG) for reading comprehension
assessment by using aphorisms as open-answer-questions." Education and Information Technologies 29.4 (2024):
4565-4590.
[13]
Kaya, Mustafa, and Ilyas Cicekli. "A hybrid approach for automated short answer grading." IEEE Access 12
(2024): 96332-96341.
[14]
Kortemeyer, Gerd. "Performance of the pre-trained large language model GPT-4 on automated short answer
grading." Discover Artificial Intelligence 4, no. 1 (2024): 47.
[15]
Zaki, Muhammad Zayyanu. "Revolutionising translation technology: A comparative study of variant
transformer models–BERT, GPT and T5." Computer Science and Engineering–An International Journal 14.3
(2024): 15-27.
[16]
Chaudhari, Rupal, and Manish Patel. "Deep Learning in Automated Short Answer Grading: A Comprehensive
Review." ITM Web of Conferences. Vol. 65. EDP Sciences, 2024.
[17]
Chen, Xieling, et al. "Automatic Classification of Online Learner Reviews Via Fine-Tuned BERTs."
International Review of Research in Open and Distributed Learning 26.1 (2025): 57-79.
[18]
Jung, Ji Yoon, Lillian Tyack, and Matthias von Davier. "Combining machine translation and automated
scoring in international large-scale assessments." Large-scale Assessments in Education 12, no. 1 (2024): 10.
[19]
Jing, Shumin, O. C. Santos, J. G. Boticario, C. Romero, M. Pechenizkiy, and A. Merceron. "Automatic
Grading of Short Answers for MOOC via Semi-supervised Document Clustering." In EDM, pp. 554-555. 2015.
[20]
Zhang, Yuan, Rajat Shah, and Min Chi. "Deep Learning+ Student Modeling+ Clustering: A Recipe for
Effective Automatic Short Answer Grading." International Educational Data Mining Society (2016).
[21]
Jiang, Lan, and Nigel Bosch. "Short answer scoring with GPT-4." In Proceedings of the Eleventh ACM
Conference on Learning@ Scale, pp. 438-442. 2024.
[22]
Chamieh, Imran, Torsten Zesch, and Klaus Giebermann. "Llms in short answer scoring: Limitations and
promise of zero-shot and few-shot approaches." In Proceedings of the 19th workshop on innovative use of nlp for
building educational applications (bea 2024), pp. 309-315. 2024.
[23]
Tulu, Cagatay Neftali, Ozge Ozkaya, and Umut Orhan. "Automatic short answer grading with SemSpace sense
vectors and MaLSTM." IEEE Access 9 (2021): 19270-19280.
[24]
Balaha, Hossam Magdy, and Mahmoud M. Saafan. "Automatic exam correction framework (aecf) for the
mcqs, essays, and equations matching." IEEE Access 9 (2021): 32368-32389.
[25]
Sychev, Oleg, Anton Anikin, and Artem Prokudin. "Automatic grading and hinting in open-ended text
questions." Cognitive Systems Research 59 (2020): 264-272.
[26]
Bennouar, Djamal. "An automatic grading system based on dynamic corpora." Int. Arab J. Inf. Technol. 14, no.
4A (2017): 552-564.
[27]
Liu, Tianyi, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, and Mrinmaya Sachan.
"AI-assisted automated short answer grading of handwritten university level mathematics exams." arXiv preprint
arXiv:2408.11728 (2024).
[28]
Bernard, Jason, Ranil Sonnadara, Anthony N. Saraco, Josh P. Mitchell, Alex B. Bak, Ilana Bayer, and Bruce C.
Wainman. "Automated grading of anatomical objective structured practical examinations using decision trees: An
artificial intelligence approach." Anatomical Sciences Education 17, no. 5 (2024): 967-978.
[29]
Dadu, Niharika, Harsh Vardhan Singh, and Romi Banerjee. "Grade Guard: A Smart System for Short Answer
Automated Grading." arXiv preprint arXiv:2504.01253 (2025).
[30]
Meyer, Gérôme, Philip Breuer, and Jonathan Fürst. "Asag2024: A combined benchmark for short answer
grading." In Proceedings of the 2024 on ACM Virtual Global Computing Education Conference V. 2, pp. 322-323.
2024.