EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Yilin Jiang; Mingzi Zhang; Xuanyu Yin; Sheng Jin; Suyu Lu; Zuocan Ying; Zengyi Yu; Xiangjie Kong

doi:10.1609/aaai.v40i37.40399

Back to AAAI

AAAI 2026

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Conference Paper AAAI Technical Track on Natural Language Processing II Artificial Intelligence

PDF Details DOI

Abstract

Large Language Models for Simulating Professions (SP-LLMs), particularly as teachers, are pivotal for personalized education. However, ensuring their professional competence and ethical safety remains a major challenge, as existing benchmarks fail to measure role-playing fidelity or address the unique teaching harms inherent in educational scenarios. To address this gap, we propose EduGuardBench, a dual-component benchmark that evaluates professional fidelity through the Role-playing Fidelity Score (RFS) and diagnoses harms specific to the teaching profession. It also probes safety vulnerabilities using persona-based adversarial prompts targeting both general harms and academic misconduct, with metrics such as Attack Success Rate (ASR) and a three-tier Refusal Quality assessment. Extensive experiments on 14 leading models reveal a stark polarization in performance. While reasoning-oriented models generally demonstrate higher fidelity, incompetence remains the dominant failure mode across most models. Adversarial testing uncovered a counterintuitive scaling paradox, where mid-sized models appear more vulnerable, challenging monotonic safety assumptions. Notably, we identify an Educational Transformation Effect, where the safest models convert harmful requests into teachable moments through ideal educational refusals. This ability is strongly negatively correlated with ASR, revealing a new dimension of advanced AI safety. EduGuardBench thus provides a reproducible framework for holistic assessment of professional, ethical, and pedagogical alignment, uncovering dynamics critical to deploying trustworthy AI in education.

EduGuardBench: A Holistic Benchmark for Evaluating the Pedagogical Fidelity and Adversarial Safety of LLMs as Simulated Teachers

Abstract

Authors

Keywords

Context