ECAI Conference 2025 Conference Paper
DiFair-LLM: Evaluating Fairness Disparities in LLMs Toward Demographic Groups
- Nurit Cohen-Inger
- Roei Zaady
- Adir Solomon
- Lior Rokach
- Bracha Shapira
Large Language Models (LLMs) are increasingly integrated into real-world applications, making equitable treatment of all demographic groups a critical concern. Existing fairness evaluations often rely on binary, template-based tests, which overlook subtle disparities in open-ended responses. We present DiFair-LLM, a model-agnostic framework for detecting and quantifying fairness disparities - any unequal treatment that benefits or disadvantages a demographic group. DiFair-LLM uses open-ended, group-specific and neutral prompts, measures semantic distances between groups’ responses, applies non-parametric statistical tests, and ranks groups by deviation from a neutral baseline. Evaluations across eight state-of-the-art LLMs and multiple demographic attributes reveal minimal disparities for gender but significant differences for age, especially older adults, and ethnicity, with the largest gaps affecting certain non-Caucasian groups. By mapping nuanced patterns of differential treatment rather than flagging only overt bias, DiFair-LLM offers a practical, reproducible approach for auditing fairness and guiding more inclusive LLM deployments.