Author name cluster

Arka Dutta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2026 Conference Paper

How Can You Tell if Your Large Language Model Could Be a Closet Antisemite? An Explainability-Based Audit Framework for Implicit Bias

Arka Dutta
Reza Fayyazi
Shanchieh Yang
Ashiqur R. KhudaBukhsh

Auditing large language models (LLMs) for biases is an ongoing and dynamic process, resembling a proverbial cat-and-mouse game. As researchers identify new vulnerabilities in LLMs, guardrails are updated to address them, prompting the need for innovative approaches to audit the increasingly fortified LLMs for biases. This paper makes three contributions. First, it introduces a scalable, explainable framework to measure biases against various identity groups across multiple open large language models. Second, it conducts a bias audit considering five well-known open LLMs and demonstrates their bias inclinations towards several historically disadvantaged groups. Our audit reveals disturbing antisemitic, Islamophobic, and xenophobic biases present in several well-known LLMs. Finally, we release a dataset of 1,000 probes curated under the supervision of an expert social scientist that can facilitate similar audits.

PDF Details DOI

AAAI Conference 2025 Short Paper

All You Need Is S P A C E: When Jailbreaking Meets Bias Audit and Reveals What Lies Beneath the Guardrails (Student Abstract)

Arka Dutta
Aman Priyanshu
Ashiqur R. KhudaBukhsh

This paper makes a novel combination of a recently proposed bias audit framework and a recently proposed jailbreaking technique for Llama3. On an audit comprising several disadvantaged groups, our experiments reveal that a jailbroken Llama3 exhibits worrisome antisemitism, racism, misogyny, and homophobia (to list a few) much akin to a broad suite of LLMs that were susceptible to similar biases.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Towards a Bipartisan Understanding of Peace and Vicarious Interactions

Arka Dutta
Syed Mohammad Sualeh Ali
Usman Naseem
Ashiqur R. KhudaBukhsh

Human input plays a critical role in modern AI systems. As machines take on increasingly nuanced tasks, it becomes essential for the community to embrace subjectivity and diverse perspectives. However, research on sensitive topics often fails to incorporate diverse and balanced perspectives. This paper makes a key contribution to participatory AI design in the context of conflicts between nuclear adversaries (India and Pakistan); where disagreement between stakeholders is anticipated. The paper explores the notion of hope speech detection -- detecting de-escalating content in the context of nuclear adversaries on the brink of war -- through the lens of participatory AI design and vicarious interactions. We release a dataset of 10, 081 social web posts annotated by raters from India and Pakistan and examine the bipartisan nature of the language of de-escalation. Our study reveals that vicarious perspectives can be useful for modeling out-group preferences.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny

Arka Dutta
Adel Khorramrouz
Sujan Dutta
Ashiqur R. KhudaBukhsh

This paper makes three contributions. First, it presents a generalizable, novel framework dubbed toxicity rabbit hole that iteratively elicits toxic content from a wide suite of large language models. Spanning a set of 1, 266 identity groups, we first conduct a bias audit of PaLM 2 guardrails presenting key insights. Next, we report generalizability across several other models. Through the elicited toxic content, we present a broad analysis with a key emphasis on racism, antisemitism, misogyny, Islamophobia, homophobia, and transphobia. We release a massive dataset of machine-generated toxic content with a view toward safety for all. Finally, driven by concrete examples, we discuss potential ramifications.

PDF Details DOI