Arrow Research search

Author name cluster

Xiaodan Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 Conference Paper

SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling

  • Md Imbesat Hassan Rizvi
  • Xiaodan Zhu
  • Iryna Gurevych

Process or step-wise supervision has played a crucial role in advancing complex multi-step reasoning capabilities of Large Language Models (LLMs). However, efficient, high-quality automated process annotation remains a significant challenge. To address this, we introduce Single-Pass Annotation with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation by jointly aligning solution steps to reference solutions and determine its accuracy with explicit reasoning in single generation. We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP), showing consistent improvements in two applications: (1) training Process Reward Models (PRMs) for ranking and aggregating multiple generations, and (2) fine-tuning models via offline reinforcement learning for greedy decoding. On PROCESSBENCH, SPARE demonstrates data-efficient out-of-distribution generalization, using only ~16% of training samples compared to human-labeled and other synthetically trained baselines. Additionally, it achieves competitive performance with MCTS-based methods while offering 2.3x speedup in terms of total token count. Manual analysis reveals complementary precision-recall characteristics with MCTS approaches, suggesting potential for ensemble methods. These results establish SPARE as a practical and scalable solution for automatic process supervision in LLM reasoning.

AAAI Conference 2025 Conference Paper

Error Diversity Matters: An Error-Resistant Ensemble Method for Unsupervised Dependency Parsing

  • Behzad Shayegh
  • Hobie H.-B. Lee
  • Xiaodan Zhu
  • Jackie Chi Kit Cheung
  • Lili Mou

We address unsupervised dependency parsing by building an ensemble of diverse existing models through post hoc aggregation of their output dependency parse structures. We observe that these ensembles often suffer from low robustness against weak ensemble components due to error accumulation. To tackle this problem, we propose an efficient ensemble-selection approach that considers error diversity and avoids error accumulation. Results demonstrate that our approach outperforms each individual model as well as previous ensemble techniques. Additionally, our experiments show that the proposed ensemble-selection method significantly enhances the performance and robustness of our ensemble, surpassing previously proposed strategies, which have not accounted for error diversity.

NeurIPS Conference 2025 Conference Paper

On the Robustness of Verbal Confidence of LLMs in Adversarial Attacks

  • Stephen Obadinma
  • Xiaodan Zhu

Robust verbal confidence generated by large language models (LLMs) is crucial for the deployment of LLMs to help ensure transparency, trust, and safety in many applications, including those involving human-AI interactions. In this paper, we present the first comprehensive study on the robustness of verbal confidence under adversarial attacks. We introduce attack frameworks targeting verbal confidence scores through both perturbation and jailbreak-based methods, and demonstrate that these attacks can significantly impair verbal confidence estimates and lead to frequent answer changes. We examine a variety of prompting strategies, model sizes, and application domains, revealing that current verbal confidence is vulnerable and that commonly used defence techniques are largely ineffective or counterproductive. Our findings underscore the need to design robust mechanisms for confidence expression in LLMs, as even subtle semantic-preserving modifications can lead to misleading confidence in responses.

TMLR Journal 2024 Journal Article

Calibration Attacks: A Comprehensive Study of Adversarial Attacks on Model Confidence

  • Stephen Obadinma
  • Xiaodan Zhu
  • Hongyu Guo

In this work, we highlight and perform a comprehensive study on calibration attacks, a form of adversarial attacks that aim to trap victim models to be heavily miscalibrated without altering their predicted labels, hence endangering the trustworthiness of the models and follow-up decision making based on their confidence. We propose four typical forms of calibration attacks: underconfidence, overconfidence, maximum miscalibration, and random confidence attacks, conducted in both the black-box and white-box setups. We demonstrate that the attacks are highly effective on both convolutional and attention-based models: with a small number of queries, they seriously skew confidence without changing the predictive performance. Given the potential danger, we further investigate the effectiveness of a wide range of adversarial defence and recalibration methods, including our proposed defences specifically designed for calibration attacks to mitigate the harm. From the ECE and KS scores, we observe that there are still significant limitations in handling calibration attacks. To the best of our knowledge, this is the first dedicated study that provides a comprehensive investigation on calibration-focused attacks. We hope this study helps attract more attention to these types of attacks and hence hamper their potential serious damages. To this end, this work also provides detailed analyses to understand the characteristics of the attacks.

AAAI Conference 2022 Conference Paper

Improving Zero-Shot Phrase Grounding via Reasoning on External Knowledge and Spatial Relations

  • Zhan Shi
  • Yilin Shen
  • Hongxia Jin
  • Xiaodan Zhu

Phrase grounding is a multi-modal problem that localizes a particular noun phrase in an image referred to by a text query. In the challenging zero-shot phrase grounding setting, the existing state-of-the-art grounding models have limited capacity in handling the unseen phrases. Humans, however, can ground novel types of objects in images with little effort, significantly benefiting from reasoning with commonsense. In this paper, we design a novel phrase grounding architecture that builds multi-modal knowledge graphs using external knowledge and then performs graph reasoning and spatial relation reasoning to localize the referred nouns phrases. We perform extensive experiments on different zero-shot grounding splits sub-sampled from the Flickr30K Entity and Visual Genome dataset, demonstrating that the proposed framework is orthogonal to backbone image encoders and outperforms the baselines by 2∼3% in accuracy, resulting in a significant improvement under the standard evaluation metrics.

AAAI Conference 2022 Conference Paper

Interpretable Low-Resource Legal Decision Making

  • Rohan Bhambhoria
  • Hui Liu
  • Samuel Dahan
  • Xiaodan Zhu

Over the past several years, legal applications of deep learning have been on the rise. However, as with other high-stakes decision making areas, the requirement for interpretability is of crucial importance. Current models utilized by legal practitioners are more of the conventional machine learning type, wherein they are inherently interpretable, yet unable to harness the performance capabilities of data-driven deep learning models. In this work, we utilize deep learning models in the area of trademark law to shed light on the issue of likelihood of confusion between trademarks. Specifically, we introduce a model-agnostic interpretable intermediate layer, a technique which proves to be effective for legal documents. Furthermore, we utilize weakly supervised learning by means of a curriculum learning strategy, effectively demonstrating the improved performance of a deep learning model. This is in contrast to the conventional models which are only able to utilize the limited number of expensive manually-annotated samples by legal experts. Although the methods presented in this work tackles the task of risk of confusion for trademarks, it is straightforward to extend them to other fields of law, or more generally, to other similar high-stakes application scenarios.

AAAI Conference 2021 Conference Paper

Dynamic Hybrid Relation Exploration Network for Cross-Domain Context-Dependent Semantic Parsing

  • Binyuan Hui
  • Ruiying Geng
  • Qiyu Ren
  • Binhua Li
  • Yongbin Li
  • Jian Sun
  • Fei Huang
  • Luo Si

Semantic parsing has long been a fundamental problem in natural language processing. Recently, cross-domain contextdependent semantic parsing has become a new focus of research. Central to the problem is the challenge of leveraging contextual information of both natural language utterance and database schemas in the interaction history. In this paper, we present a dynamic graph framework that is capable of effectively modelling contextual utterances, tokens, database schemas, and their complicated interaction as the conversation proceeds. The framework employs a dynamic memory decay mechanism that incorporates inductive bias to integrate enriched contextual relation representation, which is further enhanced with a powerful reranking model. At the time of writing, we demonstrate that the proposed framework outperforms all existing models by large margins, achieving new state-of-the-art performance on two large-scale benchmarks, the SParC and CoSQL datasets. Specifically, the model attains a 55. 8% question-match and 30. 8% interaction-match accuracy on SParC, and a 46. 8% question-match and 17. 0% interaction-match accuracy on CoSQL.

IJCAI Conference 2020 Conference Paper

End-to-End Transition-Based Online Dialogue Disentanglement

  • Hui Liu
  • Zhan Shi
  • Jia-Chen Gu
  • Quan Liu
  • Si Wei
  • Xiaodan Zhu

Dialogue disentanglement aims to separate intermingled messages into detached sessions. The existing research focuses on two-step architectures, in which a model first retrieves the relationships between two messages and then divides the message stream into separate clusters. Almost all existing work puts significant efforts on selecting features for message-pair classification and clustering, while ignoring the semantic coherence within each session. In this paper, we introduce the first end-to- end transition-based model for online dialogue disentanglement. Our model captures the sequential information of each session as the online algorithm proceeds on processing a dialogue. The coherence in a session is hence modeled when messages are sequentially added into their best-matching sessions. Meanwhile, the research field still lacks data for studying end-to-end dialogue disentanglement, so we construct a large-scale dataset by extracting coherent dialogues from online movie scripts. We evaluate our model on both the dataset we developed and the publicly available Ubuntu IRC dataset [Kummerfeld et al. , 2019]. The results show that our model significantly outperforms the existing algorithms. Further experiments demonstrate that our model better captures the sequential semantics and obtains more coherent disentangled sessions.

AAAI Conference 2020 Conference Paper

Learning Cross-Modal Context Graph for Visual Grounding

  • Yongfei Liu
  • Bo Wan
  • Xiaodan Zhu
  • Xuming He

Visual grounding is a ubiquitous building block in many vision-language tasks and yet remains challenging due to large variations in visual and linguistic features of grounding entities, strong context effect and the resulting semantic ambiguities. Prior works typically focus on learning representations of individual phrases with limited context information. To address their limitations, this paper proposes a languageguided graph representation to capture the global context of grounding entities and their relations, and develop a crossmodal graph matching strategy for the multiple-phrase visual grounding task. In particular, we introduce a modular graph neural network to compute context-aware representations of phrases and object proposals respectively via message propagation, followed by a graph-based matching module to generate globally consistent localization of grounding phrases. We train the entire graph neural network jointly in a two-stage strategy and evaluate it on the Flickr30K Entities benchmark. Extensive experiments show that our method outperforms the prior state of the arts by a sizable margin, evidencing the efficacy of our grounding framework. Code is available at https: //github. com/youngfly11/LCMCG-PyTorch.

IJCAI Conference 2017 Conference Paper

Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems

  • Quan Liu
  • Hui Jiang
  • Andrew Evdokimov
  • Zhen-Hua Ling
  • Xiaodan Zhu
  • Si Wei
  • Yu Hu

This paper focuses on the investigations in Winograd Schema (WS), a challenging problem which has been proposed for measuring progress in commonsense reasoning. Due to the lack of commonsense knowledge and training data, very little work has been found on the WS problems in recent years. Actually, there is no shortcut to solve this problem except to collect more commonsense knowledge and design suitable models. Therefore, this paper addresses a set of WS problems by proposing a knowledge acquisition method and a general neural association model. To avoid the sparseness issue, the knowledge we aim to collect is the cause-effect relationships between thousands of commonly used words. The knowledge acquisition method supports us to extract hundreds of thousands of cause-effect pairs from large text corpus automatically. Meanwhile, a neural association model (NAM) is proposed to encode the association relationships between any two discrete events. Based on the extracted knowledge and the NAM models, in this paper, we successfully build a system for solving WS problems from scratch and achieve 70. 0% accuracy. Most importantly, this paper provides a flexible framework to solve WS problems based on event association and neural network methods.

IJCAI Conference 2016 Conference Paper

Distraction-Based Neural Networks for Modeling Document

  • Qian Chen
  • Xiaodan Zhu
  • ZhenHua Ling
  • Si Wei
  • Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to help model larger spans of text, e. g. , documents, is intriguing, and further investigation would still be desirable. This paper aims to enhance neural network models for such a purpose. A typical problem of document-level modeling is automatic summarization, which aims to model documents in order to generate summaries. In this paper, we propose neural models to train computers not just to pay attention to specific regions and content of input documents with attention models, but also distract them to traverse between different content of a document so as to better grasp the overall meaning for summarization. Without engineering any features, we train the models on two large datasets. The models achieve the state-of-the-art performance, and they significantly benefit from the distraction modeling, particularly when input documents are long.