Arrow Research search

Author name cluster

Lei Sha

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

Large Language Models Struggle with Unreasonability in Math Problems

  • Jingyuan Ma
  • Damai Dai
  • Zihang Yuan
  • Rui Li
  • Weilin Luo
  • Bin Wang
  • Qun Liu
  • Lei Sha

Large Language Models (LLMs) have shown remarkable success on a wide range of math and reasoning benchmarks. However, we observe that they often struggle when faced with unreasonable math problems. Instead of recognizing these issues, models frequently proceed as if the problem is well-posed, producing incorrect answers or falling into overthinking and verbose self-correction. To systematically investigate this overlooked vulnerability, we propose the Unreasonable Math Problems (UMP) benchmark, designed to evaluate LLMs' ability to detect and respond to unreasonable math problem statements. Based on extensive experiments covering 19 LLMs, we find that even state-of-the-art general models like GPT-4o struggle on UMP. While reasoning models such as DeepSeek-R1 demonstrate a higher sensitivity to unreasonable inputs, this often comes at the cost of generating overly long and meaningless responses that fail to converge. We further find that prompting and fine-tuning enhance the detection of unreasonable inputs, with minor and acceptable trade-offs, making them practical solutions in this challenging setting.

ICLR Conference 2025 Conference Paper

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models

  • Bofei Gao
  • Feifan Song 0001
  • Zhe Yang 0013
  • Zefan Cai
  • Yibo Miao
  • Qingxiu Dong
  • Lei Li 0039
  • Chenghao Ma

Recent advancements in large language models (LLMs) have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To bridge this gap, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems with rigorous human annotation. These problems are meticulously categorized into over 33 sub-domains and span more than 10 distinct difficulty levels, enabling a holistic assessment of model performance in Olympiad-mathematical reasoning. Furthermore, we conducted an in-depth analysis based on this benchmark. Our experimental results show that even the most advanced models, OpenAI o1-mini and OpenAI o1-preview, struggle with highly challenging Olympiad-level problems, with 60.54% and 52.55% accuracy, highlighting significant challenges in Olympiad-level mathematical reasoning.

ICLR Conference 2024 Conference Paper

A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks

  • Tommaso Salvatori
  • Yuhang Song 0001
  • Yordan Yordanov
  • Beren Millidge
  • Lei Sha
  • Cornelius Emde
  • Zhenghua Xu 0001
  • Rafal Bogacz

Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. Training such models, however, is quite inefficient and unstable. In this work, we show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one, and has theoretical guarantees in terms of convergence. The proposed algorithm, that we call incremental predictive coding (iPC) is also more biologically plausible than the original one, as it it fully automatic. In an extensive set of experiments, we show that iPC constantly performs better than the original formulation on a large number of benchmarks for image classification, as well as for the training of both conditional and masked language models, in terms of test accuracy, efficiency, and convergence with respect to a large set of hyperparameters.

TMLR Journal 2024 Journal Article

Correcting Flaws in Common Disentanglement Metrics

  • Louis Mahon
  • Lei Sha
  • Thomas Lukasiewicz

Disentangled representations are those in which distinct features, such as size or shape, are represented by distinct neurons. Quantifying the extent to which a given representation is disentangled is not straightforward; multiple metrics have been proposed. In this paper, we identify two failings of existing metrics, which mean they can assign a high score to a model which is still entangled, and we propose two new metrics, which redress these problems. First, we use hypothetical toy examples to demonstrate the failure modes we identify for existing metrics. Then, we show that similar situations occur in practice. Finally, we validate our metrics on the downstream task of compositional generalization. We measure the performance of six existing disentanglement models on this downstream compositional generalization task, and show that performance is (a) generally quite poor, (b) correlated, to varying degrees, with most disentanglement metrics, and (c) most strongly correlated with our newly proposed metrics. Anonymous code to reproduce our results is available at https://github.com/anon296/anon.

NeurIPS Conference 2021 Conference Paper

Associative Memories via Predictive Coding

  • Tommaso Salvatori
  • Yuhang Song
  • Yujian Hong
  • Lei Sha
  • Simon Frieder
  • Zhenghua Xu
  • Rafal Bogacz
  • Thomas Lukasiewicz

Associative memories in the brain receive and store patterns of activity registered by the sensory neurons, and are able to retrieve them when necessary. Due to their importance in human intelligence, computational models of associative memories have been developed for several decades now. In this paper, we present a novel neural model for realizing associative memories, which is based on a hierarchical generative network that receives external stimuli via sensory neurons. It is trained using predictive coding, an error-based learning algorithm inspired by information processing in the cortex. To test the model's capabilities, we perform multiple retrieval experiments from both corrupted and incomplete data points. In an extensive comparison, we show that this new model outperforms in retrieval accuracy and robustness popular associative memory models, such as autoencoders trained via backpropagation, and modern Hopfield networks. In particular, in completing partial data points, our model achieves remarkable results on natural image datasets, such as ImageNet, with a surprisingly high accuracy, even when only a tiny fraction of pixels of the original images is presented. Our model provides a plausible framework to study learning and retrieval of memories in the brain, as it closely mimics the behavior of the hippocampus as a memory index and generative model.

AAAI Conference 2021 Conference Paper

Learning from the Best: Rationalizing Predictions by Adversarial Information Calibration

  • Lei Sha
  • Oana-Maria Camburu
  • Thomas Lukasiewicz

Explaining the predictions of AI models is paramount in safety-critical applications, such as in legal or medical domains. One form of explanation for a prediction is an extractive rationale, i. e. , a subset of features of an instance that lead the model to give its prediction on the instance. Previous works on generating extractive rationales usually employ a two-phase model: a selector that selects the most important features (i. e. , the rationale) followed by a predictor that makes the prediction based exclusively on the selected features. One disadvantage of these works is that the main signal for learning to select features comes from the comparison of the answers given by the predictor and the ground-truth answers. In this work, we propose to squeeze more information from the predictor via an information calibration method. More precisely, we train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction. The first model is used as a guide to the second model. We use an adversarial-based technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features. In addition, for natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales. Experimental results on a sentiment analysis task as well as on three tasks from the legal domain show the effectiveness of our approach to rationale extraction.

AAAI Conference 2021 Conference Paper

Multi-type Disentanglement without Adversarial Training

  • Lei Sha
  • Thomas Lukasiewicz

Controlling the style of natural language by disentangling the latent space is an important step towards interpretable machine learning. After the latent space is disentangled, the style of a sentence can be transformed by tuning the style representation without affecting other features of the sentence. Previous works usually use adversarial training to guarantee that disentangled vectors do not affect each other. However, adversarial methods are difficult to train. Especially when there are multiple features (e. g. , sentiment, or tense, which we call style types in this paper), each feature requires a separate discriminator for extracting a disentangled style vector corresponding to that feature. In this paper, we propose a unified distribution-controlling method, which provides each specific style value (the value of style types, e. g. , positive sentiment, or past tense) with a unique representation. This method contributes a solid theoretical basis to avoid adversarial training in multi-type disentanglement. We also propose multiple loss functions to achieve a style-content disentanglement as well as a disentanglement among multiple style types. In addition, we observe that if two different style types always have some specific style values that occur together in the dataset, they will affect each other when transferring the style values. We call this phenomenon training bias, and we propose a loss function to alleviate such training bias while disentangling multiple types. We conduct experiments on two datasets (Yelp service reviews and Amazon product reviews) to evaluate the style-disentangling effect and the unsupervised styletransfer performance on two style types: sentiment and tense. The experimental results show the effectiveness of our model.

AAAI Conference 2018 Conference Paper

A Multi-View Fusion Neural Network for Answer Selection

  • Lei Sha
  • Xiaodong Zhang
  • Feng Qian
  • Baobao Chang
  • Zhifang Sui

Community question answering aims at choosing the most appropriate answer for a given question, which is important in many NLP applications. Previous neural network-based methods consider several different aspects of information through calculating attentions. These different kinds of attentions are always simply summed up and can be seen as a “single view”, causing severe information loss. To overcome this problem, we propose a Multi-View Fusion Neural Network, where each attention component generates a “view” of the QA pair and a fusion RNN integrates the generated views to form a more holistic representation. In this fusion RNN method, a filter gate collects important information of input and directly adds it to the output, which borrows the idea of residual networks. Experimental results on the WikiQA and SemEval-2016 CQA datasets demonstrate that our proposed model outperforms the state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction

  • Lei Sha
  • Feng Qian
  • Baobao Chang
  • Zhifang Sui

Event extraction plays an important role in natural language processing (NLP) applications including question answering and information retrieval. Traditional event extraction relies heavily on lexical and syntactic features, which require intensive human engineering and may not generalize to different datasets. Deep neural networks, on the other hand, are able to automatically learn underlying features, but existing networks do not make full use of syntactic relations. In this paper, we propose a novel dependency bridge recurrent neural network (dbRNN) for event extraction. We build our model upon a recurrent neural network, but enhance it with dependency bridges, which carry syntactically related information when modeling each word. We illustrates that simultaneously applying tree structure and sequence structure in RNN brings much better performance than only uses sequential RNN. In addition, we use a tensor layer to simultaneously capture the various types of latent interaction between candidate arguments as well as identify/classify all arguments of an event. Experiments show that our approach achieves competitive results compared with previous work.

AAAI Conference 2018 Conference Paper

Order-Planning Neural Text Generation From Structured Data

  • Lei Sha
  • Lili Mou
  • Tianyu Liu
  • Pascal Poupart
  • Sujian Li
  • Baobao Chang
  • Zhifang Sui

Generating texts from structured data (e. g. , a table) is important for various natural language processing tasks such as question answering and dialog systems. In recent studies, researchers use neural language models and encoder-decoder frameworks for table-to-text generation. However, these neural network-based approaches typically do not model the order of content during text generation. When a human writes a summary based on a given table, he or she would probably consider the content order before wording. In this paper, we propose an order-planning text generation model, where order information is explicitly captured by link-based attention. Then a self-adaptive gate combines the link-based attention with traditional content-based attention. We conducted experiments on the WIKIBIO dataset and achieve higher performance than previous methods in terms of BLEU, ROUGE, and NIST scores; we also performed ablation tests to analyze each component of our model. 1

AAAI Conference 2018 Conference Paper

Table-to-Text Generation by Structure-Aware Seq2seq Learning

  • Tianyu Liu
  • Kexiang Wang
  • Lei Sha
  • Baobao Chang
  • Zhifang Sui

Table-to-text generation aims to generate a description for a factual table which can be viewed as a set of field-value records. To encode both the content and the structure of a table, we propose a novel structure-aware seq2seq architecture which consists of field-gating encoder and description generator with dual attention. In the encoding phase, we update the cell memory of the LSTM unit by a field gate and its corresponding field value in order to incorporate field information into table representation. In the decoding phase, dual attention mechanism which contains word level attention and field level attention is proposed to model the semantic relevance between the generated description and the table. We conduct experiments on the WIKIBIO dataset which contains over 700k biographies and corresponding infoboxes from Wikipedia. The attention visualizations and case studies show that our model is capable of generating coherent and informative descriptions based on the comprehensive understanding of both the content and the structure of a table. Automatic evaluations also show our model outperforms the baselines by a great margin. Code for this work is available on https: //github. com/tyliupku/wiki2bio.

AAAI Conference 2017 Conference Paper

Attentive Interactive Neural Networks for Answer Selection in Community Question Answering

  • Xiaodong Zhang
  • Sujian Li
  • Lei Sha
  • Houfeng Wang

Answer selection plays a key role in community question answering (CQA). Previous research on answer selection usually ignores the problems of redundancy and noise prevalent in CQA. In this paper, we propose to treat different text segments differently and design a novel attentive interactive neural network (AI-NN) to focus on those text segments useful to answer selection. The representations of question and answer are first learned by convolutional neural networks (CNNs) or other neural network architectures. Then AI-NN learns interactions of each paired segments of two texts. Row-wise and column-wise pooling are used afterwards to collect the interactions. We adopt attention mechanism to measure the importance of each segment and combine the interactions to obtain fixed-length representations for question and answer. Experimental results on CQA dataset in SemEval-2016 demonstrate that AI-NN outperforms state-of-the-art method.