Author name cluster

Hao-Ran Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

NeurIPS Conference 2025 Conference Paper

PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

Yiming Wang
Pei Zhang
Jialong Tang
Hao-Ran Wei
Baosong Yang
Rui Wang
Chenshu Sun
Feitong Sun

In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced LLMs and find that even Qwen-3-235B-A22B-Thinking and Gemini-2. 5-pro, achieve only 54. 6 and 52. 2 benchmark scores, with about 40% accuracy under the highest level. From a language perspective, our benchmark reveals several key challenges of LLMs in multilingual reasoning: (1) Reasoning performance varies widely across languages for current LLMs; (2) Input-output language consistency is low in reasoning LLMs and may be correlated with performance; (3) The thinking length differs significantly by language for current LLMs. Additionally, we demonstrate that controlling the output language in the instructions has the potential to affect reasoning performance, especially for some low-resource languages, suggesting a promising direction for improving multilingual capabilities in LLMs.

PDF Details

AAAI Conference 2020 Conference Paper

Generating Diverse Translation by Manipulating Multi-Head Attention

Zewei Sun
Shujian Huang
Hao-Ran Wei
Xin-yu Dai
Jiajun Chen

Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-theart results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the ﬁnal decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a signiﬁcant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

PDF Details

NeurIPS Conference 2020 Conference Paper

Incorporating BERT into Parallel Sequence Decoding with Adapters

Junliang Guo
Zhirui Zhang
Linli Xu
Hao-Ran Wei
Boxing Chen
Enhong Chen

While large scale pre-trained language models such as BERT have achieved great success on various natural language understanding tasks, how to efficiently and effectively incorporate them into sequence-to-sequence models and the corresponding text generation tasks remains a non-trivial problem. In this paper, we propose to address this problem by taking two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset. In this way, we obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models, while bypassing the catastrophic forgetting problem. Each component in the framework can be considered as a plug-in unit, making the framework flexible and task agnostic. Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT, and can be adapted to traditional autoregressive decoding easily. We conduct extensive experiments on neural machine translation tasks where the proposed method consistently outperforms autoregressive baselines while reducing the inference latency by half, and achieves $36. 49$/$33. 57$ BLEU scores on IWSLT14 German-English/WMT14 German-English translation. When adapted to autoregressive decoding, the proposed method achieves $30. 60$/$43. 56$ BLEU scores on WMT14 English-German/English-French translation, on par with the state-of-the-art baseline models.

PDF Details