Author name cluster

Mao Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

NeurIPS Conference 2025 Conference Paper

rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset

Yifei Liu
Li Lyna Zhang
Yi Zhu
Bingcheng Dong
Xudong Zhou
Ning Shang
Fan Yang
Cheng Li

Advancing code reasoning in large language models (LLMs) is fundamentally limited by the scarcity of high-difficulty datasets, especially those with verifiable input-output test cases necessary for rigorous solution validation at scale. We introduce rStar-Coder, which significantly improves LLM code reasoning capabilities by constructing a large-scale, verified dataset of 418K competition-level code problems, 580K long-reasoning solutions along with rich test cases of varying difficulty. This is achieved through three core contributions: (1) we curate competitive programming code problems and solutions to synthesize new, solvable problems; (2) we introduce a reliable input-output test case synthesis pipeline that decouples the generation into a three-step input generation method and a mutual verification mechanism for effective output labeling; (3) we augment problems with high-quality, test-case-verified long-reasoning solutions. Extensive experiments on Qwen models (1. 5B-14B) across various code reasoning benchmarks demonstrate the superiority of rStar-Coder dataset, achieving leading performance comparable to frontier reasoning LLMs with significantly smaller model sizes. On LiveCodeBench, rStar-Coder improves Qwen2. 5-7B from 17. 4% to an impressive 57. 3%, and Qwen2. 5-14B from 23. 3% to 62. 5%, surpassing o3-mini (low) by 3. 1%. On the more challenging USA Computing Olympiad, our 7B model achieves an average pass@1 accuracy of 16. 15%, outperforming the frontier-level QWQ-32B. rStar-Coder dataset is publicly available at https: //huggingface. co/datasets/microsoft/rStar-Coder.

PDF Details

NeurIPS Conference 2025 Conference Paper

SeerAttention: Self-distilled Attention Gating for Efficient Long-context Prefilling

Yizhao Gao
Zhichen Zeng
DaYou Du
Shijie Cao
Peiyuan Zhou
Jiaxing Qi
Junjie Lai
Hayden So

Attention is the cornerstone of modern Large Language Models (LLMs). Yet its quadratic complexity hinders efficiency and scalability, especially for long-context processing. A promising approach is to leverage sparsity in attention. However, existing sparsity-based solutions predominantly rely on predefined patterns or heuristics at the attention head level, struggling to adapt dynamically to different contexts efficiently. We propose SeerAttention, a simple yet effective attention mechanism that directly learns the block-level attention sparsity from the LLM itself. Inspired by the gating mechanism in Mixture of Experts (MoE), SeerAttention augments the conventional attention with a learnable gate that selectively activates important blocks within the attention map. Specifically, the gate first pools the query (Q) and key (K) tensors along the sequence dimension and processes them through learnable linear layers. The resulting matrices are then multiplied together to produce the gating scores, which are used to predict block-level attention sparsity. Combined with our block-sparse FlashAttention kernel, SeerAttention can achieve significant speedup on GPUs. When applied to pre-trained LLMs, SeerAttention only requires training the gate parameters in a lightweight self-distillation manner, allowing rapid convergence. Our evaluation results demonstrate that SeerAttention achieves better model accuracy and lower latency for long-context pre-filling compared to prior methods. Code is available at: https: //github. com/microsoft/SeerAttention.

PDF Details

NeurIPS Conference 2023 Conference Paper

Model-enhanced Vector Index

Hailin Zhang
Yujing Wang
Qi Chen
Ruiheng Chang
Ting Zhang
Ziming Miao
Yingyan Hou
Yang Ding

Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to support document updates. In this paper, we aim to enhance the vector index with end-to-end deep generative models, leveraging the differentiable advantages of deep retrieval models while maintaining desirable serving efficiency. We propose Model-enhanced Vector Index (MEVI), a differentiable model-enhanced index empowered by a twin-tower representation model. MEVI leverages a Residual Quantization (RQ) codebook to bridge the sequence-to-sequence deep retrieval and embedding-based models. To substantially reduce the inference time, instead of decoding the unique document ids in long sequential steps, we first generate some semantic virtual cluster ids of candidate documents in a small number of steps, and then leverage the well-adapted embedding vectors to further perform a fine-grained search for the relevant documents in the candidate virtual clusters. We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions.

PDF Details

NeurIPS Conference 2022 Conference Paper

A Neural Corpus Indexer for Document Retrieval

Yujing Wang
Yingyan Hou
Haonan Wang
Ziming Miao
Shibin Wu
Qi Chen
Yuqing Xia
Chengmin Chi

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21. 4% and +16. 8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

PDF Details

AAAI Conference 2021 Conference Paper

OpEvo: An Evolutionary Method for Tensor Operator Optimization

Xiaotian Gao
Wei Cui
Lintao Zhang
Mao Yang

Training and inference efficiency of deep neural networks highly rely on the performance of tensor operators on hardware platforms. Manually optimizing tensor operators has limitations in terms of supporting new operators or hardware platforms. Therefore, automatically optimizing device code configurations of tensor operators is getting increasingly attractive. However, current methods for tensor operator optimization usually suffer from poor sample-efficiency due to the combinatorial search space. In this work, we propose a novel evolutionary method, OpEvo, which efficiently explores the search spaces of tensor operators by introducing a topology-aware mutation operation based on q-random walk to leverage the topological structures over the search spaces. Our comprehensive experiment results show that compared with state-of-the-art (SOTA) methods OpEvo can find the best configuration with the lowest variance and least efforts in the number of trials and wall-clock time. All code of this work is available online.

PDF Details

NeurIPS Conference 2021 Conference Paper

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search

Qi Chen
Bing Zhao
Haidong Wang
Mingqin Li
ChuanJie Liu
Zengzhong Li
Mao Yang
Jingdong Wang

The in-memory algorithms for approximate nearest neighbor search (ANNS) have achieved great success for fast high-recall search, but are extremely expensive when handling very large scale database. Thus, there is an increasing request for the hybrid ANNS solutions with small memory and inexpensive solid-state drive (SSD). In this paper, we present a simple but efficient memory-disk hybrid indexing and search system, named SPANN, that follows the inverted index methodology. It stores the centroid points of the posting lists in the memory and the large posting lists in the disk. We guarantee both disk-access efficiency (low latency) and high recall by effectively reducing the disk-access number and retrieving high-quality posting lists. In the index-building stage, we adopt a hierarchical balanced clustering algorithm to balance the length of posting lists and augment the posting list by adding the points in the closure of the corresponding clusters. In the search stage, we use a query-aware scheme to dynamically prune the access of unnecessary posting lists. Experiment results demonstrate that SPANN is 2X faster than the state-of-the-art ANNS solution DiskANN to reach the same recall quality 90% with same memory cost in three billion-scale datasets. It can reach 90% recall@1 and recall@10 in just around one millisecond with only about 10% of original memory cost. Code is available at: https: //github. com/microsoft/SPTAG.

PDF Details

NeurIPS Conference 2021 Conference Paper

WRENCH: A Comprehensive Benchmark for Weak Supervision

Jieyu Zhang
Yue Yu
Yujing Wang
Yaming Yang
Mao Yang
Alexander Ratner

Recent Weak Supervision (WS) approaches have had widespread success in easing the bottleneck of labeling training data for machine learning by synthesizing labels from multiple potentially noisy supervision sources. However, proper measurement and analysis of these approaches remain a challenge. First, datasets used in existing works are often private and/or custom, limiting standardization. Second, WS datasets with the same name and base data often vary in terms of the labels and weak supervision sources used, a significant "hidden" source of evaluation variance. Finally, WS studies often diverge in terms of the evaluation protocol and ablations used. To address these problems, we introduce a benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches. It consists of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods. We use WRENCH to conduct extensive comparisons over more than 120 method variants to demonstrate its efficacy as a benchmark platform. The code is available at https: //github. com/JieyuZ2/wrench.

PDF Details

AAAI Conference 2020 Conference Paper

TextNAS: A Neural Architecture Search Space Tailored for Text Representation

Yujing Wang
Yaming Yang
Yiren Chen
Jing Bai
Ce Zhang
Guinan Su
Xiaoyu Kou
Yunhai Tong

Learning text representation is crucial for text classiﬁcation and other language related tasks. There are a diverse set of text representation networks in the literature, and how to ﬁnd the optimal one is a non-trivial problem. Recently, the emerging Neural Architecture Search (NAS) techniques have demonstrated good potential to solve the problem. Nevertheless, most of the existing works of NAS focus on the search algorithms and pay little attention to the search space. In this paper, we argue that the search space is also an important human prior to the success of NAS in different applications. Thus, we propose a novel search space tailored for text representation. Through automatic search, the discovered network architecture outperforms state-of-the-art models on various public datasets on text classiﬁcation and natural language inference tasks. Furthermore, some of the design principles found in the automatic network agree well with human intuition.

PDF Details