Author name cluster

Qin Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Ask a Strong LLM Judge when Your Reward Model is Uncertain

Zhenghao Xu
Qin Lu
Qingru Zhang
Liang Qiu
Ilgee Hong
Changlong Yu
Wenlin Yao
Yao Liu

Reward model (RM) plays a pivotal role in reinforcement learning with human feedback (RLHF) for aligning large language models (LLMs). However, classical RMs trained on human preferences are vulnerable to reward hacking and generalize poorly to out-of-distribution (OOD) inputs. By contrast, strong LLM judges equipped with reasoning capabilities demonstrate superior generalization, even without additional training, but incur significantly higher inference costs, limiting their applicability in online RLHF. In this work, we propose an uncertainty-based routing framework that efficiently complements a fast RM with a strong but costly LLM judge. Our approach formulates advantage estimation in policy gradient (PG) methods as pairwise preference classification, enabling principled uncertainty quantification to guide routing. Uncertain pairs are forwarded to the LLM judge, while confident ones are evaluated by the RM. Experiments on RM benchmarks demonstrate that our uncertainty-based routing strategy significantly outperforms random judge calling at the same cost, and downstream alignment results showcase its effectiveness in improving online RLHF.

PDF Details

NeurIPS Conference 2025 Conference Paper

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
Haoming Jiang
Qingru Zhang
Qin Lu

Reinforcement learning from human feedback (RLHF) has become a powerful post-training paradigm for aligning large language models with human preferences. A core challenge in RLHF is constructing accurate reward signals, where the conventional Bradley-Terry reward models (BT RMs) often suffer from sensitivity to data size and coverage, as well as vulnerability to reward hacking. Generative reward models (GenRMs) offer a more robust alternative by generating chain-of-thought (CoT) rationales followed by a final verdict. However, existing GenRMs rely on shallow, vertically scaled reasoning, limiting their capacity to handle nuanced or complex tasks. Moreover, their pairwise preference outputs are incompatible with standard RLHF algorithms that require pointwise reward signals. In this work, we introduce Think-RM, a training framework that enables long-horizon reasoning in GenRMs by modeling an internal thinking process. Rather than producing structured, externally provided rationales, Think-RM generates flexible, self-guided reasoning traces that support advanced capabilities such as self-reflection, hypothetical reasoning, and divergent reasoning. To elicit these reasoning abilities, we first warm-up the models by supervised fine-tuning (SFT) over long CoT data. We then further improve the model's long-horizon abilities by rule-based reinforcement learning (RL). In addition, we propose a novel pairwise RLHF pipeline that directly optimizes policies from pairwise comparisons, eliminating the need for pointwise reward conversion. Experiments show that Think-RM outperforms baselines on both in-distribution and out-of-distribution tasks, with particularly strong gains on reasoning-heavy benchmarks: more than 10\% and 5\% on RewardBench's Chat Hard and Reasoning, and 12\% on RM-Bench's Math domain. When combined with our pairwise RLHF pipeline, it demonstrates superior end-policy performance compared to traditional approaches. This depth-oriented approach not only broadens the GenRM design space but also establishes a new paradigm for preference-based policy optimization in RLHF.

PDF Details

IJCAI Conference 2016 Conference Paper

Intersubjectivity and Sentiment: From Language to Knowledge

Lin Gui
Ruifeng Xu
Yulan He
Qin Lu
Zhongyu Wei

Intersubjectivity is an important concept in psychology and sociology. It refers to sharing conceptualizations through social interactions in a community and using such shared conceptualization as a resource to interpret things that happen in everyday life. In this work, we make use of intersubjectivity as the basis to model shared stance and subjectivity for sentiment analysis. We construct an intersubjectivity network which links review writers, terms they used, as well as the polarities of the terms. Based on this network model, we propose a method to learn writer embeddings which are subsequently incorporated into a convolutional neural network for sentiment analysis. Evaluations on the IMDB, Yelp 2013 and Yelp 2014 datasets show that the proposed approach has achieved the state-of-the-art performance.

PDF Details

AAAI Conference 2016 Conference Paper

ROOT13: Spotting Hypernyms, Co-Hyponyms and Randoms

Enrico Santus
Alessandro Lenci
Tin-Shing Chiu
Qin Lu
Chu-Ren Huang

In this paper, we describe ROOT13, a supervised system for the classification of hypernyms, co-hyponyms and random words. The system relies on a Random Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9, 600 pairs, equally distributed among the three classes and involving several Parts-Of- Speech (i. e. adjectives, nouns and verbs). When all the classes are present, ROOT13 achieves an F1 score of 88. 3%, against a baseline of 57. 6% (vector cosine). When the classification is binary, ROOT13 achieves the following results: hypernyms-co-hyponyms (93. 4% vs. 60. 2%), hypernymsrandom (92. 3% vs. 65. 5%) and co-hyponyms-random (97. 3% vs. 81. 5%). Our results are competitive with stateof-the-art models.

PDF Details

AAAI Conference 2016 Conference Paper

Unsupervised Measure of Word Similarity: How to Outperform Co-Occurrence and Vector Cosine in VSMs

Enrico Santus
Alessandro Lenci
Tin-Shing Chiu
Qin Lu
Chu-Ren Huang

In this paper, we claim that vector cosine – which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9. 00% and +17. 98%, depending on the number of chosen top contexts.

PDF Details

IJCAI Conference 2011 Conference Paper

A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogsphere

Jintao Tang
Ting Wang
Qin Lu
Ji Wang
Wenjie Li

There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the "word-of-mouth" effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph in which the semantic relatedness between terms are learned from Wikipedia. For a given topic/post, the named entities, Wikipedia concepts, and the semantic relatedness are extracted to generate the graph model. Noises are filtered out through a graph clustering algorithm. To handle topic evolution, the topic model is enriched by using Wikipedia as background knowledge. Furthermore, graph edit distance is used to measure the similarity between a topic and its posts. The proposed method is tested using real-world blog data. Experimental results show the advantage of the proposed method on tracking topics in short, noisy text.

PDF Details DOI

AAMAS Conference 2003 Conference Paper

Improve the service quality of multi-agent system: ontology management

Kwun-tak Ng
Qin Lu