Author name cluster

Junjie Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

1 author row

AAAI Conference 2026 Conference Paper

MetaAct-RL: Training Language Models for Reasoning Through Meta-Action-Based Reinforcement Learning

Zhiheng Xi
Yuhui Wang
Yiwen Ding
Guanyu Li
Senjie Jin
Shichun Liu
Jixuan Huang
Dingwen Yang

Outcome-based reinforcement learning has made notable advances in training language models (LMs) for reasoning. However, without explicit incentives and controls, this paradigm has limitations and instability in eliciting high-quality reasoning trajectories with diverse actions—particularly for models whose pretraining lacked extensive reasoning-related data. To this end, we introduce MetaAct-RL, a new RL framework that frames LMs’ thinking as sequential decision making over meta-actions. In this framework, the model chooses and executes a high-level action at each step—such as forward reasoning, critique, or refinement—to gradually reach the correct answer. To encourage deeper exploration, richer action diversity, and to improve sampling efficiency in the RL optimization process, MetaAct-RL incorporates appropriate length-based reward and regularization, and a key-state restart mechanism. Extensive experiments across six benchmarks show that MetaAct-RL improves reasoning performance by 7.99 on Llama3.2-1B and 7.17 on Llama3.1-8B relative to vanilla RL method. Moreover, on the challenging AIME-2024, our method outperforms the vanilla RL by 7.5 with Qwen2.5-1.5B.

PDF Details DOI

AAAI Conference 2026 Conference Paper

What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study

Xiaoran Fan
Zhichao Sun
Yangfan Gao
Jingfei Xiong
Hang Yan
Yifei Cao
Jiajun Sun
Shuo Li

Speech-language models (SLMs) offer a promising path toward unifying speech and text understanding and generation. However, challenges remain in achieving effective cross-modal alignment and high-quality speech generation. In this work, we systematically investigate the role of speech tokenizer designs in LLM-centric SLMs, augmented by speech heads and speaker modeling. We compare coupled, semi-decoupled, and fully decoupled speech tokenizers under a fair SLM framework and find that decoupled tokenization significantly improves alignment and synthesis quality. To address the information density mismatch between speech and text, we introduce multi-token prediction (MTP) into SLMs, enabling each hidden state to decode multiple speech tokens. This leads to up to 12× faster decoding and a substantial drop in word error rate (from 6.07 to 3.01). Furthermore, we propose a speaker-aware generation paradigm and introduce RoleTriviaQA, a large-scale role-playing knowledge QA benchmark with diverse speaker identities. Experiments demonstrate that our methods enhance both knowledge understanding and speaker consistency.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Alleviating Shifted Distribution in Human Preference Alignment through Meta-Learning

Shihan Dou
Yan Liu
Enyu Zhou
Songyang Gao
Tianlong Li
Limao Xiong
Xin Zhao
Haoxiang Jia

The capability of the reward model (RM) is crucial for the success of Reinforcement Learning from Human Feedback (RLHF) in aligning with human preferences. However, as training progresses, the output space distribution of the policy model shifts. The RM, initially trained on responses sampled from the output distribution of the early policy model, gradually loses its ability to distinguish between responses from the newly shifted distribution. This issue is further compounded when the RM, trained on a specific data distribution, struggles to generalize to examples outside of that distribution. These two issues can be united as a challenge posed by the shifted distribution of the environment. To surmount this challenge, we introduce MetaRM, a novel method leveraging meta-learning to adapt the RM to the shifted environment distribution. MetaRM optimizes the RM in an alternating way, by preserving both the preferences of the original preference pairs, as well as maximizing discrimination power over new examples of the shifted distribution. Extensive experiments demonstrate that MetaRM can iteratively enhance the performance of human preference alignment by improving the RM's capacity to identify subtle differences in samples of shifted distributions.

PDF Details DOI

AAAI Conference 2024 Conference Paper

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

Jinyi Liu
Zhi Wang
Yan Zheng
Jianye Hao
Chenjia Bai
Junjie Ye
Zhen Wang
Haiyin Piao

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

PDF Details DOI

AAAI Conference 2024 Conference Paper

PreRoutGNN for Timing Prediction with Order Preserving Partition: Global Circuit Pre-training, Local Delay Learning and Attentional Cell Modeling

Ruizhe Zhong
Junjie Ye
Zhentao Tang
Shixiong Kai
Mingxuan Yuan
Jianye Hao
Junchi Yan

Pre-routing timing prediction has been recently studied for evaluating the quality of a candidate cell placement in chip design. It involves directly estimating the timing metrics for both pin-level (slack, slew) and edge-level (net delay, cell delay), without time-consuming routing. However, it often suffers from signal decay and error accumulation due to the long timing paths in large-scale industrial circuits. To address these challenges, we propose a two-stage approach. First, we propose global circuit training to pre-train a graph auto-encoder that learns the global graph embedding from circuit netlist. Second, we use a novel node updating scheme for message passing on GCN, following the topological sorting sequence of the learned graph embedding and circuit graph. This scheme residually models the local time delay between two adjacent pins in the updating sequence, and extracts the lookup table information inside each cell via a new attention mechanism. To handle large-scale circuits efficiently, we introduce an order preserving partition scheme that reduces memory consumption while maintaining the topological dependencies. Experiments on 21 real world circuits achieve a new SOTA R2 of 0.93 for slack prediction, which is significantly surpasses 0.59 by previous SOTA method. Code will be available at: https://github.com/Thinklab-SJTU/EDA-AI.

PDF Details DOI

TCS Journal 2022 Journal Article

A 5k-vertex kernel for P2-packing

Wenjun Li
Junjie Ye
Yixin Cao

The P 2 -packing problem asks whether a graph contains k vertex-disjoint (not necessarily induced) paths each of length two. We continue the study of its kernelization algorithms, and develop a 5k-vertex kernel.

Details DOI

NeurIPS Conference 2022 Conference Paper

The Policy-gradient Placement and Generative Routing Neural Networks for Chip Design

Ruoyu Cheng
Xianglong Lyu
Yang Li
Junjie Ye
Jianye Hao
Junchi Yan

Placement and routing are two critical yet time-consuming steps of chip design in modern VLSI systems. Distinct from traditional heuristic solvers, this paper on one hand proposes an RL-based model for mixed-size macro placement, which differs from existing learning-based placers that often consider the macro by coarse grid-based mask. While the standard cells are placed via gradient-based GPU acceleration. On the other hand, a one-shot conditional generative routing model, which is composed of a special-designed input-size-adapting generator and a bi-discriminator, is devised to perform one-shot routing to the pins within each net, and the order of nets to route is adaptively learned. Combining these techniques, we develop a flexible and efficient neural pipeline, which to our best knowledge, is the first joint placement and routing network without involving any traditional heuristic solver. Experimental results on chip design benchmarks showcase the effectiveness of our approach, with code that will be made publicly available.

PDF Details

AAAI Conference 2021 Conference Paper

Beyond Class-Conditional Assumption: A Primary Attempt to Combat Instance-Dependent Label Noise

Pengfei Chen
Junjie Ye
Guangyong Chen
Jingwei Zhao
Pheng-Ann Heng

Supervised learning under label noise has seen numerous advances recently, while existing theoretical findings and empirical results broadly build up on the class-conditional noise (CCN) assumption that the noise is independent of input features given the true label. In this work, we present a theoretical hypothesis testing and prove that noise in real-world dataset is unlikely to be CCN, which confirms that label noise should depend on the instance and justifies the urgent need to go beyond the CCN assumption. The theoretical results motivate us to study the more general and practical-relevant instancedependent noise (IDN). To stimulate the development of theory and methodology on IDN, we formalize an algorithm to generate controllable IDN and present both theoretical and empirical evidence to show that IDN is semantically meaningful and challenging. As a primary attempt to combat IDN, we present a tiny algorithm termed self-evolution average label (SEAL), which not only stands out under IDN with various noise fractions, but also improves the generalization on realworld noise benchmark Clothing1M. Our code is released1. Notably, our theoretical analysis in Section 2 provides rigorous motivations for studying IDN, which is an important topic that deserves more research attention in future.

PDF Details

AAAI Conference 2021 Conference Paper

Robustness of Accuracy Metric and its Inspirations in Learning with Noisy Labels

Pengfei Chen
Junjie Ye
Guangyong Chen
Jingwei Zhao
Pheng-Ann Heng

For multi-class classification under class-conditional label noise, we prove that the accuracy metric itself can be robust. We concretize this finding’s inspiration in two essential aspects: training and validation, with which we address critical issues in learning with noisy labels. For training, we show that maximizing training accuracy on sufficiently many noisy samples yields an approximately optimal classifier. For validation, we prove that a noisy validation set is reliable, addressing the critical demand of model selection in scenarios like hyperparameter-tuning and early stopping. Previously, model selection using noisy validation samples has not been theoretically justified. We verify our theoretical results and additional claims with extensive experiments. We show characterizations of models trained with noisy labels, motivated by our theoretical results, and verify the utility of a noisy validation set by showing the impressive performance of a framework termed noisy best teacher and student (NTS). Our code is released1.

PDF Details

TCS Journal 2019 Journal Article

Two edge-disjoint paths with length constraints

Leizhen Cai
Junjie Ye

We consider the problem of finding, for two pairs ( s 1, t 1 ) and ( s 2, t 2 ) of vertices in an undirected graph, an ( s 1, t 1 ) -path P 1 and an ( s 2, t 2 ) -path P 2 such that P 1 and P 2 share no edges and the length of each P i satisfies constraint L i, where L i ∈ { ≤ k i, = k i, ≥ k i, ⁎ } with L i = “ ⁎ ” indicating no length constraint on P i. We regard k 1 and k 2 as parameters and investigate the parameterized complexity of the above problem when at least one of P 1 and P 2 has a length constraint. For the 9 different cases of ( L 1, L 2 ), we obtain FPT algorithms for 7 of them by using random partition backed by some structural results. On the other hand, we prove that the problem admits no polynomial kernel for all 9 cases unless N P ⊆ c o N P / p o l y.

Details DOI

TCS Journal 2015 Journal Article

Parameterized complexity of finding connected induced subgraphs

Leizhen Cai
Junjie Ye

For a graph property Π, i. e. , a family Π of graphs, the Connected Induced Π -Subgraph problem asks whether an input graph G contains k vertices V ′ such that the induced subgraph G [ V ′ ] is connected and satisfies property Π. In this paper, we study the parameterized complexity of Connected Induced Π -Subgraph for decidable hereditary properties Π, and give a nearly complete characterization in terms of whether Π includes all complete graphs, all stars, and all paths. As a consequence, we obtain a complete characterization of the parameterized complexity of our problem when Π is the family of H-free graphs for a fixed graph H with h ≥ 3 vertices: W[1]-hard if H is K h, K h ‾, or K 1, h − 1; and FPT otherwise. Furthermore, we also settle the parameterized complexity of the problem for many well-known families Π of graphs: FPT for perfect graphs, chordal graphs, and interval graphs, but W[1]-hard for forests, bipartite graphs, planar graphs, line graphs, and degree-bounded graphs.

Details DOI