Arrow Research search

Author name cluster

Junyu Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

AAAI Conference 2026 Conference Paper

Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection

  • Han Wang
  • Deyi Ji
  • Junyu Lu
  • Lanyun Zhu
  • Hailong Zhang
  • Haiyang Wu
  • Liqun Liu
  • Peng Shu

Accurate detection of offensive content on social media demands high-quality labeled data; however, such data is often scarce due to the low prevalence of offensive instances and the high cost of manual annotation. To address this low-resource challenge, we propose a self-training framework that leverages abundant unlabeled data through collaborative pseudo-labeling. Starting with a lightweight classifier trained on limited labeled data, our method iteratively assigns pseudo-labels to unlabeled instances with the support of Multi-Agent Vision-Language Models (MA-VLMs). Unlabeled data on which the classifier and MA-VLMs agree are designated as the Agreed-Unknown set, while conflicting samples form the Disagreed-Unknown set. To enhance label reliability, MA-VLMs simulate dual perspectives, moderator and user, capturing both regulatory and subjective viewpoints. The classifier is optimized using a novel Positive-Negative-Unlabeled (PNU) loss, which jointly exploits labeled, Agreed-Unknown, and Disagreed-Unknown data while mitigating pseudo-label noise. Experiments on benchmark datasets demonstrate that our framework substantially outperforms baselines under limited supervision and approaches the performance of large-scale models.

IJCAI Conference 2025 Conference Paper

CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models

  • Yi Zhan
  • Qi Liu
  • Weibo Gao
  • Zheng Zhang
  • Tianfu Wang
  • Shuanghong Shen
  • Junyu Lu
  • Zhenya Huang

Personalized programming tutoring, such as exercise recommendation, can enhance learners' efficiency, motivation, and outcomes, which is increasingly important in modern digital education. However, the lack of sufficient and high-quality programming data, combined with the mismatch between offline evaluation and real-world learning, hinders the practical deployment of such systems. To address this challenge, many approaches attempt to simulate learner practice data, yet they often overlook the fine-grained, iterative nature of programming learning, resulting in a lack of interpretability and granularity. To fill this gap, we propose a LLM-based agent, CoderAgent, to simulate students' programming processes in a fine-grained manner without relying on real data. Specifically, we equip each human learner with an intelligent agent, the core of which lies in capturing the cognitive states of the human programming practice process. Inspired by ACT-R, a cognitive architecture framework, we design the structure of CoderAgent to align with human cognitive architecture by focusing on the mastery of programming knowledge and the application of coding ability. Recognizing the inherent patterns in multi-layered cognitive reasoning, we introduce the Programming Tree of Thought (PTOT), which breaks down the process into four steps: why, how, where, and what. This approach enables a detailed analysis of iterative problem-solving strategies. Finally, experimental evaluations on real-world datasets demonstrate that CoderAgent provides interpretable insights into learning trajectories and achieves accurate simulations, paving the way for personalized programming education.

AAAI Conference 2025 Conference Paper

GenAL: Generative Agent for Adaptive Learning

  • Rui Lv
  • Qi Liu
  • Weibo Gao
  • Haotian Zhang
  • Junyu Lu
  • Linbo Zhu

Adaptive learning, also known as adaptive teaching, relies on learning path recommendations that sequentially suggest personalized learning items (such as lectures and exercises) to meet the unique needs of each learner. Despite the extensive research in this field, previous approaches have primarily modeled the interaction sequences between learners and items using simple indexing, leading to three issues: (1) The utilization of information from both learners and items is not sufficient. For instance, these models are unable to leverage the semantic information contained within the textual content of the items. (2) Models need to be retrained on different datasets separately, which makes it difficult to adapt to the continuously expanding item pool in online educational scenarios. (3) The existing recommendation paradigm based on trained reinforcement learning frameworks, suffers from unstable recommendation performance in sparse learning logs. To address these challenges, we propose a generalized Generative Agent for Adaptive Learning (GenAL), which integrates educational tools with LLMs' semantic understanding to enable effective and generalizable learning path recommendations across diverse data distributions. Specifically, our framework consists of two components: the Global Thinking Agent, which updates the learner profile and reflects on recommendation outcomes based on the learner's historical learning records. The other is the Local Teaching Agent, which recommends items using educational prior knowledge. Leveraging the LLM's robust semantic understanding, our framework does not rely on item indexing but instead extracts relevant information from the textual content. We evaluated our approach on three real-world datasets, and the experimental results demonstrate that our GenAL not only consistently outperforms all baselines but also exhibits strong generalization ability.

NeurIPS Conference 2025 Conference Paper

SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

  • Wanxin Tian
  • Shijie Zhang
  • Kevin Zhang
  • Xiaowei Chi
  • Chun-Kai Fan
  • Junyu Lu
  • Yulin Luo
  • Qiang Zhou

Self-evolution, the ability of agents to autonomously improve their reasoning and behavior, is essential for the embodied domain with long-horizon, real-world tasks. Despite current advancements in reinforcement fine-tuning (RFT) showing strong performance in enhancing reasoning in LLMs, its potential to enable self-evolving embodied intelligence with multi-modal interactions remains largely unexplored. Specifically, reinforcement fine-tuning faces two fundamental obstacles in embodied settings: (i) the lack of accessible intermediate rewards in multi-step reasoning tasks limits effective learning signals, and (ii) reliance on hand-crafted reward functions restricts generalization to novel tasks and environments. To address these challenges, we present Self-Evolving Embodied Agents-R1, SEEA-R1, the first RFT framework designed for enabling the self-evolving capabilities of embodied agents. Specifically, to convert sparse delayed rewards into denser intermediate signals that improve multi-step reasoning, we propose Tree-based group relative policy optimization ( Tree-GRPO ) integrates Monte Carlo Tree Search into GRPO. To generalize reward estimation across tasks and scenes, supporting autonomous adaptation and reward-driven self-evolution, we further introduce Multi-modal Generative Reward Model ( MGRM ). To holistically evaluate the effectiveness of SEEA-R1, we evaluate on the ALFWorld benchmark, surpassing state-of-the-art methods with scores of 85. 07\% (textual) and 46. 27\% (multi-modal), outperforming prior models including GPT-4o. SEEA-R1 also achieves scores of 80. 3\% (textual) and 44. 03\% (multi-modal) without ground truth reward, surpassing all open-source baselines and highlighting its scalability as a self-evolving embodied agent. Additional experiments and qualitative analysis further support the potential of SEEA-R1 for future research in scalable embodied intelligence. Project page is at https: //seea-r1. github. io/.

AAAI Conference 2025 Conference Paper

VERSE: Verification-based Self-Play for Code Instructions

  • Hao Jiang
  • Qi Liu
  • Rui Li
  • Yuze Zhao
  • Yixiao Ma
  • Shengyu Ye
  • Junyu Lu
  • Yu Su

Instruction-tuned Code Large Language Models (Code LLMs) have excelled in diverse code-related tasks, such as program synthesis, automatic program repair, and code explanation. To collect training datasets for instruction-tuning, a popular method involves having models autonomously generate instructions and corresponding responses. However, the direct generation of responses does not ensure functional correctness, a crucial requirement for generating responses to code instructions. To overcome this, we present Verification-Based Self-Play (VERSE), aiming to enhance model proficiency in generating correct responses. VERSE establishes a robust verification framework that covers various code instructions. Employing VERSE, Code LLMs engage in self-play to generate instructions and corresponding verifications. They evaluate execution results and self-consistency as verification outcomes, using them as scores to rank generated data for self-training. Experiments show that VERSE improves multiple base Code LLMs (average 7.6%) across various languages and tasks on many benchmarks, affirming its effectiveness.

IJCAI Conference 2025 Conference Paper

WDMIR: Wavelet-Driven Multimodal Intent Recognition

  • Weiyin Gong
  • Kai Zhang
  • Yanghai Zhang
  • Qi Liu
  • Xinjie Sun
  • Junyu Lu
  • Linbo Zhu

Multimodal intent recognition (MIR) seeks to accurately interpret user intentions by integrating verbal and non-verbal information across video, audio and text modalities. While existing approaches prioritize text analysis, they often overlook the rich semantic content embedded in non-verbal cues. This paper presents a novel Wavelet-Driven Multimodal Intent Recognition (WDMIR) framework that enhances intent understanding through frequency-domain analysis of non-verbal information. To be more specific, we propose: (1) a wavelet-driven fusion module that performs synchronized decomposition and integration of video-audio features in the frequency domain, enabling fine-grained analysis of temporal dynamics; (2) a cross-modal interaction mechanism that facilitates progressive feature enhancement from bimodal to trimodal integration, effectively bridging the semantic gap between verbal and non-verbal information. Extensive experiments on MIntRec demonstrate that our approach achieves state-of-the-art performance, surpassing previous methods by 1. 13% on accuracy. Ablation studies further verify that the wavelet-driven fusion module significantly improves the extraction of semantic information from non-verbal sources, with a 0. 41% increase in recognition accuracy when analyzing subtle emotional cues.

NeurIPS Conference 2024 Conference Paper

Towards Comprehensive Detection of Chinese Harmful Memes

  • Junyu Lu
  • Bo Xu
  • Xiaokun Zhang
  • Hongbo Wang
  • Haohao Zhu
  • Dongyu Zhang
  • Liang Yang
  • Hongfei Lin

Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we present the comprehensive detection of Chinese harmful memes. We introduce ToxiCN MM, the first Chinese harmful meme dataset, which consists of 12, 000 samples with fine-grained annotations for meme types. Additionally, we propose a baseline detector, Multimodal Knowledge Enhancement (MKE), designed to incorporate contextual information from meme content, thereby enhancing the model's understanding of Chinese memes. In the evaluation phase, we conduct extensive quantitative experiments and qualitative analyses on multiple baselines, including LLMs and our MKE. Experimental results indicate that detecting Chinese harmful memes is challenging for existing models, while demonstrating the effectiveness of MKE.