Arrow Research search

Author name cluster

Yu Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

46 papers
2 author rows

Possible papers

46

AAAI Conference 2026 Conference Paper

Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning

  • Xiuxiu Qi
  • Yu Yang
  • Jiannong Cao
  • Luyao Bai
  • Chongshan Fan
  • Chengtai Cao
  • Hongpeng Wang

Language-Conditioned Manipulation (LCM) facilitates human-robot interaction via Behavioral Cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (i.e., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL’s generalization under unseen and noisy object states.

EAAI Journal 2026 Journal Article

Extracting damage Poisson's ratio model from concrete constitutive relation using Kolmogorov-Arnold network

  • Yu Yang

Accurately predicting the variation of concrete's Poisson's ratio with damage is of great significance in the field of structural health monitoring. Typically, obtaining an accurate and rational Poisson's ratio model for concrete requires extensive, time-consuming, and costly experimental studies on numerous specimens. As an alternative, this paper proposes and validates the feasibility of extracting damage Poisson's ratio models from known concrete constitutive relationship models, and for this purpose, proposes a derived theory extraction algorithm framework based on Kolmogorov-Arnold networks. The framework consists of the following steps: (1) summarizing key influencing factors; (2) generating a training dataset via numerical simulations; (3) extracting interpretable models using Kolmogorov-Arnold networks; and (4) filtering and validating the extracted models. As a demonstration, we successfully derived a damage Poisson's ratio model from a widely used complex concrete constitutive model, incorporating three critical factors: stress level, concrete strength, and initial Poisson's ratio. Validation and comparison results on five test datasets from different sources show that the model achieves an average root mean square error of 0. 0853, which is superior to the best-performing classical model (average root mean square error of 0. 1908). Furthermore, the model has a relatively concise formula form (with a string length of only 71) and good interpretability, making it practically valuable. Comparisons with other symbolic regression algorithm frameworks further validate the comprehensive advantages of the proposed Kolmogorov-Arnold network-based derived theory extraction algorithm framework in terms of accuracy, conciseness, interpretability, and efficiency.

AAAI Conference 2026 Conference Paper

LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences

  • Alan Liang
  • Youquan Liu
  • Yu Yang
  • Dongyue Lu
  • Linfeng Li
  • Lingdong Kong
  • Huaici Zhao
  • Wei Tsang Ooi

Generative world models have become essential data engines for autonomous driving, yet most focus on videos or occupancy grids and overlook the unique challenges of LiDAR. Extending LiDAR generation to dynamic 4D modeling requires addressing controllability, temporal coherence, and standardized evaluation. We present LiDARCrafter, a unified framework for controllable 4D LiDAR generation and editing. Free-form language instructions are converted into ego-centric scene graphs that guide a tri-branch diffusion model to generate object geometry, motion, and structural priors. An autoregressive module further produces temporally coherent and stable LiDAR sequences with improved global consistency. To enable fair comparison, we introduce a comprehensive benchmark covering scene-, object-, and sequence-level metrics for rigorous and reproducible evaluation. Experiments on nuScenes show that LiDARCrafter achieves state-of-the-art fidelity, controllability, and temporal consistency, paving the way for scalable data augmentation and realistic simulation in diverse scenarios. Code have been publicly available at https://lidarcrafter.github.io.

AAAI Conference 2026 Conference Paper

Multi-Window Gabor Transform Network for Ground Penetrating Radar B-Scan Image Reconstruction

  • Huabin Wang
  • Yu Yang
  • Xinran Zhong
  • Zilong Ling

Transmitting and receiving electromagnetic wave signals reflected back to the ground can detect the structure of subsurface defects. However, the imaging process of ground-penetrating radar (GPR) is highly susceptible to interference from complex underground environments, leading to nonlinear attenuation and noise. This makes it challenging to directly locate and identify defect types from raw reflected radar waveform images. Currently, mainstream methods of manual radar signal gain and filtering heavily rely on expert experience, while common end-to-end generative models are typically designed for optical images. This paper proposes a defect-guided Multi-window Gabor Transform Network (MGT-Net) for GPR B-Scan image reconstruction which achieves automatic gain and defect enhancement of raw GPR signals. Firstly, a Multi-window Gabor Transform Module (MGTM) is designed to effectively represent and extract spatial-frequency features of defects at different locations and of various types. Secondly, a defect guidance network (DG-Net) is constructed to accurately direct the reconstruction of defect areas and enhance the saliency and discriminability of defect features. Additionally, we construct a large-scale GPR B-Scan image dataset (GRD) containing 41,613 images across 7 defect categories. Experimental results show the superior performance of MGT-Net, achieving state-of-the-art (SOTA) SSIM of 81.72% ± 3.5% and PSNR of 30.50 ± 0.442.

TMLR Journal 2026 Journal Article

Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning

  • Ruhan Wang
  • Yu Yang
  • Zhishuai Liu
  • Dongruo Zhou
  • Pan Xu

We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on Decision Transformer (DT) type frameworks, which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works address the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented (REAG) method for DT type frameworks, where we augment the return in the source domain by aligning its distribution with that in the target domain. We provide the theoretical analysis demonstrating that the RCSL policy learned from REAG achieves the same level of suboptimality as would be obtained without a dynamics shift. We introduce two practical implementations $REAG^{∗}_{Dara}$ and $REAG^{∗}_{MV}$ respectively. Thorough experiments on D4RL datasets and various DT-type baselines demonstrate that our methods consistently enhance the performance of DT type frameworks in off-dynamics RL.

EAAI Journal 2026 Journal Article

Spatio-temporal traffic accidents detection via graph based generative adversarial network

  • Lyuyi Zhu
  • Qixin Zhang
  • Xiangru Jian
  • Yu Yang
  • Lishuai Li

Due to urbanization and economic growth, traffic accidents have become a severe social problem. With the development of intelligent transportation systems and Internet of Things devices, detecting traffic accidents from big data is becoming an increasingly important trend for the future. However, there are several main challenges for accident detection. Firstly, traffic data is complex due to its spatial and temporal correlations. Secondly, traffic accidents are spatially and temporally dispersed, making them challenging to capture. Additionally, the high cost of labeling presents a significant obstacle, leading to a scarcity of available labels. Thirdly, unsupervised anomaly detection necessitates the approximation of normal samples, posing a challenge in approximating time series data in high-dimensional distributions collected from Internet of Things devices. To address these problems, we propose a novel spatio-temporal graph generative adversarial network framework, comprising a discriminator and a generator. The discriminator aims to identify fake and true samples by learning the representation of each input and its spatio-temporal context. The generator aims to generate fake data from the spatio-temporal context and fool the discriminator. Through adversarial training, the model can identify anomaly samples. We validate the performance of the proposed model on two real-world traffic accident datasets. The experimental results demonstrate that our model surpasses the baselines, thereby showcasing its effectiveness. Furthermore, a case study is conducted to analyze the characteristics and potential impact of the traffic accident, providing valuable insights for the improvement of this field and future research.

EAAI Journal 2026 Journal Article

Spectrum-envelope attention-driven adaptive time-frequency enhancement network and its application in trustworthy cross-machine fault diagnosis

  • Xin Kang
  • Zhengyang Cheng
  • Yi Gong
  • Junsheng Cheng
  • Yu Yang

The performance degradation of cross-domain fault diagnosis is mainly caused by domain-related noise induced by variations in mechanical structures and operating conditions, which obscures domain-invariant fault features. Existing domain generalization methods often ignore the mechanism-based priors of rotating machinery, leading to overfitting and poor interpretability under large domain shifts. To address this issue, we propose a mechanism-constrained cross-domain fault diagnosis framework that exploits the sparsity discrepancy between fault-relevant components and domain-related noise in the subband envelope spectrum. Specifically, an improved wavelet transform is used to extract multi-scale subband envelope information, followed by a spectrum-envelope attention module (SEAM) that adaptively enhances fault-relevant subbands while suppressing domain-related noise under sparsity guidance. Experimental results demonstrate clear performance gains over state-of-the-art methods in challenging cross-machine diagnosis tasks. Comprehensive analyses of the SEAM inputs, outputs, and attention distributions, together with CAM visualizations of the enhanced time–frequency representations and diagnostic results, further verify the interpretability and reliability of the proposed model. Ablation experiments further validate the effectiveness of each component in improving cross-domain generalization.

AAAI Conference 2026 Conference Paper

Towards Training-Free and Accurate ANN-to-SNN Conversion via Activation-Aware Redistribution

  • Honglin Cao
  • Shuai Wang
  • Zijian Zhou
  • Ammar Belatreche
  • Wenjie Wei
  • Yu Liang
  • Yu Yang
  • Rui Xi

Conversion represents an effective approach for obtaining low-power models by transforming Artificial Neural Networks (ANNs) into event-driven Spiking Neural Networks (SNNs) without additional training. However, existing training-free conversion methods often incur substantial conversion errors. Here, we first reveal that these conversion errors primarily arise from a distributional mismatch, as the activation distributions of ANNs exhibit channel-wise shifts and scaling, whereas spike rates lack corresponding channel-specific characteristics. To address this limitation, we propose Adaptive Integrate-and-Fire (AIF) neurons with channel-specific thresholds and membrane-potential offsets that dynamically adjust spike rates. These parameters are optimized to jointly minimize conversion errors and maximize information entropy, enabling AIF neurons to capture the activation distribution characteristics of the original ANN. Moreover, AIF neurons can be seamlessly integrated into Transformer architectures with only negligible additional computational cost. Our method achieves state-of-the-art results on multiple vision and natural language processing benchmarks, in particular attaining a notable top-1 accuracy of 85.52% on ImageNet-1K.

EAAI Journal 2025 Journal Article

A reliable degradation prediction method for proton exchange membrane fuel cells based on uncertainty Bayesian self-attention

  • Mengyu Liu
  • Zhe Cheng
  • Yu Yang
  • Niaoqing Hu
  • Guoji Shen
  • Yi Yang

Health state prediction of Proton Exchange Membrane Fuel Cells (PEMFCs) is a critical technology to ensure their long-term reliable operation. Prediction accuracy directly influences the effectiveness of maintenance strategies and risk management. However, existing PEMFC degradation prediction methods based on Recurrent Neural Networks (RNNs) or Transformer architectures mostly focus on point estimation while neglecting uncertainty quantification. This limitation makes it difficult to assess the confidence level of predictions in practical engineering applications, reducing the models' reliability in decision support. To address this issue, this paper proposes a novel Bayesian Patch Time Series Transformer (B-PatchTST) method. By deeply integrating Bayesian variational inference with time series patch modeling, the method enables probabilistic prediction of PEMFC degradation trajectories and disentangled analysis of uncertainty sources. Unlike traditional Bayesian Neural Networks (BNNs) that primarily apply Bayesian modeling to fully connected layers, B-PatchTST introduces a Bayesian Self-Attention Mechanism, which models epistemic uncertainty in three stages: patch embedding, uncertainty-aware self-attention computation, and adaptive regularization. This design significantly enhances the credibility of the model. Extensive experiments on the fuel cell datasets demonstrate the proposed method's outstanding performance. It achieves an average reduction of 36. 31 % in root mean square error and an average compression of 83. 39 % in the 95 % confidence interval, significantly outperforming existing methods. This approach offers a trustworthy basis for predictive maintenance in PEMFC systems, promoting a shift from “experience-based maintenance” to “reliable prognostics” in hydrogen energy applications.

AAAI Conference 2025 Conference Paper

Adaptive Multi-Faceted Service Capabilities Co-Prediction for Nationwide Terminal Stations in Logistics

  • Shuxin Zhong
  • Kimberly Liu
  • Wenjun Lyu
  • Haotian Wang
  • Guang Wang
  • Yunhuai Liu
  • Tian He
  • Yu Yang

Estimating service capabilities for logistics terminal stations is essential for guiding operations adjustments to enhance customer experience. However, existing studies often focus on isolated metrics like on-time delivery or complaint rates, each reflecting a specific aspect of service capabilities. To provide a more comprehensive evaluation, we design AdaService, an Adaptive multi-faceted Service capabilities co-estimation framework. We begin by constructing Multi-faceted Hypergraph to encode stations using multiple performance metrics. We then introduce a Multi-faceted Hypergraph Convolution Network (MHCN) to capture the heterogeneous service capabilities across stations, providing a comprehensive capabilities representation. Finally, we apply an Adaptive Multi-faceted Estimation module that uses multi-task learning to model dynamic interactions among these metrics, enhancing predictive accuracy. Extensive evaluation with real-world data collected from nationwide stations in a leading logistics company in China demonstrates that AdaService significantly outperforms state-of-the-art methods, improving estimation accuracy for on-time delivery, on-time pick-up, and complaint rates by up to 18.98%, 9.30%, and 39.62%.

EAAI Journal 2025 Journal Article

Adaptive reconstruct feature difference network for open set domain generalization fault diagnosis

  • Mengyu Liu
  • Zhe Cheng
  • Yu Yang
  • Niaoqing Hu
  • Guoji Shen
  • Yi Yang

Data-driven deep learning methods for intelligent fault diagnosis are highly regarded for their capability to perform end-to-end diagnosis. However, their practical application is often constrained by challenges such as dependency on extensive labeled data, inconsistent data distributions, and the inability to handle unknown test conditions. To address these limitations, this paper introduces an Adaptive Feature Reconstruction Difference Network (AFRDN) designed for open-set domain generalization in the fault diagnosis of rotating machinery. AFRDN achieves robust performance through three key innovations: first, it maps multi-source domain features to a common standard normal distribution, enhancing feature alignment and enabling generalization to unseen target domains. Second, it employs a feature reconstructor to calculate reconstruction differences, adaptively learning decision thresholds for each known health condition category. Finally, the network integrates these components to build a domain generalization model capable of accurately detecting faults, including unknown categories, under previously unseen operating conditions. The effectiveness of the proposed AFRDN was evaluated through extensive experiments on both public and private rotating machinery datasets. The method achieved average accuracies of 90. 59% and 85. 10%, respectively, outperforming several state-of-the-art open-set domain generalization methods. AFRDN demonstrated superior capabilities in detecting unknown fault categories and improving generalization performance, making it a promising solution for real-world fault diagnosis challenges.

NeurIPS Conference 2025 Conference Paper

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

  • Andy Zhou
  • Kevin Wu
  • Francesco Pinto
  • Zhaorun Chen
  • Yi Zeng
  • Yu Yang
  • Shuang Yang
  • Sanmi Koyejo

As large language models (LLMs) become increasingly capable, security and safety evaluation are crucial. While current red teaming approaches have made strides in assessing LLM vulnerabilities, they often rely heavily on human input and lack comprehensive coverage of emerging attack vectors. This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against LLMs. AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors. The dual-agent framework consists of a red teaming agent that can operate from high-level risk categories alone to generate and execute test cases, and a strategy proposer agent that autonomously discovers and implements new attacks by analyzing recent research. This modular design allows AutoRedTeamer to adapt to emerging threats while maintaining strong performance on existing attack vectors. We demonstrate AutoRedTeamer’s effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3. 1-70B while reducing computational costs by 46% compared to existing approaches. AutoRedTeamer also matches the diversity of human-curated benchmarks in generating test cases, providing a comprehensive, scalable, and continuously evolving framework for evaluating the security of AI systems.

ICML Conference 2025 Conference Paper

BSO: Binary Spiking Online Optimization Algorithm

  • Yu Liang
  • Yu Yang
  • Wenjie Wei
  • Ammar Belatreche
  • Shuai Wang 0058
  • Malu Zhang
  • Yang Yang 0002

Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https: //github. com/hamingsi/BSO.

TCS Journal 2025 Journal Article

Discriminating code and set cover with k-bend paths

  • Yu Yang
  • Cai-Xia Wang
  • Shou-Jun Xu

In this paper, we explore geometric discriminating code and set cover problems with k-bend paths. We first demonstrate that both the discriminating code problem and the set cover problem with unit 0-bend paths are NP-hard. Additionally, we establish that the set cover problem is NP-hard with unit 1-bend paths restricted to type ⌜, where horizontal segments intersect a vertical line and vertical segments intersect a horizontal line. Furthermore, we show that the discriminating code problem remains NP-hard with unit 1-bend paths of type ⌜, where all vertical segments intersect a horizontal line. Finally, we provide approximation algorithms for these two problems, specifically for 0-bend paths of uniform length and for 1-bend paths of type ⌜, where all horizontal and vertical segments are of equal length.

AAAI Conference 2025 Conference Paper

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving

  • Yu Yang
  • Jianbiao Mei
  • Yukai Ma
  • Siliang Du
  • Wenqing Chen
  • Yijie Qian
  • Yuxiang Feng
  • Yong Liu

World models envision potential future states based on various ego actions. They embed extensive knowledge about the driving environment, facilitating safe and scalable autonomous driving. Most existing methods primarily focus on either data generation or the pretraining paradigms of world models. Unlike the aforementioned prior works, we propose Drive-OccWorld, which adapts a vision-centric 4D forecasting world model to end-to-end planning for autonomous driving. Specifically, we first introduce a semantic and motion-conditional normalization in the memory module, which accumulates semantic and dynamic information from historical BEV embeddings. These BEV features are then conveyed to the world decoder for future occupancy and flow forecasting, considering both geometry and spatiotemporal modeling. Additionally, we propose injecting flexible action conditions, such as velocity, steering angle, trajectory, and commands, into the world model to enable controllable generation and facilitate a broader range of downstream applications. Furthermore, we explore integrating the generative capabilities of the 4D world model with end-to-end planning, enabling continuous forecasting of future states and the selection of optimal trajectories using an occupancy-based cost function. Extensive experiments on the nuScenes dataset demonstrate that our method can generate plausible and controllable 4D occupancy, opening new avenues for driving world generation and end-to-end planning.

AAAI Conference 2025 Conference Paper

FairTP: A Prolonged Fairness Framework for Traffic Prediction

  • Jiangnan Xia
  • Yu Yang
  • Jiaxing Shen
  • Senzhang Wang
  • Jiannong Cao

Traffic prediction is pivotal in intelligent transportation systems. Existing works focus mainly on improving overall accuracy, overlooking a crucial problem of whether prediction results will lead to biased decisions by transportation authorities. In practice, the uneven deployment of traffic sensors in different urban areas produces imbalanced data, making the traffic prediction model fail in some urban areas and leading to unfair regional decision-making that eventually severely affects equity and quality of residents’ life. Existing fairness machine learning models struggle to maintain fair traffic prediction over prolonged periods. Although these models might achieve fairness at certain time slots, this static fairness will break down as traffic conditions change. To fill this research gap, we investigate prolonged fair traffic prediction, introducing two novel fairness metrics, i.e., region-based static fairness and sensor-based dynamic fairness, tailored to fairness fluctuations over time and across areas. An innovative prolonged fairness traffic prediction framework, namely FairTP, is then proposed. FairTP achieves prolonged fairness by alternating between “sacrifice” and “benefit” the prediction accuracy of each traffic sensor or area, ensuring that the number of these two actions are balanced over time. Specifically, FairTP incorporates a state identification module to discriminate whether the traffic sensors or areas are in a “sacrifice” or “benefit” state, thereby enabling prolonged fairness-aware traffic predictions. Additionally, we devise a state-guided balanced sampling strategy to select training examples to further enhance prediction fairness by mitigating the performance disparities among areas with uneven sensor distribution over time. Extensive experiments in two real-world datasets show that FairTP significantly improves prediction fairness without causing significant accuracy degradation.

EAAI Journal 2025 Journal Article

Graph convolutional network for traffic incidents duration classification

  • Lyuyi Zhu
  • Qixin Zhang
  • Xiangru Jian
  • Yu Yang

Traffic incidents are a primary cause of severe congestion in urban areas, making accurate forecasting of incident duration essential for effective traffic management systems. However, the inherent uncertainty associated with incidents presents significant challenges in predicting their durations. In this paper, we propose a novel deep neural network model for predicting and classifying traffic incident durations. To capture the dynamic nature of incidents, the model learns from time series data on traffic flow, speed, and occupancy. Additionally, it employs a graph neural network architecture to model the spatial relationships between sensors, while also accounting for factors such as time and incident type. By training the model with cross-entropy loss, we enable it to predict whether an incident’s duration will be long or short. Experimental results demonstrate that our model outperforms existing baselines, demonstrating the effectiveness of our proposed approach. Furthermore, we conduct a case study to visualize the impact of incidents and further validate the model’s predictive capability.

IJCAI Conference 2025 Conference Paper

HCRide: Harmonizing Passenger Fairness and Driver Preference for Human-Centered Ride-Hailing

  • Lin Jiang
  • Yu Yang
  • Guang Wang

Order dispatch systems play a vital role in ride-hailing services, which directly influence operator revenue, driver profit, and passenger experience. Most existing work focuses on improving system efficiency in terms of operator revenue, which may cause a bad experience for both passengers and drivers. Hence, in this work, we aim to design a human-centered ride-hailing system by considering both passenger fairness and driver preference without compromising the overall system efficiency. However, it is nontrivial to achieve this target due to the potential conflicts between passenger fairness and driver preference since optimizing one may sacrifice the other. To address this challenge, we design HCRide, a Human-Centered Ride-hailing system based on a novel multi-agent reinforcement learning algorithm called Harmonization-oriented Actor-Bi-Critic (Habic), which includes three major components (i. e. , a multi-agent competition mechanism, a dynamic Actor network, and a Bi-Critic network) to optimize system efficiency and passenger fairness with driver preference consideration. We extensively evaluate our HCRide using two real-world ride-hailing datasets from Shenzhen and New York City. Experimental results show our HCRide effectively improves system efficiency by 2. 02%, fairness by 5. 39%, and driver preference by 10. 21% compared to state-of-the-art baselines.

IJCAI Conference 2025 Conference Paper

LLM-based Collaborative Agents with Pedagogy-guided Interaction Modeling for Timely Instructive Feedback Generation in Task-oriented Group Discussions

  • Qihao Yang
  • Yu Yang
  • Sixu An
  • Tianyong Hao
  • Guandong Xu

Large language models (LLMs) fundamentally reshape learning and teaching models, shifting tutoring systems from supporting individual learning to facilitating collaborative learning (CL) like task-oriented group discussions. However, existing AI tutors struggle to guide CL, as they seldom model the interactions between AI tutors and students. Therefore, they cannot scaffold students to complete tasks collaboratively, which impairs learning outcomes and pedagogy adaptability. Additionally, existing AI tutors fail to make use of CL theories to generate instructive feedback, which leads to undesirable interactions such as over-instruction and limits students' autonomy. In this paper, we propose an LLM-based collaborative agent that innovatively leverages pedagogical strategies to sense discussion stages, detect learning issues, identify the timing of intervention, and generate instructive feedback. To develop the agent, we first design a prompting strategy based on a CL theory, that is, the Community of Inquiry, to cultivate the agent to understand the discussion status. Second, a multi-agent interaction framework is proposed to simulate the collaborative learning behavior between AI tutors and students. Meanwhile, a synthetic task-oriented group discussion dataset, namely CLTeach, is generated, which consists of 27k manually-verified multi-party dialogues with fine-grained annotations of instructive feedback and explanations. Lastly, we use CLTeach to fine-tune the LLM agent, ultimately enabling it to generate instructive feedback at the right time to support students in CL. Extensive experiments demonstrate that our agent achieves state-of-the-art performance in feedback generation and has the potential to mimic human teachers effectively.

AAAI Conference 2025 Conference Paper

Mixture of Knowledge Minigraph Agents for Literature Review Generation

  • Zhi Zhang
  • Yan Liu
  • Sheng-hua Zhong
  • Gong Chen
  • Yu Yang
  • Jiannong Cao

Literature reviews play a crucial role in scientific research for understanding the current state of research, identifying gaps, and guiding future studies on specific topics. However, the process of conducting a comprehensive literature review is yet time-consuming. This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs), to automate scholarly literature reviews. A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs. By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs. We evaluate CKMAs on three benchmark datasets. Experimental results show the effectiveness of the proposed method, further revealing promising applications of LLMs in scientific research.

NeurIPS Conference 2025 Conference Paper

Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

  • Shizhe Diao
  • Yu Yang
  • Yonggan Fu
  • Xin Dong
  • Dan Su
  • Markus Kliegl
  • ZIJIA CHEN
  • Peter Belcak

Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLustering-based Iterative Data Mixture Bootstrapping (Nemotron-CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, Nemotron-CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. This strategy enables effective domain adaptation without relying solely on curated data. When continuously trained on 400B tokens with this mixture, our 1B model exceeds the state-of-the-art Llama-3. 2-1B by 2. 0%. Moreover, we observe that optimizing for a specific domain (e. g. , Social Sciences) yields a 5% improvement over random sampling. Finally, we introduce Nemotron-ClimbLab, a filtered 1. 2-trillion-token corpus with 20 clusters as a research playground, and Nemotron-ClimbMix, a compact yet powerful 400-billion-token dataset designed for efficient pre-training that delivers superior performance under an equal token budget. We analyze the final data mixture, elucidating the characteristics of an optimal data mixture.

TMLR Journal 2025 Journal Article

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

  • Yu Yang
  • Pan Xu

Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collecting data from specific environments can be both costly and unsafe in many scenarios, leading to suboptimal performance and limited few-shot prompt abilities due to the data-hungry nature of Transformer-based models. Additionally, the limited datasets used in pre-training make it challenging for Prompt-DT type of methods to distinguish between various RL tasks through prompts alone. To address these challenges, we introduce the Language model-initialized Prompt Decision Transformer (LPDT) framework, which leverages pretrained language models providing rich prior knowledge for RL tasks and fine-tunes the sequence model using Low-rank Adaptation (LoRA) for meta-RL problems. We further incorporate prompt regularization to effectively differentiate between tasks based on prompt feature representations. Comprehensive empirical studies demonstrate that initializing with a pre-trained language model provides the prior knowledge and achieves a similar performance with Prompt-DT under only $10\%$ data in some MuJoCo control tasks. We also provide a thorough ablation study to validate the effectiveness of each component, including sequence modeling, language models, prompt regularizations, and prompt strategies.

IROS Conference 2025 Conference Paper

Robust Wrench-Feasible Control for Multiple UAVs Aerial Transportation System with Adaptive Cable Configuration

  • Yu Yang
  • Shuaiping Zhao
  • Yuchen Zhang
  • Liaoni Wu

Due to the bounded thrust, motion acceleration, and external disturbances inherent in quadrotor UAVs, traditional hierarchical control methods for multiple UAVs aerial transportation systems (MUATS) with cable-suspended payloads often struggle to guarantee dynamic performance and payload wrench feasibility. To address these challenges, this paper proposes a robust wrench-feasible control framework. First, we design a payload controller based on an extended state observer to estimate and compensate for the total disturbance acting on the payload. Next, a piecewise wrench adjustment strategy (PWAS) is proposed to ensure that the payload wrench balances feasibility and path-tracking ability. Finally, we propose an adaptive cable configuration strategy (ACCS) inspired by capacity margin. When external disturbances approach the system’s wrench capacity, ACCS can dynamically adjust cable configuration to maximize the capacity margin. Experimental results demonstrate the effectiveness and superiority of the proposed method.

NeurIPS Conference 2025 Conference Paper

SECODEPLT: A Unified Benchmark for Evaluating the Security Risks and Capabilities of Code GenAI

  • Yuzhou Nie
  • Zhun Wang
  • Yu Yang
  • Ruizhe Jiang
  • Yuheng Tang
  • Xander Davies
  • Yarin Gal
  • Bo Li

Existing benchmarks for evaluating the security risks and capabilities (e. g. , vulnerability detection) of code-generating large language models (LLMs) face several key limitations: (1) limited coverage of risk and capabilities; (2) reliance on static evaluation metrics such as LLM judgments or rule-based detection, which lack the precision of dynamic analysis; and(3) a trade-off between data quality and benchmark scale. To address these challenges, we introduce a general and scalable benchmark construction framework that begins with manually validated, high-quality seed examples and expands them via targeted mutations. Each mutated sample retains the seed’s security semantics while providing diverse, unseen instances. The resulting benchmark bundles every artifact required for dynamic evaluation, including prompts, vulnerable and patched code, test cases, and ground-truth proofs of concept, enabling rigorous measurement of insecure coding, vulnerability detection, and patch generation. Applying this framework to Python, C/C++, and Java, we build SECODEPLT, a dataset of more than 5. 9k samples spanning 44 CWE-based risk categories and three security capabilities. Compared with state-of-the-art benchmarks, SECODEPLT offers broader coverage, higher data fidelity, and substantially greater scale. We use SECODEPLT to evaluate leading code-generation LLMs and agents, revealing their strengths and weaknesses in both generating secure code and identifying or fixing vulnerabilities. We provide our code in \url{https: //github. com/ucsb-mlsec/SeCodePLT}, data in \url{https: //huggingface. co/datasets/UCSB-SURFI/SeCodePLT}

ICLR Conference 2025 Conference Paper

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

  • Xiao Liu 0036
  • Tianjie Zhang
  • Yu Gu 0016
  • Iat Long Iong
  • Xixuan Song
  • Yifan Xu 0014
  • Shudan Zhang
  • Hanyu Lai

Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable \textbf{Visual Foundation Agents} that are postulated to excel across a myriad of tasks. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs as visual foundation agents in complex, real-world environments. To address this gap, we introduce VisualAgentBench (VAB), a comprehensive and unified benchmark specifically designed to train and evaluate LMMs as visual foundation agents across diverse scenarios in one standard setting, including Embodied, Graphical User Interface, and Visual Design, with tasks formulated to probe the depth of LMMs' understanding and interaction capabilities. Through rigorous testing across 9 proprietary LMM APIs and 9 open models (18 in total), we demonstrate the considerable yet still developing visual agent capabilities of these models. Additionally, VAB explores the synthesizing of visual agent trajectory data through hybrid methods including Program-based Solvers, LMM Agent Bootstrapping, and Human Demonstrations, offering insights into obstacles, solutions, and trade-offs one may meet in developing open LMM agents. Our work not only aims to benchmark existing models but also provides an instrumental playground for future development into visual foundation agents. Code, train, and test data are available at \url{https://github.com/THUDM/VisualAgentBench}.

ICLR Conference 2025 Conference Paper

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

  • Zehan Qi
  • Xiao Liu 0036
  • Iat Long Iong
  • Hanyu Lai
  • Xueqiao Sun
  • Jiadai Sun
  • Xinyue Yang
  • Yu Yang

Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents face significant limitations: high-performing agents rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a novel self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. Our approach addresses key challenges in this domain, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. WebRL incorporates a self-evolving curriculum that generates new tasks from unsuccessful attempts, a robust outcome-supervised reward model (ORM), and adaptive reinforcement learning strategies to ensure consistent improvement. We apply WebRL to transform Llama-3.1 models into proficient web agents, achieving remarkable results on the WebArena-Lite benchmark. Our Llama-3.1-8B agent improves from an initial 4.8\% success rate to 42.4\%, while the Llama-3.1-70B agent achieves a 47.3\% success rate across five diverse websites. These results surpass the performance of GPT-4-Turbo (17.6\%) by over 160\% relatively and significantly outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2\%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

NeurIPS Conference 2025 Conference Paper

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability

  • Yu Yang
  • Alan Liang
  • Jianbiao Mei
  • Yukai Ma
  • Yong Liu
  • Gim Hee Lee

Diffusion models are advancing autonomous driving by enabling realistic data synthesis, predictive end-to-end planning, and closed-loop simulation, with a primary focus on temporally consistent generation. However, large-scale 3D scene generation requiring spatial coherence remains underexplored. In this paper, we present X-Scene, a novel framework for large-scale driving scene generation that achieves geometric intricacy, appearance fidelity, and flexible controllability. Specifically, X-Scene supports multi-granular control, including low-level layout conditioning driven by user input or text for detailed scene composition, and high-level semantic guidance informed by user intent and LLM-enriched prompts for efficient customization. To enhance geometric and visual fidelity, we introduce a unified pipeline that sequentially generates 3D semantic occupancy and corresponding multi-view images and videos, ensuring alignment and temporal consistency across modalities. We further extend local regions into large-scale scenes via consistency-aware outpainting, which extrapolates occupancy and images from previously generated areas to maintain spatial and visual coherence. The resulting scenes are lifted into high-quality 3DGS representations, supporting diverse applications such as simulation and scene exploration. Extensive experiments demonstrate that X-Scene substantially advances controllability and fidelity in large-scale scene generation, empowering data generation and simulation for autonomous driving.

AAAI Conference 2024 Conference Paper

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

  • Zilin Wang
  • Haolin Zhuang
  • Lu Li
  • Yinmin Zhang
  • Junjie Zhong
  • Jun Chen
  • Yu Yang
  • Boshi Tang

This paper presents an Exploratory 3D Dance generation framework, E3D2, designed to address the exploration capability deficiency in existing music-conditioned 3D dance generation models. Current models often generate monotonous and simplistic dance sequences that misalign with human preferences because they lack exploration capabilities.The E3D2 framework involves a reward model trained from automatically-ranked dance demonstrations, which then guides the reinforcement learning process. This approach encourages the agent to explore and generate high quality and diverse dance movement sequences. The soundness of the reward model is both theoretically and experimentally validated. Empirical experiments demonstrate the effectiveness of E3D2 on the AIST++ dataset.

RLJ Journal 2024 Journal Article

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

RLC Conference 2024 Conference Paper

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling

  • Haque Ishfaq
  • Yixin Tan
  • Yu Yang
  • Qingfeng Lan
  • Jianfeng Lu
  • A. Rupam Mahmood
  • Doina Precup
  • Pan Xu

Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al. , 2021), which was previously known to be intractable. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler. Empirically, we show that in tasks where deep exploration is necessary, our proposed algorithms that combine FGTS and approximate sampling perform significantly better compared to other strong baselines. On several challenging games from the Atari 57 suite, our algorithms achieve performance that is either better than or on par with other strong baselines from the deep RL literature.

NeurIPS Conference 2024 Conference Paper

Optimal Batched Best Arm Identification

  • Tianyuan Jin
  • Yu Yang
  • Jing Tang
  • Xiaokui Xiao
  • Pan Xu

We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability $1-\delta$ for some small constant $\delta>0$ while minimizing both the sample complexity (total number of arm pulls) and the batch complexity (total number of batches). We propose the three-batch best arm identification (Tri-BBAI) algorithm, which is the first batched algorithm that achieves the optimal sample complexity in the asymptotic setting (i. e. , $\delta\rightarrow 0$) and runs in $3$ batches in expectation. Based on Tri-BBAI, we further propose the almost optimal batched best arm identification (Opt-BBAI) algorithm, which is the first algorithm that achieves the near-optimal sample and batch complexity in the non-asymptotic setting (i. e. , $1/\delta$ is finite), while enjoying the same batch and sample complexity as Tri-BBAI when $\delta$ tends to zero. Moreover, in the non-asymptotic setting, the complexity of previous batch algorithms is usually conditioned on the event that the best arm is returned (with a probability of at least $1-\delta$), which is potentially unbounded in cases where a sub-optimal arm is returned. In contrast, the complexity of Opt-BBAI does not rely on such an event. This is achieved through a novel procedure that we design for checking whether the best arm is eliminated, which is of independent interest.

NeurIPS Conference 2024 Conference Paper

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models

  • Yu Yang
  • Siddhartha Mishra
  • Jeffrey Chiang
  • Baharan Mirzasoleiman

Despite the effectiveness of data selection for pretraining and instruction fine-tuninglarge language models (LLMs), improving data efficiency in supervised fine-tuning(SFT) for specialized domains poses significant challenges due to the complexityof fine-tuning data. To bridge this gap, we introduce an effective and scalabledata selection method for SFT, SmallToLarge (S2L), which trains a smallmodel, clusters loss trajectories of the examples, and samples from these clusters toguide data selection for larger models. We prove that during fine-tuning, sampleswithin the same loss trajectory cluster exhibit similar gradients. Then, we showthat S2L subsets have a bounded gradient error w. r. t. the full data, hence guaranteeconvergence to the neighborhood of the optimal solution. We demonstrate throughextensive experiments that S2L significantly improves data efficiency in SFT formathematical problem-solving, reducing the training data requirement to just $11$%of the original MathInstruct dataset to match full dataset performance whileoutperforming state-of-the-art data selection algorithms by an average of $4. 7$%across $6$ in- and out-domain evaluation datasets. Remarkably, selecting only 50Kdata for SFT, S2L achieves a $32. 7$% accuracy on the challenging MATHbenchmark, improving Phi-2 by $16. 6$%. In clinical text summarization on theMIMIC-III dataset, S2L again outperforms training on the full dataset usingonly $50$% of the data. Notably, S2L can perform scalable data selection using areference model $100\times$ smaller than the target model, proportionally reducing thecomputational cost.

TIST Journal 2023 Journal Article

Fast Real-Time Video Object Segmentation with a Tangled Memory Network

  • Jianbiao Mei
  • Mengmeng Wang
  • Yu Yang
  • Yanjun Li
  • Yong Liu

In this article, we present a fast real-time tangled memory network that segments the objects effectively and efficiently for semi-supervised video object segmentation (VOS). We propose a tangled reference encoder and a memory bank organization mechanism based on a state estimator to fully utilize the mask features and alleviate memory overhead and computational burden brought by the unlimited memory bank used in many memory-based methods. First, the tangled memory network exploits the mask features that uncover abundant object information like edges and contours but are not fully explored in existing methods. Specifically, a tangled two-stream reference encoder is designed to extract and fuse the features from both RGB frames and the predicted masks. Second, to indicate the quality of the predicted mask and feedback the online prediction state for organizing the memory bank, we devise a target state estimator to learn the IoU score between the predicted mask and ground truth. Moreover, to accelerate the forward process and avoid memory overflow, we use a memory bank of fixed size to store historical features by designing a new efficient memory bank organization mechanism based on the mask state score provided by the state estimator. We conduct comprehensive experiments on the public benchmarks DAVIS and YouTube-VOS, demonstrating that our method obtains competitive results while running at high speed (66 FPS on the DAVIS16-val set).

AAAI Conference 2023 Conference Paper

ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation

  • Qiran Zou
  • Yu Yang
  • Wing Yin Cheung
  • Chang Liu
  • Xiangyang Ji

Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.

NeurIPS Conference 2023 Conference Paper

MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy

  • Honghua Dong
  • Jiawei Xu
  • Yu Yang
  • Rui Zhao
  • Shiwen Wu
  • Chun Yuan
  • Xiu Li
  • Chris J. Maddison

Graph neural networks, which typically exchange information between local neighbors, often struggle to capture long-range interactions (LRIs) within the graph. Building a graph hierarchy via graph pooling methods is a promising approach to address this challenge; however, hierarchical information propagation cannot entirely take over the role of local information aggregation. To balance locality and hierarchy, we integrate the local and hierarchical structures, represented by intra- and inter-graphs respectively, of a multi-scale graph hierarchy into a single mega graph. Our proposed MeGraph model consists of multiple layers alternating between local and hierarchical information aggregation on the mega graph. Each layer first performs local-aware message-passing on graphs of varied scales via the intra-graph edges, then fuses information across the entire hierarchy along the bidirectional pathways formed by inter-graph edges. By repeating this fusion process, local and hierarchical information could intertwine and complement each other. To evaluate our model, we establish a new Graph Theory Benchmark designed to assess LRI capture ability, in which MeGraph demonstrates dominant performance. Furthermore, MeGraph exhibits superior or equivalent performance to state-of-the-art models on the Long Range Graph Benchmark. The experimental results on commonly adopted real-world datasets further demonstrate the broad applicability of MeGraph.

IROS Conference 2023 Conference Paper

PANet: LiDAR Panoptic Segmentation with Sparse Instance Proposal and Aggregation

  • Jianbiao Mei
  • Yu Yang
  • Mengmeng Wang 0005
  • Xiaojun Hou
  • Laijian Li
  • Yong Liu 0007

Reliable LiDAR panoptic segmentation (LPS), including both semantic and instance segmentation, is vital for many robotic applications, such as autonomous driving. This work proposes a new LPS framework named PANet to eliminate the dependency on the offset branch and improve the performance on large objects, which are always over-segmented by clustering algorithms. Firstly, we propose a non-learning Sparse Instance Proposal (SIP) module with the “sampling-shifting-grouping” scheme to directly group thing points into instances from the raw point cloud efficiently. More specifically, balanced point sampling is introduced to generate sparse seed points with more uniform point distribution over the distance range. And a shift module, termed bubble shifting, is proposed to shrink the seed points to the clustered centers. Then we utilize the connected component label algorithm to generate instance proposals. Furthermore, an instance aggregation module is devised to integrate potentially fragmented instances, improving the performance of the SIP module on large objects. Extensive experiments show that PANet achieves state-of-the-art performance among published works on the SemanticKITII validation and nuScenes validation for the panoptic segmentation task. Code is available at https://github.com/Jieqianyu/PANet.git.

NeurIPS Conference 2023 Conference Paper

Robust Learning with Progressive Data Expansion Against Spurious Correlation

  • Yihe Deng
  • Yu Yang
  • Baharan Mirzasoleiman
  • Quanquan Gu

While deep learning models have shown remarkable performance in various tasks, they are susceptible to learning non-generalizable _spurious features_ rather than the core features that are genuinely correlated to the true label. In this paper, beyond existing analyses of linear models, we theoretically examine the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features. Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process. In light of this, we propose a new training algorithm called **PDE** that efficiently enhances the model's robustness for a better worst-group performance. PDE begins with a group-balanced subset of training data and progressively expands it to facilitate the learning of the core features. Experiments on synthetic and real-world benchmark datasets confirm the superior performance of our method on models such as ResNets and Transformers. On average, our method achieves a $2. 8$ \% improvement in worst-group accuracy compared with the state-of-the-art method, while enjoying up to $10\times$ faster training efficiency.

TCS Journal 2023 Journal Article

Secure connected domination and secure total domination in unit disk graphs and rectangle graphs

  • Cai-Xia Wang
  • Yu Yang
  • Shou-Jun Xu

Given a graph G with vertex set V, a secure connected (resp. total) dominating set of G is a connected (resp. total) dominating set S ⊆ V with the property that for each u ∈ V ∖ S, there exists v ∈ S adjacent to u such that ( S ∪ { u } ) ∖ { v } is a connected (resp. total) dominating set of G. The minimum secure connected dominating set (or, for short, MSCDS) (resp. minimum secure total dominating set (or, for short, MSTDS)) problem is to find an MSCDS (resp. MSTDS) in a given graph. In this paper, we initiate to consider complexity and algorithmic aspects of the MSCDS problem and the MSTDS problem in unit disk graphs and rectangle graphs. Firstly, we show that the decision version of the MSCDS problem is NP-complete in unit square graphs and unit disk graphs. Then we show that the decision version of the MSTDS problem is NP-complete even in grid graphs (a subclass of unit square graphs and unit disk graphs). Secondly, we give linear-time constant-approximation algorithms for the two problems in unit square graphs, unit disk graphs and unit-height rectangle graphs. Thirdly, we propose a PTAS for the MSTDS problem in unit square graphs and unit disk graphs. Finally, we show that the two problems in proper rectangle graphs are APX-hard. Further we give an explicit lower bound 1. 00147 on efficient approximability for the two problems in proper rectangle graphs unless P=NP.

IROS Conference 2023 Conference Paper

SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion

  • Jianbiao Mei
  • Yu Yang
  • Mengmeng Wang 0005
  • Tianxin Huang
  • Xuemeng Yang
  • Yong Liu 0007

Semantic scene completion (SSC) jointly predicts the semantics and geometry of the entire 3D scene, which plays an essential role in 3D scene understanding for autonomous driving systems. SSC has achieved rapid progress with the help of semantic context in segmentation. However, how to effectively exploit the relationships between the semantic context in semantic segmentation and geometric structure in scene completion remains under exploration. In this paper, we propose to solve outdoor SSC from the perspective of representation separation and BEV fusion. Specifically, we present the network, named SSC-RS, which uses separate branches with deep supervision to explicitly disentangle the learning procedure of the semantic and geometric representations. And a BEV fusion network equipped with the proposed Adaptive Representation Fusion (ARF) module is presented to aggregate the multi-scale features effectively and efficiently. Due to the low computational burden and powerful representation ability, our model has good generality while running in real-time. Extensive experiments on SemanticKITTI demonstrate our SSC-RS achieves state-of-the-art performance. Code is available at https://github.com/Jieqianyu/SSC-RS.git.

NeurIPS Conference 2022 Conference Paper

Distilling Representations from GAN Generator via Squeeze and Span

  • Yu Yang
  • Xiaotian Cheng
  • Chang Liu
  • Hakan Bilen
  • Xiangyang Ji

In recent years, generative adversarial networks (GANs) have been an actively studied topic and shown to successfully produce high-quality realistic images in various domains. The controllable synthesis ability of GAN generators suggests that they maintain informative, disentangled, and explainable image representations, but leveraging and transferring their representations to downstream tasks is largely unexplored. In this paper, we propose to distill knowledge from GAN generators by squeezing and spanning their representations. We \emph{squeeze} the generator features into representations that are invariant to semantic-preserving transformations through a network before they are distilled into the student network. We \emph{span} the distilled representation of the synthetic domain to the real domain by also using real training data to remedy the mode collapse of GANs and boost the student network performance in a real domain. Experiments justify the efficacy of our method and reveal its great significance in self-supervised representation learning. Code is available at https: //github. com/yangyu12/squeeze-and-span.

NeurIPS Conference 2022 Conference Paper

Friendly Noise against Adversarial Noise: A Powerful Defense against Data Poisoning Attack

  • Tian Yu Liu
  • Yu Yang
  • Baharan Mirzasoleiman

A powerful category of (invisible) data poisoning attacks modify a subset of training examples by small adversarial perturbations to change the prediction of certain test-time data. Existing defense mechanisms are not desirable to deploy in practice, as they ofteneither drastically harm the generalization performance, or are attack-specific, and prohibitively slow to apply. Here, we propose a simple but highly effective approach that unlike existing methods breaks various types of invisible poisoning attacks with the slightest drop in the generalization performance. We make the key observation that attacks introduce local sharp regions of high training loss, which when minimized, results in learning the adversarial perturbations and makes the attack successful. To break poisoning attacks, our key idea is to alleviate the sharp loss regions introduced by poisons. To do so, our approach comprises two components: an optimized friendly noise that is generated to maximally perturb examples without degrading the performance, and a randomly varying noise component. The combination of both components builds a very light-weight but extremely effective defense against the most powerful triggerless targeted and hidden-trigger backdoor poisoning attacks, including Gradient Matching, Bulls-eye Polytope, and Sleeper Agent. We show that our friendly noise is transferable to other architectures, and adaptive attacks cannot break our defense due to its random noise component.

NeurIPS Conference 2022 Conference Paper

Sequence-to-Set Generative Models

  • Longtao Tang
  • Ying Zhou
  • Yu Yang

In this paper, we propose a sequence-to-set method that can transform any sequence generative model based on maximum likelihood to a set generative model where we can evaluate the utility/probability of any set. An efficient importance sampling algorithm is devised to tackle the computational challenge of learning our sequence-to-set model. We present GRU2Set, which is an instance of our sequence-to-set method and employs the famous GRU model as the sequence generative model. To further obtain permutation invariant representation of sets, we devise the SetNN model which is also an instance of the sequence-to-set model. A direct application of our models is to learn an order/set distribution from a collection of e-commerce orders, which is an essential step in many important operational decisions such as inventory arrangement for fast delivery. Based on the intuition that small-sized sets are usually easier to learn than large sets, we propose a size-bias trick that can help learn better set distributions with respect to the $\ell_1$-distance evaluation metric. Two e-commerce order datasets, TMALL and HKTVMALL, are used to conduct extensive experiments to show the effectiveness of our models. The experimental results demonstrate that our models can learn better set/order distributions from order data than the baselines. Moreover, no matter what model we use, applying the size-bias trick can always improve the quality of the set distribution learned from data.

TCS Journal 2021 Journal Article

Enumeration of subtrees and BC-subtrees with maximum degree no more than k in trees

  • Yu Yang
  • Xiao-xiao Li
  • Meng-yuan Jin
  • Long Li
  • Hua Wang
  • Xiao-Dong Zhang

The subtrees and BC-subtrees (subtrees where any two leaves are at even distance apart) have been intensively studied in recent years. Such structures, under special constraints on degrees, have a wide range of applications in many fields. By way of an approach based on generating functions, we present novel recursive algorithms for enumerating various subtrees and BC-subtrees of maximum degree ≤k in trees. The algorithms are explained through detailed examples. We also briefly discuss, in trees, the densities of subtrees (resp. BC-subtrees) with maximum degree ≤k among all subtrees (resp. BC-subtrees). For a tree of order n, the novelly proposed algorithms have multiple advantages. (1) Novel ( k + 2 ) (resp. ( 2 k + 3 ) ) variable generating functions were introduced to construct the algorithms. (2) The proposed algorithms solved the fast enumerating problem of subtree (resp. BC-subtrees) with maximum degree constraint, and also make the subtree (resp. BC-subtrees) enumerating algorithms proposed by Yan and Yeh [1] (resp. Yang et al. [2]) a special case of ours with k = n − 1. (3) The time complexity of our algorithm for subtree (resp. BC-subtrees) is O ( k n ) (resp. O ( k n 2 ) ), which is much faster than the O ( n 2 ) (resp. O ( k n 3 ) ) time method based on algorithm proposed in [1] (resp. [2]).

TCS Journal 2015 Journal Article

Enumeration of BC-subtrees of trees

  • Yu Yang
  • Hongbo Liu
  • Hua Wang
  • Scott Makeig

A BC-tree (block-cutpoint-tree) is a tree (with at least two vertices) where the distance between any two leaves is even. Motivated from the study of the “core” of a graph, BC-trees form an interesting class of trees. We provide a comprehensive study of questions related to BC-trees. As the analogue of the study of extremal questions on subtrees of trees, we first characterize the general extremal trees that maximize or minimize the number of BC-subtrees or leaf-containing BC-subtrees. We further discuss the “middle part” of a tree with respect to the number of BC-subtrees, namely the BC-subtree-core that behaves in a rather different way than all previously known “middle parts” of a tree. Last but not least, fast algorithms are proposed (following similar ideas as those of the enumeration of subtrees) for enumerating various classes of BC-subtrees of a tree.

IJCAI Conference 2015 Conference Paper

Maximizing the Coverage of Information Propagation in Social Networks

  • Zhefeng Wang
  • Enhong Chen
  • Qi Liu
  • Yu Yang
  • Yong Ge
  • Biao Chang

Social networks, due to their popularity, have been studied extensively these years. A rich body of these studies is related to influence maximization, which aims to select a set of seed nodes for maximizing the expected number of active nodes at the end of the process. However, the set of active nodes can not fully represent the true coverage of information propagation. A node may be informed of the information when any of its neighbours become active and try to activate it, though this node (namely informed node) is still inactive. Therefore, we need to consider both active nodes and informed nodes that are aware of the information when we study the coverage of information propagation in a network. Along this line, in this paper we propose a new problem called Information Coverage Maximization that aims to maximize the expected number of both active nodes and informed ones. After we prove that this problem is NP-hard and submodular in the independent cascade model, we design two algorithms to solve it. Extensive experiments on three real-world data sets demonstrate the performance of the proposed algorithms.

IJCAI Conference 2013 Conference Paper

PageRank with Priors: An Influence Propagation Perspective

  • Biao Xiang
  • Qi Liu
  • Enhong Chen
  • Hui Xiong
  • Yi Zheng
  • Yu Yang

Recent years have witnessed increased interests in measuring authority and modelling influence in social networks. For a long time, PageRank has been widely used for authority computation and has also been adopted as a solid baseline for evaluating social influence related applications. However, the connection between authority measurement and in- fluence modelling is not clearly established. To this end, in this paper, we provide a focused study on understanding of PageRank as well as the relationship between PageRank and social influence analysis. Along this line, we first propose a linear social influence model and reveal that this model is essentially PageRank with prior. Also, we show that the authority computation by PageRank can be enhanced with more generalized priors. Moreover, to deal with the computational challenge of PageRank with general priors, we provide an upper bound for top authoritative nodes identification. Finally, the experimental results on the scientific collaboration network validate the effectiveness of the proposed social influence model.