Arrow Research search

Author name cluster

Yang Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

AAAI Conference 2026 Conference Paper

Clear Nights Ahead: Towards Multi-Weather Nighttime Image Restoration

  • Yuetong Liu
  • Yunqiu Xu
  • Yang Wei
  • Xiuli Bi
  • Bin Xiao

Restoring nighttime images affected by multiple adverse weather conditions is a practical yet under-explored research problem, as multiple weather degradations usually coexist in the real world alongside various lighting effects at night. This paper first explores the challenging multi-weather nighttime image restoration task, where various types of weather degradations are intertwined with flare effects. To support the research, we contribute the AllWeatherNight dataset, featuring large-scale nighttime images with diverse compositional degradations. By employing illumination-aware degradation generation, our dataset significantly enhances the realism of synthetic degradations in nighttime scenes, providing a more reliable benchmark for model training and evaluation. Additionally, we propose ClearNight, a unified nighttime image restoration framework, which effectively removes complex degradations in one go. Specifically, ClearNight extracts Retinex-based dual priors and explicitly guides the network to focus on uneven illumination regions and intrinsic texture contents respectively, thereby enhancing restoration effectiveness in nighttime scenarios. Moreover, to more effectively model the common and unique characteristics of multiple weather degradations, ClearNight performs weather-aware dynamic specificity and commonality collaboration that adaptively allocates optimal sub-networks associated with specific weather types. Comprehensive experiments on both synthetic and real-world images demonstrate the necessity of the AllWeatherNight dataset and the superior performance of ClearNight.

AAAI Conference 2026 Conference Paper

Inter-Client Dependency Recovery with Hidden Global Components for Federated Traffic Prediction

  • Hang Zhou
  • Wentao Yu
  • Yang Wei
  • Guangyu Li
  • Sha Xu
  • Chen Gong

Traffic prediction plays an important role in urban management. However, existing methods rely on centralized traffic data, which may raise privacy concerns. Federated traffic prediction offers a promising solution for clients (e.g., traffic management administrations) in different regions to collaboratively train models in a distributed manner without exposing private data. Nonetheless, data isolation inherently breaks the correlations between nodes (i.e., traffic sensors collecting data) from different regions, which leads to the missing inter-client dependency. Consequently, current works either fail to capture the missing inter-client dependency or compromise data privacy to recover the inter-client dependency. To address this issue, we propose a novel Federated method which recovers the inter-client dependency with HIdden global componeNTs (FedHINT). We find that the traffic data from different local regions actually contain hidden global components that reflect cross-regional traffic changes. Therefore, our FedHINT aims to extract hidden global components from each client to generate proxy nodes that represent global information, which are then utilized to recover the inter-client dependency. To be specific, we employ an attention module, which is guided by the shared global queries to capture hidden global components from local traffic data, to generate proxy nodes. Subsequently, our FedHINT adaptively learns the correlations between proxy nodes and local nodes through a global encoder. During this process, the global information in proxy nodes compensate for the loss of information from cross-regional nodes, which thereby recovers the missing inter-client dependency. Intensive experiments on multiple datasets demonstrate that our FedHINT significantly outperforms the state-of-the-art methods, with an average decrease of 3.73 and 4.81 on MAE and RMSE, respectively.

AAAI Conference 2026 Conference Paper

Well Begun, Half Done: Reinforcement Learning with Prefix Optimization for LLM Reasoning

  • Yiliu Sun
  • Zicheng Zhao
  • Yang Wei
  • Yanfang Zhang
  • Chen Gong

Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capability of Large Language Models (LLMs). Current RLVR approaches typically conduct training across all generated tokens, but neglect to explore which tokens (e.g., prefix tokens) actually contribute to reasoning. This uniform training strategy spends substantial effort on optimizing low-return tokens, which in turn impedes the potential improvement from high-return tokens and reduces overall training effectiveness. To address this issue, we propose a novel RLVR approach called Progressive Prefix-token Policy Optimization (PPPO), which highlights the significance of the prefix segment of generated outputs. Specifically, inspired by the well-established human thinking theory of Path Dependence, where early-stage thoughts substantially constrain subsequent thinking trajectory, we identify an analogous phenomenon in LLM reasoning termed Beginning Lock-in Effect (BLE). PPPO leverages this finding by focusing its optimization objective on the prefix reasoning process of LLMs. This targeted optimization strategy can positively influence subsequent reasoning processes, and ultimately improve final results. To improve the learning effectiveness of LLMs on how to start reasoning with high quality, PPPO introduces two training strategies: (a) Progressive Prefix Retention, which shapes a progressive learning process by increasing the proportion of retained prefix tokens during training; (b) Continuation Accumulated Reward, which mitigates reward bias by sampling multiple continuations for one prefix token sequence, and accumulating their scores as the reward signal. Extensive experimental results on various reasoning tasks (e.g., math, physics, chemistry, and biology) demonstrate that our proposed PPPO outperforms representative RLVR methods, with the accuracy improvements of 18.02% on only 26.17% training tokens.

AAAI Conference 2025 Conference Paper

Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

  • Fei Shen
  • Hu Ye
  • Sibo Liu
  • Jun Zhang
  • Cong Wang
  • Xiao Han
  • Yang Wei

Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which primarily generate stories in a caption-dependent manner, often overlook the importance of contextual consistency and the relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Conditional Diffusion Models (RCDMs), a two-stage approach designed to enhance story generation's semantic consistency and temporal consistency. Specifically, in the first stage, the frame-prior transformer diffusion model is presented to predict the frame semantic embedding of the unknown clip by aligning the semantic correlations between the captions and frames of the known clip. The second stage establishes a robust model with rich contextual conditions, including reference images of the known clip, the predicted frame semantic embedding of the unknown clip, and text embeddings of all captions. By jointly injecting these rich contextual conditions at the image and feature levels, RCDMs can generate semantic and temporal consistency stories. Moreover, RCDMs can generate consistent stories with a single forward inference compared to autoregressive models. Our qualitative and quantitative results demonstrate that our proposed RCDMs outperform in challenging scenarios.

ICLR Conference 2025 Conference Paper

Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning

  • Hanlin Yang
  • Jian Yao 0008
  • Weiming Liu 0004
  • Qing Wang
  • Hanmin Qin
  • Hansheng Kong
  • Kirk Tang
  • Jiechao Xiong

Recovering a spectrum of diverse policies from a set of expert trajectories is an important research topic in imitation learning. After determining a latent style for a trajectory, previous diverse polices recovering methods usually employ a vanilla behavioral cloning learning objective conditioned on the latent style, treating each state-action pair in the trajectory with equal importance. Based on an observation that in many scenarios, behavioral styles are often highly relevant with only a subset of state-action pairs, this paper presents a new principled method in diverse polices recovering. In particular, after inferring or assigning a latent style for a trajectory, we enhance the vanilla behavioral cloning by incorporating a weighting mechanism based on pointwise mutual information. This additional weighting reflects the significance of each state-action pair's contribution to learning the style, thus allowing our method to focus on state-action pairs most representative of that style. We provide theoretical justifications for our new objective, and extensive empirical evaluations confirm the effectiveness of our method in recovering diverse polices from expert data.

AAAI Conference 2025 Conference Paper

Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning

  • Yang Wei
  • Jingyu Tan
  • Guowen Xu
  • Zhuoran Ma
  • Zhuo Ma
  • Bin Xiao

Substitute training-based data-free black-box attacks pose a significant threat to enterprise-deployed models. These attacks use a generator to synthesize data and query APIs, then train a substitute model to approximate the target model's decision boundary based on the returned results. However, existing attack methods often struggle to produce sufficiently diverse data, particularly for complex target models and extensive target data domains, severely limiting their practical application. To address this gap, we design domain-augmented learning to improve the quality of the synthetic data domain (SDD) generated by the generator from two perspectives. Specifically, (1) To broaden the SDD's coverage, we introduce textual semantic embeddings into the generator for the first time. (2) For enhancing the SDD's discretization, we propose a competitive optimization strategy that forces the generator to self-compete, along with heterogeneity excitation to overcome the constraints of information entropy on diversity. Comprehensive experiments demonstrate that our method is more effective. In non-targeted attacks on the CIFAR-10 and Tiny-ImageNet datasets, our method outperforms the state-of-the-art by 14% and 7% in attack success rate, respectively.

TIST Journal 2025 Journal Article

Robust Learning under Hybrid Noise

  • Yang Wei
  • Shuo Chen
  • Shanshan Ye
  • Bo Han
  • Chen Gong

Feature noise and label noise are ubiquitous in practical scenarios, which pose great challenges for training a robust machine learning model. Most previous approaches usually deal with only a single problem of either feature noise or label noise. However, in real-world applications, hybrid noise, which contains both feature noise and label noise, is very common due to the unreliable data collection and annotation processes. Although some results have been achieved by a few representation learning based attempts, this issue is still far from being addressed with promising performance and guaranteed theoretical analyses. To address the challenge, we propose a novel unified learning framework called Feature and Label Recovery (FLR) to combat the hybrid noise from the perspective of data recovery, where we concurrently reconstruct both the feature matrix and the label matrix of input data. Specifically, the clean feature matrix is discovered by the low-rank approximation, and the ground-truth label matrix is embedded based on the recovered features with a nuclear norm regularization. Meanwhile, the feature noise and label noise are characterized by their respective adaptive matrix norms to satisfy the corresponding maximum likelihood. As this framework leads to a non-convex optimization problem, we develop the non-convex Alternating Direction Method of Multipliers (ADMM) with the convergence guarantee to solve our learning objective. We also provide the theoretical analysis to show that the generalization error of FLR can be upper-bounded in the presence of hybrid noise. Experimental results on several typical benchmark datasets clearly demonstrate the superiority of our proposed method over the state-of-the-art robust learning approaches for various noises.

ICLR Conference 2024 Conference Paper

Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models

  • Fei Shen
  • Hu Ye
  • Jun Zhang 0018
  • Cong Wang 0034
  • Xiao Han 0011
  • Yang Wei

Recent work has showcased the significant potential of diffusion models in pose-guided person image synthesis. However, owing to the inconsistency in pose between the source and target images, synthesizing an image with a distinct pose, relying exclusively on the source image and target pose information, remains a formidable challenge. This paper presents Progressive Conditional Diffusion Models (PCDMs) that incrementally bridge the gap between person images under the target and source poses through three stages. Specifically, in the first stage, we design a simple prior conditional diffusion model that predicts the global features of the target image by mining the global alignment relationship between pose coordinates and image appearance. Then, the second stage establishes a dense correspondence between the source and target images using the global features from the previous stage, and an inpainting conditional diffusion model is proposed to further align and enhance the contextual features, generating a coarse-grained person image. In the third stage, we propose a refining conditional diffusion model to utilize the coarsely generated image from the previous stage as a condition, achieving texture restoration and enhancing fine-detail consistency. The three-stage PCDMs work progressively to generate the final high-quality and high-fidelity synthesized image. Both qualitative and quantitative results demonstrate the consistency and photorealism of our proposed PCDMs under challenging scenarios. The code and model will be available at https://github.com/tencent-ailab/PCDMs.

ICLR Conference 2024 Conference Paper

VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation

  • Jinxi Xiang
  • Ricong Huang
  • Jun Zhang 0018
  • Guanbin Li
  • Xiao Han 0011
  • Yang Wei

Creating stable, controllable videos is a complex task due to the need for significant variation in temporal dynamics and cross-frame temporal consistency. To address this, we enhance the spatial-temporal capability and introduce a versatile video generation model, VersVideo, which leverages textual, visual, and stylistic conditions. Current video diffusion models typically extend image diffusion architectures by supplementing 2D operations (such as convolutions and attentions) with temporal operations. While this approach is efficient, it often restricts spatial-temporal performance due to the oversimplification of standard 3D operations. To counter this, we incorporate two key elements: (1) multi-excitation paths for spatial-temporal convolutions with dimension pooling across different axes, and (2) multi-expert spatial-temporal attention blocks. These enhancements boost the model's spatial-temporal performance without significantly escalating training and inference costs. We also tackle the issue of information loss that arises when a variational autoencoder is used to transform pixel space into latent features and then back into pixel frames. To mitigate this, we incorporate temporal modules into the decoder to maintain inter-frame consistency. Lastly, by utilizing the innovative denoising UNet and decoder, we develop a unified ControlNet model suitable for various conditions, including image, Canny, HED, depth, and style. Examples of the videos generated by our model can be found at https://jinxixiang.github.io/versvideo/.

ICML Conference 2023 Conference Paper

Future-conditioned Unsupervised Pretraining for Decision Transformer

  • Zhihui Xie 0002
  • Zichuan Lin
  • Deheng Ye
  • Qiang Fu 0016
  • Yang Wei
  • Shuai Li 0010

Recent research in offline reinforcement learning (RL) has demonstrated that return-conditioned supervised learning is a powerful paradigm for decision-making problems. While promising, return conditioning is limited to training data labeled with rewards and therefore faces challenges in learning from unsupervised data. In this work, we aim to utilize generalized future conditioning to enable efficient unsupervised pretraining from reward-free and sub-optimal offline data. We propose Pretrained Decision Transformer (PDT), a conceptually simple approach for unsupervised RL pretraining. PDT leverages future trajectory information as a privileged context to predict actions during training. The ability to make decisions based on both present and future factors enhances PDT’s capability for generalization. Besides, this feature can be easily incorporated into a return-conditioned framework for online finetuning, by assigning return values to possible futures and sampling future embeddings based on their respective values. Empirically, PDT outperforms or performs on par with its supervised pretraining counterpart, especially when dealing with sub-optimal data. Further analysis reveals that PDT can extract diverse behaviors from offline data and controllably sample high-return behaviors by online finetuning. Code is available at here.

TMLR Journal 2023 Journal Article

RLTF: Reinforcement Learning from Unit Test Feedback

  • Jiate Liu
  • Yiqin Zhu
  • Kaiwen Xiao
  • Qiang Fu
  • Xiao Han
  • Yang Wei
  • Deheng Ye

The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, some of the current representative RL methods have only used offline frameworks, limiting the exploration of new sample spaces. Additionally, the utilization of unit test signals is limited, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code is available at: \url{https://github.com/Zyq-scut/RLTF}.

IJCAI Conference 2019 Conference Paper

Revealing Semantic Structures of Texts: Multi-grained Framework for Automatic Mind-map Generation

  • Yang Wei
  • Honglei Guo
  • Jinmao Wei
  • Zhong Su

A mind-map is a diagram used to represent ideas linked to and arranged around a central concept. It’s easier to visually access the knowledge and ideas by converting a text to a mind-map. However, highlighting the semantic skeleton of an article remains a challenge. The key issue is to detect the relations amongst concepts beyond intra-sentence. In this paper, we propose a multi-grained framework for automatic mind-map generation. That is, a novel neural network is taken to detect the relations at first, which employs multi-hop self-attention and gated recurrence network to reveal the directed semantic relations via sentences. A recursive algorithm is then designed to select the most salient sentences to constitute the hierarchy. The human-like mind-map is automatically constructed with the key phrases in the salient sentences. Promising results have been achieved on the comparison with manual mind-maps. The case studies demonstrate that the generated mind-maps reveal the underlying semantic structures of the articles.

IJCAI Conference 2018 Conference Paper

Adversarial Metric Learning

  • Shuo Chen
  • Chen Gong
  • Jian Yang
  • Xiang Li
  • Yang Wei
  • Jun Li

In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the different samplings between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the sampling bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i. e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both adversarial pairs and original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to representative state-of-the-art metric learning models.