Author name cluster

Wei Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

64 papers

2 author rows

AAAI Conference 2026 Conference Paper

Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning

Wei Yang
Jesse Thomason

Multi-agent systems of large language models (LLMs) show promise for complex reasoning, but their effectiveness is often limited by fixed collaboration protocols. These frameworks typically focus on macro-level orchestration while overlooking agents’ internal deliberative capabilities. This critical meta-cognitive blindspot treats agents as passive executors unable to adapt their strategy based on internal cognitive states like uncertainty or confidence. We introduce the Meta-Policy Deliberation Framework (MPDF), where agents learn a decentralized policy over a set of high-level meta-cognitive actions: Persist, Refine, and Concede. To overcome the instability of traditional policy gradients in this setting, we develop SoftRankPO, a novel reinforcement learning algorithm. SoftRankPO stabilizes training by shaping advantages based on the rank of rewards mapped through smooth normal quantiles, making the learning process robust to reward variance. Experiments show that MPDF with SoftRankPO achieves a 4-5% absolute gain in average accuracy across six mathematical and general reasoning benchmarks compared to state-of-the-art heuristic and learning-based multi-agent reasoning algorithms. Our work presents a paradigm for learning adaptive, meta-cognitive policies for multi-agent LLM systems, shifting the focus from designing fixed protocols to learning dynamic, deliberative strategies.

PDF Details DOI

TMLR Journal 2026 Journal Article

TS-Reasoner: Domain-Oriented Time Series Inference Agents for Reasoning and Automated Analysis

Wen Ye
Wei Yang
Defu Cao
Yizhou Zhang
Lumingyuan Tang
Jie Cai
Yan Liu

Time series analysis is crucial in real-world applications, yet traditional methods focus on isolated tasks only, and recent studies on time series reasoning remain limited to either single-step inference or are constrained to natural language answers. In this work, we introduce TS-Reasoner, a domain-specialized agent designed for multi-step time series inference. By integrating large language model (LLM) reasoning with domain- specific computational tools and error feedback loop, TS-Reasoner enables domain-informed, constraint-aware analytical workflows that combine symbolic reasoning with precise numerical analysis. We assess the system’s capabilities along two axes: 1) fundamental time series understanding assessed by TimeSeriesExam and 2) complex, multi-step inference, evaluated by a newly proposed dataset designed to test both compositional reasoning and computational precision in time series analysis. Experiments show that our approach outperforms standalone general-purpose LLMs in both basic time series concept understanding as well as the multi-step time series inference task, highlighting the promise of domain-specialized agents for automating real-world time series reasoning and analysis.

PDF Details

EAAI Journal 2025 Journal Article

A new consensus method for group decision making based on mixed adjustment

Wei Yang
Yifan Lian
Jiarong Shi

Details DOI

AAAI Conference 2025 Conference Paper

AAKR: Adversarial Attack-based Knowledge Retention for Continual Semantic Segmentation

Zhidong Yu
Xiaoman Liu
Jiajun Hu
Zhenbo Shi
Wei Yang

In the context of Continual Semantic Segmentation (CSS), replay-based methods tend to achieve better performance than knowledge distillation-based ones, as the former utilizes additional data to transfer old knowledge. However, this advantage is at the cost of necessitating additional space for storing the generative model and extra time for continual training. To address this predicament, we propose a novel CSS framework, namely Adversarial Attack-based Knowledge Retention (AAKR). The AKKR framework generates specific adversarial samples by adding images, and uses them to retain old knowledge. Specifically, we leverage adversarial attacks to generate adversarial images for incremental samples. By imposing additional constraints within these attacks, we enhance the transfer of old knowledge, thereby reinforcing the understanding of previously learned information. Furthermore, we design an attack probability module that adjusts adversarial attack directions based on training feedback. This module effectively encourages the new model to learn old knowledge from poorly protected classes, significantly improving knowledge transfer effectiveness. Our comprehensive experiments demonstrate the efficacy of AAKR, and showcase that AAKR surpasses state-of-the-art competitors on benchmark datasets.

PDF Details DOI

EAAI Journal 2025 Journal Article

Correlation based deep neuro-fuzzy Hammerstein type wind power forecasting model considering asymmetric error characteristics

Jianfang Li
Li Jia
Wei Yang
Chengyu Zhou

Details DOI

AIIM Journal 2025 Journal Article

Difficulty-aware coupled contour regression network with IoU loss for efficient IVUS delineation

Yuan Yang
Xu Yu
Wei Yu
Shengxian Tu
Su Zhang
Wei Yang

Details DOI

IROS Conference 2025 Conference Paper

Energy-Efficient Omnidirectional Locomotion for Wheeled Quadrupeds via Predictive Energy-Aware Nominal Gait Selection

Xu Yang 0044
Wei Yang
Kaibo He
Bo Yang 0064
Yanan Sui
Yilin Mo

Wheeled-legged robots combine the efficiency of wheels with the versatility of legs, but face significant energy optimization challenges when navigating diverse environments. In this work, we present a hierarchical control framework that integrates predictive power modeling with residual reinforcement learning to optimize omnidirectional locomotion efficiency for wheeled quadrupedal robots. Our approach employs a novel power prediction network that forecasts energy consumption across different gait patterns over a 1-second horizon, enabling intelligent selection of the most energy-efficient nominal gait. A reinforcement learning policy then generates residual adjustments to this nominal gait, fine-tuning the robot’s actions to balance energy efficiency with performance objectives. Comparative analysis shows our method reduces energy consumption by up to 35% compared to fixed-gait approaches while maintaining comparable velocity tracking performance. We validate our framework through extensive simulations and real-world experiments on a modified Unitree Go1 platform, demonstrating robust performance even under external disturbances. Videos and implementation details are available at https://sites.google.com/view/switching-wpg.

Details

JBHI Journal 2025 Journal Article

Frozen Large-Scale Pretrained Vision-Language Models are the Effective Foundational Backbone for Multimodal Breast Cancer Prediction

Hung Q. Vo
Lin Wang
Kelvin K. Wong
Chika F. Ezeana
Xiaohui Yu
Wei Yang
Jenny Chang
Hien V. Nguyen

Breast cancer is a pervasive global health concern among women. Leveraging multimodal data from enterprise patient databases—including Picture Archiving and Communication Systems (PACS) and Electronic Health Records (EHRs)—holds promise for improving prediction. This study introduces a multimodal deep-learning model leveraging mammogram datasets to evaluate breast cancer prediction. Our approach integrates frozen large-scale pretrained vision-language models, showcasing superior performance and stability compared to traditional image-tabular models across two public breast cancer datasets. The model consistently outperforms conventional full fine-tuning methods by using frozen pretrained vision-language models alongside a lightweight trainable classifier. The observed improvements are significant. In the CBIS-DDSM dataset, the Area Under the Curve (AUC) increases from 0. 867 to 0. 902 during validation and from 0. 803 to 0. 830 for the official test set. Within the EMBED dataset, AUC improves from 0. 780 to 0. 805 during validation. In scenarios with limited data, using Breast Imaging-Reporting and Data System category three (BI-RADS 3) cases, AUC improves from 0. 91 to 0. 96 on the official CBIS-DDSM test set and from 0. 79 to 0. 83 on a challenging validation set. This study underscores the benefits of vision-language models in jointly training diverse image-clinical datasets from multiple healthcare institutions, effectively addressing challenges related to non-aligned tabular features. Combining training data enhances breast cancer prediction on the EMBED dataset, outperforming all other experiments. In summary, our research emphasizes the efficacy of frozen large-scale pretrained vision-language models in multimodal breast cancer prediction, offering superior performance and stability over conventional methods, reinforcing their potential for breast cancer prediction.

Details DOI

NeurIPS Conference 2025 Conference Paper

Leaving No OOD Instance Behind: Instance-Level OOD Fine-Tuning for Anomaly Segmentation

Yuxuan Zhang
Zhenbo Shi
han ye
Shuchang Wang
Zhidong Yu
Shaowei Wang
Wei Yang

Out-of-distribution (OOD) fine-tuning has emerged as a promising approach for anomaly segmentation. Current OOD fine-tuning strategies typically employ global-level objectives, aiming to guide segmentation models to accurately predict a large number of anomaly pixels. However, these strategies often perform poorly on small anomalies. To address this issue, we propose an instance-level OOD fine-tuning framework, dubbed LNOIB (Leaving No OOD Instance Behind). We start by theoretically analyzing why global-level objectives fail to segment small anomalies. Building on this analysis, we introduce a simple yet effective instance-level objective. Moreover, we propose a feature separation objective to explicitly constrain the representations of anomalies, which are prone to be smoothed by their in-distribution (ID) surroundings. LNOIB integrates these objectives to enhance the segmentation of small anomalies and serves as a paradigm adaptable to existing OOD fine-tuning strategies, without introducing additional inference cost. Experimental results show that integrating LNOIB into various OOD fine-tuning strategies yields significant improvements, particularly in component-level results, highlighting its strength in comprehensive anomaly segmentation.

PDF Details

EAAI Journal 2025 Journal Article

Mixed-type micro-defect detection in semiconductor Wafers: A dual-modal feature real-time detection approach via optical topography and lightweight classification network

Shangbin Jiao
Wei Yang
Chenyan Wu
Yujun Li
Bing Xue

Details DOI

IJCAI Conference 2025 Conference Paper

Optimized View and Geometry Distillation from Multi-view Diffuser

Youjia Zhang
Zikai Song
Junqing Yu
Yawei Luo
Wei Yang

Generating multi-view images from a single input view using image-conditioned diffusion models is a recent advancement and has shown considerable potential. However, issues such as the lack of consistency in synthesized views and over-smoothing in extracted geometry persist. Previous methods integrate multi-view consistency modules or impose additional supervisory to enhance view consistency while compromising on the flexibility of camera positioning and limiting the versatility of view synthesis. In this study, we consider the radiance field optimized during geometry extraction as a more rigid consistency prior, compared to volume and ray aggregation used in previous works. We further identify and rectify a critical bias in the traditional radiance field optimization process through score distillation from a multi-view diffuser. We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model, greatly refining the radiance field fidelity. We leverage the rendered views from the optimized radiance field as the basis and develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising and generating high-quality multi-view images. Finally, we recover faithful geometry and texture directly from the refined multi-view images. Empirical evaluations demonstrate that our optimized geometry and view distillation technique generates comparable results to the state-of-the-art models trained on extensive datasets, all while maintaining freedom in camera positioning. Source code of our work is publicly available at: https: //youjiazhang. github. io/USD/.

PDF Details DOI

AAAI Conference 2025 Conference Paper

RP-PGD: Boosting Segmentation Robustness with a Region-and-Prototype Based Adversarial Attack

Yuxuan Zhang
Zhenbo Shi
Shuchang Wang
Wei Yang
Shaowei Wang
Yinxing Xue

Adversarial attack and defense have been extensively explored in classification tasks, but their study in semantic segmentation remains limited. Moreover, current attacks fail to act as strong underlying attacks for adversarial training (AT), making it difficult to achieve segmentation robustness against strong attacks. In this paper, we present RP-PGD, a novel Region-and-Prototype based Projected Gradient Descent attack tailored to fool segmentation models. In particular, we propose a region-based attack, which leverages a spatial-temporal way to separate the pixels into three disjoint regions, and highlights the attack on the crucial True Region and Boundary Region. Moreover, we introduce a prototype-based attack to disrupt the feature space, further enhancing the attack capability. To boost the robustness of segmentation models, we inject adversaries generated by RP-PGD into the clean data and perform AT. Extensive experiments on multiple datasets showcase that RP-PGD generates adversaries with faster convergence and stronger attack effectiveness, surpassing state-of-the-art attacks by a large margin. Consequently, RP-PGD serves as a strong underlying attack for segmentation models to perform AT, assisting them in defending against a variety of strong attacks without incurring additional computational costs during inference.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Stop Diverse OOD Attacks: Knowledge Ensemble for Reliable Defense

Zhenbo Shi
Xiaoman Liu
Yuxuan Zhang
Shuchang Wang
Rui Shu
Zhidong Yu
Wei Yang
Liusheng Huang

Enhancing defense through model ensemble is an emerging trend, where the challenge lies in how to use ensemble knowledge to counter Out-of-Distribution (OOD) attacks. In this paper, we propose the Reliable Defense Ensemble (REE) to address this issue. REE optimizes the ensemble knowledge of models through aggregation and enhances multidimensional robust performance through collaboration. It employs the Dynamic Synergy Amplification for weight allocation and strategy adjustment. Furthermore, we design a new Kernel Anomaly Smoothing Detection Module, which detects anomalous attacks using a smoothing feature function based on Gaussian kernel mean embedding and a multi-layer feedback structure. Particularly, we build a framework that uses reinforcement learning to iteratively fine-tune the parameters of inter-model communication and consensus. Extensive experimental results show that REE outperforms current state-of-the-art methods by a large margin in defending against OOD attacks.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Structured Spectral Reasoning for Frequency-Adaptive Multimodal Recommendation

Wei Yang
Rui Zhong
Yiqun Chen
Chi Lu
Peng Jiang

Multimodal recommendation aims to integrate collaborative signals with heterogeneous content such as visual and textual information, but remains challenged by modality-specific noise, semantic inconsistency, and unstable propagation over user–item graphs. These issues are often exacerbated by naive fusion or shallow modeling strategies, leading to degraded generalization and poor robustness. While recent work has explored the frequency domain as a lens to separate stable from noisy signals, most methods rely on static filtering or reweighting, lacking the ability to reason over spectral structure or adapt to modality-specific reliability. To address these challenges, we propose a Structured Spectral Reasoning (SSR) framework for frequency-aware multimodal recommendation. Our method follows a four-stage pipeline: (i) Decompose graph-based multimodal signals into spectral bands via graph-guided transformations to isolate semantic granularity; (ii) Modulate band-level reliability with spectral band masking, a training-time masking with representation-consistency objective that suppresses brittle frequency components; (iii) Fuse complementary frequency cues using hyperspectral reasoning with low-rank cross-band interaction; and (iv) Align modality-specific spectral features via contrastive regularization to promote semantic and structural consistency. Experiments on three real-world benchmarks show consistent gains over strong baselines, particularly under sparse and cold-start settings. Additional analyses indicate that structured spectral modeling improves robustness and provides clearer diagnostics of how different bands contribute to performance.

PDF Details

AAAI Conference 2025 Conference Paper

Temporal Coherent Object Flow for Multi-Object Tracking

Zikai Song
Run Luo
Lintao Ma
Ying Tang
Yi-Ping Phoebe Chen
Junqing Yu
Wei Yang

Multi-object tracking is a challenging vision task that requires simultaneous reasoning about object detection and object association. Conventional solutions use frame as the basic unit and typically rely on a motion predictor that exploits the appearance features to associate detected candidates, leading to insufficient adaptability to long-term associations. In this study, we propose a section-based multi-object tracking approach that integrates a temporal coherent Object Flow Tracker (OFTrack), capable of achieving simultaneous multi-frame tracking by treating multiple consecutive frames as the basic processing unit, denoted as a “section”. Our OFTrack boosts the optical flow to the object flow by employing object perception and section-based motion estimation strategies. Object perception adopts object-aware sampling and scale-aware correlation to enable precise target discrimination. Motion estimation models the correlation of different objects in multi-frames via specialized temporal-spatial attention to achieve robust association in very long videos. Additionally, to address the oscillation of unpredictable trajectories in multi-frame estimation, we have designed temporal coherent enhancement including the trajectory masking pre-training and the smoothing constraint on trajectory curves. Comprehensive experiments on several widely used benchmarks demonstrate the superior performance of our approach.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model

Hang Zhou
Jiale Cai
Yuteng Ye
Yonghui Feng
Chenxing Gao
Junqing Yu
Zikai Song
Wei Yang

A recent endeavor in one class of video anomaly detection is to leverage diffusion models and posit the task as a generation problem, where the diffusion model is trained to recover normal patterns exclusively, thus reporting abnormal patterns as outliers. Yet, existing attempts neglect the various formations of anomaly and predict normal samples at the feature level regardless that abnormal objects in surveillance videos are often relatively small. To address this, a novel patch-based diffusion model is proposed, specifically engineered to capture fine-grained local information. We further observe that anomalies in videos manifest themselves as deviations in both appearance and motion. Therefore, we argue that a comprehensive solution must consider both of these aspects simultaneously to achieve accurate frame prediction. To address this, we introduce innovative motion and appearance conditions that are seamlessly integrated into our patch diffusion model. These conditions are designed to guide the model in generating coherent and contextually appropriate predictions for both semantic content and motion relations. Experimental results on four challenging video anomaly detection datasets empirically substantiate the efficacy of our proposed approach, demonstrating that it consistently outperforms most existing methods in detecting abnormal behaviors.

PDF Details DOI

AIIM Journal 2024 Journal Article

Abnormal recognition-assisted and onset-offset aware network for pathological wearable ECG delineation

Yue Zhang
Jiewei Lai
Chenyu Zhao
Jinliang Wang
Yong Yan
Mingyang Chen
Lei Ji
Jun Guo

Details DOI

AAAI Conference 2024 Conference Paper

AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion

Beibei Jing
Youjia Zhang
Zikai Song
Junqing Yu
Wei Yang

Generating realistic human motion sequences from text descriptions is a challenging task that requires capturing the rich expressiveness of both natural language and human motion. Recent advances in diffusion models have enabled significant progress in human motion synthesis. However, existing methods struggle to handle text inputs that describe complex or long motions. In this paper, we propose the Adaptable Motion Diffusion (AMD) model, which leverages a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts that correspond to the target motion. This process exploits the LLM’s ability to provide anatomical guidance for complex motion synthesis. We then devise a two-branch fusion scheme that balances the influence of the input text and the anatomical scripts on the inverse diffusion process, which adaptively ensures the semantic fidelity and diversity of the synthesized motion. Our method can effectively handle texts with complex or long motion descriptions, where existing methods often fail. Experiments on datasets with relatively more complex motions, such as CLCD1 and CLCD2, demonstrate that our AMD significantly outperforms existing state-of-the-art models.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Attacking Transformers with Feature Diversity Adversarial Perturbation

Chenxing Gao
Hang Zhou
Junqing Yu
Yuteng Ye
Jiale Cai
Junle Wang
Wei Yang

Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturbations, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on labels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black-box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspiration comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Attacks on Continual Semantic Segmentation by Perturbing Incremental Samples

Zhidong Yu
Wei Yang
Xike Xie
Zhenbo Shi

As an essential computer vision task, Continual Semantic Segmentation (CSS) has received a lot of attention. However, security issues regarding this task have not been fully studied. To bridge this gap, we study the problem of attacks in CSS in this paper. We first propose a new task, namely, attacks on incremental samples in CSS, and reveal that the attacks on incremental samples corrupt the performance of CSS in both old and new classes. Moreover, we present an adversarial sample generation method based on class shift, namely Class Shift Attack (CS-Attack), which is an offline and easy-to-implement approach for CSS. CS-Attack is able to significantly degrade the performance of models on both old and new classes without knowledge of the incremental learning approach, which undermines the original purpose of the incremental learning, i.e., learning new classes while retaining old knowledge. Experiments show that on the popular datasets Pascal VOC, ADE20k, and Cityscapes, our approach easily degrades the performance of currently popular CSS methods, which reveals the importance of security in CSS.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Coupled Mamba: Enhanced Multimodal Fusion with Coupled State Space Model

Wenbing Li
Hang Zhou
Junqing Yu
Zikai Song
Wei Yang

The essence of multi-modal fusion lies in exploiting the complementary information inherent in diverse modalities. However, most prevalent fusion methods rely on traditional neural architectures and are inadequately equipped to capture the dynamics of interactions across modalities, particularly in presence of complex intra- and inter-modality correlations. Recent advancements in State Space Models (SSMs), notably exemplified by the Mamba model, have emerged as promising contenders. Particularly, its state evolving process implies stronger modality fusion paradigm, making multi-modal fusion on SSMs an appealing direction. However, fusing multiple modalities is challenging for SSMs due to its hardware-aware parallelism designs. To this end, this paper proposes the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes. Specifically, in our coupled scheme, we devise an inter-modal hidden states transition scheme, in which the current state is dependent on the states of its own chain and that of the neighbouring chains at the previous time-step. To fully comply with the hardware-aware parallelism, we obtain the global convolution kernel by deriving the state equation while introducing the historical state. Extensive experiments on CMU-MOSEI, CH-SIMS, CH-SIMSV2 through multi-domain input verify the effectiveness of our model compared to current state-of-the-art methods, improved F1-Score by 0. 4%, 0. 9%, and 2. 3% on the three datasets respectively, 49% faster inference and 83. 7% GPU memory save. The results demonstrate that Coupled Mamba model is capable of enhanced multi-modal fusion.

PDF Details DOI

AAAI Conference 2024 Conference Paper

DiffusionTrack: Diffusion Model for Multi-Object Tracking

Run Luo
Zikai Song
Lintao Ma
Jinlin Wei
Wei Yang
Min Yang

Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods. Despite the success of these approaches, they also suffer from common problems, such as harmful global or local inconsistency, poor trade-off between robustness and model complexity, and lack of flexibility in different scenes within the same video. In this paper we propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes. This novel progressive denoising diffusion strategy substantially augments the tracker's effectiveness, enabling it to discriminate between various objects. During the training stage, paired object boxes diffuse from paired ground-truth boxes to random distribution, and the model learns detection and tracking simultaneously by reversing this noising process. In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process. Extensive experiments on three widely used MOT benchmarks, including MOT17, MOT20, and DanceTrack, demonstrate that our approach achieves competitive performance compared to the current state-of-the-art methods. Code is available at https://github.com/RainBowLuoCS/DiffusionTrack.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Dynamic Feature Pruning and Consolidation for Occluded Person Re-identification

Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu

Occluded person re-identification (ReID) is a challenging problem due to contamination from occluders. Existing approaches address the issue with prior knowledge cues, such as human body key points and semantic segmentations, which easily fail in the presence of heavy occlusion and other humans as occluders. In this paper, we propose a feature pruning and consolidation (FPC) framework to circumvent explicit human structure parsing. The framework mainly consists of a sparse encoder, a multi-view feature mathcing module, and a feature consolidation decoder. Specifically, the sparse encoder drops less important image tokens, mostly related to background noise and occluders, solely based on correlation within the class token attention. Subsequently, the matching stage relies on the preserved tokens produced by the sparse encoder to identify k-nearest neighbors in the gallery by measuring the image and patch-level combined similarity. Finally, we use the feature consolidation module to compensate pruned features using identified neighbors for recovering essential information while disregarding disturbance from noise and occlusion. Experimental results demonstrate the effectiveness of our proposed framework on occluded, partial, and holistic Re-ID datasets. In particular, our method outperforms state-of-the-art results by at least 8.6% mAP and 6.0% Rank-1 accuracy on the challenging Occluded-Duke dataset.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

GenSeg: On Generating Unified Adversary for Segmentation

Yuxuan Zhang
Zhenbo Shi
Wei Yang
Shuchang Wang
Shaowei Wang
Yinxing Xue

Great advancements in semantic, instance, and panoptic segmentation have been made in recent years, yet the top-performing models remain vulnerable to imperceptible adversarial perturbation. Current attacks on segmentation primarily focus on a single task, and these methods typically rely on iterative instance-specific strategies, resulting in limited attack transferability and low efficiency. In this paper, we propose GenSeg, a Generative paradigm that creates unified adversaries for Segmentation tasks. In particular, we propose an intermediate-level objective to enhance attack transferability, including a mutual agreement loss for feature deviation, and a prototype obfuscating loss to disrupt intra-class and inter-class relationships. Moreover, GenSeg crafts an adversary in a single forward pass, significantly boosting the attack efficiency. Besides, we unify multiple segmentation tasks to GenSeg in a novel category-and-mask view, which makes it possible to attack these segmentation tasks within this unified framework, and conduct cross-domain and cross-task attacks as well. Extensive experiments demonstrate the superiority of GenSeg in black-box attacks compared with state-of-the-art attacks. To our best knowledge, GenSeg is the first approach capable of conducting cross-domain and cross-task attacks on segmentation tasks, which are closer to real-world scenarios.

PDF Details DOI

EAAI Journal 2024 Journal Article

New consensus reaching process with minimum adjustment and feedback mechanism for large-scale group decision making problems under social trust networks

Wei Yang
Luxiang Zhang
Jiarong Shi
Ruiyue Lin

Details DOI

AAAI Conference 2024 Conference Paper

Progressive Text-to-Image Diffusion with Soft Latent Direction

Yuteng Ye
Jiale Cai
Hang Zhou
Guanwen Li
Youjia Zhang
Zikai Song
Chenxing Gao
Junqing Yu

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges. This paper introduces an innovative progressive synthesis and editing operation that systematically incorporates entities into the target image, ensuring their adherence to spatial and relational constraints at each sequential step. Our key insight stems from the observation that while a pre-trained text-to-image diffusion model adeptly handles one or two entities, it often falters when dealing with a greater number. To address this limitation, we propose harnessing the capabilities of a Large Language Model (LLM) to decompose intricate and protracted text descriptions into coherent directives adhering to stringent formats. To facilitate the execution of directives involving distinct semantic operations—namely insertion, editing, and erasing—we formulate the Stimulus, Response, and Fusion (SRF) framework. Within this framework, latent regions are gently stimulated in alignment with each operation, followed by the fusion of the responsive latent components to achieve cohesive entity manipulation. Our proposed framework yields notable advancements in object synthesis, particularly when confronted with intricate and lengthy textual inputs. Consequently, it establishes a new benchmark for text-to-image generation tasks, further elevating the field's performance standards.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

PTDE: Personalized Training with Distilled Execution for Multi-Agent Reinforcement Learning

Yiqun Chen
Hangyu Mao
Jiaxin Mao
Shiguang Wu
Tianle Zhang
Bin Zhang
Wei Yang
Hongxing Chang

Centralized Training with Decentralized Execution (CTDE) has emerged as a widely adopted paradigm in multi-agent reinforcement learning, emphasizing the utilization of global information for learning an enhanced joint Q-function or centralized critic. In contrast, our investigation delves into harnessing global information to directly enhance individual Q-functions or individual actors. Notably, we discover that applying identical global information universally across all agents proves insufficient for optimal performance. Consequently, we advocate for the customization of global information tailored to each agent, creating agent-personalized global information to bolster overall performance. Furthermore, we introduce a novel paradigm named Personalized Training with Distilled Execution (PTDE), wherein agent-personalized global information is distilled into the agent's local information. This distilled information is then utilized during decentralized execution, resulting in minimal performance degradation. PTDE can be seamless integrated with state-of-the-art algorithms, leading to notable performance enhancements across diverse benchmarks, including the SMAC benchmark, Google Research Football (GRF) benchmark, and Learning to Rank (LTR) task.

PDF Details DOI

TCS Journal 2024 Journal Article

Space limited linear-time graph algorithms on big data

Jianer Chen
Zirui Chu
Ying Guo
Wei Yang

Details DOI

AAAI Conference 2024 Conference Paper

TIKP: Text-to-Image Knowledge Preservation for Continual Semantic Segmentation

Zhidong Yu
Wei Yang
Xike Xie
Zhenbo Shi

Continual Semantic Segmentation (CSS) is an emerging trend, where catastrophic forgetting has been a perplexing problem. In this paper, we propose a Text-to-Image Knowledge Preservation (TIKP) framework to address this issue. TIKP applies Text-to-Image techniques to CSS by automatically generating prompts and content adaptation. It extracts associations between the labels of seen data and constructs text-level prompts based on these associations, which are preserved and maintained at each incremental step. During training, these prompts generate correlated images to mitigate the catastrophic forgetting. Particularly, as the generated images may have different distributions from the original data, TIKP transfers the knowledge by a content adaption loss, which determines the role played by the generated images in incremental training based on the similarity. In addition, for the classifier, we use the previous model from a different perspective: misclassifying new classes into old objects instead of the background. We propose a knowledge distillation loss based on wrong labels, enabling us to attribute varying weights to individual objects during the distillation process. Extensive experiments conducted in the same setting show that TIKP outperforms state-of-the-art methods by a large margin on benchmark datasets.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced Hierarchical Diffusion Model

Zhenyu Xie
Yang Wu
Xuehao Gao
Zhongqian Sun
Wei Yang
Xiaodan Liang

Text-guided motion synthesis aims to generate 3D human motion that not only precisely reflects the textual description but reveals the motion details as much as possible. Pioneering methods explore the diffusion model for text-to-motion synthesis and obtain significant superiority. However, these methods conduct diffusion processes either on the raw data distribution or the low-dimensional latent space, which typically suffer from the problem of modality inconsistency or detail-scarce. To tackle this problem, we propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for high quality detailed motion synthesis. Specifically, the basic diffusion model in low-dimensional latent space provides the intermediate denoising result that to be consistent with the textual description, while the advanced diffusion model in high-dimensional latent space focuses on the following detail-enhancing denoising process. Besides, we introduce a multi-denoiser framework for the advanced diffusion model to ease the learning of high-dimensional model and fully explore the generative potential of the diffusion model. Quantitative and qualitative experiment results on two text-to-motion benchmarks (HumanML3D and KIT-ML) demonstrate that B2A-HDM can outperform existing state-of-the-art methods in terms of fidelity, modality consistency, and diversity.

PDF Details DOI

YNICL Journal 2023 Journal Article

A presurgical voxel-wise predictive model for cerebellar mutism syndrome in children with posterior fossa tumors

Wei Yang
Yiming Li
Zesheng Ying
Yingjie Cai
Xiaojiao Peng
HaiLang Sun
Jiashu Chen
Kaiyi Zhu

Details DOI

NeurIPS Conference 2023 Conference Paper

A Robust and Opponent-Aware League Training Method for StarCraft II

Ruozi Huang
Xipeng Wu
Hongsheng Yu
Zhong Fan
Haobo Fu
Qiang Fu
Wei Yang

It is extremely difficult to train a superhuman Artificial Intelligence (AI) for games of similar size to StarCraft II. AlphaStar is the first AI that beat human professionals in the full game of StarCraft II, using a league training framework that is inspired by a game-theoretic approach. In this paper, we improve AlphaStar's league training in two significant aspects. We train goal-conditioned exploiters, whose abilities of spotting weaknesses in the main agent and the entire league are greatly improved compared to the unconditioned exploiters in AlphaStar. In addition, we endow the agents in the league with the new ability of opponent modeling, which makes the agent more responsive to the opponent's real-time strategy. Based on these improvements, we train a better and superhuman AI with orders of magnitude less resources than AlphaStar (see Table 1 for a full comparison). Considering the iconic role of StarCraft II in game AI research, we believe our method and results on StarCraft II provide valuable design principles on how one would utilize the general league training framework for obtaining a least-exploitable strategy in various, large-scale, real-world games.

PDF Details

NeurIPS Conference 2023 Conference Paper

Act As You Wish: Fine-Grained Control of Motion Diffusion Model with Hierarchical Semantic Graphs

Peng Jin
Yang Wu
Yanbo Fan
Zhongqian Sun
Wei Yang
Li Yuan

Most text-driven human motion generation methods employ sequential modeling approaches, e. g. , transformer, to extract sentence-level text representations automatically and implicitly for human motion synthesis. However, these compact text representations may overemphasize the action names at the expense of other important properties and lack fine-grained details to guide the synthesis of subtly distinct motion. In this paper, we propose hierarchical semantic graphs for fine-grained control over motion generation. Specifically, we disentangle motion descriptions into hierarchical semantic graphs including three levels of motions, actions, and specifics. Such global-to-local structures facilitate a comprehensive understanding of motion description and fine-grained control of motion generation. Correspondingly, to leverage the coarse-to-fine topology of hierarchical semantic graphs, we decompose the text-to-motion diffusion process into three semantic levels, which correspond to capturing the overall motion, local actions, and action specifics. Extensive experiments on two benchmark human motion datasets, including HumanML3D and KIT, with superior performances, justify the efficacy of our method. More encouragingly, by modifying the edge weights of hierarchical semantic graphs, our method can continuously refine the generated motion, which may have a far-reaching impact on the community. Code and pre-trained weights are available at https: //github. com/jpthu17/GraphMotion.

PDF Details

AAAI Conference 2023 Conference Paper

Compact Transformer Tracker with Correlative Masked Modeling

Zikai Song
Run Luo
Junqing Yu
Yi-Ping Phoebe Chen
Wei Yang

Transformer framework has been showing superior performances in visual object tracking for its great strength in information aggregation across the template and search image with the well-known attention mechanism. Most recent advances focus on exploring attention mechanism variants for better information aggregation. We find these schemes are equivalent to or even just a subset of the basic self-attention mechanism. In this paper, we prove that the vanilla self-attention structure is sufficient for information aggregation, and structural adaption is unnecessary. The key is not the attention structure, but how to extract the discriminative feature for tracking and enhance the communication between the target and search image. Based on this finding, we adopt the basic vision transformer (ViT) architecture as our main tracker and concatenate the template and search image for feature embedding. To guide the encoder to capture the invariant feature for tracking, we attach a lightweight correlative masked decoder which reconstructs the original template and search image from the corresponding masked tokens. The correlative masked decoder serves as a plugin for the compact transformer tracker and is skipped in inference. Our compact tracker uses the most simple structure which only consists of a ViT backbone and a box head, and can run at 40 fps. Extensive experiments show the proposed compact transform tracker outperforms existing approaches, including advanced attention variants, and demonstrates the sufficiency of self-attention in tracking tasks. Our method achieves state-of-the-art performance on five challenging datasets, along with the VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k benchmarks. Our project is available at https://github.com/HUSTDML/CTTrack.

PDF Details DOI

JBHI Journal 2023 Journal Article

Coupled Contour Regression for Efficient Delineation of Lumen and External Elastic Lamina in Intravascular Ultrasound Images

Yuan Yang
Wei Yu
Haiyan Du
Li Ling
Qianjin Feng
Shengxian Tu
Wei Yang

Automatic delineation of the lumen and vessel contours in intravascular ultrasound (IVUS) images is crucial for the subsequent IVUS-based analysis. Existing methods usually address this task through mask-based segmentation, which cannot effectively handle the anatomical plausibility of the lumen and external elastic lamina (EEL) contours and thus limits their performance. In this article, we propose a contour encoding based method called coupled contour regression network (CCRNet) to directly predict the lumen and EEL contour pairs. The lumen and EEL contours are resampled, coupled, and embedded into a low-dimensional space to learn a compact contour representation. Then, we employ a convolutional network backbone to predict the coupled contour signatures and reconstruct the signatures to the object contours by a linear decoder. Assisted by the implicit anatomical prior of the paired lumen and EEL contours in the signature space and contour decoder, CCRNet has the potential to avoid producing unreasonable results. We evaluated our proposed method on a large IVUS dataset consisting of 7204 cross-sectional frames from 185 pullbacks. The CCRNet can rapidly extract the contours at 100 fps. Without any post-processing, all produced contours are anatomically reasonable in the test 19 pullbacks. The mean Dice similarity coefficients of our CCRNet for the lumen and EEL are 0. 940 and 0. 958, which are comparable to the mask-based models. In terms of the contour metric Hausdorff distance, our CCRNet achieves 0. 258 mm for lumen and 0. 268 mm for EEL, which outperforms the mask-based models.

Details DOI

AAAI Conference 2023 Conference Paper

Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection

Hang Zhou
Junqing Yu
Wei Yang

Learning discriminative features for effectively separating abnormal events from normality is crucial for weakly supervised video anomaly detection (WS-VAD) tasks. Existing approaches, both video and segment level label oriented, mainly focus on extracting representations for anomaly data while neglecting the implication of normal data. We observe that such a scheme is sub-optimal, i.e., for better distinguishing anomaly one needs to understand what is a normal state, and may yield a higher false alarm rate. To address this issue, we propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data. To be specific, inspired by the traditional global and local structure on graph convolutional networks, we introduce a Global and Local Multi-Head Self Attention (GL-MHSA) module for the Transformer network to obtain more expressive embeddings for capturing associations in videos. Then, we use two memory banks, one additional abnormal memory for tackling hard samples, to store and separate abnormal and normal prototypes and maximize the margins between the two representations. Finally, we propose an uncertainty learning scheme to learn the normal data latent space, that is robust to noise from camera switching, object changing, scene transforming, etc. Extensive experiments on XD-Violence and UCF-Crime datasets demonstrate that our method outperforms the state-of-the-art methods by a sizable margin.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

FGNet: Towards Filling the Intra-class and Inter-class Gaps for Few-shot Segmentation

Yuxuan Zhang
Wei Yang
Shaowei Wang

Current few-shot segmentation (FSS) approaches have made tremendous achievements based on prototypical learning techniques. However, due to the scarcity of the support data provided, FSS methods still suffer from the intra-class and inter-class gaps. In this paper, we propose a uniform network to fill both the gaps, termed FGNet. It consists of the novel design of a Self-Adaptive Module (SAM) to emphasize the query feature to generate an enhanced prototype for self-alignment. Such a prototype caters to each query sample itself since it contains the underlying intra-instance information, which gets around the intra-class appearance gap. Moreover, we design an Inter-class Feature Separation Module (IFSM) to separate the feature space of the target class from other classes, which contributes to bridging the inter-class gap. In addition, we present several new losses and a method termed B-SLIC, which help to further enhance the separation performance of FGNet. Experimental results show that FGNet reduces both the gaps for FSS by SAM and IFSM respectively, and achieves state-of-the-art performances on both PASCAL-5i and COCO-20i datasets compared with previous top-performing approaches.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Yun Qu
Boyuan Wang
Jianzhun Shao
Yuhang Jiang
Chen Chen
Zhenbin Ye
Liu Linc
Yang Feng

The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations. Utilizing this framework, we benchmark a variety of offline RL and offline MARL algorithms. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game. We reveal the incompetency of current offline RL approaches in handling task complexity, generalization and multi-task learning.

PDF Details

AAAI Conference 2023 Conference Paper

Maximum Entropy Population-Based Training for Zero-Shot Human-AI Coordination

Rui Zhao
Jinming Song
Yufeng Yuan
Haifeng Hu
Yang Gao
Yi Wu
Zhongqian Sun
Wei Yang

We study the problem of training a Reinforcement Learning (RL) agent that is collaborative with humans without using human data. Although such agents can be obtained through self-play training, they can suffer significantly from the distributional shift when paired with unencountered partners, such as humans. In this paper, we propose Maximum Entropy Population-based training (MEP) to mitigate such distributional shift. In MEP, agents in the population are trained with our derived Population Entropy bonus to promote the pairwise diversity between agents and the individual diversity of agents themselves. After obtaining this diversified population, a common best agent is trained by paring with agents in this population via prioritized sampling, where the prioritization is dynamically adjusted based on the training progress. We demonstrate the effectiveness of our method MEP, with comparison to Self-Play PPO (SP), Population-Based Training (PBT), Trajectory Diversity (TrajeDi), and Fictitious Co-Play (FCP) in both matrix game and Overcooked game environments, with partners being human proxy models and real humans. A supplementary video showing experimental results is available at https://youtu.be/Xh-FKD0AAKE.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Policy Space Diversity for Non-Transitive Games

Jian Yao
Weiming Liu
Haobo Fu
Yaodong Yang
Stephen McAleer
Qiang Fu
Wei Yang

Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness with existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving of PSRO, we obtain a new PSRO variant, \textit{Policy Space Diversity} PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on single-state games, Leduc, and Goofspiel demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants.

PDF Details

AAAI Conference 2023 Conference Paper

RLogist: Fast Observation Strategy on Whole-Slide Images with Deep Reinforcement Learning

Boxuan Zhao
Jun Zhang
Deheng Ye
Jian Cao
Xiao Han
Qiang Fu
Wei Yang

Whole-slide images (WSI) in computational pathology have high resolution with gigapixel size, but are generally with sparse regions of interest, which leads to weak diagnostic relevance and data inefficiency for each area in the slide. Most of the existing methods rely on a multiple instance learning framework that requires densely sampling local patches at high magnification. The limitation is evident in the application stage as the heavy computation for extracting patch-level features is inevitable. In this paper, we develop RLogist, a benchmarking deep reinforcement learning (DRL) method for fast observation strategy on WSIs. Imitating the diagnostic logic of human pathologists, our RL agent learns how to find regions of observation value and obtain representative features across multiple resolution levels, without having to analyze each part of the WSI at the high magnification. We benchmark our method on two whole-slide level classification tasks, including detection of metastases in WSIs of lymph node sections, and subtyping of lung cancer. Experimental results demonstrate that RLogist achieves competitive classification performance compared to typical multiple instance learning algorithms, while having a significantly short observation path. In addition, the observation path given by RLogist provides good decision-making interpretability, and its ability of reading path navigation can potentially be used by pathologists for educational/assistive purposes. Our code is available at: https://github.com/tencent-ailab/RLogist.

PDF Details DOI

ICLR Conference 2023 Conference Paper

SYNC: Safety-Aware Neural Control for Stabilizing Stochastic Delay-Differential Equations

Jingdong Zhang 0001
Qunxi Zhu
Wei Yang
Wei Lin 0003

Stabilization of the systems described by \textit{stochastic delay}-differential equations (SDDEs) under preset conditions is a challenging task in the control community. Here, to achieve this task, we leverage neural networks to learn control policies using the information of the controlled systems in some prescribed regions. Specifically, two learned control policies, i.e., the neural deterministic controller (NDC) and the neural stochastic controller (NSC), work effectively in the learning procedures that rely on, respectively, the well-known LaSalle-type theorem and the newly-established theorem for guaranteeing the stochastic stability in SDDEs. We theoretically investigate the performance of the proposed controllers in terms of convergence time and energy cost. More practically and significantly, we improve our learned control policies through considering the situation where the controlled trajectories only evolve in some specific safety set. {\color{black} The practical validity of such control policies restricted in safety set is attributed to the theory that we further develop for safety and stability guarantees in SDDEs using the stochastic control barrier function and the spatial discretization}. We call this control as SYNC (\textbf{S}afet\textbf{Y}-aware \textbf{N}eural \textbf{C}ontrol). The efficacy of all the articulated control policies, including the SYNC, is demonstrated systematically by using representative control problems.

Details

NeurIPS Conference 2022 Conference Paper

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Hua Wei
Jingxiao Chen
Xiyang Ji
Hongyang Qin
Minwen Deng
Siqin Li
Liang Wang
Weinan Zhang

This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on the Honor of Kings, one of the world’s most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available.

PDF Details

IJCAI Conference 2022 Conference Paper

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Zichuan Lin
Junyou Li
Jianing Shi
Deheng Ye
Qiang Fu
Wei Yang

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Learn to Reverse DNNs from AI Programs Automatically

Simin Chen
Hamed Khanpour
Cong Liu
Wei Yang

With the privatization deployment of DNNs on edge devices, the security of on-device DNNs has raised significant concern. To quantify the model leakage risk of on-device DNNs automatically, we propose NNReverse, the first learning-based method which can reverse DNNs from AI programs without domain knowledge. NNReverse trains a representation model to represent the semantics of binary code for DNN layers. By searching the most similar function in our database, NNReverse infers the layer type of a given function’s binary code. To represent assembly instructions semantics precisely, NNReverse proposes a more fine-grained embedding model to represent the textual and structural-semantic of assembly functions.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

SCL-WC: Cross-Slide Contrastive Learning for Weakly-Supervised Whole-Slide Image Classification

Xiyue Wang
Jinxi Xiang
Jun Zhang
Sen Yang
Zhongyi Yang
Ming-Hui Wang
Jing Zhang
Wei Yang

Weakly-supervised whole-slide image (WSI) classification (WSWC) is a challenging task where a large number of unlabeled patches (instances) exist within each WSI (bag) while only a slide label is given. Despite recent progress for the multiple instance learning (MIL)-based WSI analysis, the major limitation is that it usually focuses on the easy-to-distinguish diagnosis-positive regions while ignoring positives that occupy a small ratio in the entire WSI. To obtain more discriminative features, we propose a novel weakly-supervised classification method based on cross-slide contrastive learning (called SCL-WC), which depends on task-agnostic self-supervised feature pre-extraction and task-specific weakly-supervised feature refinement and aggregation for WSI-level prediction. To enable both intra-WSI and inter-WSI information interaction, we propose a positive-negative-aware module (PNM) and a weakly-supervised cross-slide contrastive learning (WSCL) module, respectively. The WSCL aims to pull WSIs with the same disease types closer and push different WSIs away. The PNM aims to facilitate the separation of tumor-like patches and normal ones within each WSI. Extensive experiments demonstrate state-of-the-art performance of our method in three different classification tasks (e. g. , over 2% of AUC in Camelyon16, 5% of F1 score in BRACS, and 3% of AUC in DiagSet). Our method also shows superior flexibility and scalability in weakly-supervised localization and semi-supervised classification experiments (e. g. , first place in the BRIGHT challenge). Our code will be available at https: //github. com/Xiyue-Wang/SCL-WC.

PDF Details

AAAI Conference 2022 Conference Paper

Shape Prior Guided Attack: Sparser Perturbations on 3D Point Clouds

Zhenbo Shi
Zhi Chen
Zhenbo Xu
Wei Yang
Zhidong Yu
Liusheng Huang

Deep neural networks are extremely vulnerable to malicious input data. As 3D data is increasingly used in vision tasks such as robots, autonomous driving and drones, the internal robustness of the classification models for 3D point cloud has received widespread attention. In this paper, we propose a novel method named SPGA (Shape Prior Guided Attack) to generate adversarial point cloud examples. We use shape prior information to make perturbations sparser and thus achieve imperceptible attacks. In particular, we propose a Spatially Logical Block (SLB) to apply adversarial points through sliding in the oriented bounding box. Moreover, we design an algorithm called FOFA for this type of task, which further refines the adversarial attack in the process of breaking down complicated problems into sub-problems. Compared with the methods of global perturbation, our attack method consumes significantly fewer computations, making it more efficient. Most importantly of all, SPGA can generate examples with a higher attack success rate (even in a defensive situation), less perturbation budget and stronger transferability.

PDF Details

IJCAI Conference 2021 Conference Paper

Boosting Offline Reinforcement Learning with Residual Generative Modeling

Hua Wei
Deheng Ye
Zhao Liu
Hao Wu
Bo Yuan
Qiang Fu
Wei Yang
Zhenhui Li

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i. e. , approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game, Honor of Kings.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Hiding Numerical Vectors in Local Private and Shuffled Messages

Shaowei Wang
Jin Li
Yuqiu Qian
Jiachun Du
Wenqing Lin
Wei Yang

Numerical vector aggregation has numerous applications in privacy-sensitive scenarios, such as distributed gradient estimation in federated learning, and statistical analysis on key-value data. Within the framework of local differential privacy, this work gives tight minimax error bounds of O(d s/(n epsilon^2)), where d is the dimension of the numerical vector and s is the number of non-zero entries. An attainable mechanism is then designed to improve from existing approaches suffering error rate of O(d^2/(n epsilon^2)) or O(d s^2/(n epsilon^2)). To break the error barrier in the local privacy, this work further consider privacy amplification in the shuffle model with anonymous channels, and shows the mechanism satisfies centralized (14 ln(2/delta) (s e^epsilon+2s-1)/(n-1))^0. 5, delta)-differential privacy, which is domain independent and thus scales to federated learning of large models. We experimentally validate and compare it with existing approaches, and demonstrate its significant error reduction.

PDF Details DOI

NeurIPS Conference 2021 Conference Paper

Learning Diverse Policies in MOBA Games via Macro-Goals

Yiming Gao
Bei Shi
Xueying Du
Liang Wang
Guangwei Chen
Zhenjie Lian
Fuhao Qiu
GUOAN HAN

Recently, many researchers have made successful progress in building the AI systems for MOBA-game-playing with deep reinforcement learning, such as on Dota 2 and Honor of Kings. Even though these AI systems have achieved or even exceeded human-level performance, they still suffer from the lack of policy diversity. In this paper, we propose a novel Macro-Goals Guided framework, called MGG, to learn diverse policies in MOBA games. MGG abstracts strategies as macro-goals from human demonstrations and trains a Meta-Controller to predict these macro-goals. To enhance policy diversity, MGG samples macro-goals from the Meta-Controller prediction and guides the training process towards these goals. Experimental results on the typical MOBA game Honor of Kings demonstrate that MGG can execute diverse policies in different matches and lineups, and also outperform the state-of-the-art methods over 102 heroes.

PDF Details

IJCAI Conference 2021 Conference Paper

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

Menghui Zhu
Minghuan Liu
Jian Shen
Zhicheng Zhang
Sheng Chen
Weinan Zhang
Deheng Ye
Yong Yu

In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

PDF Details DOI

JBHI Journal 2021 Journal Article

Quantifying Axial Spine Images Using Object-Specific Bi-Path Network

Liyan Lin
Xi Tao
Wei Yang
Shumao Pang
Zhihai Su
Hai Lu
Shuo Li
Qianjin Feng

Automatic estimation of indices from medical images is the main goal of computer-aided quantification (CADq), which speeds up diagnosis and lightens the workload of radiologists. Deep learning technique is a good choice for implementing CADq. Usually, to acquire high-accuracy quantification, specific network architecture needs to be designed for a given CADq task. In this study, considering that the target organs are the intervertebral disc and the dural sac, we propose an object-specific bi-path network (OSBP-Net) for axial spine image quantification. Each path of the OSBP-Net comprises a shallow feature extraction layer (SFE) and a deep feature extraction sub-network (DFE). The SFEs use different convolution strides because the two target organs have different anatomical sizes. The DFEs use average pooling for downsampling based on the observation that the target organs have lower intensity than the background. In addition, an inter-path dissimilarity constraint is proposed and applied to the output of the SFEs, taking into account that the activated regions in the feature maps of two paths should be different theoretically. An inter-index correlation regularization is introduced and applied to the output of the DFEs based on the observation that the diameter and area of the same object express an approximately linear relation. The prediction results of OSBP-Net are compared to several state-of-the-art machine learning-based CADq methods. The comparison reveals that the proposed methods precede other competing methods extensively, indicating its great potential for spine CADq.

Details DOI

YNIMG Journal 2020 Journal Article

(TS)2WM: Tumor Segmentation and Tract Statistics for Assessing White Matter Integrity with Applications to Glioblastoma Patients

Liming Zhong
Tengfei Li
Hai Shu
Chao Huang
Jason Michael Johnson
Donald F Schomer
Ho-Ling Liu
Qianjin Feng

Details DOI

JBHI Journal 2020 Journal Article

Flexible Prediction of CT Images From MRI Data Through Improved Neighborhood Anchored Regression for PET Attenuation Correction

Liming Zhong
Yanlin Chen
Xiao Zhang
Shupeng Liu
Yuankui Wu
Yunbi Liu
Liyan Lin
Qianjin Feng

Given the complicated relationship between the magnetic resonance imaging (MRI) signals and the attenuation values, the attenuation correction in hybrid positron emission tomography (PET)/MRI systems remains a challenging task. Currently, existing methods are either time-consuming or require sufficient samples to train the models. In this paper, an efficient approach for predicting pseudo computed tomography (CT) images from T1- and T2-weighted MRI data with limited data is proposed. The proposed approach uses improved neighborhood anchored regression (INAR) as a baseline method to pre-calculate projected matrices to flexibly predict the pseudo CT patches. Techniques, including the augmentation of the MR/CT dataset, learning of the nonlinear descriptors of MR images, hierarchical search for nearest neighbors, data-driven optimization, and multi-regressor ensemble, are adopted to improve the effectiveness of the proposed approach. In total, 22 healthy subjects were enrolled in the study. The pseudo CT images obtained using INAR with multi-regressor ensemble yielded mean absolute error (MAE) of 92. 73 $\pm$ 14. 86 HU, peak signal-to-noise ratio of 29. 77 $\pm$ 1. 63 dB, Pearson linear correlation coefficient of 0. 82 $\pm$ 0. 05, dice similarity coefficient of 0. 81 $\pm$ 0. 03, and the relative mean absolute error (rMAE) in PET attenuation correction of 1. 30 $\pm$ 0. 20% compared with true CT images. Moreover, our proposed INAR method, without any refinement strategies, can achieve considerable results with only seven subjects (MAE 106. 89 $\pm$ 14. 43 HU, rMAE 1. 51 $\pm$ 0. 21%). The experiments prove the superior performance of the proposed method over the six innovative methods. Moreover, the proposed method can rapidly generate the pseudo CT images that are suitable for PET attenuation correction.

Details DOI

AAAI Conference 2020 Conference Paper

Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye
Zhao Liu
Mingfei Sun
Bei Shi
Peilin Zhao
Hao Wu
Hongsheng Yu
Shaojie Yang

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difﬁcult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efﬁcient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dualclip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, the trained AI agents can defeat top professional human players in full 1v1 games.

PDF Details

NeurIPS Conference 2020 Conference Paper

Towards Playing Full MOBA Games with Deep Reinforcement Learning

Deheng Ye
Guibin Chen
Wen Zhang
Sheng Chen
Bo Yuan
Bo Liu
Jia Chen
Zhao Liu

MOBA games, e. g. , Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i. e. , lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including off-policy adaption, multi-head value estimation, curriculum self-play learning, policy distillation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

PDF Details

AAAI Conference 2020 Conference Paper

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Zhenbo Xu
Wei Zhang
Xiaoqing Ye
Xiao Tan
Wei Yang
Shilei Wen
Errui Ding
Ajin Meng

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of leftright bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a uniﬁed resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D ﬁtting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9. 4% on APbv (IoU=0. 7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0. 7). In addition, since the ofﬁcial KITTI benchmark lacks ﬁne-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc. . Both the KFG dataset and our codes will be publicly available at https: //github. com/detectRecog/ZoomNet.

PDF Details

AAAI Conference 2019 Conference Paper

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

Jinfeng Rao
Wei Yang
Yuhao Zhang
Ferhan Ture
Jimmy Lin

Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have mostly been applied to “standard” ad hoc retrieval tasks over web pages and newswire articles. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network), a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A poolingbased similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011–2014 show that our model significantly outperforms prior feature-based as well as existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models. Our code and data are publicly available. 1

PDF Details

YNICL Journal 2019 Journal Article

Spatial correlations exploitation based on nonlocal voxel-wise GWAS for biomarker detection of AD

Meiyan Huang
Chunyan Deng
Yuwei Yu
Tao Lian
Wei Yang
Qianjin Feng

Details DOI

YNIMG Journal 2018 Journal Article

A voxel-based analysis of neurobiological mechanisms in placebo analgesia in rats

Ying Zeng
Di Hu
Wei Yang
Emi Hayashinaka
Yasuhiro Wada
Yasuyoshi Watanabe
Qunli Zeng
Yilong Cui

Details DOI

JBHI Journal 2018 Journal Article

Lung Field Segmentation in Chest Radiographs From Boundary Maps by a Structured Edge Detector

Wei Yang
Yunbi Liu
Liyan Lin
Zhaoqiang Yun
Zhentai Lu
Qianjin Feng
Wufan Chen

Lung field segmentation in chest radiographs (CXRs) is an essential preprocessing step in automatically analyzing such images. We present a method for lung field segmentation that is built on a high-quality boundary map detected by an efficient modern boundary detector, namely a structured edge detector (SED). A SED is trained beforehand to detect lung boundaries in CXRs with manually outlined lung fields. Then, an ultrametric contour map (UCM) is transformed from the masked and marked boundary map. Finally, the contours with the highest confidence level in the UCM are extracted as lung contours. Our method is evaluated using the public Japanese Society of Radiological Technology database of scanned films. The average Jaccard index of our method is 95. 2%, which is comparable with those of other state-of-the-art methods (95. 4%). The computation time of our method is less than 0. 1 s for a 256 × 256 CXR when executed on an ordinary laptop. Our method is also validated on CXRs acquired with different digital radiography units. The results demonstrate the generalization of the trained SED model and the usefulness of our method.

Details DOI

JBHI Journal 2017 Journal Article

Classification of Multiple Finger Motions During Dynamic Upper Limb Movements

Dapeng Yang
Wei Yang
Qi Huang
Hong Liu

To better restore human hand function, advanced hand prostheses should be able to deal with a variety of daily living conditions. In this paper, we addressed myoelectric signal variations introduced by different muscle contractions, dynamic arm movements, and outer interfering forces in the practice of pattern recognition-based myoelectric control schemes. We examined four different training paradigms (data-collection protocols) and quantified their effectiveness for obtaining a robust classification. We further depicted the classification accuracy according to different arm/wrist motion primitives. Our results indicate the training paradigm that collects myoelectric signals on dynamic arm postures and varying muscular contractions (DPDE) can largely mitigate the motions' misclassification rate. The misclassification rate of finger motions seems to highly correlate to wrist pronation and supination, rather than different arm positions. Combining proprioceptive information, such as the hand's orientation, with myoelectric signals for classification only slightly alleviates the misclassification rate.

Details DOI

YNIMG Journal 2017 Journal Article

Denoise diffusion-weighted images using higher-order singular value decomposition

Xinyuan Zhang
Jie Peng
Man Xu
Wei Yang
Zhe Zhang
Hua Guo
Wufan Chen
Qianjin Feng

Details DOI

YNIMG Journal 2014 Journal Article

Brain extraction based on locally linear representation-based classification

Meiyan Huang
Wei Yang
Jun Jiang
Yao Wu
Yu Zhang
Wufan Chen
Qianjin Feng

Details DOI