Author name cluster

Kai Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

112 papers

2 author rows

EAAI Journal 2026 Journal Article

A Temporal-spatial Causal Variational Network for accurate sintering temperature forecasting in rotary kilns

Kai Wang
Hua Chen
Xiaogang Zhang
Qianyu Chen
Yuqi Cai
Lei Zhang

Accurate forecasting of sintering temperatures (ST) is pivotal to the high-efficiency, low-energy operation of rotary kilns. The complexity of coupled multivariable process data in industrial environments makes it difficult to uncover patterns and structures in the data, leading to unsatisfactory predictive performance. To accurately analyze temporal–spatial relationships among thermal process variables in rotary kilns, we analyze the causal association among variables and construct a causal graph of the sintering process according to the physicochemical mechanism of sintering. An autoregressive Causal Hidden Markov Model is introduced to model the causal relationships of variables and propagates to generate ST forecasting. In implementation, a generative recurrent neural network, Temporal–spatial Causal Variational Network (TCVN) is designed to generate the representation of hidden variables and extract ST-related features robustly. Each time step in TCVN is composed of a Causal Variational Module (CVM) that integrates a Graph Convolutional Network (GCN) with a Variational Autoencoder (VAE) based on the constructed causal graph. The experiments on real-world data demonstrate that the proposed approach effectively improves the forecasting accuracy of ST with horizons of 1, 3, 6, and 12 steps, confirming the superiority of the proposed model. • A TCVN is proposed for accurate sintering temperature forecasting in rotary kilns. • A causal graph according to the mechanism of sintering is designed. • A CVM is designed to learn the hidden variables in the causal graph. • Detailed experiments are conducted to validate the performance of the TCVN.

YNICL Journal 2026 Journal Article

Beyond the cerebral cortex: cerebellar language-related subregions contributions to fluency in post-stroke aphasia

Yuqian Zhan
Xiaohui Xie
Qiufang Ren
Xiaomin Pan
Zhishun Gao
Jin Li
Kai Wang
Tongjian Bai

Although the classical language cortex significantly contributes to post-stroke aphasia (PSA), non-language-specific cortex, such as the cerebellum, is increasingly implicated in language. However, the specific contributions of its subregions to PSA, particularly regarding distinct language dimensions, remain unclear. Given fluency as a core dimension, we investigated the functional and structural integrity of cerebellar language-related subregions to clarify their distinct roles in fluent (FA) versus non-fluent aphasia (nonFA). We enrolled a primary cohort of 81 PSA patients (46 nonFA, 35 FA), and 77 healthy controls (HCs), alongside an independent external validation cohort (Aphasia Recovery Cohort [ARC]; 23 nonFA, 22 FA). Using individualized functional connectivity (FC) and volumetric analyses based on the Multi-Domain Task Battery (MDTB) atlas, we found that nonFA patients exhibited significantly decreased FC between the classical language network (LN) and language-related cerebellar subregions (right MDTB 8 and 9; R_MDTB8/9-LN FC), alongside reduced right Crus II volume. Correlation analysis revealed that these neuroimaging indicators were positively associated with language scores in nonFA, while no such relationships were observed in FA. Furthermore, mediation analysis indicated that right Crus II volume statistically accounted for the observed association between R_MDTB8/9-LN FC and overall Aphasia Quotient (AQ). As the key findings were replicated in the ARC, our results provide compelling evidence that the functional connectivity strength and structural integrity of specific cerebellar subregions contribute to language fluency. Our findings support expanding models of PSA beyond cortical regions and suggest that cerebellar-targeted strategies may improve language rehabilitation outcomes.

EAAI Journal 2026 Journal Article

Concurrent historical data clustering and common feature learning for new-mode zero-shot industrial anomaly detection

Kai Wang
Xinlong Yuan
Xun Lang
Xiaofeng Yuan
Jie Han
Yalin Wang

Due to insufficient understanding of the new operating conditions and lack of operational experience, the new operating conditions in industrial processes are more prone to failures. Rapidly Rapid indication of anomalies in the early stage of faults is very important to production safety. However, no samples are collected for the new modes when the operation just starts. Based on the rationale that not all process correlations change from mode to mode and there exist stable similar features across different modes, we seek to mine common knowledge in multi-modal historical data and leverage it for zero-shot, zero-start industrial condition monitoring. Since no clear mode labels are available in practice, an unsupervised multi-manifold clustering and shared principal component extraction method is proposed. The class indicator matrix and the projection direction of feature subspace are simultaneously learned. A trace-ratio iterative optimization algorithm, together with a parameter initialization strategy, is proposed to accelerate convergence. A numerical example and a real multiple effect evaporator are used to verify the advantages of the proposed method.

AAAI Conference 2026 Conference Paper

EAGLE: Episodic Appearance- and Geometry-aware Memory for Unified 2D-3D Visual Query Localization in Egocentric Vision

Yifei Cao
Yu Liu
Guolong Wang
Zhu Liu
Kai Wang
Xianjie Zhang
Jizhe Yu
Xun Tu

Egocentric visual query localization is vital for embodied AI and VR/AR, yet remains challenging due to camera motion, viewpoint changes, and appearance variations. We present EAGLE, a novel framework that leverages episodic appearance- and geometry-aware memory to achieve unified 2D-3D visual query localization in egocentric vision. Inspired by avian memory consolidation, EAGLE synergistically integrates segmentation guided by an appearance-aware meta-learning memory (AMM), with tracking driven by a geometry-aware localization memory (GLM). This memory consolidation mechanism, through structured appearance and geometry memory banks, stores high-confidence retrieval samples, effectively supporting both long- and short-term modeling of target appearance variations. This enables precise contour delineation with robust spatial discrimination, leading to significantly improved retrieval accuracy. Furthermore, by integrating the VQL-2D output with a visual geometry grounded Transformer (VGGT), we achieve a efficient unification of 2D and 3D tasks, enabling rapid and accurate back-projection into 3D space. Our method achieves state-of-the-art performance on the Ego4D-VQ benchmark.

PDF Details DOI

EAAI Journal 2026 Journal Article

Enhance energy efficient ethernet with reinforcement learning based periodic strategy

Kai Wang
Wanchun Jiang
Renfu Yao
Jiawei Huang
Jianxin Wang

The strategy for Energy Efficient Ethernet determines when to enter and leave the power-saving mode, thereby directly impacting both the energy savings and the incurred latency of frames. However, due to the strong dependency of the strategy performance on the network traffic, existing strategies for Energy Efficient Ethernet need either (i) appropriately static parameter configuration under certain traffic loads, or (ii) parameter adaptation mechanisms based on the traffic prediction models that assume specific traffic distribution. Consequently, existing strategies hardly maintain consistent high performance under variable traffic in reality. To address this issue, we incorporate reinforcement learning into the design of the Energy Efficient Ethernet strategy and propose the reinforcement learning based periodic strategy (RLPS). Specifically, RLPS operates periodically with a dynamically adjusted cycle length. In each cycle, RLPS first transmits all buffered frames, then enters a selected power-saving mode for the remaining duration. Rather than directly outputting power-saving mode transition decisions, RLPS learns the time length of each cycle online for effectively capturing the impacts of traffic variation. This periodic approach enables power consumption to be optimized within each cycle using learned information while also reducing the overhead of online learning. Extensive simulations driven by synthetic traffic and real traces show that RLPS outperforms existing strategies, reducing the power consumption by up to ∼ 60. 8% while maintaining consistent high performance across diverse traffic loads and distributions.

AAAI Conference 2026 Conference Paper

HiMo-CLIP: Modeling Semantic Hierarchy and Monotonicity in Vision-Language Alignment

Ruijia Wu
Ping Chen
Fei Shen
Shaoan Zhao
Qiang Hui
Huanlin Gao
Ting Lu
Zhaoxiang Liu

Contrastive vision-language models like CLIP have achieved impressive results in image-text retrieval by aligning image and text representations in a shared embedding space. However, these models often treat text as flat sequences, limiting their ability to handle complex, compositional, and long-form descriptions. In particular, they fail to capture two essential properties of language: semantic hierarchy, which reflects the multi-level compositional structure of text, and semantic monotonicity, where richer descriptions should result in stronger alignment with visual content. To address these limitations, we propose HiMo-CLIP, a representation-level framework that enhances CLIP-style models without modifying the encoder architecture. HiMo-CLIP introduces two key components: a hierarchical decomposition (HiDe) module that extracts latent semantic components from long-form text via in-batch PCA, enabling flexible, batch-aware alignment across different semantic granularities, and a monotonicity-aware contrastive loss (MoLo) that jointly aligns global and component-level representations, encouraging the model to internalize semantic ordering and alignment strength as a function of textual completeness. These components work together to produce structured, cognitively aligned cross-modal representations. Experiments on multiple image-text retrieval benchmarks show that HiMo-CLIP consistently outperforms strong baselines, particularly under long or compositional descriptions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

KOALA: Knowledge of Optimization and Learning Algorithms for Healthcare

Kai Wang

The Knowledge of Optimization And Learning Algorithms (KOALA) group studies how to integrate optimization, machine learning, and generative modeling to enable data-driven decision-making under uncertainty. We study decision-focused learning, embedding optimization as a differentiable layer to train models end-to-end for decision quality. We design scalable reinforcement learning algorithms for population and personalized healthcare, and develop efficient bilevel optimization methods for nested and multi-agent decision-making. These directions form a unified framework linking optimization and learning for impactful AI in healthcare. Through collaborations with hospitals and NGOs, our group designs and deploys algorithms for pediatric, diabetes, maternal, and mental health applications. Looking ahead, we aim to unite these foundations with generative AI to build theoretically grounded and socially responsible algorithms that advance trustworthy, real-world AI for health and beyond.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models

Pengfei Zhou
Xiaopeng Peng
Fanrui Zhang
Zhaopan Xu
Jiaxin Ai
Yansheng Qiu
Wangbo Zhao
Jiajun Song

Multimodal large language models (MLLMs), which integrate language and visual cues for problem-solving, are crucial for advancing artificial general intelligence (AGI). However, current benchmarks for measuring the intelligence of MLLMs suffer from limited scale, narrow coverage, and unstructured knowledge, offering only static and undifferentiated evaluations. To bridge this gap, we introduce MDK12-Bench, a large-scale multidisciplinary benchmark built from real-world K–12 exams spanning six disciplines with 141K instances and 6,225 knowledge points organized in a six-layer taxonomy. Covering five question formats with difficulty and year annotations, it enables comprehensive evaluation to capture the extent to which MLLMs perform over four dimensions: 1) difficulty levels, 2) temporal (cross-year) shifts, 3) contextual shifts, and 4) knowledge-driven reasoning. We propose a novel dynamic evaluation framework that introduces unfamiliar visual, textual, and question form shifts to challenge model generalization while improving benchmark objectivity and longevity by mitigating data contamination. We further evaluate knowledge-point reference-augmented generation (KP-RAG) to examine the role of knowledge in reasoning. Key findings reveal limitations in current MLLMs in multiple aspects and provide guidance for enhancing model reasoning, robustness, and AI-assisted education.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SDNet: LiDAR Semantic Scene Completion with Sparse-Dense Fusion and Input-Aware Label Refinement

Tingming Bai
Zhiyu Xiang
Peng Xu
Tianyu Pu
Kai Wang
Eryun Liu

LiDAR Semantic Scene Completion (SSC) in autonomous driving requires predicting both dense occupancy and semantic labels from sparse input point cloud. Existing methods typically adopt cascaded architecture for feature dilation and semantic abstraction, which blurs distinctive geometric patterns and reduces feature discriminability. Moreover, given an input, conventional processing of the ground truth labels overlooks voxel predictability in the target, resulting in ill-posed supervision and discards informative voxels. To address these limitations, we propose Sparse-Dense Net (SDNet), a dual-branch architecture that processes the input points through parallel sparse and dense encoders. The complementary features are aligned and fused using a Sparse Dense Feature Fusion (SDFF) module and further refined by a Feature Propagation (FP) module. Additionally, we introduce an input-aware label refinement strategy, including Sparse-Guided Filtering (SGF) to filter unpredictable targets and Ignored Voxel Recycling (IVR) to leverage informative ignored voxels for auxiliary supervision. These innovations enhance both feature learning and label quality. Extensive experiments on SemanticKITTI and nuScenes OpenOccupancy datasets validate the effectiveness of our approach, with SDNet achieving state-of-the-art performance on both datasets and ranking 1st on the official SemanticKITTI benchmark with 42.1 mIoU, outperforming the previous best by 4.2 (+11.1\%).

PDF Details DOI

EAAI Journal 2026 Journal Article

Two-phase strategy framework for spatial prediction of landslide hazards in wide-area power linear engineering projects: the case of the China's Renewable Energy Transmission Corridors

Bijing Jin
Kunlong Yin
Taorui Zeng
Shuhao Liu
Yang Liu
Haoran Yang
Kai Wang
Lei Gui

A critical knowledge gap persists in the development of high-precision spatial prediction frameworks for landslide susceptibility assessment along wide-area linear power infrastructure. Therefore, this study develops a novel two-phase optimization framework to address this gap, focusing on China's Renewable Energy Transmission Corridors (RETCs). Phase Ⅰ employs natural breaks (optimal at 26-level grading) to address spatial heterogeneity in conditioning factors, while in Phase Ⅱ the selection of non-landslide sample is optimized based on different geological environment zones and areas with lower susceptibility levels. Six base machine learning models were evaluated, with two ensemble models (Stacking and Blending) achieving superior performance, achieving an Area Under the Curve (AUC) value exceeding 0. 88. The Blending model demonstrated peak accuracy (AUC = 0. 927), identifying 35% of transmission towers in high and very high susceptibility zones across nine provinces. The framework enables tower-specific susceptibility assessment, crucial for protecting China's 80, 000 km transmission network. These findings advance RETCs resilience by: (1) establishing continuous conditioning factor optimal grading strategy for linear infrastructure, (2) introducing a replicable non-landslide sample optimization protocol, and (3) demonstrating ensemble models superiority in energy corridor landslide susceptibility mapping. This framework provides robust support for securing stable clean energy delivery, with potential applications in global renewable energy grid landslide hazards management.

JBHI Journal 2025 Journal Article

A Dual-Branch Cross-Modality-Attention Network for Thyroid Nodule Diagnosis Based on Ultrasound Images and Contrast-Enhanced Ultrasound Videos

Jianning Chi
Jia-hui Chen
Bo Wu
Jin Zhao
Kai Wang
Xiaosheng Yu
Wenjun Zhang
Ying Huang

Contrast-enhanced ultrasound (CEUS) has been extensively employed as an imaging modality in thyroid nodule diagnosis due to its capacity to visualise the distribution and circulation of micro-vessels in organs and lesions in a non-invasive manner. However, current CEUS-based thyroid nodule diagnosis methods suffered from: 1) the blurred spatial boundaries between nodules and other anatomies in CEUS videos, and 2) the insufficient representations of the local structural information of nodule tissues by the features extracted only from CEUS videos. In this paper, we propose a novel dual-branch network with a cross-modality-attention mechanism for thyroid nodule diagnosis by integrating the information from tow related modalities, i. e. , CEUS videos and ultrasound image. The mechanism has two parts: US-attention-from-CEUS transformer (UAC-T) and CEUS-attention-from-US transformer (CAU-T). As such, this network imitates the manner of human radiologists by decomposing the diagnosis into two correlated tasks: 1) the spatio-temporal features extracted from CEUS are hierarchically embedded into the spatial features extracted from US with UAC-T for the nodule segmentation; 2) the US spatial features are used to guide the extraction of the CEUS spatio-temporal features with CAU-T for the nodule classification. The two tasks are intertwined in the dual-branch end-to-end network and optimized with the multi-task learning (MTL) strategy. The proposed method is evaluated on our collected thyroid US-CEUS dataset. Experimental results show that our method achieves the classification accuracy of 86. 92%, specificity of 66. 41%, and sensitivity of 97. 01%, outperforming the state-of-the-art methods. As a general contribution in the field of multi-modality diagnosis of diseases, the proposed method has provided an effective way to combine static information with its related dynamic information, improving the quality of deep learning based diagnosis with an additional benefit of explainability.

EAAI Journal 2025 Journal Article

A sampling interval-adaptive transformer for industrial time sequence modeling with heterogeneou s sampling rates in quality prediction

Zijian Xu
Nuo Xu
Kai Wang
Xiaofeng Yuan
Yalin Wang
Chunhua Yang
Weihua Gui
Shuqiao Cheng

The industrial data sequences frequently exhibit irregular sampling frequencies, which pose a number of difficulties for data analysis and modeling. The traditional dynamic models like Recurrent Neural Network (RNN) and Transformer are difficult to model such data sequences. The main reason is that these models assume that data sampling frequency should be constant. To this end, a Sampling Interval-Adaptive Transformer (SIA-Trans) is proposed in this paper to adaptively model the temporal information for heterogeneous sampling sequences in industrial processes. The SIA-Trans uses the sampling interval and position embedding block to address the problem of unequal time intervals and rectify the temporal correlations in time series. Then, the interval-aware self-attention net is designed for dynamic data relationship modeling, taking the processed data through the self-attention mechanism. Finally, the predicted output is obtained after the point-wise feed-forward layer. The proposed SIA-Trans is validated on a real-world hydrocracking process to predict the content of hydrocarbon mixture with five carbon atoms (C5) hydrocarbons in light naphtha, as well as the final boiling point of jet fuel.

EAAI Journal 2025 Journal Article

Boosting industrial anomaly detection performance using generated artificial fault data

Kai Wang
Jiayi Zhang
Yishun Liu
Jie Han
Xiaofeng Yuan
Le Zhou

Data-driven anomaly detection aims to learn a decision boundary, enveloping the normal region, and separating normal data from abnormal data. However, industrial data are fairly complex due to varying feedstock and unclear transfer processes and chemical reactions. This means the decision boundary will be very complex and even intractable. In addition, process variables are high-dimensional in modern industrial processes, which strengthens the difficulty of boundary extraction. Generally, the boundary should exactly exceed the outermost samples for precisely drawing normal regions. However, what we have in most situations is just normal data contaminated by unknown noises. Hence, conventional solutions that use statistical analysis to define a normal region result in a not-so-accurate decision boundary where missing alarms occur frequently. In addition to the conventional solution based entirely on historical data, i. e. , passive fault detection (PAD), an alternative detection method, active fault detection (AAD), can circumvent the above problem by stimulating system performance through the intervention of auxiliary signal. While it results in disruption of the normal operation conditions for the process, its method to enhance output performance through additional signals inspires us. In this paper, we resort to the ability of deep neural networks to fit nonlinear data and perform dimension reduction. A fault data generation strategy is proposed and the artificially generated fault data are used to regulate the model training. The new virtual fault data aids in suppressing the decision boundary closest to the outermost periphery. We propose the principles of data generation and form a network structure, implementing information fusion of genuine normal samples and virtual fault samples. Two cases demonstrate the efficiency of the proposed method.

AAAI Conference 2025 Conference Paper

CALLIC: Content Adaptive Learning for Lossless Image Compression

Daxin Li
Yuanchao Bai
Kai Wang
Junjun Jiang
Xianming Liu
Wen Gao

Learned lossless image compression has achieved significant advancements in recent years. However, existing methods often rely on training amortized generative models on massive datasets, resulting in sub-optimal probability distribution estimation for specific testing images during encoding process. To address this challenge, we explore the connection between the Minimum Description Length (MDL) principle and Parameter-Efficient Transfer Learning (PETL), leading to the development of a novel content-adaptive approach for learned lossless image compression, dubbed CALLIC. Specifically, we first propose a content-aware autoregressive self-attention mechanism by leveraging convolutional gating operations, termed Masked Gated ConvFormer (MGCF), and pretrain MGCF on training dataset. Cache then Crop Inference (CCI) is proposed to accelerate the coding process. During encoding, we decompose pretrained layers, including depth-wise convolutions, using low-rank matrices and then adapt the incremental weights on testing image by Rate-guided Progressive Fine-Tuning (RPFT). RPFT fine-tunes with gradually increasing patches that are sorted in descending order by estimated entropy, optimizing learning process and reducing adaptation time. Extensive experiments across diverse datasets demonstrate that CALLIC sets a new state-of-the-art (SOTA) for learned lossless image compression.

PDF Details DOI

YNIMG Journal 2025 Journal Article

Changes in white matter predict efficacy of repetitive transcranial magnetic stimulation in Parkinson's disease

Jinying Han
Lingling Lv
Xin Chen
Mengqi Wang
Lili Hu
Fengbo Xing
Pingping Liu
Liuzhenxiong Yu

BACKGROUND: The efficacy of continuous theta burst stimulation (cTBS) in Parkinson's disease (PD) exhibits considerable variability. Emerging evidence links changes in brain white matter (WM) activity to the onset and progression of PD, offering novel insights into its pathophysiology. OBJECTIVE: Exploring activity patterns within different WM regions to predict the therapeutic efficacy of cTBS. METHODS: This retrospective study included 68 patients with PD who underwent a 14-day cTBS targeting the supplementary motor area (25,200 pulses). Patients were classified as responders (R, n = 20) or non-responders (NR, n = 48) based on whether their UPDRS III score improved by ≥30 %. Pre-intervention differences in WM amplitude of low-frequency fluctuations (ALFF) and fractional ALFF in fMRI were analyzed, along with their correlation with motor symptom improvement. A support vector machine (SVM) model was developed to predict cTBS efficacy and validated in an independent cohort (n = 22). RESULTS: Compared to the NR group, R patients exhibited greater improvements in rigidity and axial symptoms, accompanied by lower baseline ALFF in multiple WM tracts. SVM analysis identified higher baseline UPDRS III and rigidity scores, along with reduced ALFF in the left corticospinal tract, right ILF, and left anterior thalamic radiation, as predictors of better motor outcomes. In an independent cohort, predicted and actual UPDRS III improvements showed a concordance correlation coefficient (CCC) of 0.630. A combined model incorporating rigidity scores and ILF_R ALFF achieved moderate accuracy in predicting rigidity improvement (CCC = 0.725). CONCLUSION: Baseline WM function may serve as a biomarker for predicting motor response to cTBS.

NeurIPS Conference 2025 Conference Paper

Covariances for Free: Exploiting Mean Distributions for Training-free Federated Learning

Dipam Goswami
Simone Magistri
Kai Wang
Bartłomiej Twardowski
Andrew Bagdanov
Joost van de Weijer

Using pre-trained models has been found to reduce the effect of data heterogeneity and speed up federated learning algorithms. Recent works have explored training-free methods using first- and second-order statistics to aggregate local client data distributions at the server and achieve high performance without any training. In this work, we propose a training-free method based on an unbiased estimator of class covariance matrices which only uses first-order statistics in the form of class means communicated by clients to the server. We show how these estimated class covariances can be used to initialize the global classifier, thus exploiting the covariances without actually sharing them. We also show that using only within-class covariances results in a better classifier initialization. Our approach improves performance in the range of 4-26% with exactly the same communication cost when compared to methods sharing only class means and achieves performance competitive or superior to methods sharing second-order statistics with dramatically less communication overhead. The proposed method is much more communication-efficient than federated prompt-tuning methods and still outperforms them. Finally, using our method to initialize classifiers and then performing federated fine-tuning or linear probing again yields better performance. Code is available at https: //github. com/dipamgoswami/FedCOF.

IJCAI Conference 2025 Conference Paper

DcDsDiff: Dual-Conditional and Dual-Stream Diffusion Model for Generative Image Tampering Localization

Qixian Hao
Shaozhang Niu
Jiwei Zhang
Kai Wang

Generative Image Tampering (GIT), due to its high diversity and realism, poses a significant challenge to traditional image tampering localization techniques. Consequently, this paper introduces a denoising diffusion probabilistic model-based DcDsDiff, which comprises a Dual-View Conditional Network (DVCN) and a Dual-Stream Denoising Network (DSDN). DVCN provides clues about the tampered areas. It extracts tampering features in the high-frequency view and integrates them with spatial domain features using attention mechanisms. DSDN jointly generates mask image and detail image, enhancing the generalization capability of the model against new tampering forms through iterative denoising. A multi-stream interaction mechanism enables the two generative tasks to promote each other, prompting the model to generate localization results that are rich in detail and complete. Experiments show that DcDsDiff outperforms mainstream methods in accurate localization, generalization, extensibility, and robustness. Code page: https: //github. com/QixianHao/DcDsDiff-and-GIT10K.

PDF Details DOI

JBHI Journal 2025 Journal Article

Decision Tree Extraction for Clinical Decision Support System With If-Else Pseudocode and PlanSelect Strategy

Ruihui Hou
Xiaojun Wang
Weiyan Zhang
Zhexin Song
Kai Wang
Yifei Chen
Jingping Liu
Tong Ruan

Decision trees, as a structured representation of medical knowledge, are critical resources for building clinical decision support systems. Their structured decision pathways can be used for retrieval to enhance clinical decision making. Currently, mainstream methods mainly utilize large language models and in-context learning for decision tree extraction. However, these methods often face challenges in understanding the structure of decision trees and accurately extracting the complete content of tree nodes, leading to noise in the extracted trees and ultimately impacting their effectiveness in clinical decision support system. To this end, in this paper, we propose a novel decision tree extraction framework, including two stages. In the first stage, we propose to use the If-Else pseudocode to represent the decision tree structure and design specific constraints on format and content to guide the LLM in generating outputs. In the second stage, we introduce a novel node-filling strategy called PlanSelect to match the extracted triplets with sub-sentences in the generated pseudocode, including four reasoning steps: observation, plan, action, and answer. To evaluate the effectiveness of our proposed method, we construct an English decision tree extraction dataset (EMDT) and conduct extensive experiments on the built and public datasets. Experiments on the Text2DT and EMDT datasets demonstrate that our method outperforms the current state-of-the-art approaches, achieving improvements of 1. 37% and 1. 54% on the $ER$ metric (which is lower is better), respectively. Furthermore, we use the medical decision trees extracted using our framework to improve the model's performance on clinical decision making tasks, i. e. , CMB-Clin and MedQA.

NeurIPS Conference 2025 Conference Paper

Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights

Zhiyuan Liang
Dongwen Tang
Yuhao Zhou
Xuanlei Zhao
Mingjia Shi
Wangbo Zhao
Zekai Li
Peihao Wang

Modern Parameter-Efficient Fine-Tuning (PEFT) methods such as low-rank adaptation (LoRA) reduce the cost of customizing large language models (LLMs), yet still require a separate optimization run for every downstream dataset. We introduce \textbf{Drag-and-Drop LLMs (\textit{DnD})}, a prompt-conditioned parameter generator that eliminates per-task training by mapping a handful of unlabeled task prompts directly to LoRA weight updates. A lightweight text encoder distills each prompt batch into condition embeddings, which are then transformed by a cascaded hyper-convolutional decoder into the full set of LoRA matrices. Once trained in a diverse collection of prompt-checkpoint pairs, DnD produces task-specific parameters in seconds, yielding i) up to \textbf{12, 000$\times$} lower overhead than full fine-tuning, ii) average gains up to \textbf{30\%} in performance over the strongest training LoRAs on unseen common-sense reasoning, math, coding, and multimodal benchmarks, and iii) robust cross-domain generalization improving \textbf{40\%} performance without access to the target data or labels. Our results demonstrate that prompt-conditioned parameter generation is a viable alternative to gradient-based adaptation for rapidly specializing LLMs. We open source \href{https: //jerryliang24. github. io/DnD}{our project} in support of future research.

AAAI Conference 2025 Conference Paper

Drawing Informative Gradients from Sources: A One-stage Transfer Learning Framework for Cross-city Spatiotemporal Forecasting

Yudong Zhang
Xu Wang
Xuan Yu
Zhaoyang Sun
Kai Wang
Yang Wang

Spatiotemporal forecasting (STF) is pivotal in urban computing, yet data scarcity in developing cities hampers robust model training. Addressing this, recent studies leverage transfer learning to migrate knowledge from data-rich (source) to data-poor (target) cities. This strategy, while effective, faces challenges as pre-trained models risk absorbing noise and harmful information due to data distribution disparities, potentially undermining the accuracy of forecasts for target cities. To address this issue, we propose a one-stage STF framework named Target-Skewed Joint Training (TSJT). Central to TSJT is a novel Target-Skewed Backward training strategy that selectively refines gradients from source city data, preserving only the elements that positively impact the target city. To further enhance the quality of these gradients, we have designed a Node Prompting Module (NPM). TSJT is crafted for seamless integration with existing STF models, endowing them with the capability to efficiently tackle challenges stemming from data scarcity. Experimental results on several real-world datasets from multiple cities substantiate the efficacy of TSJT in the realm of cross-city transfer learning.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

ElaD-Net: An Elastic Semantic Decoupling Network for Lesion Segmentation in Breast Ultrasound Images

Lijuan Xu
Kai Wang
Fuqiang Yu
Fenghua Tong
Mengran Li
Dawei Zhao

Breast diseases pose a significant threat to women’s health. Automatic lesion segmentation in breast ultrasound images (BUSI) plays a crucial role in fast diagnosis. While various enhanced U-Net-based models have achieved success in multi-scale feature analysis and handling blurred boundaries, two key challenges persist that could guide the improvement of BUSI segmentation networks: 1) significant fluctuations in pixel intensity distribution similarity between the lesion and surrounding tissues, and 2) inconsistent transmission of spatial detail due to multi-scale lesion sampling. These issues highlight the necessity of semantic elasticity understanding and consistency control. To this end, we propose ElaD-Net, an Elastic Semantic Decoupling Network for lesion segmentation in BUSI. This network uses the pre-trained EfficientNet-B2 for multi-scale encoding of BUSI. The decoding stage features two key modules: Elastic Semantic Decoupling (ESD) and Spatial Semantic Reconstruction (SSR). ESD learns and decouples multi-frequency semantics in multi-scale channels with a self-calibration mechanism, enabling dynamic adjustment of receptive depth to resist similarity fluctuations. SSR further optimizes ESD outputs via feature branching, compression, and excitation to ensure spatial semantic consistency, thereby separately reconstructing edge and body.

PDF Details DOI

AAAI Conference 2025 Conference Paper

FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting

Yulong Wang
Yushuo Liu
Xiaoyi Duan
Kai Wang

Multivariate time series forecasting is crucial across various industries, where accurate extraction of complex periodic and trend components can significantly enhance prediction performance. However, existing models often struggle to capture these intricate patterns. To address these challenges, we propose FilterTS, a novel forecasting model that utilizes specialized filtering techniques based on the frequency domain. FilterTS introduces a Dynamic Cross-Variable Filtering Module, a key innovation that dynamically leverages other variables as filters to extract and reinforce shared variable frequency components across variables in multivariate time series. Additionally, a Static Global Filtering Module captures stable frequency components, identified throughout the entire training set. Moreover, the model is built in the frequency domain, converting time-domain convolutions into frequency-domain multiplicative operations to enhance computational efficiency. Extensive experimental results on eight real-world datasets have demonstrated that FilterTS significantly outperforms existing methods in terms of prediction accuracy and computational efficiency.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Free-Lunch Color-Texture Disentanglement for Stylized Image Generation

Jiang Qin
Alexandra Gomez-Villa
Senmao Li
Shiqi Yang
Yaxing Wang
Kai Wang
Joost van de Weijer

Recent advances in Text-to-Image (T2I) diffusion models have transformed image generation, enabling significant progress in stylized generation using only a few style reference images. However, current diffusion-based methods struggle with \textit{fine-grained} style customization due to challenges in controlling multiple style attributes, such as color and texture. This paper introduces the first tuning-free approach to achieve free-lunch color-texture disentanglement in stylized T2I generation, addressing the need for independently controlled style elements for the Disentangled Stylized Image Generation (DisIG) problem. Our approach leverages the \textit{Image-Prompt Additivity} property in the CLIP image embedding space to develop techniques for separating and extracting Color-Texture Embeddings (CTE) from individual color and texture reference images. To ensure that the color palette of the generated image aligns closely with the color reference, we apply a whitening and coloring transformation to enhance color consistency. Additionally, to prevent texture loss due to the signal-leak bias inherent in diffusion training, we introduce a noise term that preserves textural fidelity during the Regularized Whitening and Coloring Transformation (RegWCT). Through these methods, our Style Attributes Disentanglement approach (SADis) delivers a more precise and customizable solution for stylized image generation. Experiments on images from the WikiArt and StyleDrop datasets demonstrate that, both qualitatively and quantitatively, SADis surpasses state-of-the-art stylization methods in the DisIG task.

NeurIPS Conference 2025 Conference Paper

From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

Tao Liu
Dafeng Zhang
Gengchen Li
Shizhuo Liu
yongqi song
Senmao Li
Shiqi Yang
Boqian Li

Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing $age\ accuracy$ and $identity\ preservation$—what we refer to as the $Age\text{-}ID\ trade\text{-}off$. Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a $two\text{-}pass$ face aging framework, named $Cradle2Cane$, based on few-step text-to-image (T2I) diffusion models. The first pass focuses on solving $age\ accuracy$ by introducing an adaptive noise injection ($AdaNI$) mechanism. This mechanism is guided by including prompt descriptions of age and gender for the given person as the textual condition. Also, by adjusting the noise level, we can control the strength of aging while allowing more flexibility in transforming the face. However, identity preservation is weakly ensured here to facilitate stronger age transformations. In the second pass, we enhance $identity\ preservation$ while maintaining age-specific features by conditioning the model on two identity-aware embeddings ($IDEmb$): $SVR\text{-}ArcFace$ and $Rotate\text{-}CLIP$. This pass allows for denoising the transformed image from the first pass, ensuring stronger identity preservation without compromising the aging accuracy. Both passes are $jointly\ trained\ in\ an\ end\text{-}to\text{-}end\ way\$. Extensive experiments on the CelebA-HQ test dataset, evaluated through Face++ and Qwen-VL protocols, show that our $Cradle2Cane$ outperforms existing face aging methods in age accuracy and identity consistency. Additionally, $Cradle2Cane$ demonstrates superior robustness when applied to in-the-wild human face images, where prior methods often fail. This significantly broadens its applicability to more diverse and unconstrained real-world scenarios. Code is available at https: //github. com/byliutao/Cradle2Cane.

IROS Conference 2025 Conference Paper

GIPD: Global Intent Prediction and Decomposition of Cooperative Multi-Robot System in Non-Communication Environments

Yu Zhao
Zhe Liu
Haoyu Wei
Kai Wang
Haitao Wang
Duwen Zhai
Kefan Jin
Haibin Shao

In complex multi-robot application scenarios, particularly in dynamically adversarial, hazardous, or disaster environments, traditional cooperation paradigms face significant challenges due to unreliable or absent communication links. Achieving efficient cooperation in the absence of communication has become a key bottleneck limiting the performance of multirobot systems. In this paper, we propose a Global Intent Prediction and Decomposition (GIPD) framework that enables robots to perform cooperative behavior without relying on communication. Each robot independently infers a globally consistent intent based solely on its local observations, ensuring implicit alignment across the system. Given the inferred global intent, robots autonomously determine their responsibilities and select the most appropriate tasks. They then base their local decision-making on the global intent, selected tasks, and individual observations, thereby facilitating effective execution and cooperation. We validate our approach using the MPE and SMAC benchmarks. Additionally, real-world experiments involving multiple ships demonstrate the effectiveness and practical applicability of the proposed GIPD method.

EAAI Journal 2025 Journal Article

Identification of Martian minerals based on multiscale spatial-spectral fusion network

Kai Wang
Xubing Zhang
Xianmin Wang
Zhouyuan Qian

Accurate identification of Martian surface minerals is crucial for analyzing the planet's geological environment and resource potential. Deep learning offers a powerful solution for automating Martian mineral identification. However, existing methods suffer from recognition inaccuracies due to the insufficient representation of the multiscale spectral and spatial features. In this article, a multiscale spatial-spectral fusion network (MSSFNet) is proposed to identify Martian minerals from hyperspectral images. In MSSFNet, the spectral multiscale feature extraction (Spe-MFE) module is proposed to extract multiscale spectral features from adjacent mineral bands in the spectral dimension. The spatial multiscale feature extraction (Spa-MFE) module extracts the spatial correlations of minerals from low-frequency and high-frequency features and retains more spatial information. At the end of the model, the spatial-spectral feature adaptive fusion (SSFAF) module utilizes attention-based feature weighting to fuse spectral and spatial features, improving its representation ability. Experimental results demonstrate that the proposed method achieved 99. 51% accuracy on a constructed Martian mineral identification dataset, improving the precision of deep learning models in Martian surface mineral recognition. On the benchmark hyperspectral datasets of Indian Pines and Pavia University, MSSFNet achieved overall accuracies of 98. 87% and 99. 42%, respectively, outperforming state-of-the-art methods and validating its effectiveness and superiority.

JBHI Journal 2025 Journal Article

Incomplete Multi-view Data Learning via Adaptive Embedding and Partial l 2,1 Norm Constraints for Parkinson's Disease Diagnosis

Zhongwei Huang
Kai Wang
Chao Chen
Jianxia Chen
Jun Wan
Zhi Yang
Ran Zhou
Haitao Gan

Parkinson's disease (PD) is a progressive neurodegenerative disorder characterized by mental abnormalities and motor dysfunction. Its early classification and prediction of clinical scores have been major concerns for researchers. Currently, multi-view data learning has become an essential research area due to the capacity of multiple views to provide complementary insights from various perspectives. However, the discontinuous distribution, data missing complexity, small sample size, and redundant features in multi-view datasets pose a substantial obstacle, and most existing multi-view learning methods are unable to handle these challenges effectively. In this study, we propose a novel incomplete multi-view data learning framework (IMVDL) via dynamic embedding and partial l 2, 1 norm constraints for PD diagnosis. Specifically, multi-view dynamic embedding can adapt to any view missing scene, thereby linearly/nonlinearly mapping incomplete multi-view data to low-dimensional manifold spaces and generating complete multi-view data representations. The partial l 2, 1 norm constraint can ignore larger feature weight values and perform l 2, 1 norm sparse on the remaining weights, thereby avoiding the sparse bias problem caused by larger weight values. An efficient iterative algorithm is derived to find the optimal solution of the IMVDL method. We conduct extensive experiments using multi-modal neuroimage data from the Parkinson's Progression Markers Initiative (PPMI) database. The results demonstrate that the IMVDL method is superior to other comparative methods. The source code for IMVDL is available at https://github.com/a610lab/IMVDL/.

AAAI Conference 2025 Conference Paper

InpDiffusion: Image Inpainting Localization via Conditional Diffusion Models

Kai Wang
Shaozhang Niu
Qixian Hao
Jiwei Zhang

As artificial intelligence advances rapidly, particularly with the advent of GANs and diffusion models, the accuracy of Image Inpainting Localization (IIL) has become increasingly challenging. Current IIL methods face two main challenges: a tendency towards overconfidence, leading to incorrect predictions; and difficulty in detecting subtle tampering boundaries in inpainted images. In response, we propose a new paradigm that treats IIL as a conditional mask generation task utilizing diffusion models. Our method, InpDiffusion, utilizes the denoising process enhanced by the integration of image semantic conditions to progressively refine predictions. During denoising, we employ edge conditions and introduce a novel edge supervision strategy to enhance the model's perception of edge details in inpainted objects. Balancing the diffusion model's stochastic sampling with edge supervision of tampered image regions mitigates the risk of incorrect predictions from overconfidence and prevents the loss of subtle boundaries that can result from overly stochastic processes. Furthermore, we propose an innovative Dual-stream Multi-scale Feature Extractor (DMFE) for extracting multi-scale features, enhancing feature representation by considering both semantic and edge conditions of the inpainted images. Extensive experiments across challenging datasets demonstrate that the InpDiffusion significantly outperforms existing state-of-the-art methods in IIL tasks, while also showcasing excellent generalization capabilities and robustness.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation

Huanlin Gao
Ping Chen
Fuyuan Shi
Chao Tan
Zhaoxiang Liu
Fang Zhao
Kai Wang
Shiguo Lian

We present LeMiCa, a training-free and efficient acceleration framework for diffusion-based video generation. While existing caching strategies primarily focus on reducing local heuristic errors, they often overlook the accumulation of global errors, leading to noticeable content degradation between accelerated and original videos. To address this issue, we formulate cache scheduling as a directed graph with error-weighted edges and introduce a Lexicographic Minimax Path Optimization strategy that explicitly bounds the worst-case path error. This approach substantially improves the consistency of global content and style across generated frames. Extensive experiments on multiple text-to-video benchmarks demonstrate that LeMiCa delivers dual improvements in both inference speed and generation quality. Notably, our method achieves a 2. 9× speedup on the Latte model and reaches an LPIPS score of 0. 05 on Open-Sora, outperforming prior caching techniques. Importantly, these gains come with minimal perceptual quality degradation, making LeMiCa a robust and generalizable paradigm for accelerating diffusion-based video generation. We believe this approach can serve as a strong foundation for future research on efficient and reliable video synthesis.

EAAI Journal 2025 Journal Article

Multi-step difference-driven domain adversarial network for few-sample fault detection in dynamic industrial systems

Ruiyi Fang
Kai Wang
Xiaofeng Yuan
Zeyu Yang
Yalin Wang
Chunhua Yang

The escalating production demands for manufacture result in heightened complexity in industrial processes, which leads to frequent changes in operating conditions, thus making few-sample scenarios commonplace. Although many deep learning methods achieve good performance in fault detection tasks, they mostly rely on sufficient data. Therefore, the lack of adequate data presents challenges for accurately representing the process. Moreover, the inherent interplay in processes and among equipment often manifests in data with dynamic characteristic. To address these challenges, we propose a dynamic domain adversarial network (DDAN) for dynamic few-sample fault detection. DDAN based on knowledge transfer, aiming to facilitate modeling data-poor domain with cross-domain information from the data-rich domain. It consists of three main components, a feature extractor, a data reconstructor, and a domain discriminator. To effectively extract features from dynamic samples in few-sample scenarios, a multi-step difference method is introduced. Combined with self-attention, the feature extractor highlights the most significant difference block in the dynamic representations. The output of data reconstructor is utilized for fault detection tasks, while the domain discriminator is applied for domain adaptation with a rebalancing loss. The proposed method is validated on a numerical case and a real-world alumina evaporation process. The experimental results demonstrate an average improvement in the fault detection rate of 2. 9%, with an improvement exceeding 6% for latent variable faults.

PRL Workshop 2025 Workshop Paper

Networked Restless Multi-Arm Bandits with Reinforcement Learnin

Hanmo Zhang
Kai Wang

Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in public health challenges such as resource allocation and intervention optimization. However, traditional RMABs assume independence among arms, limiting their ability to account for interactions between individuals, which can be common and significant in a real-world environment. This paper introduces Networked RMAB, a novel framework that integrates the RMAB model with the independent cascade model to capture interactions between arms in networked environments. We establish the submodularity of Bellman’s Equation for that model, enabling efficient policy design and proposing a Q-learning algorithm to account for the networked setting. Initial experimental results demonstrate that our networkaware approach outperforms a network-blind approach, highlighting the importance of capturing and leveraging network effects where they exist.

NeurIPS Conference 2025 Conference Paper

Neural-Driven Image Editing

Pengfei Zhou
Jie Xia
Xiaopeng Peng
Wangbo Zhao
Zilong Ye
Zekai Li
Suorong Yang
Jiadong Pan

Traditional image editing typically relies on manual prompting, making it labor-intensive and inaccessible to individuals with limited motor control or language abilities. Leveraging recent advances in brain-computer interfaces (BCIs) and generative models, we propose LoongX, a hands-free image editing approach driven by multimodal neurophysiological signals. LoongX utilizes state-of-the-art diffusion models trained on a comprehensive dataset of 23, 928 image editing pairs, each paired with synchronized electroencephalography (EEG), functional near-infrared spectroscopy (fNIRS), photoplethysmography (PPG), and head motion signals that capture user intent. To effectively address the heterogeneity of these signals, LoongX integrates two key modules. The cross-scale state space (CS3) module encodes informative modality-specific features. The dynamic gated fusion (DGF) module further aggregates these features into a unified latent space, which is then aligned with edit semantics via fine-tuning on a diffusion transformer (DiT). Additionally, we pre-train the encoders using contrastive learning to align cognitive states with semantic intentions from embedded natural language. Extensive experiments demonstrate that LoongX achieves performance comparable to text-driven methods (CLIP-I: 0. 6605 vs. 0. 6558; DINO: 0. 4812 vs. 0. 4637) and outperforms them when neural signals are combined with speech (CLIP-T: 0. 2588 vs. 0. 2549). These results highlight the promise of neural-driven generative models in enabling accessible, intuitive image editing and open new directions for cognitive-driven creative technologies. The code and dataset are released on the project website: https: //loongx1. github. io.

NeurIPS Conference 2025 Conference Paper

Pruning-Robust Mamba with Asymmetric Multi-Scale Scanning Paths

Jindi Lv
Yuhao Zhou
Mingjia Shi
Zhiyuan Liang
Panpan Zhang
Xiaojiang Peng
Wangbo Zhao
Zheng Zhu

Mamba has proven efficient for long-sequence modeling in vision tasks. However, when token reduction techniques are applied to improve efficiency, Mamba-based models exhibit drastic performance degradation compared to Vision Transformers (ViTs). This decline is potentially attributed to Mamba's chain-like scanning mechanism, which we hypothesize not only induces cascading losses in token connectivity but also limits the diversity of spatial receptive fields. In this paper, we propose Asymmetric Multi-scale Vision Mamba (AMVim), a novel architecture designed to enhance pruning robustness. AMVim employs a dual-path structure, integrating a window-aware scanning mechanism into one path while retaining sequential scanning in the other. This asymmetry design promotes token connection diversity and enables multi-scale information flow, reinforcing spatial awareness. Empirical results demonstrate that AMVim achieves state-of-the-art pruning robustness. During token reduction, AMVim-T achieves a substantial 34\% improvement in training-free accuracy with identical model sizes and FLOPs. Meanwhile, AMVim-S exhibits only a 1. 5\% accuracy drop, performing comparably to ViT. Notably, AMVim also delivers superior performance during pruning-free settings, further validating its architectural advantages.

NeurIPS Conference 2025 Conference Paper

REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Ziqiao Wang
Wangbo Zhao
Yuhao Zhou
Zekai Li
Zhiyuan Liang
Mingjia Shi
Xuanlei Zhao
Pengfei Zhou

Diffusion Transformers (DiTs) deliver state-of-the-art image quality, yet their training remains notoriously slow. A recent remedy---representation alignment (REPA) that matches DiT hidden features to those of a non-generative teacher (e. g. , DINO)---dramatically accelerates the early epochs but plateaus or even degrades performance later. We trace this failure to the capacity mismatch: once the generative student begins modeling the joint data distribution, the teacher's lower-dimensional embeddings and attention patterns become a straitjacket rather than a guide. We then introduce HASTE (Holistic Alignment with Stage-wise Termination for Efficient training), a two-phase schedule that keeps the help and drops the hindrance. Phase I applies a holistic alignment loss that simultaneously distills attention maps (relational priors) and feature projections (semantic anchors) from the teacher into mid-level layers of the DiT, yielding rapid convergence. Phase II then performs one-shot termination that deactivates the alignment loss, once a simple trigger such as a fixed iteration is hit, freeing the DiT to focus on denoising and exploit its generative capacity. HASTE speeds up training of diverse DiTs without architecture changes. On ImageNet 256×256, it reaches the vanilla SiT-XL/2 baseline FID in 50 epochs and matches REPA’s best FID in 500 epochs, amounting to a 28× reduction in optimization steps. HASTE also improves text-to-image DiTs on MS-COCO, proving to be a simple yet principled recipe for efficient diffusion training across various tasks.

NeurIPS Conference 2025 Conference Paper

Scaling Up Parameter Generation: A Recurrent Diffusion Approach

Kai Wang
Dongwen Tang
Wangbo Zhao
Konstantin Schürholt
Zhangyang "Atlas" Wang
Yang You

Parameter generation has long struggled to match the scale of today's large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large-Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters—up to hundreds of millions—on a single GPU. Our approach first partitions a network's parameters into non-overlapping 'tokens', each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter-token relationships, producing 'prototypes' which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO, and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open-ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in 'AI generating AI', potentially enabling efficient weight generation at scales previously deemed infeasible.

AAAI Conference 2025 Conference Paper

Single-View Graph Contrastive Learning with Soft Neighborhood Awareness

Qingqiang Sun
Chaoqi Chen
Ziyue Qiao
Xubin Zheng
Kai Wang

Most graph contrastive learning (GCL) methods heavily rely on cross-view contrast, thus facing several concomitant challenges, such as the complexity of designing effective augmentations, the potential for information loss between views, and increased computational costs. To mitigate reliance on cross-view contrasts, we propose SIGNA, a novel single-view graph contrastive learning framework. Regarding the inconsistency between structural connection and semantic similarity of neighborhoods, we resort to soft neighborhood awareness for GCL. Specifically, we leverage dropout to obtain structurally-related yet randomly-noised embedding pairs for neighbors, which serve as potential positive samples. At each epoch, the role of partial neighbors is switched from positive to negative, leading to probabilistic neighborhood contrastive learning effect. Moreover, we propose a normalized Jensen-Shannon divergence estimator for a better effect of contrastive learning. Experiments on diverse node-level tasks demonstrate that our simple single-view GCL framework consistently outperforms existing methods by margins of up to 21.74% (PPI). In particular, with soft neighborhood awareness, SIGNA can adopt MLPs instead of complicated GCNs as the encoder in transductive learning tasks, thus speeding up its inference process by 109× to 331×.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training

Ziming Liu
Shaoyu Wang
Shenggan Cheng
Zhongkai Zhao
Kai Wang
Xuanlei Zhao
James Demmel
Yang You

Training Transformer models on long sequences in a distributed setting poses significant challenges in terms of efficiency and scalability. Current methods are either constrained by the number of attention heads or excessive communication overheads. To address this problem, we propose StarTrail, a multi-dimensional concentric distributed training system for long sequences, fostering an efficient communication paradigm and providing additional tuning flexibility for communication arrangements. Specifically, StarTrail introduces an extra parallel dimension and divides the peer-to-peer communication into sub-rings to substantially reduce communication volume and avoid bandwidth bottlenecks. Through comprehensive experiments across diverse hardware environments and on both Natural Language Processing (NLP) and Computer Vision (CV) tasks, we demonstrate that our approach significantly surpasses state-of-the-art methods that support Long sequence lengths, achieving performance improvements of up to 77. 12% on GPT-style models and up to 114. 33% on DiT (Diffusion Transformer) models without affecting the computations results.

IJCAI Conference 2025 Conference Paper

Time-Frequency Disentanglement Boosted Pre-Training: A Universal Spatio-Temporal Modeling Framework

Yudong Zhang
Zhaoyang Sun
Xu Wang
Xuan Yu
Kai Wang
Yang Wang

Current spatio-temporal modeling techniques largely rely on the abundant data and the design of task-specific models. However, many cities lack well-established digital infrastructures, making data scarcity and the high cost of model development significant barriers to application deployment. Therefore, this work aims to enable spatio-temporal learning to cope with the problems of few-shot data modeling and model generalizability. To this end, we propose a Universal Spatio-Temporal Correlationship pre-training framework (USTC), for spatio-temporal modeling across different cities and tasks. To enhance the spatio-temporal representations during pre-training, we propose to decouple the time-frequency patterns within data, and leverage contrastive learning to maintain the time-frequency consistency. To further improve the adaptability to downstream tasks, we design a prompt generation module to mine personalized spatio-temporal patterns on the target city, which can be integrated with the learned common spatio-temporal representations to collaboratively serve downstream tasks. Extensive experiments conducted on real-world datasets demonstrate that USTC significantly outperforms the advanced baselines in forecasting, imputation, and extrapolation across cities.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Towards Graph Foundation Models: Training on Knowledge Graphs Enables Transferability to General Graphs

Kai Wang
Siqiang Luo
Caihua Shan
Yifei Shen

Inspired by the success of large language models, there is a trend toward developing graph foundation models to conduct diverse downstream tasks in various domains. However, current models often require extra fine-tuning to apply their learned structural and semantic representations to new graphs, which limits their versatility. Recent breakthroughs in zero-shot inductive reasoning on knowledge graphs (KGs), offer us a new perspective on extending KG reasoning to general graph applications. In this paper, we introduce SCR, a unified graph reasoning framework designed to train on knowledge graphs and effectively generalize across a wide range of graph tasks and domains. We begin by designing the task-specific KG structures to establish a unified topology for different task formats. Then we propose semantic-conditioned message passing, a novel mechanism addressing the inherent semantic isolation in traditional KG reasoning, by jointly modeling structural and semantic invariance patterns in graph representations. Evaluated on 38 diverse datasets spanning node-, link-, and graph-level tasks, SCR achieves substantial performance gains over existing foundation models and supervised baselines, demonstrating its remarkable efficacy and adaptability.

NeurIPS Conference 2025 Conference Paper

X-Field: A Physically Informed Representation for 3D X-ray Reconstruction

Feiran Wang
Jiachen Tao
Junyi Wu
Haoxuan Wang
Bin Duan
Kai Wang
Zongxin Yang
Yan Yan

X-ray imaging is indispensable in medical diagnostics, yet its use is tightly regulated due to radiation exposure. Recent research borrows representations from the 3D reconstruction area to complete two tasks with reduced radiation dose: X-ray Novel View Synthesis (NVS) and Computed Tomography (CT) reconstruction. However, these representations fail to fully capture the penetration and attenuation properties of X-ray imaging as they originate from visible light imaging. In this paper, we introduce X-Field, a 3D representation informed in the physics of X-ray imaging. First, we employ homogeneous 3D ellipsoids with distinct attenuation coefficients to accurately model diverse materials within internal structures. Second, we introduce an efficient path-partitioning algorithm that resolves the intricate intersection of ellipsoids to compute cumulative attenuation along an X-ray path. We further propose a hybrid progressive initialization to refine the geometric accuracy of X-Field and incorporate material-based optimization to enhance model fitting along material boundaries. Experiments show that X-Field achieves superior visual fidelity on both real-world human organ and synthetic object datasets, outperforming state-of-the-art methods in X-ray NVS and CT Reconstruction. Our code is available on the project page: https: //github. com/Brack-Wang/X-Field.

EAAI Journal 2024 Journal Article

A task-oriented deep learning framework based on target-related transformer network for industrial quality prediction applications

Yalin Wang
Rao Dai
Diju Liu
Kai Wang
Xiaofeng Yuan
Chenliang Liu

Executing various production tasks is critical to the safe operation and efficient production of industrial processes. As one of them, the detection task of key quality variables directly affects the operation optimization and decision-making of industrial processes, but it is severely limited by the harsh environment and detection instruments. Therefore, the real-time prediction task of key quality variables becomes the basis for optimal control of industrial processes. To address this issue, this paper proposes a task-oriented deep learning framework based on a target-related transformer (TR-Former) network for industrial quality prediction tasks. Specifically, a new target-related self-attention (TR-SA) mechanism is developed to guide feature learning by adding attention scores between task-related target variables and other variables. As a result, the learned features in this instance will be guaranteed to be relevant to the target variable and useful for the quality prediction task. Moreover, the long-range dynamics of industrial process data can also be captured, which can further improve the prediction performance of the model. Finally, extensive experiments were conducted on two industrial processes to validate the superiority of the proposed method in terms of quality prediction tasks. The experimental results demonstrate that the proposed TR-Former method exhibits an improvement ranging from 3% to 13% in the mean absolute error indicator compared to the traditional transformer and other state-of-the-art methods.

NeurIPS Conference 2024 Conference Paper

Aligning Large Language Models with Representation Editing: A Control Perspective

Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang

Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabilities. To address these challenges, we propose aligning LLMs through representation editing. The core of our method is to view a pre-trained autoregressive LLM as a discrete-time stochastic dynamical system. To achieve alignment for specific objectives, we introduce external control signals into the state space of this language dynamical system. We train a value function directly on the hidden states according to the Bellman equation, enabling gradient-based optimization to obtain the optimal control signals at test time. Our experiments demonstrate that our method outperforms existing test-time alignment techniques while requiring significantly fewer resources compared to fine-tuning methods. Our code is available at https: //github. com/Lingkai-Kong/RE-Control.

PDF Details DOI

YNICL Journal 2024 Journal Article

Alterations in neural circuit dynamics between the limbic network and prefrontal/default mode network in patients with generalized anxiety disorder

Xiaonan Pang
Siyu Fan
Yulin Zhang
Ting Zhang
Qiangqiang Hou
Yue Wu
Ye Zhang
Yanghua Tian

BACKGROUND: Widespread functional alterations have been implicated in patients with generalized anxiety disorder (GAD). However, most studies have primarily focused on static brain network features in patients with GAD. The current research focused on exploring the dynamics within functional brain networks among individuals diagnosed with GAD. METHODS: Seventy-five participants were divided into patients with GAD and healthy controls (HCs), and resting-state functional magnetic resonance imaging data were collected. The severity of symptoms was measured using the Hamilton Anxiety Scale and the Patient Health Questionnaire. Co-activation pattern (CAP) analysis, centered on the bed nucleus of the stria terminalis, was applied to explore network dynamics. The capability of these dynamic characteristics to distinguish between patients with GAD and HCs was evaluated using a support vector machine. RESULTS: Patients with GAD exhibited disruptions in the limbic-prefrontal and limbic-default-mode network circuits. Particularly noteworthy was the marked reduction in dynamic indicators such as occurrence, EntriesFromBaseline, ExitsToBaseline, in-degree, out-degree, and resilience. Moreover, these decreased dynamic features effectively distinguished the GAD group from the HC in this study. CONCLUSIONS: The current findings revealed the underlying brain networks associated with compromised emotion regulation in individuals with GAD. The dynamic reduction in connectivity between the limbic-default mode network and limbic-prefrontal networks could potentially act as a biomarker and therapeutic target for GAD in the future.

EAAI Journal 2024 Journal Article

Anomaly detection using large-scale multimode industrial data: An integration method of nonstationary kernel and autoencoder

Kai Wang
Caoyin Yan
Yanfang Mo
Yalin Wang
Xiaofeng Yuan
Chenliang Liu

Kernel methods and neural networks (NNs) are two mainstream nonlinear data modeling methods and have been widely applied to industrial process monitoring. However, they both present imperfect properties, so the relevant applications are limited. On the one hand, kernels are not so reconstructable, scalable, and robust to hyperparameters that they suffer performance degradation for large-scale data modeling and monitoring. On the other hand, the high-dimensional parameter space of NNs that is sorted to parameter initialization presents severe anomaly detection performance inconsistency, which makes the industry cautious about using NNs. Motivated by these facts, we propose to integrate kernels and NNs, forming a new model structure that is scalable, reconstructable, and performance-consistent. Specifically, a novel autoencoder-based nonstationary pattern selection kernel (AE-NPSK) is proposed by (1) selecting from the training set the critical edges and interior data as the centers of the radial basis functions in the hidden layers and (2) adaptively adjusting the kernel width in the training procedure. Also, the new NN has strong performance consistency, which facilitates the search for optimal parameters. Finally, we test the performance of the proposed method on the challenging multimode processes. The results validate the efficacy of the proposed method.

TMLR Journal 2024 Journal Article

Audio-Visual Dataset Distillation

Saksham Singh Kushwaha
Siva Sai Nagender Vasireddy
Kai Wang
Yapeng Tian

In this article, we introduce \textit{audio-visual dataset distillation}, a task to construct a smaller yet representative synthetic audio-visual dataset that maintains the cross-modal semantic association between audio and visual modalities. Dataset distillation techniques have primarily focused on image classification. However, with the growing capabilities of audio-visual models and the vast datasets required for their training, it is necessary to explore distillation methods beyond the visual modality. Our approach builds upon the foundation of Distribution Matching (DM), extending it to handle the unique challenges of audio-visual data. A key challenge is to jointly learn synthetic data that distills both the modality-wise information and natural alignment from real audio-visual data. We introduce a vanilla audio-visual distribution matching framework that separately trains visual-only and audio-only DM components, enabling us to investigate the effectiveness of audio-visual integration and various multimodal fusion methods. To address the limitations of unimodal distillation, we propose two novel matching losses: implicit cross-matching and cross-modal gap matching. These losses work in conjunction with the vanilla unimodal distribution matching loss to enforce cross-modal alignment and enhance the audio-visual dataset distillation process. Extensive audio-visual classification and retrieval experiments on four audio-visual datasets, AVE, MUSIC-21, VGGSound, and VGGSound-10K, demonstrate the effectiveness of our proposed matching approaches and validate the benefits of audio-visual integration with condensed data. This work establishes a new frontier in audio-visual dataset distillation, paving the way for further advancements in this exciting field. \textit{Our source code and pre-trained models will be released}.

ICRA Conference 2024 Conference Paper

Automatic Captioning based on Visible and Infrared Images

Yan Wang
Shuli Lou
Kai Wang
Yunzhe Wang
Xiaohu Yuan
Huaping Liu

In this paper, we tackle the task of image captioning with the complementarity of visible light images and infrared images. To address this problem, we propose an RGBIR image fusion captioning model, which can take full advantage of visible light images and infrared images under different conditions. Meanwhile, we develop a wearable environment-assisted system. In addition, we collect and annotate a new dataset containing 3510 pairs of RGB-IR images to support model training. Finally, we conduct extensive experiments to evaluate the model and system. Experimental results show that our new method and system significantly outperform baselines on multiple metrics and have potential practical value.

NeurIPS Conference 2024 Conference Paper

Causal Deciphering and Inpainting in Spatio-Temporal Dynamics via Diffusion Model

Yifan Duan
Jian Zhao
Junyuan Mao
Hao Wu
Jingyu Xu
Shilong Wang
Caoyuan Ma
Kai Wang

Spatio-temporal (ST) prediction has garnered a De facto attention in earth sciences, such as meteorological prediction, human mobility perception. However, the scarcity of data coupled with the high expenses involved in sensor deployment results in notable data imbalances. Furthermore, models that are excessively customized and devoid of causal connections further undermine the generalizability and interpretability. To this end, we establish a causal framework for ST predictions, termed CaPaint, which targets to identify causal regions in data and endow model with causal reasoning ability in a two-stage process. Going beyond this process, we utilize the back-door adjustment to specifically address the sub-regions identified as non-causal in the upstream phase. Specifically, we employ a novel image inpainting technique. By using a fine-tuned unconditional Diffusion Probabilistic Model (DDPM) as the generative prior, we in-fill the masks defined as environmental parts, offering the possibility of reliable extrapolation for potential data distributions. CaPaint overcomes the high complexity dilemma of optimal ST causal discovery models by reducing the data generation complexity from exponential to quasi-linear levels. Extensive experiments conducted on five real-world ST benchmarks demonstrate that integrating the CaPaint concept allows models to achieve improvements ranging from 4. 3% to 77. 3%. Moreover, compared to traditional mainstream ST augmenters, CaPaint underscores the potential of diffusion models in ST enhancement, offering a novel paradigm for this field. Our project is available at https: //anonymous. 4open. science/r/12345-DFCC.

PDF Details DOI

YNICL Journal 2024 Journal Article

Cortical morphological alterations in adolescents with major depression and non-suicidal self-injury

Xiaonan Pang
Dongpeng Wu
Hongping Wang
Jiahua Zhang
Yue Yu
Yue Zhao
Qianqian Li
Liangping Ni

BACKGROUND: Non-suicidal self-injury (NSSI) involves repetitive self-harm without suicidal intent and is common among adolescents, often linked to major depressive disorder (MDD). NSSI can lead to physical harm, cognitive impairments, interpersonal issues, violent behavior, and increased risks of psychological disorders and suicide attempts later in life. METHODS: Voxel-based morphometry (VBM) and surface-based morphometry (SBM) were performed on 44 NSSI patients and 44 healthy controls (HCs). Differences in GMV, CT, and cortical complexity were compared using the two-sample t-tests and correlated with neuropsychological scales. RESULTS: NSSI patients exhibited significant GMV atrophy in multiple regions, including the left insula, left anterior cingulate cortex, left putamen, left middle frontal gyrus, and right superior frontal gyrus showing increased GMV in the cerebellum posterior lobe. NSSI patients had increased CT in multiple left hemisphere regions and decreased CT in the right middle frontal gyrus. Additionally, they exhibited reduced cortical complexity, including decreased SD in the right frontal gyrus, and lower GI in the left insula. There were no significant differences between the two groups in terms of fractal dimension (FD). NSSI patients showed negative correlation between the CT of the right middle frontal gyrus and the anger dimension of the BPAQ, as well as the SD of the right superior frontal gyrus and the hostility dimension of the BPAQ. CONCLUSION: NSSI patients have significant structural changes in the insular cortex, prefrontal cortex, precentral and postcentral gyrus, temporal lobe, putamen, and anterior cingulate cortex, offering a morphological perspective on the pathophysiology of NSSI in MDD.

NeurIPS Conference 2024 Conference Paper

Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

Wangbo Zhao
Jiasheng Tang
Yizeng Han
Yibing Song
Kai Wang
Gao Huang
Fan Wang
Yang You

Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper, we propose Dynamic Tuning (DyT), a novel approach to improve both parameter and inference efficiency for ViT adaptation. Specifically, besides using the lightweight adapter modules, we propose a token dispatcher to distinguish informative tokens from less important ones, allowing the latter to dynamically skip the original block, thereby reducing the redundant computation during inference. Additionally, we explore multiple design variants to find the best practice of DyT. Finally, inspired by the mixture-of-experts (MoE) mechanism, we introduce an enhanced adapter to further boost the adaptation performance. We validate DyT across various tasks, including image/video recognition and semantic segmentation. For instance, DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28, 618 coarse-grained and 4, 487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7. 83) and Label Overlap (6. 25) on EMER, an F1 score of 0. 9036 on MER2023-SEMI challenge, and the highest UAR (45. 59) and WAR (59. 37) in zero-shot evaluations on DFEW dataset.

PDF Details DOI

AAAI Conference 2024 Conference Paper

EnMatch: Matchmaking for Better Player Engagement via Neural Combinatorial Optimization

Kai Wang
Haoyu Liu
Zhipeng Hu
Xiaochuan Feng
Minghao Zhao
Shiwei Zhao
Runze Wu
Xudong Shen

Matchmaking is a core task in e-sports and online games, as it contributes to player engagement and further influences the game's lifecycle. Previous methods focus on creating fair games at all times. They divide players into different tiers based on skill levels and only select players from the same tier for each game. Though this strategy can ensure fair matchmaking, it is not always good for player engagement. In this paper, we propose a novel Engagement-oriented Matchmaking (EnMatch) framework to ensure fair games and simultaneously enhance player engagement. Two main issues need to be addressed. First, it is unclear how to measure the impact of different team compositions and confrontations on player engagement during the game considering the variety of player characteristics. Second, such a detailed consideration on every single player during matchmaking will result in an NP-hard combinatorial optimization problem with non-linear objectives. In light of these challenges, we turn to real-world data analysis to reveal engagement-related factors. The resulting insights guide the development of engagement modeling, enabling the estimation of quantified engagement before a match is completed. To handle the combinatorial optimization problem, we formulate the problem into a reinforcement learning framework, in which a neural combinatorial optimization problem is built and solved. The performance of EnMatch is finally demonstrated through the comparison with other state-of-the-art methods based on several real-world datasets and online deployments on two games.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

First-Order Methods for Linearly Constrained Bilevel Optimization

Guy Kornowski
Swati Padmanabhan
Kai Wang
Jimmy Zhang
Suvrit Sra

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $\epsilon$-stationarity in $\widetilde{O}(\epsilon^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(\delta, \epsilon)$-Goldstein stationarity in $\widetilde{O}(d{\delta^{-1} \epsilon^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $\widetilde{O}({\delta^{-1} \epsilon^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. Our numerical experiments verify these guarantees.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

GDeR: Safeguarding Efficiency, Balancing, and Robustness via Prototypical Graph Pruning

Guibin Zhang
Haonan Dong
Yuchen Zhang
Zhixun Li
Dingshuo Chen
Kai Wang
Tianlong Chen
Yuxuan Liang

Training high-quality deep models necessitates vast amounts of data, resulting in overwhelming computational and memory demands. Recently, data pruning, distillation, and coreset selection have been developed to streamline data volume by \textit{retaining}, \textit{synthesizing}, or \textit{selecting} a small yet informative subset from the full set. Among these methods, data pruning incurs the least additional training cost and offers the most practical acceleration benefits. However, it is the most vulnerable, often suffering significant performance degradation with imbalanced or biased data schema, thus raising concerns about its accuracy and reliability in on-device deployment. Therefore, there is a looming need for a new data pruning paradigm that maintains the efficiency of previous practices while ensuring balance and robustness. Unlike the fields of computer vision and natural language processing, where mature solutions have been developed to address these issues, graph neural networks (GNNs) continue to struggle with increasingly large-scale, imbalanced, and noisy datasets, lacking a unified dataset pruning solution. To achieve this, we introduce a novel dynamic soft-pruning method, \ourmethod, designed to update the training ``basket'' during the process using trainable prototypes. \ourmethod first constructs a well-modeled graph embedding hypersphere and then samples \textit{representative, balanced, and unbiased subsets} from this embedding space, which achieves the goal we called {\fontfamily{lmtt}\selectfont \textbf{Graph Training Debugging}}. Extensive experiments on four datasets across three GNN backbones, demonstrate that \ourmethod (I) achieves or surpasses the performance of the full dataset with $30\%\sim50\%$ fewer training samples, (II) attains up to a $2. 81\times$ lossless training speedup, and (III) outperforms state-of-the-art pruning methods in imbalanced training and noisy training scenarios by $0. 3\%\sim4. 3\%$ and $3. 6\%\sim7. 8\%$, respectively.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
Shiguang Guo
Weiming Ren
Aaran Arulraj

In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains. However, as models continue to improve, their performance on these benchmarks has begun to plateau, making it increasingly difficult to discern differences in model capabilities. This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning-focused questions and expanding the choice set from four to ten options. Additionally, MMLU-Pro eliminates part of the trivial and noisy questions in MMLU. Our experimental results show that MMLU-Pro not only raises the challenge, causing a significant drop in accuracy by 16\% to 33\% compared to MMLU, but also demonstrates greater stability under varying prompts. With 24 different prompt styles tested, the sensitivity of model scores to prompt variations decreased from 4-5\% in MMLU to just 2\% in MMLU-Pro. Additionally, we found that models utilizing Chain of Thought (CoT) reasoning achieved better performance on MMLU-Pro compared to direct answering, which is in stark contrast to the findings on the original MMLU, indicating that MMLU-Pro includes more complex reasoning questions. Our assessments confirm that MMLU-Pro is more discriminative benchmark to better track progress in the field.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

Mutual Information as Intrinsic Reward of Reinforcement Learning Agents for On-demand Ride Pooling

Xianjie Zhang
Jiahao Sun
Chen Gong
Kai Wang
Yifei Cao
Hao Chen
Yu Liu

The emergence of on-demand ride pooling services allows each vehicle to serve multiple passengers at a time, thus increasing drivers’ income and enabling passengers to travel at lower prices than taxi/car on-demand services. Although on-demand ride pooling services can bring so many benefits, ride pooling services need a well-defined matching strategy to maximize the benefits for all parties (passengers, drivers, aggregation companies and environment), especially the regional dispatching of vehicles has a significant impact on matching and revenue. Existing algorithms often only consider revenue maximization, which makes it difficult for requests with unusual distribution to get rides. How to increase revenue while ensuring a reasonable assignment of requests brings a challenge to ride pooling service companies (aggregation companies). In this paper, we propose a framework for vehicle dispatching for ride pooling tasks, which splits the city into discrete dispatching regions and uses the reinforcement learning (RL) algorithm to dispatch vehicles in these regions. We also consider the mutual information (MI) between vehicle and request distribution as the intrinsic reward of the RL algorithm to improve the correlation between their distributions, thus ensuring the possibility of getting a ride for unusually distributed requests. In experimental results on a real-world taxi dataset, we demonstrate that our framework can significantly increase revenue up to an average of 3% over the existing best on-demand ride pooling method. ∗Corresponding author This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), N. Alechina, V. Dignum, M. Dastani, J. S. Sichman (eds.), May 6 – 10, 2024, Auckland, New Zealand. © 2024 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

ICRA Conference 2024 Conference Paper

NGEL-SLAM: Neural Implicit Representation-based Global Consistent Low-Latency SLAM System

Yunxuan Mao
Xuan Yu
Zhuqing Zhang
Kai Wang
Yue Wang 0020
Rong Xiong
Yiyi Liao

Neural implicit representations have emerged as a promising solution for providing dense geometry in Simultaneous Localization and Mapping (SLAM). However, existing methods in this direction fall short in terms of global consistency and low latency. This paper presents NGEL-SLAM to tackle the above challenges. To ensure global consistency, our system leverages a traditional feature-based tracking module that incorporates loop closure. Additionally, we maintain a global consistent map by representing the scene using multiple neural implicit fields, enabling quick adjustment to the loop closure. Moreover, our system allows for fast convergence through the use of octree-based implicit representations. The combination of rapid response to loop closure and fast convergence makes our system a truly low-latency system that achieves global consistency. Our system enables rendering high-fidelity RGB-D images, along with extracting dense and complete surfaces. Experiments on both synthetic and real-world datasets suggest that our system achieves state-of-the-art tracking and mapping accuracy while maintaining low latency.

JBHI Journal 2024 Journal Article

NKUT: Dataset and Benchmark for Pediatric Mandibular Wisdom Teeth Segmentation

Zhenhuan Zhou
Yuzhu Chen
Along He
Xitao Que
Kai Wang
Rui Yao
Tao Li

Germectomy is a common surgery in pediatric dentistry to prevent the potential dangers caused by impacted mandibular wisdom teeth. Segmentation of mandibular wisdom teeth is a crucial step in surgery planning. However, manually segmenting teeth and bones from 3D volumes is time-consuming and may cause delays in treatment. Deep learning based medical image segmentation methods have demonstrated the potential to reduce the burden of manual annotations, but they still require a lot of well-annotated data for training. In this paper, we initially curated a Cone Beam Computed Tomography (CBCT) dataset, NKUT, for the segmentation of pediatric mandibular wisdom teeth. This marks the first publicly available dataset in this domain. Second, we propose a semantic separation scale-specific feature fusion network named WTNet, which introduces two branches to address the teeth and bones segmentation tasks. In WTNet, We design a Input Enhancement (IE) block and a Teeth-Bones Feature Separation (TBFS) block to solve the feature confusions and semantic-blur problems in our task. Experimental results suggest that WTNet performs better on NKUT compared to previous state-of-the-art segmentation methods (such as TransUnet), with a maximum DSC lead of nearly 16%.

NeurIPS Conference 2024 Conference Paper

Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability, Reproducibility, and Practicality

Tianle Zhang
Langtian Ma
Yuchen Yan
Yuchen Zhang
Kai Wang
Yue Yang
Ziyao Guo
Wenqi Shao

Recent text-to-video (T2V) technology advancements, as demonstrated by models such as Gen2, Pika, and Sora, have significantly broadened its applicability and popularity. Despite these strides, evaluating these models poses substantial challenges. Primarily, due to the limitations inherent in automatic metrics, manual evaluation is often considered a superior method for assessing T2V generation. However, existing manual evaluation protocols face reproducibility, reliability, and practicality issues. To address these challenges, this paper introduces the Text-to-Video Human Evaluation (T2VHE) protocol, a comprehensive and standardized protocol for T2V models. The T2VHE protocol includes well-defined metrics, thorough annotator training, and an effective dynamic evaluation module. Experimental results demonstrate that this protocol not only ensures high-quality annotations but can also reduce evaluation costs by nearly 50\%. We will open-source the entire setup of the T2VHE protocol, including the complete protocol workflow, the dynamic evaluation component details, and the annotation interface code. This will help communities establish more sophisticated human assessment protocols.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Summarizing Stream Data for Memory-Constrained Online Continual Learning

Jianyang Gu
Kai Wang
Wei Jiang
Yang You

Replay-based methods have proved their effectiveness on online continual learning by rehearsing past samples from an auxiliary memory. With many efforts made on improving training schemes based on the memory, however, the information carried by each sample in the memory remains under-investigated. Under circumstances with restricted storage space, the informativeness of the memory becomes critical for effective replay. Although some works design specific strategies to select representative samples, by only employing a small number of original images, the storage space is still not well utilized. To this end, we propose to Summarize the knowledge from the Stream Data (SSD) into more informative samples by distilling the training characteristics of real images. Through maintaining the consistency of training gradients and relationship to the past tasks, the summarized samples are more representative for the stream data compared to the original images. Extensive experiments are conducted on multiple online continual learning benchmarks to support that the proposed SSD method significantly enhances the replay effects. We demonstrate that with limited extra computational overhead, SSD provides more than 3% accuracy boost for sequential CIFAR-100 under extremely restricted memory buffer. Code in https://github.com/vimar-gu/SSD.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Taihang Hu
Linxuan Li
Joost van de Weijer
Hongcheng Gao
Fahad S. Khan
Jian Yang
Ming-Ming Cheng
Kai Wang

Although text-to-image (T2I) models exhibit remarkable generation capabilities, they frequently fail to accurately bind semantically related objects or attributesin the input prompts; a challenge termed semantic binding. Previous approacheseither involve intensive fine-tuning of the entire T2I model or require users orlarge language models to specify generation layouts, adding complexity. In thispaper, we define semantic binding as the task of associating a given object with itsattribute, termed attribute binding, or linking it to other related sub-objects, referredto as object binding. We introduce a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a singlecomposite token. This ensures that the object, its attributes and sub-objects all sharethe same cross-attention map. Additionally, to address potential confusion amongmain objects with complex textual prompts, we propose end token substitution asa complementary strategy. To further refine our approach in the initial stages ofT2I generation, where layouts are determined, we incorporate two auxiliary losses, an entropy loss and a semantic binding loss, to iteratively update the compositetoken to improve the generation integrity. We conducted extensive experiments tovalidate the effectiveness of ToMe, comparing it against various existing methodson the T2I-CompBench and our proposed GPT-4o object binding benchmark. Ourmethod is particularly effective in complex scenarios that involve multiple objectsand attributes, which previous methods often fail to address. The code will be publicly available at https: //github. com/hutaihang/ToMe

PDF Details DOI

IROS Conference 2024 Conference Paper

ν-DBA: Neural Implicit Dense Bundle Adjustment Enables Image-Only Driving Scene Reconstruction

Yunxuan Mao
Bingqi Shen
Yifei Yang
Kai Wang
Rong Xiong
Yiyi Liao
Yue Wang 0020

The joint optimization of the sensor trajectory and 3D map is a crucial characteristic of bundle adjustment (BA), essential for autonomous driving. This paper presents ν-DBA, a novel framework implementing geometric dense bundle adjustment (DBA) using 3D neural implicit surfaces for map parametrization, which optimizes both the map surface and trajectory poses using geometric error guided by dense optical flow prediction. Additionally, we fine-tune the optical flow model with per-scene self-supervision to further improve the quality of the dense mapping. Our experimental results on multiple driving scene datasets demonstrate that our method achieves superior trajectory optimization and dense reconstruction accuracy. We also investigate the influences of photometric error and different neural geometric priors on the performance of surface reconstruction and novel view synthesis. Our method stands as a significant step towards leveraging neural implicit representations in dense bundle adjustment for more accurate trajectories and detailed environmental mapping.

JBHI Journal 2023 Journal Article

Development of Prognostic Biomarkers by TMB-Guided WSI Analysis: A Two-Step Approach

Xiangyu Liu
Zhenyu Liu
Ye Yan
Kai Wang
Aodi Wang
Xiongjun Ye
Liwei Wang
Wei Wei

The rapid development of computational pathology has brought new opportunities for prognosis prediction using histopathological images. However, the existing deep learning frameworks lack exploration of the relationship between images and other prognostic information, resulting in poor interpretability. Tumor mutation burden (TMB) is a promising biomarker for predicting the survival outcomes of cancer patients, but its measurement is costly. Its heterogeneity may be reflected in histopathological images. Here, we report a two-step framework for prognostic prediction using whole-slide images (WSIs). First, the framework adopts a deep residual network to encode the phenotype of WSIs and classifies patient-level TMB by the deep features after aggregation and dimensionality reduction. Then, the patients' prognosis is stratified by the TMB-related information obtained during the classification model development. Deep learning feature extraction and TMB classification model construction are performed on an in-house dataset of 295 Haematoxylin & Eosin stained WSIs of clear cell renal cell carcinoma (ccRCC). The development and evaluation of prognostic biomarkers are performed on The Cancer Genome Atlas-Kidney ccRCC (TCGA-KIRC) project with 304 WSIs. Our framework achieves good performance for TMB classification with an area under the receiver operating characteristic curve (AUC) of 0. 813 on the validation set. Through survival analysis, our proposed prognostic biomarkers can achieve significant stratification of patients' overall survival (P $< $ 0. 05) and outperform the original TMB signature in risk stratification of patients with advanced disease. The results indicate the feasibility of mining TMB-related information from WSI to achieve stepwise prognosis prediction.

NeurIPS Conference 2023 Conference Paper

Does Graph Distillation See Like Vision Dataset Counterpart?

Beining Yang
Kai Wang
Qingyun Sun
Cheng Ji
Xingcheng Fu
Hao Tang
Yang You
Jianxin Li

Training on large-scale graphs has achieved remarkable results in graph representation learning, but its cost and storage have attracted increasing concerns. Existing graph condensation methods primarily focus on optimizing the feature matrices of condensed graphs while overlooking the impact of the structure information from the original graphs. To investigate the impact of the structure information, we conduct analysis from the spectral domain and empirically identify substantial Laplacian Energy Distribution (LED) shifts in previous works. Such shifts lead to poor performance in cross-architecture generalization and specific tasks, including anomaly detection and link prediction. In this paper, we propose a novel Structure-broadcasting Graph Dataset Distillation (\textbf{SGDD}) scheme for broadcasting the original structure information to the generation of the synthetic one, which explicitly prevents overlooking the original structure information. Theoretically, the synthetic graphs by SGDD are expected to have smaller LED shifts than previous works, leading to superior performance in both cross-architecture settings and specific tasks. We validate the proposed SGDD~across 9 datasets and achieve state-of-the-art results on all of them: for example, on YelpChi dataset, our approach maintains 98. 6\% test accuracy of training on the original graph dataset with 1, 000 times saving on the scale of the graph. Moreover, we empirically evaluate there exist 17. 6\% $\sim$ 31. 4\% reductions in LED shift crossing 9 datasets. Extensive experiments and analysis verify the effectiveness and necessity of the proposed designs. The code will be made public.

NeurIPS Conference 2023 Conference Paper

Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing

Kai Wang
Fei Yang
Shiqi Yang
Muhammad Atif Butt
Joost van de Weijer

Large-scale text-to-image generative models have been a ground-breaking development in generative AI, with diffusion models showing their astounding ability to synthesize convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are susceptible to unintended modifications of regions outside the targeted area, such as on the background or on distractor objects which have some semantic or visual relationship with the targeted object. According to our experimental findings, inaccurate cross-attention maps are at the root of this problem. Based on this observation, we propose $\textit{Dynamic Prompt Learning}$ ($DPL$) to force cross-attention maps to focus on correct $\textit{noun}$ words in the text prompt. By updating the dynamic tokens for nouns in the textual input with the proposed leakage repairment losses, we achieve fine-grained image editing over particular objects while preventing undesired changes to other image regions. Our method $DPL$, based on the publicly available $\textit{Stable Diffusion}$, is extensively evaluated on a wide range of images, and consistently obtains superior results both quantitatively (CLIP score, Structure-Dist) and qualitatively (on user-evaluation). We show improved prompt editing results for Word-Swap, Prompt Refinement, and Attention Re-weighting, especially for complex multi-object scenes.

NeurIPS Conference 2023 Conference Paper

Expanding Small-Scale Datasets with Guided Imagination

Yifan Zhang
Daquan Zhou
Bryan Hooi
Kai Wang
Jiashi Feng

The power of DNNs relies heavily on the quantity and quality of training data. However, collecting and annotating data on a large scale is often expensive and time-consuming. To address this issue, we explore a new task, termed dataset expansion, aimed at expanding a ready-to-use small dataset by automatically creating new labeled samples. To this end, we present a Guided Imagination Framework (GIF) that leverages cutting-edge generative models like DALL-E2 and Stable Diffusion (SD) to "imagine" and create informative new data from the input seed data. Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content. To guide the imagination towards creating informative samples for model training, we introduce two key criteria, i. e. , class-maintained information boosting and sample diversity promotion. These criteria are verified to be essential for effective dataset expansion: GIF-SD obtains 13. 5% higher model accuracy on natural image datasets than unguided expansion with SD. With these essential criteria, GIF successfully expands small datasets in various scenarios, boosting model accuracy by 36. 9% on average over six natural image datasets and by 13. 5% on average over three medical datasets. The source code is available at https: //github. com/Vanint/DatasetExpansion.

TIST Journal 2023 Journal Article

GNN-based Advanced Feature Integration for ICS Anomaly Detection

Shuaiyi L(y)u
Kai Wang
Yuliang Wei
Hongri Liu
Qilin Fan
Bailing Wang

Recent adversaries targeting the Industrial Control Systems (ICSs) have started exploiting their sophisticated inherent contextual semantics such as the data associativity among heterogeneous field devices. In light of the subtlety rendered in these semantics, anomalies triggered by such interactions tend to be extremely covert, hence giving rise to extensive challenges in their detection. Driven by the critical demands of securing ICS processes, a Graph-Neural-Network (GNN) based method is presented to tackle these subtle hostilities by leveraging an ICS’s advanced contextual features refined from a universal perspective, rather than exclusively following GNN’s conventional local aggregation paradigm. Specifically, we design and implement the Graph Sample-and-Integrate Network (GSIN), a general chained framework performing node-level anomaly detection via advanced feature integration, which combines a node’s local awareness with the graph’s prominent global properties extracted via process-oriented pooling. The proposed GSIN is evaluated on multiple well-known datasets with different kinds of integration configurations, and results demonstrate its superiority consistently on not only anomaly detection performance (e.g., F1 score and AUPRC) but also runtime efficiency over recent representative baselines.

AAMAS Conference 2023 Conference Paper

Modeling Robustness in Decision-Focused Learning as a Stackelberg Game

Sonja Johnson-Yu
Kai Wang
Jessie Finocchiaro
Aparna Taneja
Milind Tambe

Predict-then-optimize is a common paradigm for optimization tasks situated in incomplete informational settings, in which an agent estimates missing parameters and then optimizes over these predicted parameters. One proposed improvement to this predict-thenoptimize framework is decision-focused learning, which establishes an end-to-end learning pipeline, allowing a predictive model to be tailored to the particular optimization task. The behavior of this predict-then-optimize framework in the presence of noise, however, is not well-understood. This is problematic because many data collection and annotation systems are inherently noisy, and the introduction of such noise could lead to poor downstream optimization. In this work, we aim to present results on robustness to label noise in decision-focused learning and traditional predictthen-optimize tasks using a Stackelberg game as the underlying framework of explanation. Our results suggest that playing the Stackelberg game in anticipation of label noise yields robustness in the predict-then-optimize framework at large, and that the optimal decision-focused learning Stackelberg solution continues to outperform the optimal traditional predict-then-optimize Stackelberg solution.

AAAI Conference 2023 Conference Paper

Optimistic Whittle Index Policy: Online Learning for Restless Bandits

Kai Wang
Lily Xu
Aparna Taneja
Milind Tambe

Restless multi-armed bandits (RMABs) extend multi-armed bandits to allow for stateful arms, where the state of each arm evolves restlessly with different transitions depending on whether that arm is pulled. Solving RMABs requires information on transition dynamics, which are often unknown upfront. To plan in RMAB settings with unknown transitions, we propose the first online learning algorithm based on the Whittle index policy, using an upper confidence bound (UCB) approach to learn transition dynamics. Specifically, we estimate confidence bounds of the transition probabilities and formulate a bilinear program to compute optimistic Whittle indices using these estimates. Our algorithm, UCWhittle, achieves sublinear O(H \sqrt{T log T}) frequentist regret to solve RMABs with unknown transitions in T episodes with a constant horizon H. Empirically, we demonstrate that UCWhittle leverages the structure of RMABs and the Whittle index policy solution to achieve better performance than existing online learning baselines across three domains, including one constructed from a real-world maternal and childcare dataset.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

PRIOR: Personalized Prior for Reactivating the Information Overlooked in Federated Learning.

Mingjia Shi
Yuhao Zhou
Kai Wang
Huaizheng Zhang
Shudong Huang
Qing Ye
Jiancheng Lv

Classical federated learning (FL) enables training machine learning models without sharing data for privacy preservation, but heterogeneous data characteristic degrades the performance of the localized model. Personalized FL (PFL) addresses this by synthesizing personalized models from a global model via training on local data. Such a global model may overlook the specific information that the clients have been sampled. In this paper, we propose a novel scheme to inject personalized prior knowledge into the global model in each client, which attempts to mitigate the introduced incomplete information problem in PFL. At the heart of our proposed approach is a framework, the $\textit{PFL with Bregman Divergence}$ (pFedBreD), decoupling the personalized prior from the local objective function regularized by Bregman divergence for greater adaptability in personalized scenarios. We also relax the mirror descent (RMD) to extract the prior explicitly to provide optional strategies. Additionally, our pFedBreD is backed up by a convergence analysis. Sufficient experiments demonstrate that our method reaches the $\textit{state-of-the-art}$ performances on 5 datasets and outperforms other methods by up to 3. 5% across 8 benchmarks. Extensive analyses verify the robustness and necessity of proposed designs. The code will be made public.

AAMAS Conference 2023 Conference Paper

Restless Multi-Armed Bandits for Maternal and Child Health: Results from Decision-Focused Learning

Shresth Verma
Aditya Mate
Kai Wang
Neha Madhiwalla
Aparna Hegde
Aparna Taneja
Milind Tambe

Mobile Health Awareness programs in underserved communities often suffer from diminishing engagement over time and health workers have to make live service calls to encourage beneficiaries’ participation. Owing to health workers’ limited availability, we consider the optimization problem of scheduling live service calls in a Maternal and Child Health Awareness Program and model it using Restless Multi-Armed Bandits (RMAB). Since the parameters of the RMAB formulation are unknown, a model is learnt to first predict the parameters of the RMAB problem, which is subsequently solved using the Whittle Index algorithm. However, this Predict-then-Optimize framework maximises for the predictive accuracy rather than the quality of the final solution. Decision Focused Learning (DFL) solves this mismatch by integrating the optimization problem in the learning pipeline. Previous works have only shown the applicability of DFL in simulation setting. In collaboration with an NGO, we conduct a large-scale field study consisting of 9000 beneficiaries for 6 weeks and track key engagement metrics in a mobile health awareness program. To the best of our knowledge this is the first real-world study involving Decision Focused Learning. We demonstrate that beneficiaries in the DFL group experience statistically significant reductions in cumulative engagement drop, while those in the Predict-then-Optimize group do not. This establishes the practicality of use of decision focused learning for real world problems. We also demonstrate that DFL learns a better decision boundary between the RMAB actions, and strategically predicts parameters for arms which contribute most to the final decision outcome.

AAAI Conference 2023 Conference Paper

Scalable Decision-Focused Learning in Restless Multi-Armed Bandits with Application to Maternal and Child Health

Kai Wang
Shresth Verma
Aditya Mate
Sanket Shah
Aparna Taneja
Neha Madhiwalla
Aparna Hegde
Milind Tambe

This paper studies restless multi-armed bandit (RMAB) problems with unknown arm transition dynamics but with known correlated arm features. The goal is to learn a model to predict transition dynamics given features, where the Whittle index policy solves the RMAB problems using predicted transitions. However, prior works often learn the model by maximizing the predictive accuracy instead of final RMAB solution quality, causing a mismatch between training and evaluation objectives. To address this shortcoming, we propose a novel approach for decision-focused learning in RMAB that directly trains the predictive model to maximize the Whittle index solution quality. We present three key contributions: (i) we establish differentiability of the Whittle index policy to support decision-focused learning; (ii) we significantly improve the scalability of decision-focused learning approaches in sequential problems, specifically RMAB problems; (iii) we apply our algorithm to a previously collected dataset of maternal and child health to demonstrate its performance. Indeed, our algorithm is the first for decision-focused learning in RMAB that scales to real-world problem sizes.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Scenario Diffusion: Controllable Driving Scenario Generation With Diffusion

Ethan Pronovost
Meghana Reddy Ganesina
Noureldin Hendy
Zeyu Wang
Andres Morales
Kai Wang
Nick Roy

Automated creation of synthetic traffic scenarios is a key part of scaling the safety validation of autonomous vehicles (AVs). In this paper, we propose Scenario Diffusion, a novel diffusion-based architecture for generating traffic scenarios that enables controllable scenario generation. We combine latent diffusion, object detection and trajectory regression to generate distributions of synthetic agent poses, orientations and trajectories simultaneously. This distribution is conditioned on the map and sets of tokens describing the desired scenario to provide additional control over the generated scenario. We show that our approach has sufficient expressive capacity to model diverse traffic patterns and generalizes to different geographical regions.

EAAI Journal 2023 Journal Article

Semi-supervised LSTM with historical feature fusion attention for temporal sequence dynamic modeling in industrial processes

Yiyin Tang
Yalin Wang
Chenliang Liu
Xiaofeng Yuan
Kai Wang
Chunhua Yang

In modern industrial processes, the data-driven soft sensor technology has been widely used for the prediction of key quality variables. Due to the important of dynamics and nonlinearity in industrial process data, deep learning models like long short-term memory (LSTM) network are well suited for temporal sequence dynamic modeling due to their excellent long-term memory function and feature extraction capability. Furthermore, industrial processes generate a large amount of process data with irregular sampling frequencies. However, traditional LSTM cannot fully utilize the process data with irregular sampling frequency and the guidance value of historical data samples for feature learning. To address these issues, a novel semi-supervised LSTM with history feature fusion attention (HFFA-SSLSTM) model is proposed in this paper. First, the semi-supervised learning strategy is implemented in LSTM to fully utilize the unlabeled data and mine the temporal sequence features of labeled samples and unlabeled samples with irregular sampling frequencies. Then, a novel historical feature fusion attention (HFFA) mechanism is developed, which utilizes historical hidden features to learn attention scores for obtaining weighted historical information-related features. Finally, the extracted features are combined to form the soft sensor model to perform time series prediction tasks for key quality variables in industrial processes. The experimental results on the actual industrial hydrocracking data set demonstrate the effectiveness of the proposed HFFA-SSLSTM model and its possibility of applicating in real industrial processes.

AAAI Conference 2023 Conference Paper

Smoothed Online Combinatorial Optimization Using Imperfect Predictions

Kai Wang
Zhao Song
Georgios Theocharous
Sridhar Mahadevan

Smoothed online combinatorial optimization considers a learner who repeatedly chooses a combinatorial decision to minimize an unknown changing cost function with a penalty on switching decisions in consecutive rounds. We study smoothed online combinatorial optimization problems when an imperfect predictive model is available, where the model can forecast the future cost functions with uncertainty. We show that using predictions to plan for a finite time horizon leads to regret dependent on the total predictive uncertainty and an additional switching cost. This observation suggests choosing a suitable planning window to balance between uncertainty and switching cost, which leads to an online algorithm with guarantees on the upper and lower bounds of the cumulative regret. Empirically, our algorithm shows a significant improvement in cumulative regret compared to other baselines in synthetic online distributed streaming problems.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Attracting and Dispersing: A Simple Approach for Source-free Domain Adaptation

Shiqi Yang
Yaxing Wang
Kai Wang
Shangling Jui
Joost van de Weijer

We propose a simple but effective source-free domain adaptation (SFDA) method. Treating SFDA as an unsupervised clustering problem and following the intuition that local neighbors in feature space should have more similar predictions than other features, we propose to optimize an objective of prediction consistency. This objective encourages local neighborhood features in feature space to have similar predictions while features farther away in feature space have dissimilar predictions, leading to efficient feature clustering and cluster assignment simultaneously. For efficient training, we seek to optimize an upper-bound of the objective resulting in two simple terms. Furthermore, we relate popular existing methods in domain adaptation, source-free domain adaptation and contrastive learning via the perspective of discriminability and diversity. The experimental results prove the superiority of our method, and our method can be adopted as a simple but strong baseline for future research in SFDA. Our method can be also adapted to source-free open-set and partial-set DA which further shows the generalization ability of our method. Code is available in https: //github. com/Albert0147/AaD_SFDA.

AAAI Conference 2022 Conference Paper

Coordinating Followers to Reach Better Equilibria: End-to-End Gradient Descent for Stackelberg Games

Kai Wang
Lily Xu
Andrew Perrault
Michael K. Reiter
Milind Tambe

A growing body of work in game theory extends the traditional Stackelberg game to settings with one leader and multiple followers who play a Nash equilibrium. Standard approaches for computing equilibria in these games reformulate the followers’ best response as constraints in the leader’s optimization problem. These reformulation approaches can sometimes be effective, but make limiting assumptions on the followers’ objectives and the equilibrium reached by followers, e. g. , uniqueness, optimism, or pessimism. To overcome these limitations, we run gradient descent to update the leader’s strategy by differentiating through the equilibrium reached by followers. Our approach generalizes to any stochastic equilibrium selection procedure that chooses from multiple equilibria, where we compute the stochastic gradient by back-propagating through a sampled Nash equilibrium using the solution to a partial differential equation to establish the unbiasedness of the stochastic gradient. Using the unbiased gradient estimate, we implement the gradient-based approach to solve three Stackelberg problems with multiple followers. Our approach consistently outperforms existing baselines to achieve higher utility for the leader.

YNIMG Journal 2022 Journal Article

Corrigendum to “Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla” [NeuroImage volume 245, 2021, 118724]

Xingfeng Shao
Fanhua Guo
Qinyang Shou
Kai Wang
Kay Jann
Lirong Yan
Arthur W. Toga
Peng Zhang

NeurIPS Conference 2022 Conference Paper

Dataset Distillation via Factorization

Songhua Liu
Kai Wang
Xingyi Yang
Jingwen Ye
Xinchao Wang

In this paper, we study dataset distillation (DD), from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive \xw{constraints} on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve \textasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https: //github. com/Huage001/DatasetFactorization}{here}.

NeurIPS Conference 2022 Conference Paper

Decision-Focused Learning without Decision-Making: Learning Locally Optimized Decision Losses

Sanket Shah
Kai Wang
Bryan Wilder
Andrew Perrault
Milind Tambe

Decision-Focused Learning (DFL) is a paradigm for tailoring a predictive model to a downstream optimization task that uses its predictions in order to perform better \textit{on that specific task}. The main technical challenge associated with DFL is that it requires being able to differentiate through the optimization problem, which is difficult due to discontinuous solutions and other challenges. Past work has largely gotten around this this issue by \textit{handcrafting} task-specific surrogates to the original optimization problem that provide informative gradients when differentiated through. However, the need to handcraft surrogates for each new task limits the usability of DFL. In addition, there are often no guarantees about the convexity of the resulting surrogates and, as a result, training a predictive model using them can lead to inferior local optima. In this paper, we do away with surrogates altogether and instead \textit{learn} loss functions that capture task-specific information. To the best of our knowledge, ours is the first approach that entirely replaces the optimization component of decision-focused learning with a loss that is automatically learned. Our approach (a) only requires access to a black-box oracle that can solve the optimization problem and is thus \textit{generalizable}, and (b) can be \textit{convex by construction} and so can be easily optimized over. We evaluate our approach on three resource allocation problems from the literature and find that our approach outperforms learning without taking into account task-structure in all three domains, and even hand-crafted surrogates from the literature.

NeurIPS Conference 2022 Conference Paper

Less-forgetting Multi-lingual Fine-tuning

Yuren Mao
Yaobo Liang
Nan Duan
Haobo Wang
Kai Wang
Lu Chen
Yunjun Gao

Multi-lingual fine-tuning (MLF), which fine-tunes a multi-lingual language model (MLLM) with multiple source languages, aims to gain good zero-shot performance on target languages. In MLF, the fine-tuned model tends to fit the source languages while forgetting its cross-lingual knowledge obtained from the pre-training stage. This forgetting phenomenon degenerates the zero-shot performance of MLF, which remains under-explored. To fill this gap, this paper proposes a multi-lingual fine-tuning method, dubbed Less-forgetting Multi-lingual Fine-tuning (LF-MLF). In LF-MLF, we cast multi-lingual fine-tuning as a constrained optimization problem, where the optimization objective is to minimize forgetting, and constraints are reducing the fine-tuning loss. The proposed method has superior zero-shot performance; furthermore, it can achieve the Pareto stationarity. Extensive experiments on Named Entity Recognition, Question Answering and Natural Language Inference back up our theoretical analysis and validate the superiority of our proposals.

EAAI Journal 2021 Journal Article

Common and specific deep feature representation for multimode process monitoring using a novel variable-wise weighted parallel network

Kai Wang
Zhiying Guo
Yalin Wang
Xiaofeng Yuan
Chunhua Yang

Multimodal data are common in industrial processes because of switched operating conditions, varying feedstocks and changed product designs and so on. To guarantee process safety and improving process performance, a variable-wise weighted parallel stacked auto-encoder model is proposed for nonlinear multimode process monitoring. Considering the similarity and difference between multiple operating modes with complex process nonlinearities, mode-common and mode-specific deep features are parallelly extracted with the proposed new model. Since each variable distinctly contributes to the mode-common features, variable-wise weights are designed with an optimal transport distance between modes when the mode-common features are learned. Moreover, different from designing a unified monitoring index for all modes, three asymmetric indices are designed to not only trigger an alarm for an anomaly, but also indicate whether the anomaly is caused by mode-common factors, mode-specific factors or others. Thus, the real-time monitoring results, together with some diagnosis information are simultaneously presented. A numerical example and a real industry application are used to validate the monitoring efficacy of the proposed model.

EAAI Journal 2021 Journal Article

Deep learning with nonlocal and local structure preserving stacked autoencoder for soft sensor in industrial processes

Chenliang Liu
Yalin Wang
Kai Wang
Xiaofeng Yuan

Deep learning-based soft sensor has been widely used for quality prediction in modern industry. Traditional deep learning like stacked autoencoder (SAE) only captures the feature representations by minimizing the global reconstruction errors, which causes a loss of the intrinsic geometric structure embedded in the raw data. To address this problem, a nonlocal and local structure preserving stacked autoencoder (NLSP-SAE) is proposed for soft sensor. Different from the original SAE, NLSP-SAE aims to extract the meaningful structure-relevant features by establishing a new objective function with a regularizer of the nonlocal and local data structure information. For local structure preserving, NLSP-SAE enforces two adjacent data points to be near each other in the reconstructed space. While for nonlocal structure preserving, NLSP-SAE constrains two nonadjacent data points to be far apart from each other. The application on an industrial hydrocracking process demonstrates that NLSP-SAE can improve the prediction accuracy for quality variables.

AAAI Conference 2021 Conference Paper

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Lily Xu
Elizabeth Bondi
Fei Fang
Andrew Perrault
Kai Wang
Milind Tambe

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i. e. , patrollers), who must patrol vast areas to protect from attackers (e. g. , poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitzcontinuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

NeurIPS Conference 2021 Conference Paper

Labeling Trick: A Theory of Using Graph Neural Networks for Multi-Node Representation Learning

Muhan Zhang
Pan Li
Yinglong Xia
Kai Wang
Long Jin

In this paper, we provide a theory of using graph neural networks (GNNs) for multi-node representation learning (where we are interested in learning a representation for a set of more than one node, such as link). We know that GNN is designed to learn single-node representations. When we want to learn a node set representation involving multiple nodes, a common practice in previous works is to directly aggregate the single-node representations obtained by a GNN into a joint node set representation. In this paper, we show a fundamental constraint of such an approach, namely the inability to capture the dependence between nodes in the node set, and argue that directly aggregating individual node representations does not lead to an effective joint representation for multiple nodes. Then, we notice that a few previous successful works for multi-node representation learning, including SEAL, Distance Encoding, and ID-GNN, all used node labeling. These methods first label nodes in the graph according to their relationships with the target node set before applying a GNN. Then, the node representations obtained in the labeled graph are aggregated into a node set representation. By investigating their inner mechanisms, we unify these node labeling techniques into a single and most general form---labeling trick. We prove that with labeling trick a sufficiently expressive GNN learns the most expressive node set representations, thus in principle solves any joint learning tasks over node sets. Experiments on one important two-node representation learning task, link prediction, verified our theory. Our work explains the superior performance of previous node-labeling-based methods, and establishes a theoretical foundation of using GNNs for multi-node representation learning.

YNIMG Journal 2021 Journal Article

Laminar perfusion imaging with zoomed arterial spin labeling at 7 Tesla

Xingfeng Shao
Fanhua Guo
Qinyang Shou
Kai Wang
Kay Jann
Lirong Yan
Arthur W. Toga
Peng Zhang

Laminar fMRI based on BOLD and CBV contrast at ultrahigh magnetic fields has been applied for studying the dynamics of mesoscopic brain networks. However, the quantitative interpretations of BOLD/CBV fMRI results are confounded by different baseline physiology across cortical layers. Here we introduce a novel 3D zoomed pseudo-continuous arterial spin labeling (pCASL) technique at 7T that offers the capability for quantitative measurements of laminar cerebral blood flow (CBF) both at rest and during task activation with high spatial specificity and sensitivity. We found arterial transit time in superficial layers is ∼100 ms shorter than in middle/deep layers revealing the time course of labeled blood flowing from pial arteries to downstream microvasculature. Resting state CBF peaked in the middle layers which is highly consistent with microvascular density measured from human cortex specimens. Finger tapping induced a robust two-peak laminar profile of CBF increases in the superficial (somatosensory and premotor input) and deep (spinal output) layers of M1, while finger brushing task induced a weaker CBF increase in superficial layers (somatosensory input). This observation is highly consistent with reported laminar profiles of CBV activation on M1. We further demonstrated that visuospatial attention induced a predominant CBF increase in deep layers and a smaller CBF increase on top of the lower baseline CBF in superficial layers of V1 (feedback cortical input), while stimulus driven activity peaked in the middle layers (feedforward thalamic input). With the capability for quantitative CBF measurements both at baseline and during task activation, high-resolution ASL perfusion fMRI at 7T provides an important tool for in vivo assessment of neurovascular function and metabolic activities of neural circuits across cortical layers.

NeurIPS Conference 2021 Conference Paper

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning

Kai Wang
Sanket Shah
Haipeng Chen
Andrew Perrault
Finale Doshi-Velez
Milind Tambe

In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman-based and policy gradient-based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

IJCAI Conference 2021 Conference Paper

Neighborhood Intervention Consistency: Measuring Confidence for Knowledge Graph Link Prediction

Kai Wang
Yu Liu
Quan Z. Sheng

Link prediction based on knowledge graph embeddings (KGE) has recently drawn a considerable momentum. However, existing KGE models suffer from insufficient accuracy and hardly evaluate the confidence probability of each predicted triple. To fill this critical gap, we propose a novel confidence measurement method based on causal intervention, called Neighborhood Intervention Consistency (NIC). Unlike previous confidence measurement methods that focus on the optimal score in a prediction, NIC actively intervenes in the input entity vector to measure the robustness of the prediction result. The experimental results on ten popular KGE models show that our NIC method can effectively estimate the confidence score of each predicted triple. The top 10% triples with high NIC confidence can achieve 30% higher accuracy in the state-of-the-art KGE models.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

Kai Wang
Zhene Zou
Qilin Deng
Jianrong Tao
Runze Wu
Changjie Fan
Liang Chen
Peng Cui

In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user’s future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

YNICL Journal 2020 Journal Article

Altered resting-state dynamic functional brain networks in major depressive disorder: Findings from the REST-meta-MDD consortium

Yicheng Long
Hengyi Cao
Chaogan Yan
Xiao Chen
Le Li
Francisco Xavier Castellanos
Tongjian Bai
Qijing Bo

BACKGROUND: Major depressive disorder (MDD) is known to be characterized by altered brain functional connectivity (FC) patterns. However, whether and how the features of dynamic FC would change in patients with MDD are unclear. In this study, we aimed to characterize dynamic FC in MDD using a large multi-site sample and a novel dynamic network-based approach. METHODS: Resting-state functional magnetic resonance imaging (fMRI) data were acquired from a total of 460 MDD patients and 473 healthy controls, as a part of the REST-meta-MDD consortium. Resting-state dynamic functional brain networks were constructed for each subject by a sliding-window approach. Multiple spatio-temporal features of dynamic brain networks, including temporal variability, temporal clustering and temporal efficiency, were then compared between patients and healthy subjects at both global and local levels. RESULTS: ). Corresponding local changes in MDD were mainly found in the default-mode, sensorimotor and subcortical areas. Measures of temporal variability and characteristic temporal path length were significantly correlated with depression severity in patients (corrected p < 0.05). Moreover, the observed between-group differences were robustly present in both first-episode, drug-naïve (FEDN) and non-FEDN patients. CONCLUSIONS: Our findings suggest that excessive temporal variations of brain FC, reflecting abnormal communications between large-scale bran networks over time, may underlie the neuropathology of MDD.

NeurIPS Conference 2020 Conference Paper

Automatically Learning Compact Quality-aware Surrogates for Optimization Problems

Kai Wang
Bryan Wilder
Andrew Perrault
Milind Tambe

Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.

YNICL Journal 2020 Journal Article

Biotypes of major depressive disorder: Neuroimaging evidence from resting-state default mode network patterns

Sugai Liang
Wei Deng
Xiaojing Li
Andrew J. Greenshaw
Qiang Wang
Mingli Li
Xiaohong Ma
Tong-Jian Bai

BACKGROUND: Major depressive disorder (MDD) is heterogeneous disorder associated with aberrant functional connectivity within the default mode network (DMN). This study focused on data-driven identification and validation of potential DMN-pattern-based MDD subtypes to parse heterogeneity of the disorder. METHODS: The sample comprised 1397 participants including 690 patients with MDD and 707 healthy controls (HC) registered from multiple sites based on the REST-meta-MDD Project in China. Baseline resting-state functional magnetic resonance imaging (rs-fMRI) data was recorded for each participant. Discriminative features were selected from DMN between patients and HC. Patient subgroups were defined by K-means and principle component analysis in the multi-site datasets and validated in an independent single-site dataset. Statistical significance of resultant clustering were confirmed. Demographic and clinical variables were compared between identified patient subgroups. RESULTS: Two MDD subgroups with differing functional connectivity profiles of DMN were identified in the multi-site datasets, and relatively stable in different validation samples. The predominant dysfunctional connectivity profiles were detected among superior frontal cortex, ventral medial prefrontal cortex, posterior cingulate cortex and precuneus, whereas one subgroup exhibited increases of connectivity (hyperDMN MDD) and another subgroup showed decreases of connectivity (hypoDMN MDD). The hyperDMN subgroup in the discovery dataset had age-related severity of depressive symptoms. Patient subgroups had comparable demographic and clinical symptom variables. CONCLUSIONS: Findings suggest the existence of two neural subtypes of MDD associated with different dysfunctional DMN connectivity patterns, which may provide useful evidence for parsing heterogeneity of depression and be valuable to inform the search for personalized treatment strategies.

TCS Journal 2020 Journal Article

Calibration scheduling with time slot cost

Kai Wang

We study the scheduling problem with calibrations and time slot costs. In this problem, the machine has to be calibrated to run a job and such a calibration only remains valid for a fixed time period of length T, after which it must be recalibrated in order to execute jobs. On the other hand, a certain cost will be incurred when the machine executes a job and such a cost is determined by the time slot that is occupied by the job in the schedule. We consider jobs with release times, deadlines and identical processing times. The objective is to schedule the jobs on a single machine and minimize the total cost while calibrating the machine at most K times. We investigate the structure of the optimal schedule and based on that we propose dynamic programs for different scenarios of the problem. At last, for another variant of the problem without the consideration of machine calibration, a greedy algorithm is proposed, which is based on matroid theory.

TCS Journal 2020 Journal Article

Facility location games with optional preference

Zhihuai Chen
Ken C.K. Fong
Minming Li
Kai Wang
Hongning Yuan
Yong Zhang

In this paper, we study the optional preference model of the facility location game problem with two heterogeneous facilities on a line. The preference of each agent is one of the two facilities or both facilities, and the cost of each agent is a function of the distances to the facilities that the agent prefers. We consider two cost functions: Minimum Distance and Maximum Distance functions. Aiming at minimizing the maximum cost or the social cost of agents, we propose different strategyproof mechanisms without monetary transfers and derive both lower and upper bounds of the approximation ratios with respect to strategyproof mechanisms. In the variant of Minimum Distance, we propose a 2-approximation deterministic strategyproof mechanism for the maximum cost objective, and prove a lower bound of 4/3, while for the social cost objective we propose a ( n / 2 +1)-approximation deterministic strategyproof mechanism and prove a lower bound of 2, also a lower bound of 3/2 for randomized mechanisms. In the variant of Maximum Distance, we propose an optimal deterministic strategyproof mechanism for the maximum cost objective and a 2-approximation deterministic strategyproof mechanism for the social cost objective.

TCS Journal 2020 Journal Article

Flow shop for dual CPUs in dynamic voltage scaling

Vincent Chau
Xin Chen
Ken C.K. Fong
Minming Li
Kai Wang

We study the following flow shop scheduling problem on two processors. We are given n jobs with a common deadline D, where each job j has workload p i, j on processor i and a set of processors which can vary their speed dynamically. Job j can be executed on the second processor if the execution of job j is completed on the first processor. Our objective is to find a feasible schedule such that all jobs are completed by the common deadline D with minimized energy consumption. For this model, we present a linear program for the discrete speed case, where the processor can only run at specific speeds in S = { s 1, s 2, ⋯, s q } and the job execution order is fixed. We also provide a m α − 1 -approximation algorithm for the arbitrary order case and for continuous speed model where m is the number of processors and α is a parameter of the processor. We then introduce a new variant of flow shop scheduling problem called sense-and-aggregate model motivated by data aggregation in wireless sensor networks where the base station needs to receive data from sensors and then compute a single aggregate result. In this model, the first processor will receive unit size data from sensors and the second processor is responsible for calculating the aggregate result. The second processor can decide when to aggregate and the workload that needs to be done to aggregate x data will be f ( x ) and another unit size data will be generated as the result of the partial aggregation which will then be used in the next round aggregation. Our objective is to find a schedule such that all data are received and aggregated by the deadline with minimum energy consumption. We present an O ( n 5 ) dynamic programming algorithm when f ( x ) = x and a greedy algorithm when f ( x ) = x − 1. Finally, we investigate the performance of the flowshop problem when the order of jobs is fixed by comparing it to the approximation algorithm with an arbitrary order. We show experimentally that the approximation ratio is close to 1 when there are few machines and when there are more jobs.

AAAI Conference 2020 Conference Paper

Interactive Dual Generative Adversarial Networks for Image Captioning

Junhao Liu
Kai Wang
Chunpu Xu
Zhou Zhao
Ruifeng Xu
Ying Shen
Min Yang

Image captioning is usually built on either generationbased or retrieval-based approaches. Both ways have certain strengths but suffer from their own limitations. In this paper, we propose an Interactive Dual Generative Adversarial Network (IDGAN) for image captioning, which mutually combines the retrieval-based and generation-based methods to learn a better image captioning ensemble. IDGAN consists of two generators and two discriminators, where the generation- and retrieval-based generators mutually beneﬁt from each other’s complementary targets that are learned from two dual adversarial discriminators. Speciﬁcally, the generation- and retrieval-based generators provide improved synthetic and retrieved candidate captions with informative feedback signals from the two respective discriminators that are trained to distinguish the generated captions from the true captions and assign top rankings to true captions respectively, thus featuring the merits of both retrieval-based and generation-based approaches. Extensive experiments on MSCOCO dataset demonstrate that the proposed IDGAN model signiﬁcantly outperforms the compared methods for image captioning.

AAAI Conference 2020 Conference Paper

On the Generation of Medical Question-Answer Pairs

Sheng Shen
Yaliang Li
Nan Du
Xian Wu
Yusheng Xie
Shen Ge
Tao Yang
Kai Wang

Question answering (QA) has achieved promising progress recently. However, answering a question in real-world scenarios like the medical domain is still challenging, due to the requirement of external knowledge and the insufﬁcient quantity of high-quality training data. In the light of these challenges, we study the task of generating medical QA pairs in this paper. With the insight that each medical question can be considered as a sample from the latent distribution of questions given answers, we propose an automated medical QA pair generation framework, consisting of an unsupervised key phrase detector that explores unstructured material for validity, and a generator that involves a multi-pass decoder to integrate structural knowledge for diversity. A series of experiments have been conducted on a real-world dataset collected from the National Medical Licensing Examination of China. Both automatic evaluation and human annotation demonstrate the effectiveness of the proposed method. Further investigation shows that, by incorporating the generated QA pairs for training, signiﬁcant improvement in terms of accuracy can be achieved for the examination QA system. 1

AAAI Conference 2020 Conference Paper

PSENet: Psoriasis Severity Evaluation Network

Yi Li
Zhe Wu
Shuang Zhao
Xian Wu
Yehong Kuang
YangTian Yan
Shen Ge
Kai Wang

Psoriasis is a chronic skin disease which affects hundreds of millions of people around the world. This disease cannot be fully cured and requires lifelong caring. If the deterioration of Psoriasis is not detected and properly treated in time, it could cause serious complications or even lead to a life threat. Therefore, a quantitative measurement that can track the Psoriasis severity is necessary. Currently, PASI (Psoriasis Area and Severity Index) is the most frequently used measurement in clinical practices. However, PASI has the following disadvantages: (1) Time consuming: calculating PASI usually takes more than 30 minutes which poses a heavy burden on dermatologists; and (2) Inconsistency: due to the complexity of PASI calculation, different or even the same dermatologist could give different scores for the same case. To overcome these drawbacks, we propose PSENet which applies deep neural networks to estimate Psoriasis severity based on skin lesion images. Different from typical deep learning frameworks for image processing, PSENet has the following characteristics: (1) PSENet introduces a score re- ﬁne module which is able to capture the visual features of skin at both coarse and ﬁne-grained granularities; (2) PSENet uses siamese structure in training and accepts pairwise inputs, which reduces the dependency on large amount of training data; and (3) PSENet can not only estimate the severity, but also locate the skin lesion regions from the input image. To train and evaluate PSENet, we work with professional dermatologists from a top hospital and spend years in building a golden dataset. The experimental results show that PSENet can achieve the mean absolute error of 2. 21 and the accuracy of 77. 87% in pair comparison, outperforming baseline methods. Overall, PSENet not only relieves dermatologists from the dull PASI calculation but also enables patients to track Psoriasis severity in a much more convenient manner.

IJCAI Conference 2019 Conference Paper

Adversarial Machine Learning with Double Oracle

Kai Wang

We aim to improve the general adversarial machine learning solution by introducing the double oracle idea from game theory, which is commonly used to solve a sequential zero-sum game, where the adversarial machine learning problem can be formulated as a zero-sum minimax problem between learner and attacker.

AAMAS Conference 2019 Conference Paper

Deep Fictitious Play for Games with Continuous Action Spaces

Nitin Kamra
Umang Gupta
Kai Wang
Fei Fang
Yan Liu
Milind Tambe

Fictitious play has been a classic algorithm to solve two-player adversarial games with discrete action spaces. In this work we develop an approximate extension of fictitious play to two-player games with high-dimensional continuous action spaces. We use generative neural networks to approximate players’ best responses while also learning a differentiable approximate model to the players’ rewards given their actions. Both these networks are trained jointly with gradient-based optimization to emulate fictitious play. We explore our approach in zero-sum games, non zero-sum games and security game domains.

YNICL Journal 2018 Journal Article

A radiomic signature as a non-invasive predictor of progression-free survival in patients with lower-grade gliomas

Xing Liu
Yiming Li
Zenghui Qian
Zhiyan Sun
Kaibin Xu
Kai Wang
Shuai Liu
Xing Fan

OBJECTIVE: The aim of this study was to develop a radiomics signature for prediction of progression-free survival (PFS) in lower-grade gliomas and to investigate the genetic background behind the radiomics signature. METHODS: In this retrospective study, training (n = 216) and validation (n = 84) cohorts were collected from the Chinese Glioma Genome Atlas and the Cancer Genome Atlas, respectively. For each patient, a total of 431 radiomics features were extracted from preoperative T2-weighted magnetic resonance images. A radiomics signature was generated in the training cohort, and its prognostic value was evaluated in both the training and validation cohorts. The genetic characteristics of the group with high-risk scores were identified by radiogenomic analysis, and a nomogram was established for prediction of PFS. RESULTS: There was a significant association between the radiomics signature (including 9 screened radiomics features) and PFS, which was independent of other clinicopathologic factors in both the training (P < 0.001, multivariable Cox regression) and validation (P = 0.045, multivariable Cox regression) cohorts. Radiogenomic analysis revealed that the radiomics signature was associated with the immune response, programmed cell death, cell proliferation, and vasculature development. A nomogram established using the radiomics signature and clinicopathologic risk factors demonstrated high accuracy and good calibration for prediction of PFS in both the training (C-index, 0.684) and validation (C-index, 0.823) cohorts. CONCLUSIONS: PFS can be predicted non-invasively in patients with LGGs by a group of radiomics features that could reflect the biological processes of these tumors.

AAAI Conference 2018 Conference Paper

Automatic Model Selection in Subspace Clustering via Triplet Relationships

Jufeng Yang
Jie Liang
Kai Wang
Yong-Liang Yang
Ming-Ming Cheng

This paper addresses both the model selection (i. e. estimating the number of clusters K) and subspace clustering problems in a uniﬁed model. The real data always distribute on a union of low-dimensional sub-manifolds which are embedded in a high-dimensional ambient space. In this regard, the state-ofthe-art subspace clustering approaches ﬁrstly learn the afﬁnity among samples, followed by a spectral clustering to generate the segmentation. However, arguably, the intrinsic geometrical structures among samples are rarely considered in the optimization process. In this paper, we propose to simultaneously estimate K and segment the samples according to the local similarity relationships derived from the afﬁnity matrix. Given the correlations among samples, we deﬁne a novel data structure termed the Triplet, each of which reﬂects a high relevance and locality among three samples which are aimed to be segmented into the same subspace. While the traditional pairwise distance can be close between inter-cluster samples lying on the intersection of two subspaces, the wrong assignments can be avoided by the hyper-correlation derived from the proposed triplets due to the complementarity of multiple constraints. Sequentially, we propose to greedily optimize a new model selection reward to estimate K according to the correlations between inter-cluster triplets. We simultaneously optimize a fusion reward based on the similarities between triplets and clusters to generate the ﬁnal segmentation. Extensive experiments on the benchmark datasets demonstrate the effectiveness and robustness of the proposed approach.

AAMAS Conference 2018 Conference Paper

Equilibrium Refinement in Security Games with Arbitrary Scheduling Constraints

Kai Wang
Qingyu Guo
Phebe Vayanos
Milind Tambe
Bo An

Significant research effort in security games has focused in devising strategies that perform well even when the attacker deviates from optimal (rational) behavior. In most of these frameworks, a price needs to be paid to ensure robustness against this unpredictability. However, equilibrium refinement is an attractive alternative to boost solution robustness at no cost even though it has not received as much attention in security game literature. In this framework, resources are strategically allocated to secure an optimal outcome against a rational adversary while simultaneously protecting other targets to ensure good outcomes against boundedly rational or constrained attackers. Unfortunately, existing approaches for equilibrium refinement in security games cannot effectively address scheduling constraints that arise frequently in real-world applications. In this paper, we aim to fill this gap and make several key contributions. First, we show that existing approaches for equilibrium refinement can fail in the presence of scheduling constraints. Second, we investigate the properties of the best response of the attacker. Third, we leverage these properties to devise novel iterative algorithms to compute the optimally refined equilibrium, with polynomially many calls to an LP oracle for zero-sum games. Finally, we conduct extensive experimental evaluations that showcase i) the superior performance of our approach in the face of a boundedly rational attacker and ii) the attractive scalability properties of our algorithm that can solve realistic-sized instances.

YNICL Journal 2018 Journal Article

MRI features predict p53 status in lower-grade gliomas via a machine-learning approach

Yiming Li
Zenghui Qian
Kaibin Xu
Kai Wang
Xing Fan
Shaowu Li
Tao Jiang
Xing Liu

Background: P53 mutation status is a pivotal biomarker for gliomas. Here, we developed a machine-learning model to predict p53 status in lower-grade gliomas based on radiomic features extracted from conventional magnetic resonance (MR) images. Methods: = 92) set. A total of 431 radiomic features were extracted from each patient. The lest absolute shrinkage and selection operator (LASSO) method was used for feature selection and radiomic signature construction. Subsequently, a machine-learning model to predict p53 status was established using the selected features and a Support Vector Machine classifier. The predictive performance of all individual features and the model was calculated using receiver operating characteristic curves in both the training and validation sets. Results: The p53-related radiomic signature was built using the LASSO algorithm; this procedure consisted of four first-order statistics or related wavelet features (including Maximum, Median, Minimum, and Uniformity), a shape and size-based feature (Spherical Disproportion), and ten textural features or related wavelet features (including Correlation, Run Percentage, and Sum Entropy). The prediction accuracies based on the area under the curve were 89.6% in the training set and 76.3% in the validation set, which were better than individual features. Conclusions: These results demonstrate that MR image texture features are predictive of p53 mutation status in lower-grade gliomas. Thus, our procedure can be conveniently used to facilitate presurgical molecular pathological diagnosis.

YNICL Journal 2018 Journal Article

Radiomics analysis allows for precise prediction of epilepsy in patients with low-grade gliomas

Zhenyu Liu
Yinyan Wang
Xing Liu
Yang Du
Zhenchao Tang
Kai Wang
Jingwei Wei
Di Dong

Purpose To investigate the association between imaging features and low-grade gliomas (LGG) related epilepsy, and to propose a radiomics-based model for the prediction of LGG-associated epilepsy. Methods This retrospective study consecutively enrolled 286 patients with LGGs (194 in the primary cohort and 92 in the validation cohort). T2-weighted MR images (T2WI) were used to characterize risk factors for LGG-related epilepsy: Tumor location features and 3-D imaging features were determined, following which the interactions between these two kinds of features were analyzed. Elastic net was applied to generate a radiomics signature combining key imaging features associated with the LGG-related epilepsy with the primary cohort, and then a nomogram incorporating radiomics signature and clinical characteristics was developed. The radiomics signature and nomogram were validated in the validation cohort. Results A total of 475 features associated with LGG-related epilepsy were obtained for each patient. A radiomics signature with eleven selected features allowed for discriminating patients with epilepsy or not was detected, which performed better than location and 3-D imaging features. The nomogram incorporating radiomics signature and clinical characteristics achieved a high degree of discrimination with area under receiver operating characteristic (ROC) curve (AUC) at 0. 8769 in the primary cohort and 0. 8152 in the validation cohort. The nomogram also allowed for good calibration in the primary cohort. Conclusion We developed and validated an effective prediction model for LGG-related epilepsy. Our results suggested that radiomics analysis may enable more precise and individualized prediction of LGG-related epilepsy.

AAAI Conference 2018 Conference Paper

Strategic Coordination of Human Patrollers and Mobile Sensors With Signaling for Security Games

Haifeng Xu
Kai Wang
Phebe Vayanos
Milind Tambe

Traditional security games concern the optimal randomized allocation of human patrollers, who can directly catch attackers or interdict attacks. Motivated by the emerging application of utilizing mobile sensors (e. g. , UAVs) for patrolling, in this paper we propose the novel Sensor-Empowered security Game (SEG) model which captures the joint allocation of human patrollers and mobile sensors. Sensors differ from patrollers in that they cannot directly interdict attacks, but they can notify nearby patrollers (if any). Moreover, SEGs incorporate mobile sensors’ natural functionality of strategic signaling. On the technical side, we ﬁrst prove that solving SEGs is NP-hard even in zero-sum cases. We then develop a scalable algorithm SEGer based on the branch-and-price framework with two key novelties: (1) a novel MILP formulation for the slave; (2) an efﬁcient relaxation of the problem for pruning. To further accelerate SEGer, we design a faster combinatorial algorithm for the slave problem, which is provably a constantapproximation to the slave problem in zero-sum cases and serves as a useful heuristic for general-sum SEGs. Our experiments demonstrate the signiﬁcant beneﬁt of utilizing mobile sensors.

IJCAI Conference 2018 Conference Paper

The Price of Usability: Designing Operationalizable Strategies for Security Games

Sara Marie Mc Carthy
Corine M. Laan
Kai Wang
Phebe Vayanos
Arunesh Sinha
Milind Tambe

We consider the problem of allocating scarce security resources among heterogeneous targets to thwart a possible attack. It is well known that deterministic solutions to this problem being highly predictable are severely suboptimal. To mitigate this predictability, the game-theoretic security game model was proposed which randomizes over pure (deterministic) strategies, causing confusion in the adversary. Unfortunately, such mixed strategies typically involve randomizing over a large number of strategies, requiring security personnel to be familiar with numerous protocols, making them hard to operationalize. Motivated by these practical considerations, we propose an easy to use approach for computing strategies that are easy to operationalize and that bridge the gap between the static solution and the optimal mixed strategy. These strategies only randomize over an optimally chosen subset of pure strategies whose cardinality is selected by the defender, enabling them to conveniently tune the trade-off between ease of operationalization and efficiency using a single design parameter. We show that the problem of computing such operationalizable strategies is NP-hard, formulate it as a mixed-integer optimization problem, provide an algorithm for computing epsilon-optimal equilibria, and an efficient heuristic. We evaluate the performance of our approach on the problem of screening for threats at airport checkpoints and show that the Price of Usability, i. e. , the loss in optimality to obtain a strategy that is easier to operationalize, is typically not high.

YNIMG Journal 2017 Journal Article

Dynamic aftereffects in supplementary motor network following inhibitory transcranial magnetic stimulation protocols

Gong-Jun Ji
Fengqiong Yu
Wei Liao
Kai Wang

The supplementary motor area (SMA) is a key node of the motor network. Inhibitory repetitive transcranial magnetic stimulation (rTMS) of the SMA can potentially improve movement disorders. However, the aftereffects of inhibitory rTMS on brain function remain largely unknown. Using a single-blind, crossover within-subject design, we investigated the role of aftereffects with two inhibitory rTMS protocols [1800 pulses of either 1-Hz repetitive stimulation or continuous theta burst stimulation (cTBS)] on the left SMA. A total of 19 healthy volunteers participated in the rTMS sessions on 2 separate days. Firstly, short-term aftereffects were estimated at three levels (functional connectivity, local activity, and network properties) by comparing the resting-state functional magnetic resonance imaging datasets (9min) acquired before and after each rTMS session. Local activity and network properties were not significantly altered by either protocol. Functional connectivity within the SMA network was increased (in the left paracentral gyrus) by 1-Hz stimulation and decreased (in the left inferior frontal gyrus and SMA/middle cingulate cortex) by cTBS. The subsequent three-way analysis of variance (site×time×protocol) did not show a significant interaction effect or “protocol” main effect, suggesting that the two protocols share an underlying mechanism. Secondly, sliding-window analysis was used to evaluate the dynamic features of aftereffects in the ~29min after the end of stimulation. Aftereffects were maintained for a maximum of 9. 8 and 6. 6min after the 1-Hz and cTBS protocols, respectively. In summary, this study revealed topographical and temporal aftereffects in the SMA network following inhibitory rTMS protocols, providing valuable information for their application in future neuroscience and clinical studies.

YNIMG Journal 2014 Journal Article

Temporal and spectral profiles of stimulus–stimulus and stimulus–response conflict processing

Kai Wang
Qi Li
Ya Zheng
Hongbin Wang
Xun Liu

The ability to detect and resolve conflict is an essential function of cognitive control. Laboratory studies often use stimulus–response-compatibility (SRC) tasks to examine conflict processing in order to elucidate the mechanism and modular organization of cognitive control. Inspired by two influential theories regarding cognitive control, the conflict monitoring theory (Botvinick, Braver, Barch, Carter, & Cohen, 2001) and dimensional overlap taxonomy (Kornblum, Hasbroucq, & Osman, 1990), we explored the temporal and spectral similarities and differences between processing of stimulus–stimulus (S–S) and stimulus–response (S–R) conflicts with event related potential (ERP) and time-frequency measures. We predicted that processing of S–S conflict starts earlier than that of S–R conflict and that the two types of conflict may involve different frequency bands. Participants were asked to perform two parallel SRC tasks, both combining the Stroop task (involving S–S conflict) and Simon task (involving S–R conflict). ERP results showed pronounced SRC effects (incongruent vs. congruent) on N2 and P3 components for both S–S and S–R conflicts. In both tasks, SRC effects of S–S conflict took place earlier than those of S–R conflict. Time-frequency analysis revealed that both types of SRC effects modulated theta and alpha bands, while S–R conflict effects additionally modulated power in the beta band. These results indicated that although S–S and S–R conflict processing shared considerable ERP and time-frequency properties, they differed in temporal and spectral dynamics. We suggest that the modular organization of cognitive control should take both commonality and distinction of S–S and S–R conflict processing into consideration.

IROS Conference 2014 Conference Paper

Visual servoing based trajectory tracking of underactuated water surface robots without direct position measurement

Kai Wang
Yun-Hui Liu 0001
Luyang Li

The trajectory tracking of underactuated water surface robots (or boats, surface vessels, etc.) has been an attractive topic over the past decade, and a lot of controllers are proposed for this challenging problem. However, most of the existing trajectory tracking controllers of the underactuated water surface robots assume the global positions of the robots can be accurately measured. In the working environments of the robots, the global position measurements are sometimes unstable or even unavailable. To avoid the direct position measurement, a new controller is proposed in this paper for the trajectory tracking of underactuated water surface robots by adopting the monocular visual feedback. This controller works on the basis of a novel adaptive algorithm for estimating global position of the robot online using visual feature tracking from a monocular camera, and its orientation and velocity measured by the AHRS (Attitude and Heading Reference System) sensor and visual odometry. It is proved by Lyapunov theory that the proposed adaptive visual servo controller gives rise to the asymptotic trajectory tracking and convergence of the position estimation to the actual position. An experiment is conducted to validate the effectiveness and robust performance of the proposed controller.

ICRA Conference 2013 Conference Paper

Vision-based tracking control of nonholonomic mobile robots without position measurement

Kai Wang
Yun-Hui Liu 0001
Luyang Li

Localization is one of the most crucial and difficult problems for motion control of mobile robots despite of tremendous research efforts made for years. This paper presents a new vision-based controller for controlling a nonholonomic mobile robot to track a desired trajectory without directly measuring its position. A novel adaptive estimator is embedded into this new controller to estimate global position of the mobile robot online using natural visual features measured by a vision system, and its orientation and velocity measured by odometry/inertia/magnetic sensors. It is proved by Lyapunov theory that the proposed controller gives rise to asymptotic tracking of a desired trajectory and convergence of the position estimation to the actual position. The experiment is conducted to validate the proposed controller.

IS Journal 2011 Journal Article

Artificial Societies and GPU-Based Cloud Computing for Intelligent Transportation Management

Kai Wang
Zhen Shen

It is challenging to establish accurate mathematical models for complex systems, and experiments on them are generally costly or even impossible, making it difficult to analyze, control, and manage them. The ACP approach provides a way to attack this difficulty. However, with the agent technologies for the A (artificial societies) part, the burdens from computing agent behaviors and the algorithms' evaluating process are usually very heavy. Fortunately, computing hardware is going through a revolution with the development of graphics processing units (GPUs). A single GPU can provide numerous threads running together and is suitable for parallel computing. This article focuses on the C (computational experiments) part of the ACP approach. It explains the advantages of cloud computing and GPUs and presents the architectures of the GPU-based cloud computing for the transportation systems.

IS Journal 2011 Journal Article

Cloud Computing for Agent-Based Urban Transportation Systems

ZhenJiang Li
Cheng Chen
Kai Wang

Agent-based traffic management systems can use the autonomy, mobility, and adaptability of mobile agents to deal with dynamic traffic environments. Cloud computing can help such systems cope with the large amounts of storage and computing resources required to effectively use of traffic strategy agents and mass transport data. This article reviews the history of the development of traffic control and management systems within the evolving computing paradigm and shows the state of traffic control and management systems based on mobile multiagent technology. An intelligent transportation cloud could provide services such as decision support, a standard development environment for traffic management strategy, and so on. Moreover, the cloud can generate, store, manage, test, optimize, and use mobile traffic strategy agents to maximize advantages of cloud computing and agent technology to effectively control and manage urban-traffic systems.