Arrow Research search

Author name cluster

Wei Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

221 papers
2 author rows

Possible papers

221

JBHI Journal 2026 Journal Article

A 1D Snoring Waveform and 2D Composite Acoustic Feature Graph-Based Multi-Modal Fusion Network for Obstructive Sites Recognition

  • Xia Hu
  • Rui Fang
  • Huiping Luo
  • Jingchun Luo
  • Chen Chen
  • Wei Chen

As a critical factor in diagnostic work-up and treatment decision-making process of sleep-related breathing disorders, accurate localization of obstructive sites in the upper airway is in dire need. Snoring, as a dynamic acoustic signal, carries informative information relating to the sites and degree of obstruction in the upper airway, offering a non-invasive, cost-effective solution for obstructive sites recognition. However, most of existing snoring-based methods for recognizing obstructive sites only involve limited information (either mainly concentrated on traditional acoustic characteristics or spectrogram features), which may omit dynamic pathological information. Moreover, existing methods proceed from either a one-dimensional (1D) signal or two-dimensional (2D) image perspective, where complementary information from the other modality may be overlooked. In this paper, a multi-modal framework, which combines 1D snoring waveform and 2D Composite Acoustic Feature Graph (CAF-Graph), is proposed. 1D snoring waveform perceives fine time structure and local patterns, aiming at learning high-level discriminative representations by neural networks. 2D CAF-Graph is dedicated to emphasizing dynamic spatio-temporal and physiological-acoustic characteristic of snoring, which concatenates acoustic features related to Prosodic, Formant, Spectral, and Cepstral characteristics. Further, a multi-modal fusion network (BMFNet) effectively integrates independent and interactive information between single-modal features, which offers a more comprehensive perspective. The recognition task was formulated as a three-class classification problem, including upper (snoring caused by upper-level obstruction), lower (snoring caused by lower-level obstruction), and silence (obstruction without snoring). The proposed method was validated on a clinical dataset collected in the ENT institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, where reached 81. 2% Accuracy, 86. 8% Weighted Average Precision, 81. 2% Weighted Average Recall, and 82. 3% Weighted Average F1-Score. Results exhibit the effectiveness of multi-modal feature representations for snoring, providing a novel insight for obstructive sites recognition tasks.

AAAI Conference 2026 Conference Paper

Attention to Threat-Relevant Objects: Reasoning Detection in Autonomous Driving via Multimodal Large Language Models

  • Yulin He
  • Wei Chen
  • Xinbiao Gan
  • Siqi Wang
  • Haotian Wang
  • Yusong Tan

Perceiving threats is an innate human instinct. During driving, humans naturally focus their attention on objects that pose real potential risks. Motivated by this observation, we shift the focus from traditional class-based detection to a novel task termed threat-oriented reasoning detection in autonomous driving. This task aims to localize threat objects and reason about their threat levels from a driver-centric perspective. To support this task, we build a benchmark comprising diverse corner-case scenarios, annotated by multiple experienced drivers to reflect human-aligned threat cognition. Given the reasoning demands of this task, we then explore the capabilities of multi-modal large language models (MLLMs) and introduce two methods based on whether the MLLM supports object detection: 1) For MLLMs lacking detection capability, we introduce ThreatCoT, a plug-and-play training-free method that combines chain-of-thought (CoT) with a visual expert toolchain to support step-by-step reasoning. 2) For MLLMs with detection support, we introduce ThreatReasoner, an end-to-end reinforcement learning (RL)-based method built on the GRPO algorithm, which enables per-object reasoning through a fully unsupervised reward strategy. Both quantitative and qualitative experiments show that our methods can effectively unlock the new capabilities of MLLM in threat-oriented reasoning detection.

AAAI Conference 2026 Conference Paper

AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale

  • Ziyang Wang
  • Yuanlei Zheng
  • Zhenbiao Cao
  • Xiaojin Zhang
  • Zhongyu Wei
  • Pei Fu
  • Zhenbo Luo
  • Wei Chen

For industrial-scale text-to-SQL, supplying the entire database schema to Large Language Models (LLMs) is impractical due to context window limits and irrelevant noise. Schema linking, which filters the schema to a relevant subset, is therefore critical. However, existing methods incur prohibitive costs, struggle to trade off recall and noise, and scale poorly to large databases. We present AutoLink, an autonomous agent framework that reformulates schema linking as an iterative, agent-driven process. Guided by an LLM, AutoLink dynamically explores and expands the linked schema subset, progressively identifying necessary schema components without inputting the full database schema. Our experiments demonstrate AutoLink's superior performance, achieving state-of-the-art strict schema linking recall of 97.4% on Bird-Dev and 91.2% on Spider 2.0-Lite, with competitive execution accuracy, i.e., 68.7% EX on Bird-Dev (better than CHESS) and 34.9% EX on Spider 2.0-Lite (ranking 2nd on the official leaderboard). Crucially, AutoLink exhibits exceptional scalability, maintaining high recall, efficient token consumption, and robust execution accuracy on large schemas (e.g., over 3,000 columns) where existing methods severely degrade—making it a highly scalable, high-recall schema-linking solution for industrial text-to-SQL systems.

AAAI Conference 2026 Conference Paper

Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants

  • Wei Chen
  • Wanyang Gu
  • Linjun Peng
  • Ting Yan
  • Ruichu Cai
  • Zhifeng Hao
  • Kun Zhang

Federated causal discovery aims to uncover causal relationships while protecting data privacy, with significant real-world applications. Existing methods focus on horizontal federated settings where clients share the same variables but have different samples. However, in practice, clients may have different variables, leading to spurious causal relationships. To address this issue, we comprehensively consider causal structure learning methods under both horizontal and vertical federated settings. Interestingly, we find that, higher-order cumulants rely solely on the joint distribution of the relevant variables and are useful to solve the above problem in the linear non-Gaussian case. This motivates us to provide the identification theories for determining the causal order over observed variables, leveraging the difference in the product of the (cross) cumulants of the specific variables. Based on these theories, we develop a method for learning causal order in the horizontal and vertical federated scenarios. Specifically, we first obtain local (cross) cumulant matrices of observed variables from all participating clients to construct a global cumulant matrix. This global cumulant matrix is then used for recursive source variable identification, ultimately yielding a causal strength matrix of the union of variables from all clients. Our algorithm demonstrates superior performance in experiments on both synthetic and real-world data.

AAAI Conference 2026 Conference Paper

MHED-SLAM: Multi-Scale Hybrid Encoding-Based Decoupled SLAM

  • Dengfang Feng
  • Wenyang Qin
  • Zhongchen Shi
  • Wei Chen
  • Yanhui Duan
  • Liang Xie
  • Erwei Yin

Neural Radiance Fields (NeRF)-based Visual Simultaneous Localization and Mapping (SLAM) achieve superior scene geometric modeling and robust camera tracking by leveraging neural representations. Existing methods typically relied on multi-resolution hash encoding with truncated signed distance fields (TSDF) to achieve high frame rates. However, unavoidable hash collisions can lead to artifacts, and multi-view color inconsistencies in indoor scenes can result in shape-radiance ambiguity, adversely affecting geometric quality and tracking accuracy. To address these issues, we propose a novel Multi-scale Hybrid Encoding-based Decoupled SLAM (MHED-SLAM). First, to mitigate the adverse effects of hash collisions and reduce the number of learnable parameters, we innovatively fuse a coarse-scale hash tri-plane with a fine-scale hash grid within a single latent volume. Second, to enable precise geometric reconstruction and camera tracking, we decouple the reconstruction and rendering processes, independently learning a TSDF field for reconstruction and a density field for rendering. Third, we devise a Symmetric Kullback-Leibler (SKL) strategy based on ray termination distributions to align the probability distributions derived from the TSDF and density fields for their synchronous convergence. Extensive experimental evaluations demonstrate that our approach surpasses the state-of-the-art (SOTA) methods by utilizing a faster frame rate of 20 Hz and fewer parameters, while achieving higher tracking and reconstruction accuracy.

JBHI Journal 2026 Journal Article

Multi-Task Learning for OSA Detection and Sleep Staging via Multi-Scale Modeling

  • Zhiya Wang
  • Tian Yang
  • Yunfeng Zhu
  • Jia Liu
  • Peter A. Cistulli
  • Wei Chen

Obstructive sleep apnea (OSA) and sleep fragmentation are closely linked physiological phenomena that play crucial roles in the diagnosis and management of sleep disorders. While numerous deep learning models have been developed for either OSA detection or sleep stage classification, few attempts have been made to address both tasks simultaneously. To this end, we propose MT-TASPPNet (Multi-Task Triple Atrous Spatial Pyramid Pooling Network), a unified multi-modal multi-task network that jointly performs automatic OSA event detection and sleep staging. The model integrates modality-specific feature extractors for EEG, ECG, and airflow signals, and employs Atrous Spatial Pyramid Pooling modules in both the modality-specific and shared representation pathways to capture multi-scale temporal-frequency patterns. Additionally, an EOG-guided prior mechanism is incorporated to enhance the discrimination of subtle sleep stages. We use a 3-min input window (1-min target with ±1-min context) and evaluate our method on three large-scale datasets: SHHS1, SHHS2, and Sydney Sleep Biobank. The model achieves OSA detection accuracy between 0. 798 and 0. 884 (MF1: 0. 772 to 0. 821), and sleep staging accuracy between 0. 776 and 0. 834 (MF1: 0. 735 to 0. 749, $\mathcal {K}$: 0. 697 to 0. 77). Notably, the model maintains consistent performance despite data heterogeneity and individual variability. These results validate the stability and adaptability of MT-TASPPNet in clinical settings, paving the way for efficient and scalable multi-task sleep analysis systems.

JBHI Journal 2026 Journal Article

Multidomain Selective Feature Fusion and Stacking Based Ensemble Framework for EEG-Based Neonatal Sleep Stratification

  • Muhammad Irfan
  • Laishuan Wang
  • Husnain Shahid
  • Yan Xu
  • Abdulhamit Subasi
  • Adnan Munawar
  • Noman Mustafa
  • Chen Chen

Employing a minimal array of electroencephalography (EEG) channels for neonatal sleep stage classification is essential for data acquisition in the Internet of Medical Things (IoMT), as single-channel and edge-based features can reduce data transfer and processing requirements, enhancing cost-effectiveness and practicality. In this paper, we evaluate the efficacy of a single channel and the viability of a binary classification scheme for discerning awake and sleep states and transitions to quiet sleep. For this, two datasets of EEG signals for neonate sleep analysis were recorded from Children's Hospital of Fudan University, Shanghai, comprising recordings from 64 and 19 neonates, respectively. From each epoch, a diverse ensemble of 490 features was extracted through a blend of discrete and continuous wavelet transforms (DWT, CWT), spectral statistics, and temporal features. In addition, we introduced an innovative hybrid univariate and ensemble feature selection approach with multidomain feature fusion, a stacking-based ensemble classifier that outperforms existing work. We achieved 90. 37%, 91. 13%, and 94. 88% accuracy for sleep/awake, quiet sleep/non-quiet sleep, and quiet sleep/awake, respectively. This was corroborated by significant Kappa values of 77. 5%, 80. 29%, and 89. 76%. Using SelectPercentile, we devised three distinct feature selection mechanisms: one using DWT, one with CWT, and another incorporating both spectral and temporal features. Subsequently, SelectKBest was used to determine the most effective features. For our stacked model, we incorporated a trifecta of the ExtraTree model with variable estimators, a Random Forest, and an Artificial Neural Network (ANN) as base classifiers, and for the final prediction phase, ANN was implemented again. The model's performance was evaluated using K-fold and leave-one-subject cross-validation.

AAAI Conference 2026 Conference Paper

Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports from Scratch with Agentic Framework

  • Zhaorui Yang
  • Bo Pan
  • Han Wang
  • Yiyao Wang
  • Xingyu Liu
  • Luoxuan Weng
  • Yingchaojie Feng
  • Haozhe Feng

Visualizations play a crucial part in effective communication of concepts and information. Recent advances in reasoning and retrieval augmented generation have enabled Large Language Models (LLMs) to perform deep research and generate comprehensive reports. Despite its progress, existing deep research frameworks primarily focus on generating text-only content, leaving the automated generation of interleaved texts and visualizations underexplored. This novel task poses key challenges in designing informative visualizations and effectively integrating them with text reports. To address these challenges, we propose Formal Description of Visualization (FDV), a structured textual representation of charts that enables LLMs to learn from and generate diverse, high-quality visualizations. Building on this representation, we introduce Multimodal DeepResearcher, an agentic framework that decomposes the task into four stages: (1) researching, (2) exemplar report textualization, (3) planning and (4) multimodal report generation. For the evaluation of the generated reports, we develop MultimodalReportBench which contains 100 diverse topics as inputs, and a set of dedicated metrics for report and chart evaluation. Extensive experiments across models and evaluation methods demonstrate the effectiveness of Multimodal DeepResearcher. Notably, utilizing the same Claude 3.7 Sonnet model, Multimodal DeepResearcher achieves an 82% overall win rate over the baseline method.

AAAI Conference 2026 Conference Paper

Relation-R1: Progressively Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relation Comprehension

  • Lin Li
  • Wei Chen
  • Jiahui Li
  • Kwang-Ting Cheng
  • Long Chen

Recent advances in multi-modal large language models (MLLMs) have significantly improved object-level grounding and region captioning. However, they remain limited in visual relation understanding, struggling even with binary relation detection, let alone N-ary relations involving multiple semantic roles. The core reason is the lack of modeling for structural semantic dependencies among multi-entities, leading to over-reliance on language priors (e.g., defaulting to "person drinks a milk" if a person is merely holding it). To this end, we propose Relation-R1, the first unified relation comprehension framework that explicitly integrates cognitive chain-of-thought (CoT)-guided supervised fine-tuning (SFT) and group relative policy optimization (GRPO) within a reinforcement learning (RL) paradigm. Specifically, we first establish foundational reasoning capabilities via SFT, enforcing structured outputs with thinking processes. Then, GRPO is utilized to refine these outputs via multi-rewards optimization, prioritizing visual-semantic grounding over language-induced biases, thereby improving generalization capability. Furthermore, we investigate the impact of various CoT strategies within this framework, demonstrating that a specific-to-general progressive approach in CoT guidance further improves generalization, especially in capturing synonymous N-ary relations. Extensive experiments on widely-used PSG and SWiG datasets demonstrate that Relation-R1 achieves state-of-the-art performance in both binary and N-ary relation understanding.

AAAI Conference 2025 Conference Paper

A Systematic Exploration of Knowledge Graph Alignment with Large Language Models in Retrieval Augmented Generation

  • Shiyu Tian
  • Shuyue Xing
  • Xingrui Li
  • Yangyang Luo
  • Caixia Yuan
  • Wei Chen
  • Huixing Jiang
  • Xiaojie Wang

Retrieval Augmented Generation (RAG) with Knowledge Graphs (KGs) is an effective way to enhance Large Language Models (LLMs). Due to the natural discrepancy between structured KGs and sequential LLMs, KGs must be linearized to text before being inputted into LLMs, leading to the problem of KG Alignment with LLMs (KGA). However, recent KG+RAG methods only consider KGA as a simple step without comprehensive and in-depth explorations, leaving three essential problems unclear: (1) What are the factors and their effects in KGA? (2) How do LLMs understand KGs? (3) How to improve KG+RAG by KGA? To fill this gap, we conduct systematic explorations on KGA, where we first define the problem of KGA and subdivide it into the graph transformation phase (graph-to-graph) and the linearization phase (graph-to-text). In the graph transformation phase, we study graph features at the node, edge, and full graph levels from low to high granularity. In the linearization phase, we study factors on formats, orders, and templates from structural to token levels. We conduct substantial experiments on 15 typical LLMs and three common datasets. Our main findings include: (1) The centrality of the KG affects the final generation; formats have the greatest impact on KGA; orders are model-dependent, without an optimal order adapting for all models; the templates with special token separators are better. (2) LLMs understand KGs by a unique mechanism, different from processing natural sentences, and separators play an important role. (3) We achieved 7.3% average performance improvements on four common LLMs on the KGQA task by combining the optimal factors to enhance KGA.

AAAI Conference 2025 Conference Paper

Achieving Speed-Accuracy Balance in Vision-based 3D Occupancy Prediction via Geometric-Semantic Disentanglement

  • Yulin He
  • Wei Chen
  • Siqi Wang
  • Tianci Xun
  • Yusong Tan

Occupancy prediction plays a pivotal role in autonomous driving (AD) due to its capabilities of fine-grained 3D perception and general object recognition. However, existing methods often incur high computational costs, which conflict with AD's real-time demand. To this end, we redirect the focus from accuracy only to both accuracy and efficiency. By conducting a head-to-head comparison of existing methods, we find it challenging to balance accuracy and efficiency. We identify a core issue for this challenge: the strong coupling between geometry and semantics. Specifically, the predicted geometric structure (e.g., depth) guides the projection of 2D image features into 3D voxel space, which significantly affects feature discriminability and subsequent semantic learning. To address this issue, we focus on two key aspects: model design and learning strategies. 1) For model design, we propose a dual-branch network that disentangles the representation of geometry and semantics. The voxel branch utilizes a novel re-parameterized large-kernel 3D convolution to refine geometric structure efficiently, while the BEV branch employs temporal fusion and BEV encoding for efficient semantic learning. 2) For learning strategies, we propose to separate geometric learning from semantic learning by the mixup of ground-truth and predicted depths. Our method achieves 39.4% mIoU at 20 FPS on Occ3D-nuScenes, showcasing a state-of-the-art balance between accuracy and efficiency.

ICLR Conference 2025 Conference Paper

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

  • Jianqun Zhou
  • Yuanlei Zheng
  • Wei Chen
  • Qianqian Zheng
  • Zeyuan Shang
  • Wei Zhang 0185
  • Rui Meng
  • Xiaoyu Shen 0001

Instruction-following capabilities in large language models (LLMs) have progressed significantly, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets and increasing model size enhance performance, most models still fall short of instruction compliance. We release our dataset and code on https://github.com/EIT-NLP/InfoSearch.

IJCAI Conference 2025 Conference Paper

Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting

  • Wei Chen
  • Jiahao Zhang
  • Haipeng Zhu
  • Boyan Xu
  • Zhifeng Hao
  • Keli Zhang
  • Junjian Ye
  • Ruichu Cai

Large language models (LLMs) have shown great potential in decision-making due to the vast amount of knowledge stored within the models. However, these pre-trained models are prone to lack reasoning abilities and are difficult to adapt to new environments, further hindering their application to complex real-world tasks. To address these challenges, inspired by the human cognitive process, we propose Causal-Aware LLMs, which integrate the structural causal model (SCM) into the decision-making process to model, update, and utilize structured knowledge of the environment in a "learning-adapting-acting" paradigm. Specifically, in the learning stage, we first utilize an LLM to extract the environment-specific causal entities and their causal relations to initialize a structured causal model of the environment. Subsequently, in the adapting stage, we update the structured causal model through external feedback about the environment, via an idea of causal intervention. Finally, in the acting stage, Causal-Aware LLMs exploit structured causal knowledge for more efficient policy-making through the reinforcement learning agent. The above processes are performed iteratively to learn causal knowledge, ultimately enabling the causal-aware LLM to achieve a more accurate understanding of the environment and make more efficient decisions. Experimental results across 22 diverse tasks within the open-world game "Crafter" validate the effectiveness of our proposed method.

AAAI Conference 2025 Conference Paper

CognTKE: A Cognitive Temporal Knowledge Extrapolation Framework

  • Wei Chen
  • Yuting Wu
  • Shuhan Wu
  • Zhiyu Zhang
  • Mengqi Liao
  • Youfang Lin
  • Huaiyu Wan

Reasoning future unknowable facts on temporal knowledge graphs (TKGs) is a challenging task, holding significant academic and practical values for various fields. Existing studies exploring explainable reasoning concentrate on modeling comprehensible temporal paths relevant to the query. Yet, these path-based methods primarily focus on local temporal paths appearing in recent times, failing to capture the complex temporal paths in TKG and resulting in the loss of longer historical relations related to the query. Motivated by the Dual Process Theory in cognitive science, we propose a Cognitive Temporal Knowledge Extrapolation framework (CognTKE), which introduces a novel temporal cognitive relation directed graph (TCR-Digraph) and performs interpretable global shallow reasoning and local deep reasoning over the TCR-Digraph. Specifically, the proposed TCR-Digraph is constituted by retrieving significant local and global historical temporal relation paths associated with the query. In addition, CognTKE presents the global shallow reasoner and the local deep reasoner to perform global one-hop temporal relation reasoning (System 1) and local complex multi-hop path reasoning (System 2) over the TCR-Digraph, respectively. The experimental results on four benchmark datasets demonstrate that CognTKE achieves significant improvement in accuracy compared to the state-of-the-art baselines and delivers excellent zero-shot reasoning ability.

AAAI Conference 2025 Conference Paper

Correlation-Attention Masked Temporal Transformer for User Identity Linkage Using Heterogeneous Mobility Data

  • Ziang Yan
  • Xingyu Zhao
  • Hanqing Ma
  • Wei Chen
  • Jianpeng Qi
  • Yanwei Yu
  • Junyu Dong

With the rise of social media and Location-Based Social Networks (LBSN), check-in data across platforms has become crucial for User Identity Linkage (UIL). These data not only reveal users' spatio-temporal information but also provide insights into their behavior patterns and interests. However, cross-platform identity linkage faces challenges like poor data quality, high sparsity, and noise interference, which hinder existing methods from extracting cross-platform user information. To address these issues, we propose a Correlation-Attention Masked Transformer for User Identity Link age Network (MT-Link), a transformer-based framework to enhance model performance by learning spatio-temporal co-occurrence patterns of cross-platform users. Our model effectively captures spatio-temporal co-occurrence in cross-platform user check-in sequences. It employs a correlation attention mechanism to detect the spatio-temporal co-occurrence between user check-in sequences. Guided by attention weight maps, the model focuses on co-occurrence points while filtering out noise, ultimately improving classification performance. Experimental results show that our model significantly outperforms state-of-the-art baselines by 12.92%-17.76% and 5.80%-8.38% improvements in terms of Macro-F1 and Area Under Curve (AUC).

IJCAI Conference 2025 Conference Paper

Credit Assignment and Fine-Tuning Enhanced Reinforcement Learning for Collaborative Spatial Crowdsourcing

  • Wei Chen
  • Yafei Li
  • Baolong Mei
  • Guanglei Zhu
  • Jiaqi Wu
  • Mingliang Xu

Collaborative spatial crowdsourcing leverages distributed workers' collective intelligence to accomplish spatial tasks. A central challenge is to efficiently assign suitable workers to collaborate on these tasks. Although mainstream reinforcement learning (RL) methods have proven effective in task allocation, they face two key obstacles: delayed reward feedback and non-stationary data distributions, both hindering optimal allocation and collaborative efficiency. To address these limitations, we propose CAFE (credit assignment and fine-tuning enhanced), a novel multi-agent RL framework for spatial crowdsourcing. CAFE introduces a credit assignment mechanism that distributes rewards based on workers' contributions and spatiotemporal constraints, coupled with bi-level meta-optimization to jointly optimize credit assignment and RL policy. To handle non-stationary spatial task distributions, CAFE employs an adaptive fine-tuning procedure that efficiently adjusts credit assignment parameters while preserving collaborative knowledge. Experiments on two real-world datasets validate the effectiveness of our framework, demonstrating superior performance in terms of task completion and equitable reward redistribution.

AAAI Conference 2025 Conference Paper

Data with High and Consistent Preference Difference Are Better for Reward Model

  • Qi Lin
  • Hengtong Lu
  • Caixia Yuan
  • Xiaojie Wang
  • Huixing Jiang
  • Wei Chen

Reinforcement Learning from Human Feedback (RLHF) is a commonly used alignment method for Large Language Models (LLMs). This method relies on a reward model trained on a preference dataset to provide scalar rewards. However, the human-annotated preference data is often sparse, noisy, and costly to obtain, necessitating more efficient utilization. This paper proposes a new metric for better preference data utilization from both theoretical and empirical perspectives. Starting with the Bradley-Terry model, we compute the Mean Square Error (MSE) between the expected loss and empirical loss of the reward model. Our findings reveal that data with higher and more consistent difference result in lower MSE. We therefore propose the Preference Difference (PD), the reward difference between two samples, as a filter for preference data. Experimental results on three open-source models show that reward models trained by filtered data with PD achieve higher calibrated accuracy, as well as better RLHF alignment performance. The conclusion remains consistent when we extend the experiments and theoretical derivations to implicit reward alignment algorithms, such as Direct Preference Optimization (DPO).

IROS Conference 2025 Conference Paper

Data-driven Visual Servoing of Flexible Continuum Robots in Constrained Environments

  • Wei Chen
  • Haiwen Wu
  • Xiyue Dong
  • Bohan Yang 0005
  • Yun-Hui Liu

Flexible continuum robots operating in constrained and dynamic environments face significant challenges, especially when interacting with uncertain and potentially unknown conditions. Traditional model-based methods face significant difficulties due to the inherent nonlinearities and uncertainties in robot dynamics, as well as the complexities introduced by environmental interactions. This work presents a new data-driven, model-free control strategy for flexible continuum robots operating in constrained environments, leveraging Lie bracket approximations to achieve effective regulation. The method enables effective visual servoing without requiring explicit kinematic or dynamic models, making it highly adaptable to diverse scenarios where environmental constraints and robot deformation impact system performance. Additionally, it does not rely on initial state estimation, further enhancing its suitability for dynamic, uncertain environments. The effectiveness of the proposed method is validated through simulations and experiments, showing enhanced robustness and adaptability in real-time control scenarios.

AAAI Conference 2025 Conference Paper

Debiased Active Learning with Variational Gradient Rectifier

  • Weiguo Chen
  • Changjian Wang
  • Shijun Li
  • Kele Xu
  • Yanru Bai
  • Wei Chen
  • Shanshan Li

The strategy of selecting ``most informative'' hard samples in active learning has proven a boon for alleviating the challenges of few-shot learning and costly data annotation in deep learning. However, this very preference towards hard samples engenders bias issues, thereby impeding the full potential of active learning. It has witnessed an increasing trend to mitigate this stubborn problem, yet most neglect the quantification of bias itself and the direct rectification of dynamically evolving biases. Revisiting the bias issue, this paper presents an active learning approach based on the Variational Gradient Rectifier (VaGeRy). First, we employ variational methods to quantify bias at the level of latent state representations. Then, harnessing historical training dynamics, we introduce Uncertainty Consistency Regularization and Fluctuation Restriction, which asynchronously iterate to rectify gradient backpropagation. Extensive experiments demonstrate that our proposed methodology effectively counteracts bias phenomena in a majority of active learning scenarios

NeurIPS Conference 2025 Conference Paper

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

  • Wei Chen
  • Xin Yan
  • Bin Wen
  • Fan Yang
  • Tingting Gao
  • Di Zhang
  • Long Chen

Although multimodal large language models (MLLMs) exhibit remarkable reasoning capabilities on complex multimodal understanding tasks, they still suffer from the notorious 'hallucination' issue: generating outputs misaligned with obvious visual or factual evidence. Currently, training-based solutions, like direct preference optimization (DPO), leverage paired preference data to suppress hallucinations. However, they risk sacrificing general reasoning capabilities due to the likelihood displacement. Meanwhile, training-free solutions, like contrastive decoding, achieve this goal by subtracting the estimated hallucination pattern from a distorted input. Yet, these handcrafted perturbations (e. g. , add noise to images) may poorly capture authentic hallucination patterns. To avoid these weaknesses of existing methods, and realize ``robust'' hallucination mitigation (\ie, maintaining general reasoning performance), we propose a novel framework: Decoupling Contrastive Decoding (DCD). Specifically, DCD decouples the learning of positive and negative samples in preference datasets, and trains separate positive and negative image projections within the MLLM. The negative projection implicitly models real hallucination patterns, which enables vision-aware negative images in the contrastive decoding inference stage. Our DCD alleviates likelihood displacement by avoiding pairwise optimization and generalizes robustly without handcrafted degradation. Extensive ablations across hallucination benchmarks and general reasoning tasks demonstrate the effectiveness of DCD, \ie, it matches DPO’s hallucination suppression while preserving general capabilities and outperforms the handcrafted contrastive decoding methods.

AAAI Conference 2025 Conference Paper

DiffDVC: Accurate Event Detection for Dense Video Captioning via Diffusion Models

  • Wei Chen
  • Jianwei Niu
  • Xuefeng Liu
  • Zhendong Wang
  • Shaojie Tang
  • Guogang Zhu

Dense video captioning (DVC) aims to describe multiple events within a video, and its performance is greatly affected by the accuracy of video event detection. Video event detection involves predicting the proposal boundaries (start and end times) and the classification score of each event in a video. Recently, a few methods have applied diffusion models originally designed for image object detection to detect events in DVC. These methods add noise to the ground-truth event proposal boundaries, and subsequently learn the denoising process. However, these methods often overlook the fundamental differences between videos and images. We observe that, whereas in images the important information for object classification is normally around the boundaries of the ground-truth boxes, in videos the key information for event classification is typically centered in the middle of ground-truth event proposals. As a result, the classification module in these existing diffusion models becomes insensitive to boundary changes introduced by the added noise, leading to sub-optimal performance. This paper introduces DiffDVC, an innovative diffusion model for DVC. The core of DiffDVC is a boundary-sensitive detector. The detector increases the sensitivity of the classification module to boundary changes by focusing on frames within a specific range around the start and end times of noisy event proposals. Additionally, this range is dynamically adjusted to suit different event proposals. Comprehensive experiments on ActivityNet-1.3, ActivityNet Captions, and YouCook2 datasets show DiffDVC achieving superior performance.

NeurIPS Conference 2025 Conference Paper

EVODiff: Entropy-aware Variance Optimized Diffusion Inference

  • Shigui Li
  • Wei Chen
  • Delu Zeng

Diffusion models (DMs) excel in image generation but suffer from slow inference and training-inference discrepancies. Although gradient-based solvers for DMs accelerate denoising inference, they often lack theoretical foundations in information transmission efficiency. In this work, we introduce an information-theoretic perspective on the inference processes of DMs, revealing that successful denoising fundamentally reduces conditional entropy in reverse transitions. This principle leads to our key insights into the inference processes: (1) data prediction parameterization outperforms its noise counterpart, and (2) optimizing conditional variance offers a reference-free way to minimize both transition and reconstruction errors. Based on these insights, we propose an entropy-aware variance optimized method for the generative process of DMs, called EVODiff, which systematically reduces uncertainty by optimizing conditional entropy during denoising. Extensive experiments on DMs validate our insights and demonstrate that our method significantly and consistently outperforms state-of-the-art (SOTA) gradient-based solvers. For example, compared to the DPM-Solver++, EVODiff reduces the reconstruction error by up to 45. 5\% (FID improves from 5. 10 to 2. 78) at 10 function evaluations (NFE) on CIFAR-10, cuts the NFE cost by 25\% (from 20 to 15 NFE) for high-quality samples on ImageNet-256, and improves text-to-image generation while reducing artifacts. Code is available at https: //github. com/ShiguiLi/EVODiff.

IJCAI Conference 2025 Conference Paper

FCKT: Fine-Grained Cross-Task Knowledge Transfer with Semantic Contrastive Learning for Targeted Sentiment Analysis

  • Wei Chen
  • Zhao Zhang
  • Meng Yuan
  • Kepeng Xu
  • Fuzhen Zhuang

In this paper, we address the task of targeted sentiment analysis, which involves two sub-tasks, i. e. , identifying specific aspects from reviews and determining their corresponding senti-ments. Aspect extraction forms the foundation for sentiment prediction, highlighting the critical dependency between these two tasks for effective cross-task knowledge transfer. While most existing studies adopt a multi-task learning paradigm to align task-specific features in the latent space, they predominantly rely on coarse-grained knowledge transfer. Such approaches lack fine-grained control over aspect-sentiment relationships, often assuming uniform sentiment polarity within related aspects. This oversimplification neglects contextual cues that differentiate sentiments, leading to negative transfer. To overcome these limitations, we propose FCKT, a fine-grained cross-task knowledge transfer framework tailored for TSA. By explicitly incorporating aspect-level information into sentiment prediction, our framework achieves fine-grained knowledge transfer, effectively mitigating negative transfer and enhancing task performance. Extensive experiments on three real-world datasets, including comparisons with various baselines and large language models (LLMs), demonstrate the effectiveness of FCKT. The source code is available on https: //github. com/cwei01/FCKT.

AAAI Conference 2025 Conference Paper

FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning

  • Jialuo He
  • Wei Chen
  • Xiaojin Zhang

Federated Learning (FL) has emerged as a promising approach for privacy-preserving model training across decentralized devices. However, it faces challenges such as statistical heterogeneity and susceptibility to adversarial attacks, which can impact model robustness and fairness. Personalized FL attempts to provide some relief by customizing models for individual clients. However, it falls short in addressing server-side aggregation vulnerabilities. We introduce a novel method called FedAA, which optimizes client contributions via Adaptive Aggregation to enhance model robustness against malicious clients and ensure fairness across participants in non-identically distributed settings. To achieve this goal, we propose an approach involving a Deep Deterministic Policy Gradient-based algorithm for continuous control of aggregation weights, an innovative client selection method based on model parameter distances, and a reward mechanism guided by validation set performance. Empirically, extensive experiments demonstrate that, in terms of robustness, FedAA outperforms the state-of-the-art methods, while maintaining comparable levels of fairness, offering a promising solution to build resilient and fair federated systems.

AAAI Conference 2025 Conference Paper

Gradient-Guided Credit Assignment and Joint Optimization for Dependency-Aware Spatial Crowdsourcing

  • Yafei Li
  • Wei Chen
  • Jinxing Yan
  • Huiling Li
  • Lei Gao
  • Mingliang Xu

Dependency-aware spatial crowdsourcing (DASC) addresses the unique challenges posed by subtask dependencies in spatial task assignment. This paper investigates the task assignment problem in DASC and proposes a two-stage Recommend and Match Optimization (RMO) framework, leveraging multi-agent reinforcement learning for subtask recommendation and a multi-dimensional utility function for subtask matching. The RMO framework primarily addresses two key challenges: credit assignment for subtasks with interdependencies and maintaining overall coherence between subtask recommendation and matching. Specifically, we employ meta-gradients to construct auxiliary policies and establish a gradient connection between two stages, which can effectively address credit assignment and joint optimization of subtask recommendation and matching, while concurrently accelerating network training. We further establish a unified gradient descent process through gradient synchronization across recommendation networks, auxiliary policies, and the matching utility evaluation function. Experiments on two real-world datasets validate the effectiveness and feasibility of our proposed approach.

IROS Conference 2025 Conference Paper

GraphGarment: Learning Garment Dynamics for Bimanual Cloth Manipulation Tasks

  • Wei Chen
  • Kelin Li
  • Dongmyoung Lee
  • Xiaoshuai Chen
  • Rui Zong
  • Petar Kormushev

Physical manipulation of garments is often crucial when performing fabric-related tasks, such as hanging garments. However, due to the deformable nature of fabrics, these operations remain a significant challenge for robots in household, healthcare, and industrial environments. In this paper, we propose GraphGarment, a novel approach that models garment dynamics based on robot control inputs and applies the learned dynamics model to facilitate garment manipulation tasks such as hanging. Specifically, we use graphs to represent the interactions between the robot end-effector and the garment. GraphGarment uses a graph neural network (GNN) to learn a dynamics model that can predict the next garment state given the current state and input action in simulation. To address the substantial sim-to-real gap, we propose a residual model that compensates for garment state prediction errors, thereby improving real-world performance. The garment dynamics model is then applied to a model-based action sampling strategy, where it is utilized to manipulate the garment to a reference pre-hanging configuration for garment-hanging tasks. We conducted four experiments using six types of garments to validate our approach in both simulation and real-world settings. In simulation experiments, GraphGarment achieves better garment state prediction performance, with a prediction error 0. 46 cm lower than the best baseline. Our approach also demonstrates improved performance in the garment-hanging simulation experiment—with enhancements of 12%, 24%, and 10%, respectively. Moreover, real-world robot experiments confirm the robustness of sim-to-real transfer, with an error increase of 0. 17 cm compared to simulation results. Supplementary material is available at: https://sites.google.com/view/graphgarment.

AAAI Conference 2025 Conference Paper

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

  • Xiangchen Yin
  • Donglin Di
  • Lei Fan
  • Hao Li
  • Wei Chen
  • Gouxiaofei
  • Yang Song
  • Xiao Sun

Recent methods using diffusion models have made significant progress in human image generation with various control signals such as pose priors. However, existing efforts are still struggling to generate high-quality images with consistent pose alignment, resulting in unsatisfactory output. In this paper, we propose a framework that delves into the graph relations of pose priors to provide control information for human image generation. The main idea is to establish a graph topological structure between the pose priors and latent representation of diffusion models to capture the intrinsic associations between different pose parts. A Progressive Graph Integrator (PGI) is designed to learn the spatial relationships of the pose priors with the graph structure, adopting a hierarchical strategy within an Adapter to gradually propagate information across different pose parts. Besides, a pose perception loss is introduced based on a pretrained pose estimation network to minimize the pose differences. Extensive qualitative and quantitative experiments conducted on the Human-Art and LAION-Human datasets clearly demonstrate that our model can achieve significant performance improvement over the latest benchmark models.

IROS Conference 2025 Conference Paper

Haptic-ACT: Bridging Human Intuition with Compliant Robotic Manipulation via Immersive VR

  • Kelin Li
  • Shubham M. Wagh
  • Nitish Sharma
  • Saksham Bhadani
  • Wei Chen
  • Chang Liu
  • Petar Kormushev

Robotic manipulation is essential for the widespread adoption of robots in industrial and home settings and has long been a focus within the robotics community. Advances in artificial intelligence have introduced promising learning-based methods to address this challenge, with imitation learning emerging as particularly effective. However, efficiently acquiring high-quality demonstrations remains a challenge. In this work, we introduce an immersive VR-based teleoperation setup designed to collect demonstrations from a remote human user. We also propose an imitation learning framework called Haptic Action Chunking with Transformers (Haptic-ACT). To evaluate the platform, we conducted a pick-and-place task and collected 50 demonstration episodes. Results indicate that the immersive VR platform significantly reduces demonstrator fingertip forces compared to systems without haptic feedback, enabling more delicate manipulation. Additionally, evaluations of the Haptic-ACT framework in both the MuJoCo simulator and on a real robot demonstrate its effectiveness in teaching robots more compliant manipulation compared to the original ACT. Additional materials are available at https://sites.google.com/view/hapticact.

NeurIPS Conference 2025 Conference Paper

In-Context Compositional Learning vis Sparse Coding Transformer

  • Wei Chen
  • Jingxi Yu
  • Zichen Miao
  • Qiang Qiu

Recent advances in AI, driven by Transformer architectures, have achieved remarkable success in language, vision, and multimodal reasoning, and there is growing demand for them to address in-context compositional learning tasks. In these tasks, models solve the target problems by inferring compositional rules from context examples, which are composed of basic components structured by underlying rules. However, some of these tasks remain challenging for Transformers, which are not inherently designed to handle compositional tasks and offer limited structural inductive bias. Inspired by sparse coding, we propose a reformulation of the attention to enhance its capability for compositional tasks. In sparse coding, data are represented as sparse combinations of basic elements, with the resulting coefficients capturing the underlying compositional structure of the input. Specifically, we reinterpret the standard attention block as projecting inputs into outputs through projections onto two sets of learned dictionary atoms: an encoding dictionary and a decoding dictionary. The encoding dictionary decomposes the input into a set of coefficients, which represent the compositional structure of the input. To enhance structured representations, we impose sparsity on these coefficients. The sparse coefficients are then used to linearly combine the decoding dictionary atoms to generate the output. Furthermore, to assist compositional generalization tasks, we propose estimating the coefficients of the target problem as a linear combination of the coefficients obtained from the context examples. We demonstrate the effectiveness of our approach on the S-RAVEN and RAVEN datasets. For certain compositional generalization tasks, our method maintains performance even when standard Transformers fail, owing to its ability to learn and apply compositional rules.

IJCAI Conference 2025 Conference Paper

Instance Relation Learning Network with Label Knowledge Propagation for Few-shot Multi-label Intent Detection

  • Shiman Zhao
  • Shangyuan Li
  • Wei Chen
  • Tengjiao Wang
  • Jiahui Yao
  • Jiabin Zheng
  • Kam-Fai Wong

Few-shot Multi-label Intent Detection (MID) is crucial for dialogue systems, aiming to detect multiple intents of utterances in low-resource dialogue domains. Previous studies focus on a two-stage pipeline. They first learn representations of utterances with multiple labels and then use a threshold-based strategy to identify multi-label results. However, these methods rely on representation classification and ignore instance relations, leading to error propagation. To solve the above issues, we propose a multi-label joint learning method for few-shot MID in an end-to-end manner, which constructs an instance relation learning network with label knowledge propagation to eliminate error propagation. Concretely, we learn the interaction relations between instances with class information to propagate label knowledge between a few labeled (support set) and unlabeled (query set) instances. With label knowledge propagation, the relation strength between instances directly indicates whether two utterances belong to the same intent for multi-label prediction. Besides, a dual relation-enhanced loss is developed to optimize support- and query-level relation strength to improve performance. Experiments show that we outperform strong baselines by an average of 9. 54% AUC and 11. 19% Macro-F1 in 1-shot scenarios.

NeurIPS Conference 2025 Conference Paper

Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting

  • Wei Chen
  • Yuxuan Liang

Spatio-temporal forecasting is crucial in many domains, such as transportation, meteorology, and energy. However, real-world scenarios frequently present challenges such as signal anomalies, noise, and distributional shifts. Existing solutions primarily enhance robustness by modifying network architectures or training procedures. Nevertheless, these approaches are computationally intensive and resource-demanding, especially for large-scale applications. In this paper, we explore a novel t est- t ime c omputing paradigm, namely learning with calibration, ST-TTC, for s patio- t emporal forecasting. Through learning with calibration, we aim to capture periodic structural biases arising from non-stationarity during the testing phase and perform real-time bias correction on predictions to improve accuracy. Specifically, we first introduce a spectral-domain calibrator with phase-amplitude modulation to mitigate periodic shift and then propose a flash updating mechanism with a streaming memory queue for efficient test-time computation. ST-TTC effectively bypasses complex training-stage techniques, offering an efficient and generalizable paradigm. Extensive experiments on real-world datasets demonstrate the effectiveness, universality, flexibility and efficiency of our proposed method.

AAAI Conference 2025 Conference Paper

Learnware Specification via Label-Aware Neural Embedding

  • Wei Chen
  • Jun-Xiang Mao
  • Min-Ling Zhang

The learnware paradigm aims to establish a learnware dock system of numerous well-trained machine learning models, enabling users to reuse existing helpful models for their tasks instead of starting from scratch. Each learnware in the system is a well-established model submitted by its developer, associated with a specification generated by the learnware dock system. The specification characterizes the specialty of the corresponding model, enabling it to be identified accurately for new task requirements. Existing specification generation methods are mostly based on the Reduced Kernel Mean Embedding (RKME) technique, which uses the Maximum Mean Discrepancy (MMD) in the Reproducing Kernel Hilbert Space (RKHS) to seek a reduced set that characterizes the model's capabilities. However, existing RKME-based methods mainly utilize feature information to generate specifications by assuming the existence of the ground-truth labeling function, while leaving the label information, which is capable of providing rich semantic characterization, untouched. Furthermore, the quality of the generated specifications heavily relies on the choice of the kernels, which makes it prohibitive to adapt to all real-world scenarios. In this paper, to overcome the above limitations, we propose a novel specification approach named LANE, i.e., Label-Aware Neural Embedding. In LANE, the neural embedding space is utilized to replace the RKHS, effectively circumventing the step of kernel selection and thereby addressing the dependency on kernels in existing RKME-based specification methods. More importantly, LANE uses the label information as additional supervision to enhance the generation process, resulting in specifications of superior quality. Extensive experiments demonstrate the effectiveness and superiority of the proposed LANE approach in the learnware paradigm.

AAAI Conference 2025 Conference Paper

Make Domain Shift a Catastrophic Forgetting Alleviator in Class-Incremental Learning

  • Wei Chen
  • Yi Zhou

In the realm of class-incremental learning (CIL), alleviating the catastrophic forgetting problem is a pivotal challenge. This paper discovers a counter-intuitive observation: by incorporating domain shift into CIL tasks, the forgetting rate is significantly reduced. Our comprehensive studies demonstrate that incorporating domain shift leads to a clearer separation in the feature distribution across tasks and helps reduce parameter interference during the learning process. Inspired by this observation, we propose a simple yet effective method named DisCo to deal with CIL tasks. DisCo introduces a lightweight prototype pool that utilizes contrastive learning to promote distinct feature distributions for the current task relative to previous ones, effectively mitigating interference across tasks. DisCo can be easily integrated into existing state-of-the-art class-incremental learning methods. Experimental results show that incorporating our method into various CIL methods achieves substantial performance improvements, validating the benefits of our approach in enhancing class-incremental learning by separating feature representation and reducing interference. These findings illustrate that DisCo can serve as a robust fashion for future research in class-incremental learning.

NeurIPS Conference 2025 Conference Paper

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

  • Haoran Sun
  • Yurong Chen
  • Siwei Wang
  • Chu Xu
  • Wei Chen
  • Xiaotie Deng

Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, but this could harm the aggregation performance. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on training rules that maximize social welfare subject to certain regularizations, referred to as SW-Max rules. First, we show that under most circumstances, truthful reporting is sub-optimal with simply a SW-Max rule, thereby highlighting the necessity of payments. Second, we extend the VCG payment to implement SW-Max rules in dominant-strategy incentive compatibility (DSIC). We characterize sufficient conditions for payment equivalence and derive the necessary conditions for a payment rule to implement a SW-Max rule in DSIC and other principles. Third, we demonstrate that our mechanism is approximately DSIC with perturbed input, showcasing its robustness against the inevitable errors in real-world applications. Experiments on real LLM training results further confirm the practical implications of our results.

JBHI Journal 2025 Journal Article

MtRBD: Advancing iRBD Analysis With Multi-Task Learning for Joint Sleep Staging and RSWA Detection

  • Xiaoyu Chen
  • Yiyuan Zhang
  • Zhenning Tang
  • Huijuan Wu
  • Chen Chen
  • Wei Chen

Rapid eye movement (REM) sleep without atonia (RSWA) is a critical diagnostic criterion for REM sleep behavior disorder (RBD). Current clinical practices rely on time-consuming manual annotation, increasing workload, and introducing variability. Existing automated methods, including CNN, LSTM, Transformer, and their combinations, fail to fully exploit the inherent physiological relationship between sleep staging and RSWA detection, while also facing challenges such as severe data imbalance and the complex fusion of multichannel signals. To address these limitations, we propose a multi-task learning framework that jointly optimizes both tasks. Our Multi-scale Information Attention Bottleneck Network (MIABNet) backbone improves multichannel fusion. Building upon MIABNet, the Multi-task RBD Network (MtRBD) implements dynamic feature enhancement, facilitating cross-task information flow while preserving physiological relationships. On the clinical CZ-RBD dataset collected from 30 patients over 59 nights at Shanghai Changzheng Hospital, our framework achieved the accuracy of 83. 0% for sleep staging and 93. 6% for RSWA detection, with an accuracy of 77. 6% for critical RSWA event identification, outperforming single-task methods in cross-subject validation. Through modality selection and Grad-CAM visualization, we enhance clinical interpretability, providing reliable support for early detection of RBD and related neurodegenerative diseases.

IJCAI Conference 2025 Conference Paper

Multi-view Clustering via Multi-granularity Ensemble

  • Jie Yang
  • Wei Chen
  • Feng Liu
  • Peng Zhou
  • Zhongli Wang
  • Xinyan Liang
  • Bingbing Jiang

Multi-view clustering aims to integrate complementary information from multiple views to improve clustering performance. However, existing ensemble-based methods suffer from information loss due to their reliance on single-granularity labels, limiting the discriminative capability of learned representations. Meanwhile, representation and graph fusion-based approaches face challenges such as explicit view alignment and manual weight tuning, making them less effective for heterogeneous views with varying data distributions. To address these limitations, we propose a novel multi-view clustering framework via Multi-granularity Ensemble (MGE), fully using the multi-granularity information across diverse views for accurate and consistent clustering. Specifically, MGE first modifies the hierarchical clustering and then leverages it on each view (including the fused view) to achieve multi-granularity labels. Moreover, the cross-view and cross-granularity fusion strategy is designed to learn a robust co-association similarity matrix, which effectively preserves the fine-grained and coarse-grained structures of multi-view data and facilitates subsequent clustering. Therefore, MGE can provide a comprehensive representation of local and global patterns within data, eliminating the requirement for view alignment and weight tuning. Experiments demonstrate that MGE consistently outperforms state-of-the-art methods across multiple datasets, validating its effectiveness and superiority in handling heterogeneous views.

NeurIPS Conference 2025 Conference Paper

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

  • Ling Fu
  • Zhebin Kuang
  • Jiajun Song
  • Mingxin Huang
  • Biao Yang
  • Yuzhe Li
  • Linghao Zhu
  • Qidi Luo

Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks ($4\times$ more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios ($31$ diverse scenarios), and thorough evaluation metrics, with $10, 000$ human-verified question-answering pairs and a high proportion of difficult samples. Moreover, we construct a private test set with $1, 500$ manually annotated images. The consistent evaluation trends observed across both public and private test sets validate the OCRBench v2's reliability. After carefully benchmarking state-of-the-art LMMs, we find that most LMMs score below $50$ ($100$ in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning. The benchmark and evaluation scripts are available at https: //github. com/Yuliang-Liu/MultimodalOCR.

TAAS Journal 2025 Journal Article

Privacy-Preserving Group-by-Aggregation Queries for Data Federation under V2X environment

  • Zicheng Cao
  • Guanfeng Liu
  • Qingzhi Ma
  • Wei Chen
  • Lei Zhao
  • An Liu

Vehicle-to-everything (V2X) technology enables vehicles to communicate with each other, infrastructure, and the cloud, facilitating intelligent traffic management and vehicle interconnection. However, the data generated by vehicles raises concerns regarding personal privacy and corporate interests. With the rapid development of V2X technology, data security issues are becoming increasingly prominent. Data federation, as an emerging data-sharing model, utilizes secure multi-party computation techniques to enable collaboration among data owners without disclosing raw data, offering a new approach to addressing privacy and security concerns in the data exchange process of V2X. This paper proposes a group-by-aggregation query algorithm for data federation, aiming to protect personal privacy data while facilitating effective data sharing and analysis. The algorithm reverses the traditional group-by-aggregation queries process by not transmitting grouping results but rather passing encrypted aggregated attribute values to relevant data owners. By leveraging encryption algorithms with additive homomorphic or order-preserving properties to encrypt the aggregated attribute values, the algorithm ensures the correctness of mathematical operations performed under encryption, such as addition and comparison operations. Finally, the effectiveness and practicality of the algorithm are validated through experimental evaluations.

JBHI Journal 2025 Journal Article

Residual Self-Calibrated Network With Multi-Scale Channel Attention for Accurate EOG-Based Eye Movement Classification

  • Zheng Zeng
  • Linkai Tao
  • Ruizhi Su
  • Adili Tuheti
  • Hao Huang
  • Chen Chen
  • Wei Chen

Recently, Electrooculography-based Human-Computer Interaction (EOG-HCI) technology has gained widespread attention in industrial areas, including assistive robots, augmented reality in gaming, etc. However, as the fundamental step of EOG-HCI, accurate eye movement classification (EMC) still faces a significant challenge, where their constraints in extracting discriminative features limit the performance of most existing works. To address this issue, a Residual Self-Calibrated Network with Multi-Scale Channel Attention (RSCA), focusing on efficient feature extraction and enhancement is proposed. The RSCA network first employs three self-calibrated convolution blocks within a hierarchical residual framework to fully extract the discriminative multi-scale features. Then, a multi-scale channel attention module adaptively weights the learned features to screen out the discriminative representation by aggregating the multi-scale context information along the channel dimension, thus further boosting the performance. Comprehensive experiments were performed using 5 public datasets and 7 prevailing methods for comparative validation. The results confirm that the RSCA network outperforms all other methods significantly, establishing a state-of-the-art benchmark for EOG-based EMC. Furthermore, thorough ablation analyses confirm the effectiveness of the employed modules within the RSCA network, providing valuable insights for the design of EOG-based deep models.

AAAI Conference 2025 Conference Paper

Scalable Trajectory-User Linking with Dual-Stream Representation Networks

  • Hao Zhang
  • Wei Chen
  • Xingyu Zhao
  • Jianpeng Qi
  • Guiyuan Jiang
  • Yanwei Yu

Trajectory-user linking (TUL) aims to match anonymous trajectories to the most likely users who generated them, offering benefits for a wide range of real-world spatio-temporal applications. However, existing TUL methods are limited by high model complexity and poor learning of the effective representations of trajectories, rendering them ineffective in handling large-scale user trajectory data.In this work, we propose a novel Scalable Trajectory-User Linking with dual-stream representation networks for large-scale TUL problem, named ScaleTUL Specifically, ScaleTUL generates two views using temporal and spatial augmentations to exploit supervised contrastive learning framework to effectively capture the irregularities of trajectories. In each view, a dual-stream trajectory encoder consisting of a long-term encoder and a short-term encoder is designed to learn the unified representations of trajectories that fuses different temporal-spatial dependencies. Then, a TUL layer is used to associate the trajectories with the corresponding users in the representation space using a two-stage training model.Experimental results on check-in mobility datasets from three real-world cities and the nationwide U.S. demonstrate the superiority of ScaleTUL over state-of-the-art baselines for large-scale TUL tasks.

NeurIPS Conference 2025 Conference Paper

SGN: Shifted Window-Based Hierarchical Variable Grouping for Multivariate Time Series Classification

  • Zenan Ying
  • Zhi Zheng
  • huijun hou
  • Tong Xu
  • Qi Liu
  • Jinke Wang
  • Wei Chen

Multivariate time series (MTS) classification has attracted increasing attention across various domains. Existing methods either decompose MTS into separate univariate series, ignoring inter-variable dependencies, or jointly model all variables, which may lead to over-smoothing and loss of semantic structure. These limitations become particularly pronounced when dealing with complex and heterogeneous variable types. To address these challenges, we propose SwinGroupNet (SGN), which explores a novel perspective for constructing variable interaction and temporal dependency. Specifically, SGN processes multi-scale time series using (1) Variable Group Embedding (VGE), which partitions variables into groups and performs independent group-wise embedding; (2) Multi-Scale Group Window Mixing (MGWM), which reconstructs variable interactions by modeling both intra-group and inter-group dependencies while extracting multi-scale temporal features; and (3) Periodic Window Shifting and Merging (PWSM), which exploits inherent periodic patterns to enable hierarchical temporal interaction and feature aggregation. Extensive experiments on diverse benchmark datasets from multiple domains demonstrate that SGN consistently achieves state-of-the-art performance, with an average improvement of 4. 2% over existing methods. We release the source code at https: //anonymous. 4open. science/r/SGN.

AAAI Conference 2025 Conference Paper

Smoothness Really Matters: A Simple Yet Effective Approach for Unsupervised Graph Domain Adaptation

  • Wei Chen
  • Guo Ye
  • Yakun Wang
  • Zhao Zhang
  • Libang Zhang
  • Daixin Wang
  • Zhiqiang Zhang
  • Fuzhen Zhuang

Unsupervised Graph Domain Adaptation (UGDA) seeks to bridge distribution shifts between domains by transferring knowledge from labeled source graphs to given unlabeled target graphs. Existing UGDA methods primarily focus on aligning features in the latent space learned by graph neural networks (GNNs) across domains, often overlooking structural shifts, resulting in limited effectiveness when addressing structurally complex transfer scenarios. Given the sensitivity of GNNs to local structural features, even slight discrepancies between source and target graphs could lead to significant shifts in node embeddings, thereby reducing the effectiveness of knowledge transfer. To address this issue, we introduce a novel approach for UGDA called Target-Domain Structural Smoothing (TDSS). TDSS is a simple and effective method designed to perform structural smoothing directly on the target graph, thereby mitigating structural distribution shifts and ensuring the consistency of node representations. Specifically, by integrating smoothing techniques with neighbor- hood sampling, TDSS maintains the structural coherence of the target graph while mitigating the risk of over-smoothing. Our theoretical analysis shows that TDSS effectively reduces target risk by improving model smoothness. Empirical results on three real-world datasets demonstrate that TDSS outperforms recent state-of-the-art baselines, achieving significant improvements across six transfer scenarios.

AAAI Conference 2025 Conference Paper

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

  • Shuyuan Zhao
  • Wei Chen
  • Boyan Shi
  • Liyong Zhou
  • Shuohao Lin
  • Huaiyu Wan

The takeaway recommendation system aims to recommend users' future takeaway purchases based on their historical purchase behaviors, thereby improving user satisfaction and boosting merchant sales. Existing methods focus on incorporating auxiliary information or leveraging knowledge graphs to alleviate the sparsity issue of user purchase sequences. However, two main challenges limit the performance of these approaches: (1) capturing dynamic user preferences on complex geospatial information and (2) efficiently integrating spatial-temporal knowledge from both graphs and sequence data with low computational costs. In this paper, we propose a novel spatial-temporal knowledge distillation model for takeaway recommendation (STKDRec) based on the two-stage training process. Specifically, during the first pre-training stage, a spatial-temporal knowledge graph (STKG) encoder is trained to extract high-order spatial-temporal dependencies and collaborative associations from the STKG. During the second spatial-temporal knowledge distillation (STKD) stage, a spatial-temporal Transformer (ST-Transformer) is employed to comprehensively model dynamic user preferences on various types of fine-grained geospatial information from a sequential perspective. Furthermore, the STKD strategy is introduced to transfer graph-based spatial-temporal knowledge to the ST-Transformer, facilitating the adaptive fusion of rich knowledge derived from both the STKG and sequence data while reducing computational overhead. Extensive experiments on three real-world datasets show that STKDRec significantly outperforms the state-of-the-art baselines.

AAAI Conference 2025 Conference Paper

Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

  • Shengbin Yue
  • Siyuan Wang
  • Wei Chen
  • Xuanjing Huang
  • Zhongyu Wei

Recent advancements in Large Language Models (LLMs) have led to significant breakthroughs in various natural language processing tasks. However, generating factually consistent responses in knowledge-intensive scenarios remains a challenge due to issues such as hallucination, difficulty in acquiring long-tailed knowledge, and limited memory expansion. This paper introduces SMART, a novel multi-agent framework that leverages external knowledge to enhance the interpretability and factual consistency of LLM-generated responses. SMART comprises four specialized agents, each performing a specific sub-trajectory action to navigate complex knowledge-intensive tasks. We propose a multi-agent co-training paradigm, Long-Short Trajectory Learning, which ensures synergistic collaboration among agents while maintaining fine-grained execution by each agent. Extensive experiments on five knowledge-intensive tasks demonstrate SMART's superior performance compared to widely adopted knowledge internalization and knowledge enhancement methods. Our framework can extend beyond knowledge-intensive tasks to more complex scenarios.

IJCAI Conference 2025 Conference Paper

Unleashing the Potential of Transformer Flow for Photorealistic Face Restoration

  • Kepeng Xu
  • Li Xu
  • Gang He
  • Wei Chen
  • Xianyun Wu
  • WenXin Yu

Face restoration is a challenging task due to the need to remove artifacts and restore details. Traditional methods usually use generative model prior to achieve face restoration, but the restored results are still insufficient in terms of realism and details. In this paper, we introduce OmniFace, a novel face restoration framework that leverages Transformer-based diffusion flow. By exploiting the scaling property of Transformer, OmniFace achieves high-resolution restoration with exceptional realism and detail. The framework integrates three key components: (1) a Transformer-driven vector estimation network, (2) a representation aligned ControlNet, and (3) an adaptive training strategy for face restoration. The inherent scaling law of Transformer architectures enables the restoration of high-quality faces at high resolution. The controlnet combined with pre-trained diffusion representation can be easily trained. The adaptive training strategy provides a vector field that is more suitable for face restoration. Comprehensive experiments demonstrate that OmniFace outperforms existing techniques in terms of restoration quality across multiple benchmark datasets, especially in restoring photographic-level texture details in high-resolution scenes.

AAAI Conference 2025 Conference Paper

UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction

  • Xixuan Hao
  • Wei Chen
  • Yibo Yan
  • Siru Zhong
  • Kun Wang
  • Qingsong Wen
  • Yuxuan Liang

Urban socioeconomic indicator prediction aims to infer various metrics related to sustainable development in diverse urban landscapes using data-driven methods. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place. Secondly, the text generated by the precursor work UrbanCLIP, which fully utilizes the extensive knowledge of LLMs, frequently exhibits issues such as hallucination and homogenization, resulting in a lack of reliable quality. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro (street-view) levels, overcoming the limitations of prior pretrained models. Moreover, it introduces automatic text generation and calibration, providing a robust guarantee for producing high-quality text descriptions of urban imagery. Rigorous experiments conducted across six socioeconomic indicator prediction tasks underscore its superior performance.

IJCAI Conference 2024 Conference Paper

Active Deep Multi-view Clustering

  • Helin Zhao
  • Wei Chen
  • Peng Zhou

Deep multi-view clustering has been widely studied. However, since it is an unsupervised task, where no labels are used to guide the training, it is still unreliable especially when handling complicated data. Although deep semi-supervised multi-view clustering can alleviate this problem by using some supervised information, the supervised information is often pregiven or randomly selected. Unfortunately, as we know, the clustering performance highly depends on the quality of the supervised information and most of the semi-supervised methods ignore the supervised information selection. To tackle this problem, in this paper, we propose a novel active deep multi-view clustering method, which can actively select important data for querying human annotations. In this method, we carefully design a fusion module, an active selection module, a supervised module, and an unsupervised module, and integrate them into a unified framework seamlessly. In this framework, we can obtain a more reliable clustering result with as few annotations as possible. The extensive experiments on benchmark data sets show that our method can outperform state-of-the-art unsupervised and semi-supervised methods, demonstrating the effectiveness and superiority of the proposed method. The code is available at https: //github. com/wodedazhuozi/ADMC.

NeurIPS Conference 2024 Conference Paper

ALPINE: Unveiling The Planning Capability of Autoregressive Learning in Language Models

  • Siwei Wang
  • Yifei Shen
  • Shi Feng
  • Haoran Sun
  • Shang-Hua Teng
  • Wei Chen

Planning is a crucial element of both human intelligence and contemporary large language models (LLMs). In this paper, we initiate a theoretical investigation into the emergence of planning capabilities in Transformer-based LLMs via their next-word prediction mechanisms. We model planning as a network path-finding task, where the objective is to generate a valid path from a specified source node to a designated target node. Our mathematical characterization shows that Transformer architectures can execute path-finding by embedding the adjacency and reachability matrices within their weights. Furthermore, our theoretical analysis of gradient-based learning dynamics reveals that LLMs can learn both the adjacency and a limited form of the reachability matrices. These theoretical insights are then validated through experiments, which demonstrate that Transformer architectures indeed learn the adjacency and an incomplete reachability matrices, consistent with our theoretical predictions. When applying our methodology to the real-world planning benchmark Blocksworld, our observations remain consistent. Additionally, our analyses uncover a fundamental limitation of current Transformer architectures in path-finding: these architectures cannot identify reachability relationships through transitivity, which leads to failures in generating paths when concatenation is required. These findings provide new insights into how the internal mechanisms of autoregressive learning facilitate intelligent planning and deepen our understanding of how future LLMs might achieve more advanced and general planning-and-reasoning capabilities across diverse applications.

NeurIPS Conference 2024 Conference Paper

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

  • Jiaxi Hu
  • Yuehong Hu
  • Wei Chen
  • Ming Jin
  • Shirui Pan
  • Qingsong Wen
  • Yuxuan Liang

In long-term time series forecasting (LTSF) tasks, an increasing number of works have acknowledged that discrete time series originate from continuous dynamic systems and have attempted to model their underlying dynamics. Recognizing the chaotic nature of real-world data, our model, Attraos, incorporates chaos theory into LTSF, perceiving real-world time series as low-dimensional observations from unknown high-dimensional chaotic dynamical systems. Under the concept of attractor invariance, Attraos utilizes non-parametric Phase Space Reconstruction embedding along with a novel multi-resolution dynamic memory unit to memorize historical dynamical structures, and evolves by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.

IJCAI Conference 2024 Conference Paper

Automatic De-Biased Temporal-Relational Modeling for Stock Investment Recommendation

  • Weijun Chen
  • Shun Li
  • Xipu Yu
  • Heyuan Wang
  • Wei Chen
  • Tengjiao Wang

Stock investment recommendation is crucial for guiding investment decisions and managing portfolios. Recent studies have demonstrated the potential of temporal-relational models (TRM) to yield excess investment returns. However, in the complicated finance ecosystem, the current TRM suffer from both the intrinsic temporal bias from the low signal-to-noise ratio (SNR) and the relational bias caused by utilizing inappropriate relational topologies and propagation mechanisms. Moreover, the distribution shifts behind macro-market scenarios invalidate the underlying i. i. d. assumption and limit the generalization ability of TRM. In this paper, we pioneer the impact of the above issues on the effective learning of temporal-relational patterns and propose an Automatic De-Biased Temporal-Relational Model (ADB-TRM) for stock recommendation. Specifically, ADB-TRM consists of three main components, i. e. , (i) a meta-learned architecture forms a dual-stage training process, with the inner part ameliorating temporal-relational bias and the outer meta-learner counteracting distribution shifts, (ii) automatic adversarial sample generation guides the model adaptively to alleviate bias and enhance its profiling ability through adversarial training, and (iii) global-local interaction helps seek relative invariant stock embeddings from local and global distribution perspectives to mitigate distribution shifts. Experiments on three datasets from distinct stock markets show that ADB-TRM excels state-of-the-arts over 28. 41% and 9. 53% in terms of cumulative and risk-adjusted returns.

NeurIPS Conference 2024 Conference Paper

Can Graph Learning Improve Planning in LLM-based Agents?

  • Xixi Wu
  • Yifei Shen
  • Caihua Shan
  • Kaitao Song
  • Siwei Wang
  • Bohang Zhang
  • Jiarui Feng
  • Hong Cheng

Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests in natural language into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addressed by graph neural networks (GNNs). This theoretical insight led us to integrate GNNs with LLMs to enhance overall performance. Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. The performance gain increases with a larger task graph size.

ICLR Conference 2024 Conference Paper

Cascading Reinforcement Learning

  • Yihan Du
  • R. Srikant 0001
  • Wei Chen

Cascading bandits have gained popularity in recent years due to their applicability to recommendation systems and online advertising. In the cascading bandit model, at each timestep, an agent recommends an ordered subset of items (called an item list) from a pool of items, each associated with an unknown attraction probability. Then, the user examines the list, and clicks the first attractive item (if any), and after that, the agent receives a reward. The goal of the agent is to maximize the expected cumulative reward. However, the prior literature on cascading bandits ignores the influences of user states (e.g., historical behaviors) on recommendations and the change of states as the session proceeds. Motivated by this fact, we propose a generalized cascading RL framework, which considers the impact of user states and state transition into decisions. In cascading RL, we need to select items not only with large attraction probabilities but also leading to good successor states. This imposes a huge computational challenge due to the combinatorial action space. To tackle this challenge, we delve into the properties of value functions, and design an oracle BestPerm to efficiently find the optimal item list. Equipped with BestPerm, we develop two algorithms CascadingVI and CascadingBPI, which are both computationally-efficient and sample-efficient, and provide near-optimal regret and sample complexity guarantees. Furthermore, we present experiments to show the improved computational and sample efficiencies of our algorithms compared to straightforward adaptations of existing RL algorithms in practice.

JMLR Journal 2024 Journal Article

Causal-learn: Causal Discovery in Python

  • Yujia Zheng
  • Biwei Huang
  • Wei Chen
  • Joseph Ramsey
  • Mingming Gong
  • Ruichu Cai
  • Shohei Shimizu
  • Peter Spirtes

Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe causal-learn, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, causal-learn is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

CausalDiff: Causality-Inspired Disentanglement via Diffusion Model for Adversarial Defense

  • Mingkun Zhang
  • Keping Bi
  • Wei Chen
  • Quanrun Chen
  • Jiafeng Guo
  • Xueqi Cheng

Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86. 39\% (+4. 01\%) on CIFAR-10, 56. 25\% (+3. 13\%) on CIFAR-100, and 82. 62\% (+4. 93\%) on GTSRB (German Traffic Sign Recognition Benchmark). The code is available athttps: //github. com/CAS-AISafetyBasicResearchGroup/CausalDiff.

ICLR Conference 2024 Conference Paper

Combinatorial Bandits for Maximum Value Reward Function under Value-Index Feedback

  • Yiliu Wang
  • Wei Chen
  • Milan Vojnovic

We investigate the combinatorial multi-armed bandit problem where an action is to select $k$ arms from a set of base arms, and its reward is the maximum of the sample values of these $k$ arms, under a weak feedback structure that only returns the value and index of the arm with the maximum value. This novel feedback structure is much weaker than the semi-bandit feedback previously studied and is only slightly stronger than the full-bandit feedback, and thus it presents a new challenge for the online learning task. We propose an algorithm and derive a regret bound for instances where arm outcomes follow distributions with finite supports. Our algorithm introduces a novel concept of biased arm replacement to address the weak feedback challenge, and it achieves a distribution-dependent regret bound of $O((k/\Delta)\log(T))$ and a distribution-independent regret bound of $\tilde{O}(\sqrt{T})$, where $\Delta$ is the reward gap and $T$ is the time horizon. Notably, our regret bound is comparable to the bounds obtained under the more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results.

AAAI Conference 2024 Conference Paper

Dual-Prior Augmented Decoding Network for Long Tail Distribution in HOI Detection

  • Jiayi Gao
  • Kongming Liang
  • Tao Wei
  • Wei Chen
  • Zhanyu Ma
  • Jun Guo

Human object interaction detection aims at localizing human-object pairs and recognizing their interactions. Trapped by the long-tailed distribution of the data, existing HOI detection methods often have difficulty recognizing the tail categories. Many approaches try to improve the recognition of HOI tasks by utilizing external knowledge (e.g. pre-trained visual-language models). However, these approaches mainly utilize external knowledge at the HOI combination level and achieve limited improvement in the tail categories. In this paper, we propose a dual-prior augmented decoding network by decomposing the HOI task into two sub-tasks: human-object pair detection and interaction recognition. For each subtask, we leverage external knowledge to enhance the model's ability at a finer granularity. Specifically, we acquire the prior candidates from an external classifier and embed them to assist the subsequent decoding process. Thus, the long-tail problem is mitigated from a coarse-to-fine level with the corresponding external knowledge. Our approach outperforms existing state-of-the-art models in various settings and significantly boosts the performance on the tail HOI categories. The source code is available at https://github.com/PRIS-CV/DP-ADN.

NeurIPS Conference 2024 Conference Paper

Generative Retrieval Meets Multi-Graded Relevance

  • Yubao Tang
  • Ruqing Zhang
  • Jiafeng Guo
  • Maarten de Rijke
  • Wei Chen
  • Xueqi Cheng

Generative retrieval represents a novel approach to information retrieval, utilizing an encoder-decoder architecture to directly produce relevant document identifiers (docids) for queries. While this method offers benefits, current implementations are limited to scenarios with binary relevance data, overlooking the potential for documents to have multi-graded relevance. Extending generative retrieval to accommodate multi-graded relevance poses challenges, including the need to reconcile likelihood probabilities for docid pairs and the possibility of multiple relevant documents sharing the same identifier. To address these challenges, we introduce a new framework called GRaded Generative Retrieval (GR$^2$). Our approach focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Firstly, we aim to create identifiers that are both semantically relevant and sufficiently distinct to represent individual documents effectively. This is achieved by jointly optimizing the relevance and distinctness of docids through a combination of docid generation and autoencoder models. Secondly, we incorporate information about the relationship between relevance grades to guide the training process. Specifically, we leverage a constrained contrastive training strategy to bring the representations of queries and the identifiers of their relevant documents closer together, based on their respective relevance grades. Extensive experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of our method.

NeurIPS Conference 2024 Conference Paper

Graph Diffusion Policy Optimization

  • Yijing Liu
  • Chao Du
  • Tianyu Pang
  • Chongxuan Li
  • Min Lin
  • Wei Chen

Recent research has made significant progress in optimizing diffusion models for downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to optimize graph diffusion models for arbitrary (e. g. , non-differentiable) objectives using reinforcement learning. GDPO is based on an eager policy gradient tailored for graph diffusion models, developed through meticulous analysis and promising improved performance. Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https: //github. com/sail-sg/GDPO.

AAAI Conference 2024 Conference Paper

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

  • Wei Chen
  • Zhiyi Huang
  • Ruichu Cai
  • Zhifeng Hao
  • Kun Zhang

Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization of higher-order cumulants. By leveraging the higher-order cumulants of non-Gaussian data, we provide an analytical solution for estimating the causal coefficients or their ratios. With the estimated (ratios of) causal coefficients, we propose a novel approach to identify the existence of a causal edge between two observed variables subject to latent variable influence. In case when such a causal edge exits, we introduce an asymmetry criterion to determine the causal direction. The experimental results demonstrate the effectiveness of our proposed method.

IJCAI Conference 2024 Conference Paper

Individual Causal Structure Learning from Population Data

  • Wei Chen
  • Xiaokai Huang
  • Zijian Li
  • Ruichu Cai
  • Zhiyi Huang
  • Zhifeng Hao

Learning the causal structure of each individual plays a crucial role in neuroscience, biology, and so on. Existing methods consider data from each individual separately, which may yield inaccurate causal structure estimations in limited samples. To leverage more samples, we consider incorporating data from all individuals as population data. We observe that the variables of all individuals are influenced by the common environment variables they share. These shared environment variables can be modeled as latent variables and serve as a bridge connecting data from different individuals. In particular, we propose an Individual Linear Acyclic Model (ILAM) for each individual from population data, which models the individual's variables as being linearly influenced by their parents, in addition to environment variables and noise terms. Theoretical analysis shows that the model is identifiable when all environment variables are non-Gaussian, or even if some are Gaussian with an adequate diversity in the variance of noises for each individual. We then develop an individual causal structures learning method based on the Share Independence Component Analysis technique. Experimental results on synthetic and real-world data demonstrate the correctness of the method even when the sample size of each individual's data is small.

NeurIPS Conference 2024 Conference Paper

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

  • Wei Wu
  • Kecheng Zheng
  • Shuailei Ma
  • Fan Lu
  • Yuxin Guo
  • Yifei Zhang
  • Wei Chen
  • Qingpei Guo

In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually paired with short captions, leaving certain tokens easily overshadowed by salient tokens. Towards this problem, our initial attempt is to relabel the data with long captions, however, directly learning with which may lead to performance degradation in understanding short text (e. g. , in the image classification task). Then, after incorporating corner tokens to aggregate diverse textual information, we manage to help the model catch up to its original level of short text understanding yet greatly enhance its capability of long text understanding. We further look into whether the model can continuously benefit from longer captions and notice a clear trade-off between the performance and the efficiency. Finally, we validate the effectiveness of our approach using a self-constructed large-scale dataset, which consists of 100M long caption oriented text-image pairs. Our method achieves superior performance in long-text-image retrieval tasks. The project page is available at https: //wuw2019. github. io/lot-lip.

AAAI Conference 2024 Conference Paper

Modeling Adaptive Inter-Task Feature Interactions via Sentiment-Aware Contrastive Learning for Joint Aspect-Sentiment Prediction

  • Wei Chen
  • Yuxuan Liu
  • Zhao Zhang
  • Fuzhen Zhuang
  • Jiang Zhong

Aspect prediction (AP) and sentiment prediction (SP) are representative applications in fine-grained sentiment anal- ysis. They can be considered as sequential tasks, where AP identifies mentioned aspects in a sentence, and SP infers fine-grained sentiments for these aspects. Recent models perform the aspect-sentiment prediction in a joint man-ner, but heavily rely on the feature interactions of aspect and sentiment. One drawback is that they ignore correlation strength varies between aspect features and sentiment fea- tures across different sentences, and employ a fixed feature interaction strategy may limit effective knowledge transfer across tasks. To tackle this issue, in this paper, we propose an Adaptive Inter-task Feature Interaction framework, AIFI, for joint aspect-sentiment prediction. Specifically, we introduce a novel contrast-based alignment method based on contrastive learning. Our approach considers the AP-specific and SP-specific representations of a given sentence as a positive pair, while representation of another random sentence serves as a negative example. Moreover, we propose an inter-task feature correlation network to predict the contrast strength, which is determined by the temperature coefficient in the InfoNCE loss. This dynamic correlation adjustment enhances model’s ability to capture proper feature interactions more efficiently. Experimental results on three datasets validate the effectiveness of our approach.

AAAI Conference 2024 Conference Paper

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

  • Yu-An Liu
  • Ruqing Zhang
  • Mingkun Zhang
  • Wei Chen
  • Maarten de Rijke
  • Jiafeng Guo
  • Xueqi Cheng

Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel perturbation-invariant adversarial training (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.

JBHI Journal 2024 Journal Article

PSEENet: A Pseudo-Siamese Neural Network Incorporating Electroencephalography and Electrooculography Characteristics for Heterogeneous Sleep Staging

  • Wei Zhou
  • Ning Shen
  • Ligang Zhou
  • Minghui Liu
  • Yiyuan Zhang
  • Cong Fu
  • Huan Yu
  • Feng Shu

Sleep staging plays a critical role in evaluating the quality of sleep. Currently, most studies are either suffering from dramatic performance drops when coping with varying input modalities or unable to handle heterogeneous signals. To handle heterogeneous signals and guarantee favorable sleep staging performance when a single modality is available, a pseudo-siamese neural network (PSN) to incorporate electroencephalography (EEG), electrooculography (EOG) characteristics is proposed (PSEENet). PSEENet consists of two parts, spatial mapping modules (SMMs) and a weight-shared classifier. SMMs are used to extract high-dimensional features. Meanwhile, joint linkages among multi-modalities are provided by quantifying the similarity of features. Finally, with the cooperation of heterogeneous characteristics, associations within various sleep stages can be established by the classifier. The evaluation of the model is validated on two public datasets, namely, Montreal Archive of Sleep Studies (MASS) and SleepEDFX, and one clinical dataset from Huashan Hospital of Fudan University (HSFU). Experimental results show that the model can handle heterogeneous signals, provide superior results under multimodal signals and show good performance with single modality. PSEENet obtains accuracy of 79. 1%, 82. 1% with EEG, EEG and EOG on Sleep-EDFX, and significantly improves the accuracy with EOG from 73. 7% to 76% by introducing similarity information.

NeurIPS Conference 2024 Conference Paper

Query-Efficient Correlation Clustering with Noisy Oracle

  • Yuko Kuroki
  • Atsushi Miyauchi
  • Francesco Bonchi
  • Wei Chen

We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.

TIST Journal 2024 Journal Article

T-Distributed Stochastic Neighbor Embedding for Co-Representation Learning

  • Wei Chen
  • Hongjun Wang
  • Yinghui Zhang
  • Ping Deng
  • Zhipeng Luo
  • Tianrui Li

Co-clustering is the simultaneous clustering of the samples and attributes of a data matrix that provides deeper insight into data than traditional clustering. However, there is a lack of representation learning algorithms that serve this mechanism of co-clustering, and the current representation learning algorithms are limited to the sample perspective and lack the use of information in the attribute perspective. To solve this problem, in this article, ctSNE, a co-representation learning model based on t-distributed stochastic neighbor embedding, is proposed for unsupervised co-clustering, where ctSNE makes the dataset representation outputted more discriminative of row and column clusters (i.e. co-discrimination). On the basis of t-distributed stochastic neighbor embedding retaining the sample data distribution and local data structure, the philosophy of collaboration is introduced (i.e., row and column hidden relationship information) so that the ctSNE model is equipped with co-representation learning capability, which can effectively improve the performance of co-clustering. To prove the effectiveness of the ctSNE model, several classic co-clustering algorithms are used to check the co-representation performance of ctSNE, and a novel internal index based on an internal clustering index, known as total inertia, is proposed to demonstrate the effect of co-clustering. The numerous experimental results show that ctSNE has tremendous co-representation capability and can significantly improve the performance of co-clustering algorithms.

NeurIPS Conference 2024 Conference Paper

Terra: A Multimodal Spatio-Temporal Dataset Spanning the Earth

  • Wei Chen
  • Xixuan Hao
  • Yuankai Wu
  • Yuxuan Liang

Since the inception of our planet, the meteorological environment, as reflected through spatio-temporal data, has always been a fundamental factor influencing human life, socio-economic progress, and ecological conservation. A comprehensive exploration of this data is thus imperative to gain a deeper understanding and more accurate forecasting of these environmental shifts. Despite the success of deep learning techniques within the realm of spatio-temporal data and earth science, existing public datasets are beset with limitations in terms of spatial scale, temporal coverage, and reliance on limited time series data. These constraints hinder their optimal utilization in practical applications. To address these issues, we introduce Terra, a multimodal spatio-temporal dataset spanning the earth. This dataset encompasses hourly time series data from 6, 480, 000 grid areas worldwide over the past 45 years, while also incorporating multimodal spatial supplementary information including geo-images and explanatory text. Through a detailed data analysis and evaluation of existing deep learning models within earth sciences, utilizing our constructed dataset. we aim to provide valuable opportunities for enhancing future research in spatio-temporal data mining, thereby advancing towards more spatio-temporal general intelligence. Our source code and data can be accessed at https: //github. com/CityMind-Lab/NeurIPS24-Terra.

AAAI Conference 2024 Conference Paper

TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences

  • Yuequn Liu
  • Ruichu Cai
  • Wei Chen
  • Jie Qiao
  • Yuguang Yan
  • Zijian Li
  • Keli Zhang
  • Zhifeng Hao

Learning Granger causality from event sequences is a challenging but essential task across various applications. Most existing methods rely on the assumption that event sequences are independent and identically distributed (i.i.d.). However, this i.i.d. assumption is often violated due to the inherent dependencies among the event sequences. Fortunately, in practice, we find these dependencies can be modeled by a topological network, suggesting a potential solution to the non-i.i.d. problem by introducing the prior topological network into Granger causal discovery. This observation prompts us to tackle two ensuing challenges: 1) how to model the event sequences while incorporating both the prior topological network and the latent Granger causal structure, and 2) how to learn the Granger causal structure. To this end, we devise a unified topological neural Poisson auto-regressive model with two processes. In the generation process, we employ a variant of the neural Poisson process to model the event sequences, considering influences from both the topological network and the Granger causal structure. In the inference process, we formulate an amortized inference algorithm to infer the latent Granger causal structure. We encapsulate these two processes within a unified likelihood function, providing an end-to-end framework for this task. Experiments on simulated and real-world data demonstrate the effectiveness of our approach.

JBHI Journal 2024 Journal Article

Towards Real-Time Sleep Stage Prediction and Online Calibration Based on Architecturally Switchable Deep Learning Models

  • Hangyu Zhu
  • Yonglin Wu
  • Yao Guo
  • Cong Fu
  • Feng Shu
  • Huan Yu
  • Wei Chen
  • Chen Chen

Despite the recent advances in automatic sleep staging, few studies have focused on real-time sleep staging to promote the regulation of sleep or the intervention of sleep disorders. In this paper, a novel network named SwSleepNet, that can handle both precisely offline sleep staging, and online sleep stages prediction and calibration is proposed. For offline analysis, the proposed network coordinates sequence broadening module (SBM), sequential CNN (SCNN), squeeze and excitation (SE) block, and sequence consolidation module (SCM) to balance the operational efficiency of the network and the comprehensive feature extraction. For online analysis, only SCNN and SE are involved in predicting the sleep stage within a short-time segment of the recordings. Once more than two successive segments have disparate predictions, the calibration mechanism will be triggered, and contextual information will be involved. In addition, to investigate the appropriate time of the segment that is suitable to predict a sleep stage, segments with five-second, three-second, and two-second data are analyzed. The performance of SwSleepNet is validated on two publicly available datasets Sleep-EDF Expanded and Montreal Archive of Sleep Studies (MASS), and one clinical dataset Huashan Hospital Fudan University (HSFU), with the offline accuracy of 84. 5%, 86. 7%, and 81. 8%, respectively, which outperforms the state-of-the-art methods. Additionally, for the online sleep staging, the dedicated calibration mechanism allows SwSleepNet to achieve high accuracy over 80% on three datasets with the short-time segments, demonstrating the robustness and stability of SwSleepNet. This study presents a real-time sleep staging architecture, which is expected to pave the way for accurate sleep regulation and intervention.

IJCAI Conference 2024 Conference Paper

Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning

  • Kang Luo
  • Yuanshao Zhu
  • Wei Chen
  • Kun Wang
  • Zhengyang Zhou
  • Sijie Ruan
  • Yuxuan Liang

Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to decipher the trajectory representation learning process from a causal perspective. Building upon the SCM, we further present a Trajectory modeling framework (TrajCL) based on Causal Learning, which leverages the backdoor adjustment theory as an intervention tool to eliminate the spurious correlations between geospatial context and trajectories. Extensive experiments on two real-world datasets verify that TrajCL markedly enhances performance in trajectory classification tasks while showcasing superior generalization and interpretability.

JBHI Journal 2024 Journal Article

Unsupervised Transfer Learning Approach With Adaptive Reweighting and Resampling Strategy for Inter-Subject EOG-Based Gaze Angle Estimation

  • Zheng Zeng
  • Linkai Tao
  • Ruizhi Su
  • Yunfeng Zhu
  • Long Meng
  • Adili Tuheti
  • Hao Huang
  • Feng Shu

Gaze estimation based on electrooculograms (EOGs) has been widely explored. However, the inter-subject variability of EOGs still leaves a significant challenge for practical applications. It contributes to performance degradation when handling inter-subject issues. In this paper, an unsupervised transfer learning approach with an adaptive reweighting and resampling (ARR) strategy to fully consider individual variability is proposed for EOG-based gaze angle estimation. It allows quantifying domain shifts by leveraging the source-target similarities, reweighting and resampling the source data to retain relevant instances and disregard irrelevant instances during adaptation. Specifically, our proposed methodology first assesses the domain shifts via decomposing transformation matrices, which are estimated between the training subjects (denoted as multi-source domains) and the test subject (denoted as target domain). Then, the multi-domain shifts are assigned as weighted indicators to resample the multi-source domains for model training. Comparative experiments with several prevailing transfer learning methods including CORrelation ALignment (CORAL), Geodesic Flow Kernel (GFK), Joint Distribution Adaptation (JDA), Transfer component analysis (TCA), and Balanced distribution adaption (BDA) using two different normalization processes were conducted on a realistic scenario across 18 subjects. Experimental results demonstrate that the ARR strategy can significantly improve performance (mean absolute error (MAE) reduction: 7. 0%, root mean square error (RMSE) reduction: 6. 3%), outperforming the prevailing methods. Besides, the impacts of data diversity and data size on ARR strategy are further investigated. It exhibits that data size is more important than data diversity for EOG-based gaze angle estimation, and also presents the benefits of the ARR strategy for dealing with practical scenarios.

NeurIPS Conference 2023 Conference Paper

Closing the gap between the upper bound and lower bound of Adam's iteration complexity

  • Bohan Wang
  • Jingwen Fu
  • Huishuai Zhang
  • Nanning Zheng
  • Wei Chen

Recently, Arjevani et al. [1] establish a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.

AAAI Conference 2023 Conference Paper

Combinatorial Causal Bandits

  • Shi Feng
  • Wei Chen

In combinatorial causal bandits (CCB), the learning agent chooses at most K variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable Y. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e., no hidden variables) based on the maximum likelihood estimation method and give regret analysis for it. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space and the general causal graph structures including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) avoiding unrealistic assumptions (such as knowing the joint distribution of the parents of Y under all interventions) and regret factors exponential to causal graph size in prior studies.

AAAI Conference 2023 Conference Paper

Fourier-Net: Fast Image Registration with Band-Limited Deformation

  • Xi Jia
  • Joseph Bartlett
  • Wei Chen
  • Siyang Song
  • Tianyang Zhang
  • Xinxing Cheng
  • Wenqi Lu
  • Zhaowen Qiu

Unsupervised image registration commonly adopts U-Net style networks to predict dense displacement fields in the full-resolution spatial domain. For high-resolution volumetric image data, this process is however resource-intensive and time-consuming. To tackle this problem, we propose the Fourier-Net, replacing the expansive path in a U-Net style network with a parameter-free model-driven decoder. Specifically, instead of our Fourier-Net learning to output a full-resolution displacement field in the spatial domain, we learn its low-dimensional representation in a band-limited Fourier domain. This representation is then decoded by our devised model-driven decoder (consisting of a zero padding layer and an inverse discrete Fourier transform layer) to the dense, full-resolution displacement field in the spatial domain. These changes allow our unsupervised Fourier-Net to contain fewer parameters and computational operations, resulting in faster inference speeds. Fourier-Net is then evaluated on two public 3D brain datasets against various state-of-the-art approaches. For example, when compared to a recent transformer-based method, named TransMorph, our Fourier-Net, which only uses 2.2% of its parameters and 6.66% of the multiply-add operations, achieves a 0.5% higher Dice score and an 11.48 times faster inference speed. Code is available at https://github.com/xi-jia/Fourier-Net.

NeurIPS Conference 2023 Conference Paper

Inner Product-based Neural Network Similarity

  • Wei Chen
  • Zichen Miao
  • Qiang Qiu

Analyzing representational similarity among neural networks (NNs) is essential for interpreting or transferring deep models. In application scenarios where numerous NN models are learned, it becomes crucial to assess model similarities in computationally efficient ways. In this paper, we propose a new paradigm for reducing NN representational similarity to filter subspace distance. Specifically, when convolutional filters are decomposed as a linear combination of a set of filter subspace elements, denoted as filter atoms, and have those decomposed atom coefficients shared across networks, NN representational similarity can be significantly simplified as calculating the cosine distance among respective filter atoms, to achieve millions of times computation reduction over popular probing-based methods. We provide both theoretical and empirical evidence that such simplified filter subspace-based similarity preserves a strong linear correlation with other popular probing-based metrics, while being significantly more efficient to obtain and robust to probing data. We further validate the effectiveness of the proposed method in various application scenarios where numerous models exist, such as federated and continual learning as well as analyzing training dynamics. We hope our findings can help further explorations of real-time large-scale representational similarity analysis in neural networks.

IJCAI Conference 2023 Conference Paper

Learning Few-shot Sample-set Operations for Noisy Multi-label Aspect Category Detection

  • Shiman Zhao
  • Wei Chen
  • Tengjiao Wang

Multi-label Aspect Category Detection (MACD) is essential for aspect-based sentiment analysis, which aims to identify multiple aspect categories in a given sentence. Few-shot MACD is critical due to the scarcity of labeled data. However, MACD is a high-noise task, and existing methods fail to address it with only two or three training samples per class, which limits the application in practice. To solve above issues, we propose a group of Few-shot Sample-set Operations (FSO) to solve noisy MACD in fewer sample scenarios by identifying the semantic contents of samples. Learning interactions among intersection, subtraction, and union networks, the FSO imitates arithmetic operations on samples to distinguish relevant and irrelevant aspect contents. Eliminating the negative effect caused by noises, the FSO extracts discriminative prototypes and customizes a dedicated query vector for each class. Besides, we design a multi-label architecture, which integrates with score-wise loss and multi-label loss to optimize the FSO for multi-label prediction, avoiding complex threshold training or selection. Experiments show that our method achieves considerable performance. Significantly, it improves by 11. 01% at most and an average of 8. 59% Macro-F in fewer sample scenarios.

IROS Conference 2023 Conference Paper

Learning to Grasp Clothing Structural Regions for Garment Manipulation Tasks

  • Wei Chen
  • Dongmyoung Lee
  • Digby Chappell
  • Nicolás Rojas 0002

When performing cloth-related tasks, such as garment hanging, it is often important to identify and grasp certain structural regions—a shirt's collar as opposed to its sleeve, for instance. However, due to cloth deformability, these manipulation activities, which are essential in domestic, health care, and industrial contexts, remain challenging for robots. In this paper, we focus on how to segment and grasp structural regions of clothes to enable manipulation tasks, using hanging tasks as case study. To this end, a neural network-based perception system is proposed to segment a shirt's collar from areas that represent the rest of the scene in a depth image. With a 10-minute video of a human manipulating shirts to train it, our perception system is capable of generalizing to other shirts regardless of texture as well as to other types of collared garments. A novel grasping strategy is then proposed based on the segmentation to determine grasping pose. Experiments demonstrate that our proposed grasping strategy achieves 92%, 80%, and 50% grasping success rates with one folded garment, one crumpled garment and three crumpled garments, respectively. Our grasping strategy performs considerably better than tested baselines that do not take into account the structural nature of the garments. With the proposed region segmentation and grasping strategy, challenging garment hanging tasks are successfully implemented using an open-loop control policy. Supplementary material is available at https://sites.google.com/view/garment-hanging

JBHI Journal 2023 Journal Article

MaskSleepNet: A Cross-Modality Adaptation Neural Network for Heterogeneous Signals Processing in Sleep Staging

  • Hangyu Zhu
  • Wei Zhou
  • Cong Fu
  • Yonglin Wu
  • Ning Shen
  • Feng Shu
  • Huan Yu
  • Wei Chen

Deep learning methods have become an important tool for automatic sleep staging in recent years. However, most of the existing deep learning-based approaches are sharply constrained by the input modalities, where any insertion, substitution, and deletion of input modalities would directly lead to the unusable of the model or a deterioration in the performance. To solve the modality heterogeneity problems, a novel network architecture named MaskSleepNet is proposed. It consists of a masking module, a multi-scale convolutional neural network (MSCNN), a squeezing and excitation (SE) block, and a multi-headed attention (MHA) module. The masking module consists of a modality adaptation paradigm that can cooperate with modality discrepancy. The MSCNN extracts features from multiple scales and specially designs the size of the feature concatenation layer to prevent invalid or redundant features from zero-setting channels. The SE block further optimizes the weights of the features to optimize the network learning efficiency. The MHA module outputs the prediction results by learning the temporal information between the sleeping features. The performance of the proposed model was validated on two publicly available datasets, Sleep-EDF Expanded (Sleep-EDFX) and Montreal Archive of Sleep Studies (MASS), and a clinical dataset, Huashan Hospital Fudan University (HSFU). The proposed MaskSleepNet can achieve favorable performance with input modality discrepancy, e. g. for single-channel EEG signal, it can reach 83. 8%, 83. 4%, 80. 5%, for two-channel EEG+EOG signals it can reach 85. 0%, 84. 9%, 81. 9% and for three-channel EEG+EOG+EMG signals, it can reach 85. 7%, 87. 5%, 81. 1% on Sleep-EDFX, MASS, and HSFU, respectively. In contrast the accuracy of the state-of-the-art approach which fluctuated widely between 69. 0% and 89. 4%. The experimental results exhibit that the proposed model can maintain superior performance and robustness in handling input modality discrepancy issues.

NeurIPS Conference 2023 Conference Paper

Multi-Fidelity Multi-Armed Bandits Revisited

  • Xuchuang Wang
  • Qingyun Wu
  • Wei Chen
  • John C. S. Lui

We study the multi-fidelity multi-armed bandit ($\texttt{MF-MAB}$), an extension of the canonical multi-armed bandit (MAB) problem. $\texttt{MF-MAB}$ allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence ($\texttt{BAI}$) and the regret minimization objectives. For $\texttt{BAI}$, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of $\texttt{MF-MAB}$, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of $\texttt{MF-MAB}$, we propose a new regret definition, prove its problem-independent regret lower bound $\Omega(K^{1/3}\Lambda^{2/3})$ and problem-dependent lower bound $\Omega(K\log \Lambda)$, where $K$ is the number of arms and $\Lambda$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $\Lambda$.

NeurIPS Conference 2023 Conference Paper

Scalable Fair Influence Maximization

  • Xiaobin Rui
  • Zhixiao Wang
  • Jiayu Zhao
  • Lichao Sun
  • Wei Chen

Given a graph $G$, a community structure $\mathcal{C}$, and a budget $k$, the fair influence maximization problem aims to select a seed set $S$ ($|S|\leq k$) that maximizes the influence spread while narrowing the influence gap between different communities. While various fairness notions exist, the welfare fairness notion, which balances fairness level and influence spread, has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the welfare fairness objective function restricts its application to small-scale networks with only a few hundred nodes. In this paper, we adopt the objective function of welfare fairness to maximize the exponentially weighted summation over the influenced fraction of all communities. We first introduce an unbiased estimator for the fractional power of the arithmetic mean. Then, by adapting the reverse influence sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Further, we present an efficient algorithm that guarantees $1-1/e - \varepsilon$ approximation.

JAIR Journal 2022 Journal Article

Adaptive Greedy versus Non-adaptive Greedy for Influence Maximization

  • Wei Chen
  • Binghui Peng
  • Grant Schoenebeck
  • Biaoshuai Tao

We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1 − 1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1−1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1 − 1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular diffusion model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm.

IJCAI Conference 2022 Conference Paper

Adaptive Long-Short Pattern Transformer for Stock Investment Selection

  • Heyuan Wang
  • Tengjiao Wang
  • Shun Li
  • Jiayi Zheng
  • Shijie Guan
  • Wei Chen

Stock investment selection is a hard issue in the Fintech field due to non-stationary dynamics and complex market interdependencies. Existing studies are mostly based on RNNs, which struggle to capture interactive information among fine granular volatility patterns. Besides, they either treat stocks as isolated, or presuppose a fixed graph structure heavily relying on prior domain knowledge. In this paper, we propose a novel Adaptive Long-Short Pattern Transformer (ALSP-TF) for stock ranking in terms of expected returns. Specifically, we overcome the limitations of canonical self-attention including context and position agnostic, with two additional capacities: (i) fine-grained pattern distiller to contextualize queries and keys based on localized feature scales, and (ii) time-adaptive modulator to let the dependency modeling among pattern pairs sensitive to different time intervals. Attention heads in stacked layers gradually harvest short- and long-term transition traits, spontaneously boosting the diversity of representations. Moreover, we devise a graph self-supervised regularization, which helps automatically assimilate the collective synergy of stocks and improve the generalization ability of overall model. Experiments on three exchange market datasets show ALSP-TF’s superiority over state-of-the-art stock forecast methods.

NeurIPS Conference 2022 Conference Paper

Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms

  • Xutong Liu
  • Jinhang Zuo
  • Siwei Wang
  • Carlee Joe-Wong
  • John C. S. Lui
  • Wei Chen

In this paper, we study the combinatorial semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound, where $K$ is the total number of arms that can be pulled or triggered in each round. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we discover a novel (directional) triggering probability and variance modulated (TPVM) condition that can replace the previously-used smoothness condition for various applications, such as cascading bandits, online network exploration and online influence maximization. Under this new condition, we propose a BCUCB-T algorithm with variance-aware confidence intervals and conduct regret analysis which reduces the $O(K)$ factor to $O(\log K)$ or $O(\log^2 K)$ in the regret bound, significantly improving the regret bounds for the above applications. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition and completely removes the dependency on $K$ in the leading regret. As a valuable by-product, the regret analysis used in this paper can improve several existing results by a factor of $O(\log K)$. Finally, experimental evaluations show our superior performance compared with benchmark algorithms in different applications.

JBHI Journal 2022 Journal Article

Cancelable HD-SEMG Biometric Identification via Deep Feature Learning

  • Jiahao Fan
  • Xinyu Jiang
  • Xiangyu Liu
  • Xian Zhao
  • Xinming Ye
  • Chenyun Dai
  • Metin Akay
  • Wei Chen

Conventional biometric modalities, such as the face, fingerprint, and iris, are vulnerable against imitation and circumvention. Accordingly, secure biometric modalities with cancelable properties are needed for personal identification, especially in smart healthcare applications. Here we developed a person identification model using high-density surface electromyography (HD-sEMG) as biometric traits. In this model, the HD-sEMG biometric templates are cancelable and could be customized by the users through finger isometric contractions. A deep feature learning approach, implemented by convolutional neural networks (CNNs) is used to capture user-specific patterns from HD-sEMG signals and make identification decisions. This model has been validated on twenty-two subjects, with training and testing data acquired from two different days. The rank-1 identification accuracy and equal error rate for 44 identities (22 subjects × 2 accounts) can reach 87. 23% and 4. 66%, respectively. The cross-day identification accuracy of the proposed model is higher than the results of previous methods reported in the literature. The usability and efficiency of the proposed model are also investigated, indicating its potentials for practical applications.

JBHI Journal 2022 Journal Article

DINs: Deep Interactive Networks for Neurofibroma Segmentation in Neurofibromatosis Type 1 on Whole-Body MRI

  • Jian-Wei Zhang
  • Wei Chen
  • K. Ina Ly
  • Xubin Zhang
  • Fan Yan
  • Justin Jordan
  • Gordon Harris
  • Scott Plotkin

Neurofibromatosis type 1 (NF1) is an autosomal dominant tumor predisposition syndrome that involves the central and peripheral nervous systems. Accurate detection and segmentation of neurofibromas are essential for assessing tumor burden and longitudinal tumor size changes. Automatic convolutional neural networks (CNNs) are sensitive and vulnerable as tumors’ variable anatomical location and heterogeneous appearance on MRI. In this study, wepropose deep interactive networks (DINs) to address the above limitations. User interactions guide the model to recognize complicated tumors and quickly adapt to heterogeneous tumors. We introduce a simple but effective Exponential Distance Transform (ExpDT) that converts user interactions into guide maps regarded as the spatial and appearance prior. Comparing with popular Euclidean and geodesic distances, ExpDT is more robust to various image sizes, which reserves the distribution of interactive inputs. Furthermore, to enhance the tumor-related features, we design a deep interactive module to propagate the guides into deeper layers. We train and evaluate DINs on three MRI data sets from NF1 patients. The experiment results yield significant improvements of 44% and 14% in DSC comparing with automated and other interactive methods, respectively. We also experimentally demonstrate the efficiency of DINs in reducing user burden when comparing with conventional interactive methods.

NeurIPS Conference 2022 Conference Paper

Does Momentum Change the Implicit Regularization on Separable Data?

  • Bohan Wang
  • Qi Meng
  • Huishuai Zhang
  • Ruoyu Sun
  • Wei Chen
  • Zhi-Ming Ma
  • Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the $L^2$ max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i. e. , SGDM and deterministic Adam) and show they also converge to the $L^2$ max-margin solution. Technically, the implicit regularization of SGDM is established based on a novel convergence analysis of SGDM under a general noise condition called affine noise variance condition. To the best of our knowledge, we are the first to derive SGDM’s convergence under such an assumption. Numerical experiments are conducted to support our theoretical results.

NeurIPS Conference 2022 Conference Paper

Feature-Proxy Transformer for Few-Shot Segmentation

  • Jian-Wei Zhang
  • Yifan Sun
  • Yi Yang
  • Wei Chen

Few-shot segmentation~(FSS) aims at performing semantic segmentation on novel classes given a few annotated support samples. With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head. Due to the intricacy of the decoder and its matching pipeline, it is not easy to follow such an FSS framework. This paper revives the straightforward framework of ``feature extractor $+$ linear classification head'' and proposes a novel Feature-Proxy Transformer (FPTrans) method, in which the ``proxy'' is the vector representing a semantic class in the linear classification head. FPTrans has two keypoints for learning discriminative features and representative proxies: 1) To better utilize the limited support samples, the feature extractor makes the query interact with the support features from bottom to top layers using a novel prompting strategy. 2) FPTrans uses multiple local background proxies (instead of a single one) because the background is not homogeneous and may contain some novel foreground regions. These two keypoints are easily integrated into the vision transformer backbone with the prompting mechanism in the transformer. Given the learned features and proxies, FPTrans directly compares their cosine similarity for segmentation. Although the framework is straightforward, we show that FPTrans achieves competitive FSS accuracy on par with state-of-the-art decoder-based methods.

TIST Journal 2022 Journal Article

Federated Multi-task Graph Learning

  • Yijing Liu
  • Dongming Han
  • Jianwei Zhang
  • Haiyang Zhu
  • Mingliang Xu
  • Wei Chen

Distributed processing and analysis of large-scale graph data remain challenging because of the high-level discrepancy among graphs. This study investigates a novel subproblem: the distributed multi-task learning on the graph, which jointly learns multiple analysis tasks from decentralized graphs. We propose a federated multi-task graph learning (FMTGL) framework to solve the problem within a privacy-preserving and scalable scheme. Its core is an innovative data-fusion mechanism and a low-latency distributed optimization method. The former captures multi-source data relatedness and generates universal task representation for local task analysis. The latter enables the quick update of our framework with gradients sparsification and tree-based aggregation. As a theoretical result, the proposed optimization method has a convergence rate interpolates between \( \mathcal {O}(1/T) \) and \( \mathcal {O}(1/\sqrt {T}) \), up to logarithmic terms. Unlike previous studies, our work analyzes the convergence behavior with adaptive stepsize selection and non-convex assumption. Experimental results on three graph datasets verify the effectiveness and scalability of FMTGL.

IJCAI Conference 2022 Conference Paper

Heterogeneous Interactive Snapshot Network for Review-Enhanced Stock Profiling and Recommendation

  • Heyuan Wang
  • Tengjiao Wang
  • Shun Li
  • Shijie Guan
  • Jiayi Zheng
  • Wei Chen

Stock recommendation plays a critical role in modern quantitative trading. The large volumes of social media information such as investment reviews that delegate emotion-driven factors, together with price technical indicators formulate a “snapshot” of the evolving stock market profile. However, previous studies usually model the temporal trajectories of price and media modalities separately while losing their interrelated influences. Moreover, they mainly extract review semantics via sequential or attentive models, whereas the rich text associated knowledge is largely neglected. In this paper, we propose a novel heterogeneous interactive snapshot network for stock profiling and recommendation. We model investment reviews in each snapshot as a heterogeneous document graph, and develop a flexible hierarchical attentive propagation framework to capture fine-grained proximity features. Further, to learn stock embedding for ranking, we introduce a novel twins-GRU method, which tightly couples the media and price parallel sequences in a cross-interactive fashion to catch dynamic dependencies between successive snapshots. Our approach excels state-of-the-arts over 7. 6% in terms of cumulative and risk-adjusted returns in trading simulations on both English and Chinese benchmarks.

NeurIPS Conference 2022 Conference Paper

Multivariate Time-Series Forecasting with Temporal Polynomial Graph Neural Networks

  • Yijing Liu
  • Qinxian Liu
  • Jian-Wei Zhang
  • Haozhe Feng
  • Zhongwei Wang
  • Zihan Zhou
  • Wei Chen

Modeling multivariate time series (MTS) is critical in modern intelligent systems. The accurate forecast of MTS data is still challenging due to the complicated latent variable correlation. Recent works apply the Graph Neural Networks (GNNs) to the task, with the basic idea of representing the correlation as a static graph. However, predicting with a static graph causes significant bias because the correlation is time-varying in the real-world MTS data. Besides, there is no gap analysis between the actual correlation and the learned one in their works to validate the effectiveness. This paper proposes a temporal polynomial graph neural network (TPGNN) for accurate MTS forecasting, which represents the dynamic variable correlation as a temporal matrix polynomial in two steps. First, we capture the overall correlation with a static matrix basis. Then, we use a set of time-varying coefficients and the matrix basis to construct a matrix polynomial for each time step. The constructed result empirically captures the precise dynamic correlation of six synthetic MTS datasets generated by a non-repeating random walk model. Moreover, the theoretical analysis shows that TPGNN can achieve perfect approximation under a commutative condition. We conduct extensive experiments on two traffic datasets with prior structure and four benchmark datasets. The results indicate that TPGNN achieves the state-of-the-art on both short-term and long-term MTS forecastings.

IJCAI Conference 2022 Conference Paper

Mutual Distillation Learning Network for Trajectory-User Linking

  • Wei Chen
  • ShuZhe Li
  • Chao Huang
  • Yanwei Yu
  • Yongguo Jiang
  • Junyu Dong

Trajectory-User Linking (TUL), which links trajectories to users who generate them, has been a challenging problem due to the sparsity in check-in mobility data. Existing methods ignore the utilization of historical data or rich contextual features in check-in data, resulting in poor performance for TUL task. In this paper, we propose a novel Mutual distillation learning network to solve the TUL problem for sparse check-in mobility data, named MainTUL. Specifically, MainTUL is composed of a Recurrent Neural Network (RNN) trajectory encoder that models sequential patterns of input trajectory and a temporal-aware Transformer trajectory encoder that captures long-term time dependencies for the corresponding augmented historical trajectories. Then, the knowledge learned on historical trajectories is transferred between the two trajectory encoders to guide the learning of both encoders to achieve mutual distillation of information. Experimental results on two real-world check-in mobility datasets demonstrate the superiority of \model against state-of-the-art baselines. The source code of our model is available at https: //github. com/Onedean/MainTUL.

AAAI Conference 2022 Conference Paper

Online Influence Maximization with Node-Level Feedback Using Standard Offline Oracles

  • Zhijie Zhang
  • Wei Chen
  • Xiaoming Sun
  • Jialin Zhang

We study the online influence maximization (OIM) problem in social networks, where in multiple rounds the learner repeatedly chooses seed nodes to generate cascades, observes the cascade feedback, and gradually learns the best seeds that generate the largest cascade. We focus on two major challenges in this paper. First, we work with node-level feedback instead of edge-level feedback. The edge-level feedback reveals all edges that pass through information in a cascade, whereas the node-level feedback only reveals the activated nodes with timestamps. The node-level feedback is arguably more realistic since in practice it is relatively easy to observe who is influenced but very difficult to observe from which relationship (edge) the influence comes. Second, we use standard offline oracles instead of offline pair-oracles. To compute a good seed set for the next round, an offline pair-oracle finds the best seed set and the best parameters within the confidence region simultaneously, and such an oracle is difficult to compute due to the combinatorial core of the OIM problem. So we focus on how to use the standard offline influence maximization oracle which finds the best seed set given the edge parameters as input. In this paper, we resolve these challenges for the famous independent cascade (IC) diffusion model. The past research only achieves edge-level feedback, while we present the first e O( √ T)-regret algorithm for the node-level feedback. For the first challenge above, we apply a novel adaptation of the maximum likelihood estimation (MLE) approach to learn the graph parameters and its confidence region (a confidence ellipsoid). For the second challenge, we adjust the update procedure to dissect the confidence ellipsoid into confidence intervals on each parameter, so that the standard offline influence maximization oracle is enough.

NeurIPS Conference 2022 Conference Paper

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret

  • Jiawei Huang
  • Li Zhao
  • Tao Qin
  • Wei Chen
  • Nan Jiang
  • Tie-Yan Liu

We propose a new learning framework that captures the tiered structure of many real-world user-interaction applications, where the users can be divided into two groups based on their different tolerance on exploration risks and should be treated separately. In this setting, we simultaneously maintain two policies $\pi^{\text{O}}$ and $\pi^{\text{E}}$: $\pi^{\text{O}}$ (``O'' for ``online'') interacts with more risk-tolerant users from the first tier and minimizes regret by balancing exploration and exploitation as usual, while $\pi^{\text{E}}$ (``E'' for ``exploit'') exclusively focuses on exploitation for risk-averse users from the second tier utilizing the data collected so far. An important question is whether such a separation yields advantages over the standard online setting (i. e. , $\pi^{\text{E}}=\pi^{\text{O}}$) for the risk-averse users. We individually consider the gap-independent vs. ~gap-dependent settings. For the former, we prove that the separation is indeed not beneficial from a minimax perspective. For the latter, we show that if choosing Pessimistic Value Iteration as the exploitation algorithm to produce $\pi^{\text{E}}$, we can achieve a constant regret for risk-averse users independent of the number of episodes $K$, which is in sharp contrast to the $\Omega(\log K)$ regret for any online RL algorithms in the same setting, while the regret of $\pi^{\text{O}}$ (almost) maintains its online regret optimality and does not need to compromise for the success of $\pi^{\text{E}}$.

JBHI Journal 2021 Journal Article

A Hybrid DCNN-SVM Model for Classifying Neonatal Sleep and Wake States Based on Facial Expressions in Video

  • Muhammad Awais
  • Xi Long
  • Bin Yin
  • Saadullah Farooq Abbasi
  • Saeed Akbarzadeh
  • Chunmei Lu
  • Xinhua Wang
  • Laishuan Wang

Sleep is a natural phenomenon controlled by the central nervous system. The sleep-wake pattern, which functions as an essential indicator of neurophysiological organization in the neonatal period, has profound meaning in the prediction of cognitive diseases and brain maturity. In recent years, unobtrusive sleep monitoring and automatic sleep staging have been intensively studied for adults, but much less for neonates. This work aims to investigate a novel video-based unobtrusive method for neonatal sleep-wake classification by analyzing the behavioral changes in the neonatal facial region. A hybrid model is proposed to monitor the sleep-wake patterns of human neonates. The model combines two algorithms: deep convolutional neural network (DCNN) and support vector machine (SVM), where DCNN works as a trainable feature extractor and SVM as a classifier. Data was collected from nineteen Chinese neonates at the Children's Hospital of Fudan University, Shanghai, China. The classification results are compared with the gold standard of video-electroencephalography scored by pediatric neurologists. Validations indicate that the proposed hybrid DCNN-SVM model achieved reliable performances in classifying neonatal sleep and wake states in RGB video frames (with the face region detected), with an accuracy of 93. 8 ± 2. 2% and an F1-score 0. 93 ± 0. 3.

ICRA Conference 2021 Conference Paper

Autonomous Multi-View Navigation via Deep Reinforcement Learning

  • Xueqin Huang
  • Wei Chen
  • Wei Zhang 0021
  • Ran Song 0001
  • Jiyu Cheng
  • Yibin Li 0001

In this paper, we propose a novel deep reinforcement learning (DRL) system for the autonomous navigation of mobile robots that consists of three modules: map navigation, multi-view perception and multi-branch control. Our DRL system takes as the input a routed map provided by a global planner and three RGB images captured by a multi-camera setup to gather global and local information, respectively. In particular, we present a multi-view perception module based on an attention mechanism to filter out redundant information caused by multi-camera sensing. We also replace raw RGB images with low-dimensional representations via a specifically designed network, which benefits a more robust sim2real transfer learning. Extensive experiments in both simulated and real-world scenarios demonstrate that our system outperforms state-of-the-art approaches.

JBHI Journal 2021 Journal Article

Cancelable HD-sEMG-Based Biometrics for Cross-Application Discrepant Personal Identification

  • Xinyu Jiang
  • Ke Xu
  • Xiangyu Liu
  • Chenyun Dai
  • David A. Clifton
  • Edward A. Clancy
  • Metin Akay
  • Wei Chen

With the soaring development of body sensor network (BSN)-based health informatics, information security in such medical devices has attracted increasing attention in recent years. Employing the biosignals acquired directly by the BSN as biometrics for personal identification is an effective approach. Noncancelability and cross-application invariance are two natural flaws of most traditional biometric modalities. Once the biometric template is exposed, it is compromised forever. Even worse, because the same biometrics may be employed as tokens for different accounts in multiple applications, the exposed template can be used to compromise other accounts. In this work, we propose a cancelable and cross-application discrepant biometric approach based on high-density surface electromyogram (HD-sEMG) for personal identification. We enrolled two accounts for each user. HD-sEMG signals from the right dorsal hand under isometric contractions of different finger muscles were employed as biometric tokens. Since isometric contraction, in contrast to dynamic contraction, requires no actual movement, the users’ choice to login to different accounts is greatly protected against impostors. We realized a promising identification accuracy of 85. 8% for 44 identities (22 subjects × 2 accounts) with training and testing data acquired 9 days apart. The high identification accuracy of different accounts for the same user demonstrates the promising cancelability and cross-application discrepancy of the proposed HD-sEMG-based biometrics. To the best of our knowledge, this is the first study to employ HD-sEMG in personal identification applications, with signal variation across days considered.

NeurIPS Conference 2021 Conference Paper

Combinatorial Pure Exploration with Bottleneck Reward Function

  • Yihan Du
  • Yuko Kuroki
  • Wei Chen

In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings. In CPE-B, given a set of base arms and a collection of subsets of base arms (super arms) following a certain combinatorial constraint, a learner sequentially plays a base arm and observes its random reward, with the objective of finding the optimal super arm with the maximum bottleneck value, defined as the minimum expected reward of the base arms contained in the super arm. CPE-B captures a variety of practical scenarios such as network routing in communication networks, and its unique challenges fall on how to utilize the bottleneck property to save samples and achieve the statistical optimality. None of the existing CPE studies (most of them assume linear rewards) can be adapted to solve such challenges, and thus we develop brand-new techniques to handle them. For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor). For the FB setting, we design an algorithm which achieves the state-of-the-art error probability guarantee and is the first to run efficiently on fixed-budget path instances, compared to existing CPE algorithms. Our experimental results on the top-$k$, path and matching instances validate the empirical superiority of the proposed algorithms over their baselines.

AAAI Conference 2021 Conference Paper

Combinatorial Pure Exploration with Full-Bandit or Partial Linear Feedback

  • Yihan Du
  • Yuko Kuroki
  • Wei Chen

In this paper, we first study the problem of combinatorial pure exploration with full-bandit feedback (CPE-BL), where a learner is given a combinatorial action space X ⊆ {0, 1}d, and in each round the learner pulls an action x ∈ X and receives a random reward with expectation x⊤ θ, with θ ∈ Rd a latent and unknown environment vector. The objective is to identify the optimal action with the highest expected reward, using as few samples as possible. For CPE-BL, we design the first polynomial-time adaptive algorithm, whose sample complexity matches the lower bound (within a logarithmic factor) for a family of instances and has a light dependence of ∆min (the smallest gap between the optimal action and sub-optimal actions). Furthermore, we propose a novel generalization of CPE-BL with flexible feedback structures, called combinatorial pure exploration with partial linear feedback (CPE-PL), which encompasses several families of sub-problems including full-bandit feedback, semi-bandit feedback, partial feedback and nonlinear reward functions. In CPE-PL, each pull of action x reports a random feedback vector with expectation of Mxθ, where Mx ∈ Rmx×d is a transformation matrix for x, and gains a random (possibly nonlinear) reward related to x. For CPE-PL, we develop the first polynomial-time algorithm, which simultaneously addresses limited feedback, general reward function and combinatorial action space (e. g. , matroids, matchings and s-t paths), and provide its sample complexity analysis. Our empirical evaluation demonstrates that our algorithms run orders of magnitude faster than the existing ones, and our CPE-BL algorithm is robust across different ∆min settings while our CPE-PL algorithm is the first one returning correct answers for nonlinear reward functions.

AAAI Conference 2021 Conference Paper

How Does Data Augmentation Affect Privacy in Machine Learning?

  • Da Yu
  • Huishuai Zhang
  • Wei Chen
  • Jian Yin
  • Tie-Yan Liu

It is observed in the literature that data augmentation can significantly mitigate membership inference (MI) attack. However, in this work, we challenge this observation by proposing new MI attacks to utilize the information of augmented data. MI attack is widely used to measure the model’s information leakage of the training set. We establish the optimal membership inference when the model is trained with augmented data, which inspires us to formulate the MI attack as a set classification problem, i. e. , classifying a set of augmented instances instead of a single data point, and design input permutation invariant features. Empirically, we demonstrate that the proposed approach universally outperforms original methods when the model is trained with data augmentation. Even further, we show that the proposed approach can achieve higher MI attack success rates on models trained with some data augmentation than the existing methods on models trained without data augmentation. Notably, we achieve 70. 1% MI attack success rate on CIFAR10 against a wide residual network while previous best approach only attains 61. 9%. This suggests the privacy risk of models trained with data augmentation could be largely underestimated.

AAAI Conference 2021 Conference Paper

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach

  • Suping Zhou
  • Jia Jia
  • Zhiyong Wu
  • Zhihan Yang
  • Yanfeng Wang
  • Wei Chen
  • Fanbo Meng
  • Shuo Huang

Effective emotion inference from user queries helps to give a more personified response for Voice Dialogue Applications(VDAs). The tremendous amounts of VDA users bring in diverse emotion expressions. How to achieve a high emotion inferring performance from large-scale Internet Voice Data in VDAs? Traditionally, researches on speech emotion recognition are based on acted voice datasets, which have limited speakers but strong and clear emotion expressions. Inspired by this, in this paper, we propose a novel approach to leverage acted voice data with strong emotion expressions to enhance large-scale unlabeled internet voice data with diverse emotion expressions for emotion inferring. Specifically, we propose a novel semi-supervised multi-modal curriculum augmentation deep learning framework. First, to learn more general emotion cues, we adopt a curriculum learning based epoch-wise training strategy, which trains our model guided by strong and balanced emotion samples from acted voice data and sub-sequently leverages weak and unbalanced emotion samples from internet voice data. Second, to employ more diverse emotion expressions, we design a Multi-path Mixmatch Multimodal Deep Neural Network(MMMD), which effectively learns feature representations for multiple modalities and trains labeled and unlabeled data in hybrid semisupervised methods for superior generalisation and robustness. Experiments on an internet voice dataset with 500, 000 utterances show our method outperforms (+10. 09% in terms of F1) several alternative baselines, while an acted corpus with 2, 397 utterances contributes 4. 35%. To further compare our method with state-of-the-art techniques in traditionally acted voice datasets, we also conduct experiments on public dataset IEMOCAP. The results reveal the effectiveness of the proposed approach.

NeurIPS Conference 2021 Conference Paper

Learning Causal Semantic Representation for Out-of-Distribution Prediction

  • Chang Liu
  • Xinwei Sun
  • Jindong Wang
  • Haoyue Tang
  • Tao Li
  • Tao Qin
  • Wei Chen
  • Tie-Yan Liu

Conventional supervised learning methods, especially deep ones, are found to be sensitive to out-of-distribution (OOD) examples, largely because the learned representation mixes the semantic factor with the variation factor due to their domain-specific correlation, while only the semantic factor causes the output. To address the problem, we propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately, and develop methods for OOD prediction from a single training domain, which is common and challenging. The methods are based on the causal invariance principle, with a novel design in variational Bayes for both efficient learning and easy prediction. Theoretically, we prove that under certain conditions, CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error and the success of adaptation. Empirical study shows improved OOD performance over prevailing baselines.

JBHI Journal 2021 Journal Article

MetaSleepLearner: A Pilot Study on Fast Adaptation of Bio-Signals-Based Sleep Stage Classifier to New Individual Subject Using Meta-Learning

  • Nannapas Banluesombatkul
  • Pichayoot Ouppaphan
  • Pitshaporn Leelaarporn
  • Payongkit Lakhan
  • Busarakum Chaitusaney
  • Nattapong Jaimchariyatam
  • Ekapol Chuangsuwanich
  • Wei Chen

Identifying bio-signals based-sleep stages requires time-consuming and tedious labor of skilled clinicians. Deep learning approaches have been introduced in order to challenge the automatic sleep stage classification conundrum. However, the difficulties can be posed in replacing the clinicians with the automatic system due to the differences in many aspects found in individual bio-signals, causing the inconsistency in the performance of the model on every incoming individual. Thus, we aim to explore the feasibility of using a novel approach, capable of assisting the clinicians and lessening the workload. We propose the transfer learning framework, entitled MetaSleepLearner, based on Model Agnostic Meta-Learning (MAML), in order to transfer the acquired sleep staging knowledge from a large dataset to new individual subjects (source code is available at https://github.com/IoBT-VISTEC/MetaSleepLearner). The framework was demonstrated to require the labelling of only a few sleep epochs by the clinicians and allow the remainder to be handled by the system. Layer-wise Relevance Propagation (LRP) was also applied to understand the learning course of our approach. In all acquired datasets, in comparison to the conventional approach, MetaSleepLearner achieved a range of 5. 4% to 17. 7% improvement with statistical difference in the mean of both approaches. The illustration of the model interpretation after the adaptation to each subject also confirmed that the performance was directed towards reasonable learning. MetaSleepLearner outperformed the conventional approaches as a result from the fine-tuning using the recordings of both healthy subjects and patients. This is the first work that investigated a non-conventional pre-training method, MAML, resulting in a possibility for human-machine collaboration in sleep stage classification and easing the burden of the clinicians in labelling the sleep stages through only several epochs rather than an entire recording.

JBHI Journal 2021 Journal Article

Noise Reduction for SD-OCT Using a Structure-Preserving Domain Transfer Approach

  • Menglin Wu
  • Wei Chen
  • Qiang Chen
  • Hyunjin Park

Spectral-domain optical coherence tomography (SD-OCT) images inevitably suffer from multiplicative speckle noise caused by random interference. This study proposes an unsupervised domain adaptation approach for noise reduction by translating the SD-OCT to the corresponding high-quality enhanced depth imaging (EDI)-OCT. We propose a structure-persevered cycle-consistent generative adversarial network for unpaired image-to-image translation, which can be applied to imbalanced unpaired data, and can effectively preserve retinal details based on a structure-specific cross-domain description. It also imposes smoothness by penalizing the intensity variation of the low reflective region between consecutive slices. Our approach was tested on a local data set that consisted of 268 SD-OCT volumes and two public independent validation datasets including 20 SD-OCT volumes and 17 B-scans, respectively. Experimental results show that our method can effectively suppress noise and maintain the retinal structure, compared with other traditional approaches and deep learning methods in terms of qualitative and quantitative assessments. Our proposed method shows good performance for speckle noise reduction and can assist downstream tasks of OCT analysis.

NeurIPS Conference 2021 Conference Paper

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

  • Bohan Wang
  • Huishuai Zhang
  • Jieyu Zhang
  • Qi Meng
  • Wei Chen
  • Tie-Yan Liu

Recently, the information-theoretical framework has been proven to be able to obtain non-vacuous generalization bounds for large models trained by Stochastic Gradient Langevin Dynamics (SGLD) with isotropic noise. In this paper, we optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized. This validates that the optimal noise is quite close to the empirical gradient covariance. Technically, we develop a new information-theoretical bound that enables such an optimization analysis. We then apply matrix analysis to derive the form of optimal noise covariance. Presented constraint and results are validated by the empirical observations.

JBHI Journal 2021 Journal Article

Quantifying Spatial Activation Patterns of Motor Units in Finger Extensor Muscles

  • Xinyu Jiang
  • Haoran Ren
  • Ke Xu
  • Xinming Ye
  • Chenyun Dai
  • Edward A. Clancy
  • Yuan-Ting Zhang
  • Wei Chen

The ability to expertly control different fingers contributes to hand dexterity during object manipulation in daily life activities. The macroscopic spatial patterns of muscle activations during finger movements using global surface electromyography (sEMG) have been widely researched. However, the spatial activation patterns of microscopic motor units (MUs) under different finger movements have not been well investigated. The present work aims to quantify MU spatial activation patterns during movement of distinct fingers (index, middle, ring and little finger). Specifically, we focused on extensor muscles during extension contractions. Motor unit action potentials (MUAPs) during movement of each finger were obtained through decomposition of high-density sEMG (HD-sEMG). First, we quantified the spatial activation patterns of MUs for each finger based on 2-dimension (2-D) root-mean-square (RMS) maps of MUAP grids after spike-triggered averaging. We found that these activation patterns under different finger movements are distinct along the distal-proximal direction, but with partial overlap. Second, to further evaluate MU separability, we classified the spatial activation pattern of each individual MU under distinct finger movement and associated each MU with its corresponding finger with Regularized Uncorrelated Multilinear Discriminant Analysis (RUMLDA). A high accuracy of MU-finger classification tested on 12 subjects with a mean of 88. 98% was achieved. The quantification of MU spatial activation patterns could be beneficial to studies of neural mechanisms of the hand. To the best of our knowledge, this is the first work which manages to quantify MU behaviors under different finger movements.

NeurIPS Conference 2021 Conference Paper

R-Drop: Regularized Dropout for Neural Networks

  • Xiaobo Liang
  • Lijun Wu
  • Juntao Li
  • Yue Wang
  • Qi Meng
  • Tao Qin
  • Wei Chen
  • Min Zhang

Dropout is a powerful and widely used technique to regularize the training of deep neural networks. Though effective and performing well, the randomness introduced by dropout causes unnegligible inconsistency between training and inference. In this paper, we introduce a simple consistency training strategy to regularize dropout, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the above inconsistency. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e. g. , ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30. 91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43. 95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub\footnote{\url{https: //github. com/dropreg/R-Drop}}.

NeurIPS Conference 2021 Conference Paper

Recovering Latent Causal Factor for Generalization to Distributional Shifts

  • Xinwei Sun
  • Botong Wu
  • Xiangyu Zheng
  • Chang Liu
  • Wei Chen
  • Tao Qin
  • Tie-Yan Liu

Distributional shifts between training and target domains may degrade the prediction accuracy of learned models, mainly because these models often learn features that possess only correlation rather than causal relation with the output. Such a correlation, which is known as ``spurious correlation'' statistically, is domain-dependent hence may fail to generalize to unseen domains. To avoid such a spurious correlation, we propose \textbf{La}tent \textbf{C}ausal \textbf{I}nvariance \textbf{M}odels (LaCIM) that specifies the underlying causal structure of the data and the source of distributional shifts, guiding us to pursue only causal factor for prediction. Specifically, the LaCIM introduces a pair of correlated latent factors: (a) causal factor and (b) others, while the extent of this correlation is governed by a domain variable that characterizes the distributional shifts. On the basis of this, we prove that the distribution of observed variables conditioning on latent variables is shift-invariant. Equipped with such an invariance, we prove that the causal factor can be recovered without mixing information from others, which induces the ground-truth predicting mechanism. We propose a Variational-Bayesian-based method to learn this invariance for prediction. The utility of our approach is verified by improved generalization to distributional shifts on various real-world data. Our code is freely available at \url{https: //github. com/wubotong/LaCIM}.

AAAI Conference 2021 Conference Paper

SHOT-VAE: Semi-supervised Deep Generative Models With Label-aware ELBO Approximations

  • Hao-Zhe Feng
  • Kezhi Kong
  • Minghao Chen
  • Tianye Zhang
  • Minfeng Zhu
  • Wei Chen

Semi-supervised variational autoencoders (VAEs) have obtained strong results, but have also encountered the challenge that good ELBO values do not always imply accurate inference results. In this paper, we investigate and propose two causes of this problem: (1) The ELBO objective cannot utilize the label information directly. (2) A bottleneck value exists, and continuing to optimize ELBO after this value will not improve inference accuracy. On the basis of the experiment results, we propose SHOT-VAE to address these problems without introducing additional prior knowledge. The SHOT- VAE offers two contributions: (1) A new ELBO approximation named smooth-ELBO that integrates the label predictive loss into ELBO. (2) An approximation based on optimal interpolation that breaks the ELBO value bottleneck by reducing the margin between ELBO and the data likelihood. The SHOT-VAE achieves good performance with 25. 30% error rate on CIFAR-100 with 10k labels and reduces the error rate to 6. 11% on CIFAR-10 with 4k labels.

NeurIPS Conference 2021 Conference Paper

The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

  • Fang Kong
  • Yueran Yang
  • Wei Chen
  • Shuai Li

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example \cite{WangC18} has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.

AAAI Conference 2021 Conference Paper

Time Series Domain Adaptation via Sparse Associative Structure Alignment

  • Ruichu Cai
  • Jiawei Chen
  • Zijian Li
  • Wei Chen
  • Keli Zhang
  • Junjian Ye
  • Zhuozhang Li
  • Xiaoyan Yang

Domain adaptation on time series data is an important but challenging task. Most of the existing works in this area are based on the learning of the domain-invariant representation of the data with the help of restrictions like MMD. However, such extraction of the domain-invariant representation is a non-trivial task for time series data, due to the complex dependence among the timestamps. In detail, in the fully dependent time series, a small change of the time lags or the offsets may lead to difficulty in the domain invariant extraction. Fortunately, the stability of the causality inspired us to explore the domain invariant structure of the data. To reduce the difficulty in the discovery of causal structure, we relax it to the sparse associative structure and propose a novel sparse associative structure alignment model for domain adaptation. First, we generate the segment set to exclude the obstacle of offsets. Second, the intra-variables and inter-variables sparse attention mechanisms are devised to extract associative structure time-series data with considering time lags. Finally, the associative structure alignment is used to guide the transfer of knowledge from the source domain to the target one. Experimental studies not only verify the good performance of our methods on three real-world datasets but also provide some insightful discoveries on the transferred knowledge.

JBHI Journal 2020 Journal Article

A Hierarchical Neural Network for Sleep Stage Classification Based on Comprehensive Feature Learning and Multi-Flow Sequence Learning

  • Chenglu Sun
  • Chen Chen
  • Wei Li
  • Jiahao Fan
  • Wei Chen

Automatic sleep staging methods usually extract hand-crafted features or network trained features from signals recorded by polysomnography (PSG), and then estimate the stages by various classifiers. In this study, we propose a classification approach based on a hierarchical neural network to process multi-channel PSG signals for improving the performance of automatic five-class sleep staging. The proposed hierarchical network contains two stages: comprehensive feature learning stage and sequence learning stage. The first stage is used to obtain the feature matrix by fusing the hand-crafted features and network trained features. A multi-flow recurrent neural network (RNN) as the second stage is utilized to fully learn temporal information between sleep epochs and fine-tune the parameters in the first stage. The proposed model was evaluated by 147 full night recordings in a public sleep database, the Montreal Archive of Sleep Studies (MASS). The proposed approach can achieve the overall accuracy of 0. 878, and the F1-score is 0. 818. The results show that the approach can achieve better performance compared to the state-of-the-art methods. Ablation experiment and model analysis proved the effectiveness of different components of the proposed model. The proposed approach allows automatic sleep stage classification by multi-channel PSG signals with different criteria standards, signal characteristics, and epoch divisions, and it has the potential to exploit sleep information comprehensively.

AAAI Conference 2020 Conference Paper

Adaptive Greedy versus Non-Adaptive Greedy for Influence Maximization

  • Wei Chen
  • Binghui Peng
  • Grant Schoenebeck
  • Biaoshuai Tao

We consider the adaptive influence maximization problem: given a network and a budget k, iteratively select k seeds in the network to maximize the expected number of adopters. In the full-adoption feedback model, after selecting each seed, the seed-picker observes all the resulting adoptions. In the myopic feedback model, the seed-picker only observes whether each neighbor of the chosen seed adopts. Motivated by the extreme success of greedy-based algorithms/heuristics for influence maximization, we propose the concept of greedy adaptivity gap, which compares the performance of the adaptive greedy algorithm to its non-adaptive counterpart. Our first result shows that, for submodular influence maximization, the adaptive greedy algorithm can perform up to a (1 − 1/e)-fraction worse than the non-adaptive greedy algorithm, and that this ratio is tight. More specifically, on one side we provide examples where the performance of the adaptive greedy algorithm is only a (1 − 1/e) fraction of the performance of the non-adaptive greedy algorithm in four settings: for both feedback models and both the independent cascade model and the linear threshold model. On the other side, we prove that in any submodular cascade, the adaptive greedy algorithm always outputs a (1 − 1/e)-approximation to the expected number of adoptions in the optimal non-adaptive seed choice. Our second result shows that, for the general submodular cascade model with full-adoption feedback, the adaptive greedy algorithm can outperform the non-adaptive greedy algorithm by an unbounded factor. Finally, we propose a risk-free variant of the adaptive greedy algorithm that always performs no worse than the non-adaptive greedy algorithm.

IROS Conference 2020 Conference Paper

Autonomous Robot Navigation Based on Multi-Camera Perception

  • Kunyan Zhu
  • Wei Chen
  • Wei Zhang 0021
  • Ran Song 0001
  • Yibin Li 0001

In this paper, we propose an autonomous method for robot navigation based on a multi-camera setup that takes advantage of a wide field of view. A new multi-task network is designed for handling the visual information supplied by the left, central and right cameras to find the passable area, detect the intersection and infer the steering. Based on the outputs of the network, three navigation indicators are generated and then combined with the high-level control commands extracted by the proposed MapNet, which are finally fed into the driving controller. The indicators are also used through the controller for adjusting the driving velocity, which assists the robot to adjust the speed for smoothly bypassing obstacles. Experiments in real-world environments demonstrate that our method performs well in both local obstacle avoidance and global goal-directed navigation tasks.

IS Journal 2020 Journal Article

Domain Adaptation Learning Based on Structural Similarity Weighted Mean Discrepancy for Credit Risk Classification

  • Wei Chen
  • Zhongfei Li
  • Jinchao Guo

Domain adaptation learning is an effective method for leveraging knowledge from the source domain with labels to a target domain without any labels. However, most prior methods have neglected the contribution of each sample to the integral measure when adaptively matching either marginal distribution, conditional distribution, or both between domains. In this article, an improved algorithm based on mean discrepancy embedding with structural similarity is proposed, which aims at the contribution of each sample to the integral measure on the performance of target domain learning model by using labeled source samples and unlabeled target samples. The discrepancy between both marginal and conditional distribution are minimized with dimensionality reduction procedure to feature extraction with structural similarity weights for all samples from the source and target domains. The results of empirical analysis demonstrate that the proposed method has better performance over several state-of-the-art methods in credit risk classification. `

AAAI Conference 2020 Conference Paper

EEMEFN: Low-Light Image Enhancement via Edge-Enhanced Multi-Exposure Fusion Network

  • Minfeng Zhu
  • Pingbo Pan
  • Wei Chen
  • Yi Yang

This work focuses on the extremely low-light image enhancement, which aims to improve image brightness and reveal hidden information in darken areas. Recently, image enhancement approaches have yielded impressive progress. However, existing methods still suffer from three main problems: (1) low-light images usually are high-contrast. Existing methods may fail to recover images details in extremely dark or bright areas; (2) current methods cannot precisely correct the color of low-light images; (3) when the object edges are unclear, the pixel-wise loss may treat pixels of different objects equally and produce blurry images. In this paper, we propose a two-stage method called Edge-Enhanced Multi-Exposure Fusion Network (EEMEFN) to enhance extremely low-light images. In the first stage, we employ a multi-exposure fusion module to address the high contrast and color bias issues. We synthesize a set of images with different exposure time from a single image and construct an accurate normal-light image by combining well-exposed areas under different illumination conditions. Thus, it can produce realistic initial images with correct color from extremely noisy and low-light images. Secondly, we introduce an edge enhancement module to refine the initial images with the help of the edge information. Therefore, our method can reconstruct high-quality images with sharp edges when minimizing the pixel-wise loss. Experiments on the See-in-the-Dark dataset indicate that our EEMEFN approach achieves state-of-the-art performance.

JBHI Journal 2020 Journal Article

Epilepsy Seizure Prediction on EEG Using Common Spatial Pattern and Convolutional Neural Network

  • Yuan Zhang
  • Yao Guo
  • Po Yang
  • Wei Chen
  • Benny Lo

Epilepsy seizure prediction paves the way of timely warning for patients to take more active and effective intervention measures. Compared to seizure detection that only identifies the inter-ictal state and the ictal state, far fewer researches have been conducted on seizure prediction because the high similarity makes it challenging to distinguish between the pre-ictal state and the inter-ictal state. In this paper, a novel solution on seizure prediction is proposed using common spatial pattern (CSP) and convolutional neural network (CNN). Firstly, artificial preictal EEG signals based on the original ones are generated by combining the segmented pre-ictal signals to solve the trial imbalance problem between the two states. Secondly, a feature extractor employing wavelet packet decomposition and CSP is designed to extract the distinguishing features in both the time domain and the frequency domain. It can improve overall accuracy while reducing the training time. Finally, a shallow CNN is applied to discriminate between the pre-ictal state and the inter-ictal state. Our proposed solution is evaluated on 23 patients' data from Boston Children's Hospital-MIT scalp EEG dataset by employing a leave-one-out cross-validation, and it achieves a sensitivity of 92. 2% and false prediction rate of 0. 12/h. Experimental result demonstrates that the proposed approach outperforms most state-of-the-art methods.

AAAI Conference 2020 Conference Paper

Gradient Method for Continuous Influence Maximization with Budget-Saving Considerations

  • Wei Chen
  • Weizhong Zhang
  • Haoyu Zhao

Continuous influence maximization (CIM) generalizes the original influence maximization by incorporating general marketing strategies: a marketing strategy mix is a vector x = (x1, .. ., xd) such that for each node v in a social network, v could be activated as a seed of diffusion with probability hv(x), where hv is a strategy activation function satisfying DR-submodularity. CIM is the task of selecting a strategy mix x with constraint i xi ≤ k where k is a budget constraint, such that the total number of activated nodes after the diffusion process, called influence spread and denoted as g(x), is maximized. In this paper, we extend CIM to consider budget saving, that is, each strategy mix x has a cost c(x) where c is a convex cost function, and we want to maximize the balanced sum g(x) + λ(k − c(x)) where λ is a balance parameter, subject to the constraint of c(x) ≤ k. We denote this problem as CIM- BS. The objective function of CIM-BS is neither monotone, nor DR-submodular or concave, and thus neither the greedy algorithm nor the standard result on gradient method could be directly applied. Our key innovation is the combination of the gradient method with reverse influence sampling to design algorithms that solve CIM-BS: For the general case, we give an algorithm that achieves 1 2 − ε -approximation, and for the case of independent strategy activations, we present an algorithm that achieves 1 − 1 e − ε approximation.

IJCAI Conference 2020 Conference Paper

Gradient Perturbation is Underrated for Differentially Private Convex Optimization

  • Da Yu
  • Huishuai Zhang
  • Wei Chen
  • Jian Yin
  • Tie-Yan Liu

Gradient perturbation, widely used for differentially private optimization, injects noise at every iterative update to guarantee differential privacy. Previous work first determines the noise level that can satisfy the privacy requirement and then analyzes the utility of noisy gradient updates as in the non-private case. In contrast, we explore how the privacy noise affects the optimization property. We show that for differentially private convex optimization, the utility guarantee of differentially private (stochastic) gradient descent is determined by an expected curvature rather than the minimum curvature. The expected curvature, which represents the average curvature over the optimization path, is usually much larger than the minimum curvature. By using the expected curvature, we show that gradient perturbation can achieve a significantly improved utility guarantee that can theoretically justify the advantage of gradient perturbation over other perturbation methods. Finally, our extensive experiments suggest that gradient perturbation with the advanced composition method indeed outperforms other perturbation approaches by a large margin, matching our theoretical findings.

JBHI Journal 2020 Journal Article

Guest Editorial: Blockchain and Healthcare Computing

  • Yulei Wu
  • Zheng Yan
  • F. Richard Yu
  • Robert Deng
  • Vijay Varadharajan
  • Wei Chen

The four papers in this special section focus on the use of blockchain in the healthcare field. With the development of society, health has received increasing attentions. The development of science and technology has also promoted the protection of health. In recent years, the rapid development of computing and networking technologies has improved the ability to collect, measure, and analyze health-related data, and thus tremendous opportunities have opened up for healthcare computing. Meanwhile, these technologies have also brought new challenges and issues.

IJCAI Conference 2020 Conference Paper

I4R: Promoting Deep Reinforcement Learning by the Indicator for Expressive Representations

  • Xufang Luo
  • Qi Meng
  • Di He
  • Wei Chen
  • Yunhong Wang

Learning expressive representations is always crucial for well-performed policies in deep reinforcement learning (DRL). Different from supervised learning, in DRL, accurate targets are not always available, and some inputs with different actions only have tiny differences, which stimulates the demand for learning expressive representations. In this paper, firstly, we empirically compare the representations of DRL models with different performances. We observe that the representations of a better state extractor (SE) are more scattered than a worse one when they are visualized. Thus, we investigate the singular values of representation matrix, and find that, better SEs always correspond to smaller differences among these singular values. Next, based on such observations, we define an indicator of the representations for DRL model, which is the Number of Significant Singular Values (NSSV) of a representation matrix. Then, we propose I4R algorithm, to improve DRL algorithms by adding the corresponding regularization term to enhance the NSSV. Finally, we apply I4R to both policy gradient and value based algorithms on Atari games, and the results show the superiority of our proposed method.

JBHI Journal 2020 Journal Article

Measuring and Localizing Individual Bites Using a Sensor Augmented Plate During Unrestricted Eating for the Aging Population

  • Gert Mertes
  • Li Ding
  • Wei Chen
  • Hans Hallez
  • Jie Jia
  • Bart Vanrumste

Food intake monitoring can play an important role in the prevention of malnutrition in the aging population, but traditional tools may not be adequate for use in this target group. These tools typically involve the use of questionnaires or food diaries that require manual data entry. Due to their time-consuming nature, they are often incomplete, contain mistakes, or not used at all. An alternative to self-reporting tools, in the form of a plate system that automatically measures the consumed food during the meal, is presented in this paper. Furthermore, the system can estimate the location where each bite was taken on the plate. The system is compatible with an off-the-shelf plate that is mounted on top of a base station. Weight sensors are integrated in the base, allowing for easy removal and cleaning of the plate. Localization of bites is done by looking at the movement of the center of mass during eating. When used with a compartmentalized plate, the amount of consumed food per compartment can be measured. With prior knowledge of the type of food in each compartment, this can give an indication of calories and nutritional intake. We present a bite detection algorithm using a random forest decision tree classifier. Data from 24 aging adults (ages 52-95) eating a single meal with chopsticks was used to train and evaluate the model. Out of a total of 836 true annotated bites, the algorithm detected 602 with a precision and recall of 0. 78 and 0. 76, respectively. By summing the weights of detected bites from each compartment, the algorithm was able to estimate the amount of food taken per compartment with an average error of $(8 \pm 8)$% of the portion size.

NeurIPS Conference 2020 Conference Paper

Online Influence Maximization under Linear Threshold Model

  • Shuai Li
  • Fang Kong
  • Kejie Tang
  • Qizhi Li
  • Wei Chen

Online influence maximization (OIM) is a popular problem in social networks to learn influence propagation model parameters and maximize the influence spread at the same time. Most previous studies focus on the independent cascade (IC) model under the edge-level feedback. In this paper, we address OIM in the linear threshold (LT) model. Because node activations in the LT model are due to the aggregated effect of all active neighbors, it is more natural to model OIM with the nodel-level feedback. And this brings new challenge in online learning since we only observe aggregated effect from groups of nodes and the groups are also random. Based on the linear structure in node activations, we incorporate ideas from linear bandits and design an algorithm $\ltlinucb$ that is consistent with the observed feedback. By proving group observation modulated (GOM) bounded smoothness property, a novel result of the influence difference in terms of the random observations, we provide a regret of order $\tilde{O}(\mathrm{poly}(m)\sqrt{T})$, where $m$ is the number of edges and $T$ is the number of rounds. This is the first theoretical result in such order for OIM under the LT model. In the end, we also provide an algorithm $\oimetc$ with regret bound $O(\mathrm{poly}(m)\ T^{2/3})$, which is model-independent, simple and has less requirement on online feedback and offline computation.

IJCAI Conference 2020 Conference Paper

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

  • Ling Pan
  • Qingpeng Cai
  • Qi Meng
  • Wei Chen
  • Longbo Huang

Value function estimation is an important task in reinforcement learning, i. e. , prediction. The Boltzmann softmax operator is a natural value estimator and can provide several benefits. However, it does not satisfy the non-expansion property, and its direct use may fail to converge even in value iteration. In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning. Experimental results on GridWorld show that the DBS operator enables better estimation of the value function, which rectifies the convergence issue of the softmax operator. Finally, we propose the DBS-DQN algorithm by applying the DBS operator, which outperforms DQN substantially in 40 out of 49 Atari games.

AAAI Conference 2020 Conference Paper

Stochastic Online Learning with Probabilistic Graph Feedback

  • Shuai Li
  • Wei Chen
  • Zheng Wen
  • Kwong-Sak Leung

We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability pij. Two cases are covered. (a) The one-step case, where after playing arm i the learner observes a sample reward feedback of arm j with independent probability pij. (b) The cascade case where after playing arm i the learner observes feedback of all arms j in a probabilistic cascade starting from i – for each (i, j) with probability pij, if arm i is played or observed, then a reward sample of arm j would be observed with independent probability pij. Previous works mainly focus on deterministic graphs which corresponds to one-step case with pij ∈ {0, 1}, an adversarial sequence of graphs with certain topology guarantees, or a specific type of random graphs. We analyze the asymptotic lower bounds and design algorithms in both cases. The regret upper bounds of the algorithms match the lower bounds with high probability.

AAAI Conference 2019 Conference Paper

Adapting Translation Models for Transcript Disfluency Detection

  • Qianqian Dong
  • Feng Wang
  • Zhen Yang
  • Wei Chen
  • Shuang Xu
  • Bo Xu

Transcript disfluency detection (TDD) is an important component of the real-time speech translation system, which arouses more and more interests in recent years. This paper presents our study on adapting neural machine translation (NMT) models for TDD. We propose a general training framework for adapting NMT models to TDD task rapidly. In this framework, the main structure of the model is implemented similar to the NMT model. Additionally, several extended modules and training techniques which are independent of the NMT model are proposed to improve the performance, such as the constrained decoding, denoising autoencoder initialization and a TDD-specific training object. With the proposed training framework, we achieve significant improvement. However, it is too slow in decoding to be practical. To build a feasible and production-ready solution for TDD, we propose a fast non-autoregressive TDD model following the non-autoregressive NMT model emerged recently. Even we do not assume the specific architecture of the NMT model, we build our TDD model on the basis of Transformer, which is the state-of-the-art NMT model. We conduct extensive experiments on the publicly available set, Switchboard, and in-house Chinese set. Experimental results show that the proposed model significantly outperforms previous state-ofthe-art models.

NeurIPS Conference 2019 Conference Paper

Adaptive Influence Maximization with Myopic Feedback

  • Binghui Peng
  • Wei Chen

We study the adaptive influence maximization problem with myopic feedback under the independent cascade model: one sequentially selects k nodes as seeds one by one from a social network, and each selected seed returns the immediate neighbors it activates as the feedback available for by later selections, and the goal is to maximize the expected number of total activated nodes, referred as the influence spread. We show that the adaptivity gap, the ratio between the optimal adaptive influence spread and the optimal non-adaptive influence spread, is at most 4 and at least e/(e-1), and the approximation ratios with respect to the optimal adaptive influence spread of both the non-adaptive greedy and adaptive greedy algorithms are at least \frac{1}{4}(1 - \frac{1}{e}) and at most \frac{e^2 + 1}{(e + 1)^2} < 1 - \frac{1}{e}. Moreover, the approximation ratio of the non-adaptive greedy algorithm is no worse than that of the adaptive greedy algorithm, when considering all graphs. Our result confirms a long-standing open conjecture of Golovin and Krause (2011) on the constant approximation ratio of adaptive greedy with myopic feedback, and it also suggests that adaptive greedy may not bring much benefit under myopic feedback.

IJCAI Conference 2019 Conference Paper

BN-invariant Sharpness Regularizes the Training Model to Better Generalization

  • Mingyang Yi
  • Huishuai Zhang
  • Wei Chen
  • Zhi-Ming Ma
  • Tie-Yan Liu

It is arguably believed that flatter minima can generalize better. However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a delta ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g. , networks with batch normalization layer. In this paper, we first propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN. It achieves the property of scale invariance by connecting the integral diameter with the scale of parameter. Then we present a computation-efficient way to calculate the BN-sharpness approximately i. e. , one dimensional integral along the "sharpest" direction. Furthermore, we use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective. Our algorithm achieves considerably better performance than vanilla SGD over various experiment settings.

AAAI Conference 2019 Conference Paper

Capacity Control of ReLU Neural Networks by Basis-Path Norm

  • Shuxin Zheng
  • Qi Meng
  • Huishuai Zhang
  • Wei Chen
  • Nenghai Yu
  • Tie-Yan Liu

Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. It has been shown that the generalization error bound in terms of the path norm explains the empirical generalization behaviors of the ReLU neural networks better than that of other capacity measures. Moreover, optimization algorithms which take path norm as the regularization term to the loss function, like Path-SGD, have been shown to achieve better generalization performance. However, the path norm counts the values of all paths, and hence the capacity measure based on path norm could be improperly influenced by the dependency among different paths. It is also known that each path of a ReLU network can be represented by a small group of linearly independent basis paths with multiplication and division operation, which indicates that the generalization behavior of the network only depends on only a few basis paths. Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately. We establish a generalization error bound based on this basis path norm, and show it explains the generalization behaviors of ReLU networks more accurately than previous capacity measures via extensive experiments. In addition, we develop optimization algorithms which minimize the empirical risk regularized by the basis-path norm. Our experiments on benchmark datasets demonstrate that the proposed regularization method achieves clearly better performance on the test set than the previous regularization approaches.

IJCAI Conference 2019 Conference Paper

Improved Algorithm on Online Clustering of Bandits

  • Wei Chen
  • Shuai Li
  • Kwong-Sak Leung

We generalize the setting of online clustering of bandits by allowing non-uniform distribution over user frequencies. A more efficient algorithm is proposed with simple set structures to represent clusters. We prove a regret bound for the new algorithm which is free of the minimal frequency over users. The experiments on both synthetic and real datasets consistently show the advantage of the new algorithm over existing methods.

AAAI Conference 2019 Short Paper

Type Sequence Preserving Heterogeneous Information Network Embedding

  • Yuxin Chen
  • Tengjiao Wang
  • Wei Chen
  • Qiang Li
  • Zhen Qiu

Lacking in sequence preserving mechanism, existing heterogeneous information network (HIN) embedding discards the essential type sequence information during embedding. We propose a Type Sequence Preserving HIN Embedding model (SeqHINE) which expands the HIN embedding to sequence level. SeqHINE incorporates the type sequence information via type-aware GRU and preserves representative sequence information by decay function. Abundant experiments show that SeqHINE can outperform state-of-the-art even with 50% less labeled data.

IJCAI Conference 2018 Conference Paper

Combinatorial Pure Exploration with Continuous and Separable Reward Functions and Its Applications

  • Weiran Huang
  • Jungseul Ok
  • Liang Li
  • Wei Chen

We study the Combinatorial Pure Exploration problem with Continuous and Separable reward functions (CPE-CS) in the stochastic multi-armed bandit setting. In a CPE-CS instance, we are given several stochastic arms with unknown distributions, as well as a collection of possible decisions. Each decision has a reward according to the distributions of arms. The goal is to identify the decision with the maximum reward, using as few arm samples as possible. The problem generalizes the combinatorial pure exploration problem with linear rewards, which has attracted significant attention in recent years. In this paper, we propose an adaptive learning algorithm for the CPE-CS problem, and analyze its sample complexity. In particular, we introduce a new hardness measure called the consistent optimality hardness, and give both the upper and lower bounds of sample complexity. Moreover, we give examples to demonstrate that our solution has the capacity to deal with non-linear reward functions.

NeurIPS Conference 2018 Conference Paper

Community Exploration: From Offline Optimization to Online Learning

  • Xiaowei Chen
  • Weiran Huang
  • Wei Chen
  • John C. S. Lui

We introduce the community exploration problem that has various real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an ``upper confidence'' like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.

AAAI Conference 2018 Conference Paper

Dictionary Learning Inspired Deep Network for Scene Recognition

  • Yang Liu
  • Qingchao Chen
  • Wei Chen
  • Ian Wassell

Scene recognition remains one of the most challenging problems in image understanding. With the help of fully connected layers (FCL) and rectified linear units (ReLu), deep networks can extract the moderately sparse and discriminative feature representation required for scene recognition. However, few methods consider exploiting a sparsity model for learning the feature representation in order to provide enhanced discriminative capability. In this paper, we replace the conventional FCL and ReLu with a new dictionary learning layer, that is composed of a finite number of recurrent units to simultaneously enhance the sparse representation and discriminative abilities of features via the determination of optimal dictionaries. In addition, with the help of the structure of the dictionary, we propose a new label discriminative regressor to boost the discrimination ability. We also propose new constraints to prevent overfitting by incorporating the advantage of the Mahalanobis and Euclidean distances to balance the recognition accuracy and generalization performance. Our proposed approach is evaluated using various scene datasets and shows superior performance to many stateof-the-art approaches.

IJCAI Conference 2018 Conference Paper

Differential Equations for Modeling Asynchronous Algorithms

  • Li He
  • Qi Meng
  • Wei Chen
  • Zhi-Ming Ma
  • Tie-Yan Liu

Asynchronous stochastic gradient descent (ASGD) is a popular parallel optimization algorithm in machine learning. Most theoretical analysis on ASGD take a discrete view and prove upper bounds for their convergence rates. However, the discrete view has its intrinsic limitations: there is no characterizationof the optimization path and the proof techniques are induction-based and thus usually complicated. Inspired by the recent successful adoptions of stochastic differential equations (SDE) to the theoretical analysis of SGD, in this paper, we study the continuous approximation of ASGD by using stochastic differential delay equations (SDDE). We introduce the approximation method and study the approximation error. Then we conduct theoretical analysis on the convergence rate of ASGD algorithm based on the continuous approximation. There are two methods: moment estimation and energy function minimization can be used to analyzethe convergence rates. Moment estimation depends on the specific form of the loss function, while energy function minimization only leverages the convex property of the loss function, and does not depend on its specific form. In addition to the convergence analysis, the continuous view also helps us derive better convergence rates. All of this clearly shows the advantage of taking the continuous view in gradient descent algorithms.

JBHI Journal 2018 Journal Article

Frequency Network Analysis of Heart Rate Variability for Obstructive Apnea Patient Detection

  • Zhao Dong
  • Xiang Li
  • Wei Chen

Obstructive sleep apnea (OSA) is a popular sleep disorder. Traditional OSA diagnosis methods are cumbersome and expensive, which bring inconvenience for patient diagnosis and heavy workload for physician. Automatically identifying OSA patients from electrocardiogram (ECG) records is important for clinical diagnosis and treatment. In this paper, a new method based on the frequency and network domains is proposed to automatically recognize OSA patients with nocturnal ECG records. First, each RR-interval (beat to beat heart rate) series was divided into segments. By calculating the power spectral density (PSD) of heart rate variability segment with Lomb-Scargle method, the dynamic time warping (DTW) distance was used to evaluate the similarity (dissimilarity) of the lower frequency in the PSD series, then the DTW distance matrix was transformed to a binary matrix, and then network metrics were calculated to discriminate OSA patients with healthy subjects. The new method was tested with data of 389 subjects collected from two public databases that consist of normal subjects without OSA (apnea-hypopnea index, AHI≤5) and OSA patients (AHI>5). Results show that a single network metric (local clustering coefficient) can recognize OSA patients with 90. 1% accuracy, 88. 29% sensitivity, and 90. 5% specificity, and confirm the potential of using the ECG records for OSA patients recognition.

IJCAI Conference 2018 Conference Paper

Galaxy Network Embedding: A Hierarchical Community Structure Preserving Approach

  • Lun Du
  • Zhicong Lu
  • Yun Wang
  • Guojie Song
  • Yiming Wang
  • Wei Chen

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.

NeurIPS Conference 2018 Conference Paper

On the Local Hessian in Back-propagation

  • Huishuai Zhang
  • Wei Chen
  • Tie-Yan Liu

Back-propagation (BP) is the foundation for successfully training deep neural networks. However, BP sometimes has difficulties in propagating a learning signal deep enough effectively, e. g. , the vanishing gradient phenomenon. Meanwhile, BP often works well when combining with ``designing tricks'' like orthogonal initialization, batch normalization and skip connection. There is no clear understanding on what is essential to the efficiency of BP. In this paper, we take one step towards clarifying this problem. We view BP as a solution of back-matching propagation which minimizes a sequence of back-matching losses each corresponding to one block of the network. We study the Hessian of the local back-matching loss (local Hessian) and connect it to the efficiency of BP. It turns out that those designing tricks facilitate BP by improving the spectrum of local Hessian. In addition, we can utilize the local Hessian to balance the training pace of each block and design new training algorithms. Based on a scalar approximation of local Hessian, we propose a scale-amended SGD algorithm. We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side.

TIST Journal 2018 Journal Article

RelationLines

  • Wei Chen
  • Jing Xia
  • Xumeng Wang
  • Yi Wang
  • Jun Chen
  • Liang Chang

The increased accessibility of urban sensor data and the popularity of social network applications is enabling the discovery of crowd mobility and personal communication patterns. However, studying the egocentric relationships of an individual can be very challenging because available data may refer to direct contacts, such as phone calls between individuals, or indirect contacts, such as paired location presence. In this article, we develop methods to integrate three facets extracted from heterogeneous urban data (timelines, calls, and locations) through a progressive visual reasoning and inspection scheme. Our approach uses a detect-and-filter scheme such that, prior to visual refinement and analysis, a coarse detection is performed to extract the target individual and construct the timeline of the target. It then detects spatio-temporal co-occurrences or call-based contacts to develop the egocentric network of the individual. The filtering stage is enhanced with a line-based visual reasoning interface that facilitates a flexible and comprehensive investigation of egocentric relationships and connections in terms of time, space, and social networks. The integrated system, RelationLines, is demonstrated using a dataset that contains taxi GPS data, cell-base mobility data, mobile calling data, microblog data, and point-of-interest (POI) data from a city with millions of citizens. We examine the effectiveness and efficiency of our system with three case studies and user review.

AAMAS Conference 2018 Conference Paper

Slim-DP: A Multi-Agent System for Communication-Efficient Distributed Deep Learning

  • Shizhao Sun
  • Wei Chen
  • Jiang Bian
  • Xiaoguang Liu
  • Tie-Yan Liu

To afford the huge computational cost, large-scale deep neural networks (DNN) are usually trained on the distributed system, especially the widely-used parameter server architecture, consisting of a parameter server as well as multiple local workers with powerful GPU cards. During the training, local workers frequently pull the global model and push their computed gradients from/to the parameter server. Due to the limited bandwidth, such frequent communication will cause severe bottleneck for the training acceleration. As recent attempts to address this problem, quantization methods have been proposed to compress the gradients for efficient communication. However, such methods overlook the effects of compression on the model performance such that they either suffer from a low compression ratio or an accuracy drop. In this paper, to better address this problem, we investigate the distributed deep learning as a multi-agent system (MAS) problem. Specifically, 1) local workers and the parameter server are separate agents in the system; 2) the objective of these agents is to maximize the efficacy of the learned model through their cooperative interactions; 3) the strategy of the agents describes how they take actions, i. e. communicate their computed gradients or the global model; 4) rational agents always select the best-response strategy with the optimal utility. Inspired by this, we design a MAS approach for distributed training of DNN. In our method, the agents first estimate the utility (i. e. , the benefit to help improve the model) of each action (i. e. , transferring a subset of the gradients or the global model), and then take the best-response strategy based on their estimated utilities mixed with ϵ-random exploration. We call our new method Slim-DP as it, being different from the standard data-parallelism, only communicates a subset of the gradient or the global model. Our experimental results demonstrate that our proposed Slim-DP can reduce more communication cost and achieve better speedup without loss of accuracy than the standard data parallelism and its quantization version. ∗This work was done when the author was visiting Microsoft Research Asia. Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), M. Dastani, G. Sukthankar, E. André, S. Koenig (eds.), July 10–15, 2018, Stockholm, Sweden. © 2018 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org). All rights reserved.

AAAI Conference 2017 Conference Paper

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

  • Qi Meng
  • Wei Chen
  • Jingcheng Yu
  • Taifeng Wang
  • Zhi-Ming Ma
  • Tie-Yan Liu

Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal optimization algorithms, i. e. , proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to solve the R-ERM problem. Recently, variance reduction technique was proposed to improve ProxSGD and ProxSCD, and the corresponding ProxSVRG and ProxSVRCD have better convergence rate. These proximal algorithms with variance reduction technique have also achieved great success in applications at small and moderate scales. However, in order to solve large-scale R- ERM problems and make more practical impacts, the parallel versions of these algorithms are sorely needed. In this paper, we propose asynchronous ProxSVRG (Async-ProxSVRG) and asynchronous ProxSVRCD (Async-ProxSVRCD) algorithms, and prove that Async-ProxSVRG can achieve near linear speedup when the training data is sparse, while Async- ProxSVRCD can achieve near linear speedup regardless of the sparse condition, as long as the number of block partitions are appropriately set. We have conducted experiments on a regularized logistic regression task. The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

IJCAI Conference 2017 Conference Paper

Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems

  • Quanming Yao
  • James T. Kwok
  • Fei Gao
  • Wei Chen
  • Tie-Yan Liu

While proximal gradient algorithm is originally designed for convex optimization, several variants have been recently proposed for nonconvex problems. Among them, nmAPG [Li and Lin, 2015] is the state-of-art. However, it is inefficient when the proximal step does not have closed-form solution, or such solution exists but is expensive, as it requires more than one proximal steps to be exactly solved in each iteration. In this paper, we propose an efficient accelerate proximal gradient (niAPG) algorithm for nonconvex problems. In each iteration, it requires only one inexact (less expensive) proximal step. Convergence to a critical point is still guaranteed, and a O(1/k) convergence rate is derived. Experiments on image inpainting and matrix completion problems demonstrate that the proposed algorithm has comparable performance as the state-of-the-art, but is much faster.

NeurIPS Conference 2017 Conference Paper

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

  • Yue Wang
  • Wei Chen
  • Yuting Liu
  • Zhi-Ming Ma
  • Tie-Yan Liu

In reinforcement learning (RL), one of the key components is policy evaluation, which aims to estimate the value function (i. e. , expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous \emph{Gradient-based Temporal Difference(GTD)} policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i. i. d. generated, previous work provided the finite sample analysis of the GTD algorithms with constant step size by converting them into convex-concave saddle point problems. However, it is well-known that, the data are generated from Markov processes rather than i. i. d in RL problems. . In this paper, in the realistic Markov setting, we derive the finite sample bounds for the general convex-concave saddle point problems, and hence for the GTD algorithms. We have the following discussions based on our bounds. (1) With variants of step size, GTD algorithms converge. (2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient. The faster the Markov processes mix, the faster the convergence. (3) We explain that the experience replay trick is effective by improving the mixing property of the Markov process. To the best of our knowledge, our analysis is the first to provide finite sample bounds for the GTD algorithms in Markov setting.

AAAI Conference 2017 Conference Paper

Generalization Error Bounds for Optimization Algorithms via Stability

  • Qi Meng
  • Yue Wang
  • Wei Chen
  • Taifeng Wang
  • Zhi-Ming Ma
  • Tie-Yan Liu

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG). Conventional analysis on these optimization algorithms focuses on their convergence rates during the training process, however, people in the machine learning community may care more about the generalization performance of the learned model on unseen test data. In this paper, we investigate on this issue, by using stability as a tool. In particular, we decompose the generalization error for R-ERM, and derive its upper bound for both convex and nonconvex cases. In convex cases, we prove that the generalization error can be bounded by the convergence rate of the optimization algorithm and the stability of the R-ERM process, both in expectation (in the order of O(1/n) + Eρ(T)), where ρ(T) is the convergence error and T is the number of iterations) and in high probability (in the order of O log 1/δ √ n + ρ(T) with probability 1 − δ). For nonconvex cases, we can also obtain a similar expected generalization error bound. Our theorems indicate that 1) along with the training process, the generalization error will decrease for all the optimization algorithms under our investigation; 2) Comparatively speaking, SVRG has better generalization ability than GD and SGD. We have conducted experiments on both convex and nonconvex problems, and the experimental results verify our theoretical findings.

NeurIPS Conference 2017 Conference Paper

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

  • Qinshi Wang
  • Wei Chen

We study combinatorial multi-armed bandit with probabilistically triggered arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior CMAB-T studies where the regret bounds contain a possibly exponentially large factor of 1/p*, where p* is the minimum positive probability that an arm is triggered by any action. We address this issue by introducing a triggering probability modulated (TPM) bounded smoothness condition into the influence maximization bandit and combinatorial cascading bandit satisfy this TPM condition. As a result, we completely remove the factor of 1/p* from the regret bounds, achieving significantly better regret bounds for influence maximization and cascading bandits than before. Finally, we provide lower bound results showing that the factor 1/p* is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.

NeurIPS Conference 2017 Conference Paper

Influence Maximization with $\varepsilon$-Almost Submodular Threshold Functions

  • Qiang Li
  • Wei Chen
  • Institute of Computing Xiaoming Sun
  • Institute of Computing Jialin Zhang

Influence maximization is the problem of selecting $k$ nodes in a social network to maximize their influence spread. The problem has been extensively studied but most works focus on the submodular influence diffusion models. In this paper, motivated by empirical evidences, we explore influence maximization in the non-submodular regime. In particular, we study the general threshold model in which a fraction of nodes have non-submodular threshold functions, but their threshold functions are closely upper- and lower-bounded by some submodular functions (we call them $\varepsilon$-almost submodular). We first show a strong hardness result: there is no $1/n^{\gamma/c}$ approximation for influence maximization (unless P = NP) for all networks with up to $n^{\gamma}$ $\varepsilon$-almost submodular nodes, where $\gamma$ is in (0, 1) and $c$ is a parameter depending on $\varepsilon$. This indicates that influence maximization is still hard to approximate even though threshold functions are close to submodular. We then provide $(1-\varepsilon)^{\ell}(1-1/e)$ approximation algorithms when the number of $\varepsilon$-almost submodular nodes is $\ell$. Finally, we conduct experiments on a number of real-world datasets, and the results demonstrate that our approximation algorithms outperform other baseline algorithms.

NeurIPS Conference 2017 Conference Paper

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

  • Guolin Ke
  • Qi Meng
  • Thomas Finley
  • Taifeng Wang
  • Wei Chen
  • Weidong Ma
  • Qiwei Ye
  • Tie-Yan Liu

Gradient Boosting Decision Tree (GBDT) is a popular machine learning algorithm, and has quite a few effective implementations such as XGBoost and pGBRT. Although many engineering optimizations have been adopted in these implementations, the efficiency and scalability are still unsatisfactory when the feature dimension is high and data size is large. A major reason is that for each feature, they need to scan all the data instances to estimate the information gain of all possible split points, which is very time consuming. To tackle this problem, we propose two novel techniques: \emph{Gradient-based One-Side Sampling} (GOSS) and \emph{Exclusive Feature Bundling} (EFB). With GOSS, we exclude a significant proportion of data instances with small gradients, and only use the rest to estimate the information gain. We prove that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size. With EFB, we bundle mutually exclusive features (i. e. , they rarely take nonzero values simultaneously), to reduce the number of features. We prove that finding the optimal bundling of exclusive features is NP-hard, but a greedy algorithm can achieve quite good approximation ratio (and thus can effectively reduce the number of features without hurting the accuracy of split point determination by much). We call our new GBDT implementation with GOSS and EFB \emph{LightGBM}. Our experiments on multiple public datasets show that, LightGBM speeds up the training process of conventional GBDT by up to over 20 times while achieving almost the same accuracy.

IJCAI Conference 2017 Conference Paper

Opinion-aware Knowledge Graph for Political Ideology Detection

  • Wei Chen
  • Xiao Zhang
  • Tengjiao Wang
  • Bishan Yang
  • Yi Li

Identifying individual's political ideology from their speeches and written texts is important for analyzing political opinions and user behavior on social media. Traditional opinion mining methods rely on bag-of-words representations to classify texts into different ideology categories. Such methods are too coarse for understanding political ideologies. The key to identify different ideologies is to recognize different opinions expressed toward a specific topic. To model this insight, we classify ideologies based on the distribution of opinions expressed towards real-world entities or topics. Specifically, we propose a novel approach to political ideology detection that makes predictions based on an opinion-aware knowledge graph. We show how to construct such graph by integrating the opinions and targeted entities extracted from text into an existing structured knowledge base, and show how to perform ideology inference by information propagation on the graph. Experimental results demonstrate that our method achieves high accuracy in detecting ideologies compared to baselines including LR, SVM and RNN.

AAAI Conference 2017 Conference Paper

Partitioned Sampling of Public Opinions Based on Their Social Dynamics

  • Weiran Huang
  • Liang Li
  • Wei Chen

Public opinion polling is usually done by random sampling from the entire population, treating individual opinions as independent. In the real world, individuals’ opinions are often correlated, e. g. , among friends in a social network. In this paper, we explore the idea of partitioned sampling, which partitions individuals with high opinion similarities into groups and then samples every group separately to obtain an accurate estimate of the population opinion. We rigorously formulate the above idea as an optimization problem. We then show that the simple partitions which contain only one sample in each group are always better, and reduce finding the optimal simple partition to a well-studied Min-r-Partition problem. We adapt an approximation algorithm and a heuristic to solve the optimization problem. Moreover, to obtain opinion similarity efficiently, we adapt a well-known opinion evolution model to characterize social interactions, and provide an exact computation of opinion similarities based on the model. We use both synthetic and real-world datasets to demonstrate that the partitioned sampling method results in significant improvement in sampling quality and it is robust when some opinion similarities are inaccurate or even missing.

NeurIPS Conference 2016 Conference Paper

A Communication-Efficient Parallel Algorithm for Decision Tree

  • Qi Meng
  • Guolin Ke
  • Taifeng Wang
  • Wei Chen
  • Qiwei Ye
  • Zhi-Ming Ma
  • Tie-Yan Liu

Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e. g. , $M$) machines, this algorithm performs both local voting and global voting in each iteration. For local voting, the top-$k$ attributes are selected from each machine according to its local data. Then, the indices of these top attributes are aggregated by a server, and the globally top-$2k$ attributes are determined by a majority voting among these local candidates. Finally, the full-grained histograms of the globally top-$2k$ attributes are collected from local machines in order to identify the best (most informative) attribute and its split point. PV-Tree can achieve a very low communication cost (independent of the total number of attributes) and thus can scale out very well. Furthermore, theoretical analysis shows that this algorithm can learn a near optimal decision tree, since it can find the best attribute with a large probability. Our experiments on real-world datasets show that PV-Tree significantly outperforms the existing parallel decision tree algorithms in the tradeoff between accuracy and efficiency.

IJCAI Conference 2016 Conference Paper

Asynchronous Accelerated Stochastic Gradient Descent

  • Qi Meng
  • Wei Chen
  • Jingcheng Yu
  • Taifeng Wang
  • Zhi-Ming Ma
  • Tie-Yan Liu

Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov's acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data, asynchronous parallelization of SGD has also been studied. Then, a natural question is whether these techniques can be seamlessly integrated with each other, and whether the integration has desirable theoretical guarantee on its convergence. In this paper, we provide our formal answer to this question. In particular, we consider the asynchronous parallelization of SGD, accelerated by leveraging variance reduction, coordinate sampling, and Nesterov's method. We call the new algorithm asynchronous accelerated SGD (AASGD). Theoretically, we proved a convergence rate of AASGD, which indicates that (i) the three acceleration methods are complementary to each other and can make their own contributions to the improvement of convergence rate; (ii) asynchronous parallelization does not hurt the convergence rate, and can achieve considerable speedup under appropriate parameter setting. Empirically, we tested AASGD on a few benchmark datasets. The experimental results verified our theoretical findings and indicated that AASGD could be a highly effective and efficient algorithm for practical use.

JMLR Journal 2016 Journal Article

Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms

  • Wei Chen
  • Yajun Wang
  • Yang Yuan
  • Qinshi Wang

We define a general framework for a large class of combinatorial multi-armed bandit (CMAB) problems, where subsets of base arms with unknown distributions form super arms. In each round, a super arm is played and the base arms contained in the super arm are played and their outcomes are observed. We further consider the extension in which more base arms could be probabilistically triggered based on the outcomes of already triggered arms. The reward of the super arm depends on the outcomes of all played arms, and it only needs to satisfy two mild assumptions, which allow a large class of nonlinear reward instances. We assume the availability of an offline $(\alpha,\beta)$-approximation oracle that takes the means of the outcome distributions of arms and outputs a super arm that with probability $\beta$ generates an $\alpha$ fraction of the optimal expected reward. The objective of an online learning algorithm for CMAB is to minimize $(\alpha,\beta)$-approximation regret, which is the difference in total expected reward between the $\alpha\beta$ fraction of expected reward when always playing the optimal super arm, and the expected reward of playing super arms according to the algorithm. We provide CUCB algorithm that achieves $O(\log n)$ distribution-dependent regret, where $n$ is the number of rounds played, and we further provide distribution-independent bounds for a large class of reward functions. Our regret analysis is tight in that it matches the bound of UCB1 algorithm (up to a constant factor) for the classical MAB problem, and it significantly improves the regret bound in an earlier paper on combinatorial bandits with linear rewards. We apply our CMAB framework to two new applications, probabilistic maximum coverage (PMC) for online advertising and social influence maximization for viral marketing, both having nonlinear reward structures. In particular, application to social influence maximization requires our extension on probabilistically triggered arms. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

NeurIPS Conference 2016 Conference Paper

Combinatorial Multi-Armed Bandit with General Reward Functions

  • Wei Chen
  • Wei Hu
  • Fu Li
  • Jian Li
  • Yu Liu
  • Pinyan Lu

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the $\max()$ function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve $O(\log T)$ distribution-dependent regret and $\tilde{O}(\sqrt{T})$ distribution-independent regret, where $T$ is the time horizon. We apply our results to the $K$-MAX problem and expected utility maximization problems. In particular, for $K$-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first $\tilde{O}(\sqrt T)$ bound on the $(1-\epsilon)$-approximation regret of its online problem, for any $\epsilon>0$.

JBHI Journal 2016 Journal Article

Guest Editorial Sensor Informatics for Managing Mental Health

  • Gaetano Valenza
  • Vladimir Carli
  • Antonio Lanata
  • Wei Chen
  • Roozbeh Jafari
  • Enzo Pasquale Scilingo

The papers in this special section focus on the topic of sensor informatics for mental health applications. The papers provide novel insights on advances in detection, sensing, analysis, and modeling of central and/or autonomic correlates useful in psychophysiological states assessment.

AAAI Conference 2016 Conference Paper

Learning Market Parameters Using Aggregate Demand Queries

  • Xiaohui Bei
  • Wei Chen
  • Jugal Garg
  • Martin Hoefer
  • Xiaoming Sun

We study efficient algorithms for a natural learning problem in markets. There is one seller with m divisible goods and n buyers with unknown individual utility functions and budgets of money. The seller can repeatedly announce prices and observe aggregate demand bundles requested by the buyers. The goal of the seller is to learn the utility functions and budgets of the buyers. Our scenario falls into the classic domain of “revealed preference” analysis. Problems with revealed preference have recently started to attract increased interest in computer science due to their fundamental nature in understanding customer behavior in electronic markets. The goal of revealed preference analysis is to observe rational agent behavior, to explain it using a suitable model for the utility functions, and to predict future agent behavior. Our results are the first polynomial-time algorithms to learn utility and budget parameters via revealed preference queries in classic Fisher markets with multiple buyers. Our analysis concentrates on linear, CES, and Leontief markets, which are the most prominent classes studied in the literature. Some of our results extend to general Arrow-Debreu exchange markets.

AAAI Conference 2016 Conference Paper

On the Depth of Deep Neural Networks: A Theoretical View

  • Shizhao Sun
  • Wei Chen
  • Liwei Wang
  • Xiaoguang Liu
  • Tie-Yan Liu

People believe that depth plays an important role in success of deep neural networks (DNN). However, this belief lacks solid theoretical justifications as far as we know. We investigate role of depth from perspective of margin bound. In margin bound, expected error is upper bounded by empirical margin error plus Rademacher Average (RA) based capacity term. First, we derive an upper bound for RA of DNN, and show that it increases with increasing depth. This indicates negative impact of depth on test performance. Second, we show that deeper networks tend to have larger representation power (measured by Betti numbers based complexity) than shallower networks in multi-class setting, and thus can lead to smaller empirical margin error. This implies positive impact of depth. The combination of these two results shows that for DNN with restricted number of hidden units, increasing depth is not always good since there is a tradeoff between positive and negative impacts. These results inspire us to seek alternative ways to achieve positive impact of depth, e. g. , imposing margin-based penalty terms to cross entropy loss so as to reduce empirical margin error without increasing depth. Our experiments show that in this way, we achieve significantly better test performance.

AAAI Conference 2015 Conference Paper

Generalization Analysis for Game-Theoretic Machine Learning

  • Haifang Li
  • Fei Tian
  • Wei Chen
  • Tao Qin
  • Zhi-Ming Ma
  • Tie-Yan Liu

For Internet applications like sponsored search, cautions need to be taken when using machine learning to optimize their mechanisms (e. g. , auction) since selfinterested agents in these applications may change their behaviors (and thus the data distribution) in response to the mechanisms. To tackle this problem, a framework called game-theoretic machine learning (GTML) was recently proposed, which first learns a Markov behavior model to characterize agents behaviors, and then learns the optimal mechanism by simulating agents’ behavior changes in response to the mechanism. While GTML has demonstrated practical success, its generalization analysis is challenging because the behavior data are non-i. i. d. and dependent on the mechanism. To address this challenge, first, we decompose the generalization error for GTML into the behavior learning error and the mechanism learning error; second, for the behavior learning error, we obtain novel non-asymptotic error bounds for both parametric and non-parametric behavior learning methods; third, for the mechanism learning error, we derive a uniform convergence bound based on a new concept called nested covering number of the mechanism space and the generalization analysis techniques developed for mixing sequences.

AAAI Conference 2015 Conference Paper

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework

  • Ran Xu
  • Caiming Xiong
  • Wei Chen
  • Jason Corso

Recently, joint video-language modeling has been attracting more and more attention. However, most existing approaches focus on exploring the language model upon on a fixed visual model. In this paper, we propose a unified framework that jointly models video and the corresponding text sentences. The framework consists of three parts: a compositional semantics language model, a deep video model and a joint embedding model. In our language model, we propose a dependency-tree structure model that embeds sentence into a continuous vector space, which preserves visually grounded meanings and word order. In the visual model, we leverage deep neural networks to capture essential semantic information from videos. In the joint embedding model, we minimize the distance of the outputs of the deep video model and compositional language model in the joint space, and update these two models jointly. Based on these three parts, our system is able to accomplish three tasks: 1) natural language generation, and 2) video retrieval and 3) language retrieval. In the experiments, the results show our approach outperforms SVM, CRF and CCA baselines in predicting Subject-Verb- Object triplet and natural sentence generation, and is better than CCA in video retrieval and language retrieval tasks.

AAAI Conference 2015 Conference Paper

Mechanism Learning with Mechanism Induced Data

  • Tie-Yan Liu
  • Wei Chen
  • Tao Qin

Machine learning and game theory are two important directions of AI. The former usually assumes data is independent of the models to be learned; the latter usually assumes agents are fully rational. In many modern Internet applications, like sponsored search and crowdsourcing, the two basic assumptions are violated and new challenges are posed to both machine learning and game theory. To better model and study such applications, we need to go beyond conventional machine learning and game theory (mechanism design), and adopt a new approach called mechanism learning with mechanism induced data. Specifically, we propose to learn a behavior model from data to describe how “real” agents play the complicated game, instead of making the fullrationality assumption. Then we propose to optimize the mechanism by using the learned behavior models to predict the future behaviors of agents in response to the new mechanism. Because the above process couples mechanism learning and behavior learning in a loop, new algorithms and theories are needed to perform the task and guarantee the asymptotical performance. As shown in this paper, there are many interesting research topics along this direction, many of which are still open problems, waiting for researchers in our community to deeply investigate.

JBHI Journal 2015 Journal Article

Mimo Pillow—An Intelligent Cushion Designed With Maternal Heart Beat Vibrations for Comforting Newborn Infants

  • Wei Chen
  • Sidarto Bambang Oetomo
  • Daniel Tetteroo
  • Frank Versteegh
  • Thelxi Mamagkaki
  • Mariana Serras Pereira
  • Lindy Janssen
  • Andrea van Meurs

Premature infants are subject to numerous interventions ranging from a simple diaper change to surgery while residing in neonatal intensive care units. These neonates often suffer from pain, distress, and discomfort during the first weeks of their lives. Although pharmacological pain treatment often is available, it cannot always be applied to relieve a neonate from pain or discomfort. This paper describes a nonpharmacological solution, called Mimo, which provides comfort through mediation of a parent's physiological features to the distressed neonate via an intelligent pillow system embedded with sensing and actuating functions. We present the design, the implementation, and the evaluation of the prototype. Clinical tests at Máxima Medical Center in the Netherlands show that among the nine of ten infants who showed discomfort following diaper change, a shorter recovery time to baseline skin conductance analgesimeter values could be measured when the maternal heartbeat vibration in the Mimo was switched ON and in seven of these ten a shorter crying time was measured.

TIST Journal 2015 Journal Article

Sponsored Search Auctions

  • Tao Qin
  • Wei Chen
  • Tie-Yan Liu

Sponsored search has been proven to be a successful business model, and sponsored search auctions have become a hot research direction. There have been many exciting advances in this field, especially in recent years, while at the same time, there are also many open problems waiting for us to resolve. In this article, we provide a comprehensive review of sponsored search auctions in hopes of helping both industry practitioners and academic researchers to become familiar with this field, to know the state of the art, and to identify future research topics. Specifically, we organize the article into two parts. In the first part, we review research works on sponsored search auctions with basic settings, where fully rational advertisers without budget constraints, preknown click-through rates (CTRs) without interdependence, and exact match between queries and keywords are assumed. Under these assumptions, we first introduce the generalized second price (GSP) auction, which is the most popularly used auction mechanism in the industry. Then we give the definitions of several well-studied equilibria and review the latest results on GSP’s efficiency and revenue in these equilibria. In the second part, we introduce some advanced topics on sponsored search auctions. In these advanced topics, one or more assumptions made in the basic settings are relaxed. For example, the CTR of an ad could be unknown and dependent on other ads; keywords could be broadly matched to queries before auctions are executed; and advertisers are not necessarily fully rational, could have budget constraints, and may prefer rich bidding languages. Given that the research on these advanced topics is still immature, in each section of the second part, we provide our opinions on how to make further advances, in addition to describing what has been done by researchers in the corresponding direction.

NeurIPS Conference 2015 Conference Paper

Stochastic Online Greedy Learning with Semi-bandit Feedbacks

  • Tian Lin
  • Jian Li
  • Wei Chen

The greedy algorithm is extensively studied in the field of combinatorial optimization for decades. In this paper, we address the online learning problem when the input to the greedy algorithm is stochastic with unknown parameters that have to be learned over time. We first propose the greedy regret and $\epsilon$-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks, which use multi-armed bandit and pure exploration bandit policies at each level of greedy learning, one for each of the regret metrics respectively. Both algorithms achieve $O(\log T)$ problem-dependent regret bound ($T$ being the time horizon) for a general class of combinatorial structures and reward functions that allow greedy solutions. We further show that the bound is tight in $T$ and other problem instance parameters.

AAAI Conference 2014 Conference Paper

Agent Behavior Prediction and Its Generalization Analysis

  • Fei Tian
  • Haifang Li
  • Wei Chen
  • Tao Qin
  • Enhong Chen
  • Tie-Yan Liu

Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. Behavior data in these systems are generated by live agents: once systems change due to the adoption of prediction models learnt from behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. Therefore, the evolving behavior data will not be identically and independently distributed, posing great challenges to theoretical analysis. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of machine learning algorithms on its basis. We propose a novel technique that transforms the original time-variant MCRE into a higher-dimensional timehomogeneous Markov chain, which is easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we obtain a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems.

NeurIPS Conference 2014 Conference Paper

Combinatorial Pure Exploration of Multi-Armed Bandits

  • Shouyuan Chen
  • Tian Lin
  • Irwin King
  • Michael Lyu
  • Wei Chen

We study the {\em combinatorial pure exploration (CPE)} problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a \emph{decision class}, which is a collection of subsets of arms with certain combinatorial structures such as size-$K$ subsets, matchings, spanning trees or paths, etc. The CPE problem represents a rich class of pure exploration tasks which covers not only many existing models but also novel cases where the object of interest has a non-trivial combinatorial structure. In this paper, we provide a series of results for the general CPE problem. We present general learning algorithms which work for all decision classes that admit offline maximization oracles in both fixed confidence and fixed budget settings. We prove problem-dependent upper bounds of our algorithms. Our analysis exploits the combinatorial structures of the decision classes and introduces a new analytic tool. We also establish a general problem-dependent lower bound for the CPE problem. Our results show that the proposed algorithms achieve the optimal sample complexity (within logarithmic factors) for many decision classes. In addition, applying our results back to the problems of top-$K$ arms identification and multiple bandit best arms identification, we recover the best available upper bounds up to constant factors and partially resolve a conjecture on the lower bounds.

JMLR Journal 2014 Journal Article

Detecting Click Fraud in Online Advertising: A Data Mining Approach

  • Richard Oentaryo
  • Ee-Peng Lim
  • Michael Finegold
  • David Lo
  • Feida Zhu
  • Clifton Phua
  • Eng-Yeow Cheu
  • Ghim-Eng Yap

Click fraud--the deliberate clicking on advertisements with no real interest on the product or service offered--is one of the most daunting problems in online advertising. Building an effective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from September 1 to September 30, 2012, attracting 127 teams from more than 15 countries. The mobile advertising data are unique and complex, involving heterogeneous information, noisy patterns with missing values, and highly imbalanced class distribution. The competition results provide a comprehensive study on the usability of data mining-based fraud detection approaches in practical setting. Our principal findings are that features derived from fine-grained time-series analysis are crucial for accurate fraud detection, and that ensemble methods offer promising solutions to highly-imbalanced nonlinear classification tasks with mixed variable types and noisy/missing patterns. The competition data remain available for further studies at palanteer.sis.smu.edu.sg/fdma2012. [abs] [ pdf ][ bib ] &copy JMLR 2014. ( edit, beta )

AAAI Conference 2013 Conference Paper

A Concave Conjugate Approach for Nonconvex Penalized Regression with the MCP Penalty

  • Shubao Zhang
  • Hui Qian
  • Wei Chen
  • Zhihua Zhang

The minimax concave plus penalty (MCP) has been demonstrated to be effective in nonconvex penalization for feature selection. In this paper we propose a novel construction approach for MCP. In particular, we show that MCP can be derived from a concave conjugate of the Euclidean distance function. This construction approach in turn leads us to an augmented Lagrange multiplier method for solving the penalized regression problem with MCP. In our method each tuning parameter corresponds to a feature, and these tuning parameters can be automatically updated. We also develop a d. c. (difference of convex functions) programming approach for the penalized regression problem. We find that the augmented Lagrange multiplier method degenerates into the d. c. programming method under specific conditions. Experimental analysis is conducted on a set of simulated data. The result is encouraging.

IJCAI Conference 2013 Conference Paper

A Game-Theoretic Machine Learning Approach for Revenue Maximization in Sponsored Search

  • Di He
  • Wei Chen
  • Liwei Wang
  • Tie-Yan Liu

Sponsored search is an important monetization channel for search engines, in which an auction mechanism is used to select the ads shown to users and determine the prices charged from advertisers. There have been several pieces of work in the literature that investigate how to design an auction mechanism in order to optimize the revenue of the search engine. However, due to some unrealistic assumptions used, the practical values of these studies are not very clear. In this paper, we propose a novel game-theoretic machine learning approach, which naturally combines machine learning and game theory, and learns the auction mechanism using a bilevel optimization framework. In particular, we first learn a Markov model from historical data to describe how advertisers change their bids in response to an auction mechanism, and then for any given auction mechanism, we use the learnt model to predict its corresponding future bid sequences. Next we learn the auction mechanism through empirical revenue maximization on the predicted bid sequences. We show that the empirical revenue will converge when the prediction period approaches infinity, and a Genetic Programming algorithm can effectively optimize this empirical revenue. Our experiments indicate that the proposed approach is able to produce a much more effective auction mechanism than several baselines.

AAAI Conference 2012 Conference Paper

Time-Critical Influence Maximization in Social Networks with Time-Delayed Diffusion Process

  • Wei Chen
  • Wei Lu
  • Ning Zhang

Influence maximization is a problem of finding a small set of highly influential users in a social network such that the spread of influence under certain propagation models is maximized. In this paper, we consider time-critical influence maximization, in which one wants to maximize influence spread within a given deadline. Since timing is considered in the optimization, we also extend the Independent Cascade (IC) model to incorporate the time delay aspect of influence diffusion in social networks. We show that time-critical influence maximization under the time-delayed IC model maintains desired properties such as submodularity, which allows a greedy algorithm to achieve an approximation ratio of 1 − 1/e, to circumvent the NP-hardness of the problem. To overcome the inefficiency of the approximation algorithm, we design two heuristic algorithms: the first one is based on a dynamic programming procedure that computes exact influence in tree structures, while the second one converts the problem to one in the original IC model and then applies existing fast heuristics to it. Our simulation results demonstrate that our heuristics achieve the same level of influence spread as the greedy algorithm while running a few orders of magnitude faster, and they also outperform existing algorithms that disregard the deadline constraint and delays in diffusion.

IJCAI Conference 2011 Conference Paper

Community Detection in Social Networks through Community Formation Games

  • Wei Chen
  • Zhenming Liu
  • Xiaorui Sun
  • Yajun Wang

We introduce a game-theoretic framework to address the community detection problem based on the social networks' structure. The dynamics of community formation is framed as a strategic game called community formation game: Given a social network, each node is selfish and selects communities to join or leave based on her own utility measurement. A community structure can be interpreted as an equilibrium of this game. We formulate the agents' utility by the combination of a gain function and a loss function. Each agent can select multiple communities, which naturally captures the concept of "overlapping communities". We propose a gain function based on Newman's modularity function and a simple loss function that reflects the intrinsic costs incurred when people join the communities. We conduct extensive experiments under this framework; our results show that our algorithm is effective in identifying overlapping communities, and is often better than other algorithms we evaluated especially when many people belong to multiple communities.

NeurIPS Conference 2010 Conference Paper

Two-Layer Generalization Analysis for Ranking Using Rademacher Average

  • Wei Chen
  • Tie-Yan Liu
  • Zhi-Ming Ma

This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i. e. , consisting of queries and documents per query. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of algorithms. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i. e. , the i. i. d. sampling of queries and the conditional i. i. d sampling of documents per query. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms. However, it is challenging to perform such analysis, because the documents associated with different queries are not identically distributed, and the documents associated with the same query become no longer independent if represented by features extracted from the matching between document and query. To tackle the challenge, we decompose the generalization error according to the two layers, and make use of the new concept of two-layer Rademacher average. The generalization bounds we obtained are quite intuitive and are in accordance with previous empirical studies on the performance of ranking algorithms.

NeurIPS Conference 2009 Conference Paper

Ranking Measures and Loss Functions in Learning to Rank

  • Wei Chen
  • Tie-Yan Liu
  • Yanyan Lan
  • Zhi-Ming Ma
  • Hang Li

Learning to rank has become an important research topic in machine learning. While most learning-to-rank methods learn the ranking function by minimizing the loss functions, it is the ranking measures (such as NDCG and MAP) that are used to evaluate the performance of the learned ranking function. In this work, we reveal the relationship between ranking measures and loss functions in learning-to-rank methods, such as Ranking SVM, RankBoost, RankNet, and ListMLE. We show that these loss functions are upper bounds of the measure-based ranking errors. As a result, the minimization of these loss functions will lead to the maximization of the ranking measures. The key to obtaining this result is to model ranking as a sequence of classification tasks, and define a so-called essential loss as the weighted sum of the classification errors of individual tasks in the sequence. We have proved that the essential loss is both an upper bound of the measure-based ranking errors, and a lower bound of the loss functions in the aforementioned methods. Our proof technique also suggests a way to modify existing loss functions to make them tighter bounds of the measure-based ranking errors. Experimental results on benchmark datasets show that the modifications can lead to better ranking performance, demonstrating the correctness of our analysis.

IROS Conference 2006 Conference Paper

Information System Unit Model of Distributed Manufacturing Enterprises

  • Kaisheng Zhang
  • Yanming Sun
  • Shixiong Zheng
  • Wei Chen

In order to analyse the function demand of the distributional manufacturing information system as well as its control demand, and eliminate information ambiguity among system units to integrate semantics, the abstract agent model and computational structure of each unit was presented based on the flexible coupling automaton. The autonomy of each unit was investigated in this foundation. The system unit was described using the OWL ontology. And the system semantics was also integrated. On these basics the communication among the system units was illustrated with an example of interaction between a machine and a warehouse. This work established the foundation for the demand analysis, design and development of the distributional manufacture information system