Arrow Research search

Author name cluster

Gang Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

109 papers
2 author rows

Possible papers

109

EAAI Journal 2026 Journal Article

An interpretable causal invariant graph neural network for unseen domain gear fault diagnosis

  • Zhenpeng Lao
  • Gang Chen
  • Yiyue Zhang
  • Penghong Lu
  • Zhenzhen Jin

In recent years, causal learning has provided application prospects for revealing the internal causal relationships of equipment and the explainability of intelligent diagnostic models. However, existing methods still have limitations of the difficulty in eliminating spurious causal correlations in high-dimensional data and insufficient explainability, leading to unstable and unreliable diagnostic performance in unseen domains. Aiming at the above problems, an interpretable fault diagnosis method based on causal invariant graph neural network (CIGNN) is proposed to enhance model’s accuracy and interpretability for gears in the unseen domain. Firstly, a structural causal model is constructed from the cross-domain perspective and combined with GNN to clarify the internal causal mechanism of faults. Then, a causal disentanglement refining module is proposed to separate the effective causal parts from the high-dimensional and complex GNN. Furthermore, a domain causal feature consistency method is proposed to guide CIGNN in learning consistent causal feature embeddings across multi-source domains. Finally, a causal intervention risk minimization strategy is introduced to enable CIGNN deeply mine potential features and block the interference of backdoor paths, enhancing diagnostic stability. Experimental results reveal that the proposed CIGNN model performs robustly in the unseen domain diagnosis task and provides interpretable explanation for decision-making in engineering applications.

JAAMAS Journal 2026 Journal Article

Coordinating Multiple Agents via Reinforcement Learning

  • Gang Chen
  • Zhonghua Yang
  • Kiah Mok Goh

Abstract In this paper, we attempt to use reinforcement learning techniques to solve agent coordination problems in task-oriented environments. The Fuzzy Subjective Task Structure model (FSTS) is presented to model the general agent coordination. We show that an agent coordination problem modeled in FSTS is a Decision-Theoretic Planning (DTP) problem, to which reinforcement learning can be applied. Two learning algorithms, ‘‘ coarse-grained ’’ and ‘‘ fine-grained ’’, are proposed to address agents coordination behavior at two different levels. The ‘‘ coarse-grained ’’ algorithm operates at one level and tackle hard system constraints, and the ‘‘ fine-grained ’’ at another level and for soft constraints. We argue that it is important to explicitly model and explore coordination-specific (particularly system constraints) information, which underpins the two algorithms and attributes to the effectiveness of the algorithms. The algorithms are formally proved to converge and experimentally shown to be effective.

AAAI Conference 2026 Conference Paper

DeepOR: A Deep Reasoning Foundation Model for Optimization Modeling

  • Ziyang Xiao
  • Yuan Jessica Wang
  • Xiongwei Han
  • Shisi Guan
  • Jingyan Zhu
  • Jingrong Xie
  • Lilin Xu
  • Han Wu

Optimization modeling plays a critical role in supporting optimal decision-making across various domains. Previous works have demonstrated that large language models (LLMs) tailored for optimization modeling have significantly automated and simplified this process. However, these models typically employ a straightforward input-output paradigm and struggle with challenging instances. In contrast, recent advances in general-purpose reasoning LLMs (RLLMs), such as DeepSeek-R1, have shown impressive capabilities in complex domains like mathematics and coding. In this paper, we introduce DeepOR, the first RLLM specifically designed for optimization modeling. Instead of directly outputting solutions, DeepOR explicitly performs multiple intermediate reasoning steps. To adapt a base LLM into an RLLM, we begin by synthesizing long chain-of-thought (CoT) data guided by a flowchart, which is automatically generated using a self-exploration algorithm. Once the training data are prepared, we employ supervised fine-tuning on the base LLM to endow it with reasoning capabilities tailored for optimization modeling. To fully leverage the model's reasoning potential, we further apply reinforcement learning with reward-shaping derived from solver feedback. Experimental results on benchmarks confirm that DeepOR consistently and significantly outperforms existing state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

DynamicRTL: RTL Representation Learning for Dynamic Circuit Behavior

  • Ruiyang Ma
  • Yunhao Zhou
  • Yipeng Wang
  • Yi Liu
  • Zhengyuan Shi
  • Ziyang Zheng
  • Kexin Chen
  • Zhiqiang He

There is a growing body of work on using Graph Neural Networks (GNNs) to learn representations of circuits, focusing primarily on their static characteristics. However, these models fail to capture circuit runtime behavior, which is crucial for tasks like circuit verification and optimization. To address this limitation, we introduce DR-GNN (DynamicRTL-GNN), a novel approach that learns RTL circuit representations by incorporating both static structures and multi-cycle execution behaviors. DR-GNN leverages an operator-level Control Data Flow Graph (CDFG) to represent Register Transfer Level (RTL) circuits, enabling the model to capture dynamic dependencies and runtime execution. To train and evaluate DR-GNN, we build the first comprehensive dynamic circuit dataset, comprising over 6,300 Verilog designs and 63,000 simulation traces. Our results demonstrate that DR-GNN outperforms existing models in branch hit prediction and toggle rate prediction. Furthermore, its learned representations transfer effectively to related dynamic circuit tasks, achieving strong performance in power estimation and assertion prediction.

AAAI Conference 2026 Conference Paper

Exploring Surround-View Fisheye Camera 3D Object Detection

  • Changcai Li
  • Wenwei Lin
  • Zuoxun Hou
  • Gang Chen
  • Wei Zhang
  • Huihui Zhou
  • Weishi Zheng

In this work, we explore the technical feasibility of implementing end-to-end 3D object detection (3DOD) with surround-view fisheye camera system. Specifically, we first investigate the performance drop incurred when transferring classic pinhole-based 3D object detectors to fisheye imagery. To mitigate this, we then develop two methods that incorporate the unique geometry of fisheye images into mainstream detection frameworks: one based on the bird's-eye-view (BEV) paradigm, named FisheyeBEVDet, and the other on the query-based paradigm, named FisheyePETR. Both methods adopt spherical spatial representations to effectively capture fisheye geometry. In light of the lack of dedicated evaluation benchmarks, we release Fisheye3DOD, a new open dataset synthesized using CARLA and featuring both standard pinhole and fisheye camera arrays. Experiments on Fisheye3DOD demonstrate that our fisheye-compatible modeling improves accuracy by up to 6.2% compared to baseline methods.

AAAI Conference 2026 Conference Paper

Group-aware Multiscale Ensemble Learning for Test-Time Multimodal Sentiment Analysis

  • Kai Tang
  • Yixuan Tang
  • Tianyi Chen
  • Haokai Xu
  • Qiqi Luo
  • Jin Guang Zheng
  • Zhixin Zhang
  • Gang Chen

Multi-modal Sentiment Analysis (MSA) enables machines to perceive human sentiments by integrating multiple modalities such as text, video, and audio. Despite recent progress, most existing methods assume distribution consistency between training and test data—a condition rarely met in real-world scenarios. To address domain shifts without relying on source data or target labels, Test-Time Adaptation (TTA) has emerged as a promising paradigm. However, applying TTA methods to MSA faces two challenges: a representation bottleneck inherent to the regression formulation and the inconsistency in modality fusion caused by modality-specific data augmentation techniques. To overcome these issues, we propose Group-aware Multiscale Ensemble Learning (GMEL), which leverages a von Mises-Fisher (vMF) mixture distribution to model latent sentiment groups and integrates a multi-scale re-dropout strategy for modality-agnostic feature augmentation, preserving fusion consistency. Extensive experiments on three benchmark datasets using two backbone architectures show that GMEL significantly outperforms existing baselines, demonstrating strong robustness to test-time distribution shifts in multi-modal sentiment analysis.

YNIMG Journal 2026 Journal Article

Neural representation of emotional valence in human amygdala and surrounding regions

  • Ke Bo
  • Lihan Cui
  • Yujun Chen
  • Gang Chen
  • Andreas Keil
  • Mingzhou Ding

The amygdala is a core structure for encoding the affective value of external stimuli. Animal studies suggest that positive and negative emotions are separately encoded by distinct neuronal populations within the amygdala; however, this hypothesis has rarely been tested in humans. The current study examined this hypothesis by comparing the distributed emotion encoding model, as proposed in animal studies, with the univariate emotion encoding model using functional magnetic resonance (fMRI) imaging. More specifically, we applied univariate regression, using average amygdala activation to represent global activation level, and multivariate regression, using distributed voxel-level pattern within the amygdala, to predict normative valence of affective images from the IAPS library. In the core amygdala, the multivariate model's prediction performance was not better than that of the univariate model, with weight map analysis revealing an overwhelming predominance of voxels selectively responsive to negative stimuli. When the region of interest was expanded to include voxels with lower anatomical probability of belonging to the amygdala as well as voxels from adjacent areas, the multivariate model significantly outperformed the univariate model, with the voxels selectively responsive to positive valence primarily located in regions surrounding the core amygdala. These findings suggest that in the human amygdala, the core region encodes emotional valence primarily through a global activation signal, rather than distributed patterns consisting of separate clusters of positive and negative voxels, and a more distributed valence representation emerges when regions surrounding the amygdala are taken into consideration.

EAAI Journal 2026 Journal Article

Posture stability control of a beaver-like bipedal robot based on the deep interactive twin delayed deep deterministic policy gradient algorithm

  • Gang Chen
  • Hanhan Xue
  • Zhihan Zhao
  • Yuwang Lu
  • Guangke Cao
  • Chenguang Yang
  • Huosheng Hu
  • Chuanyu Wu

The underwater environment presents a complex and variable dynamic that challenges the modeling process and is crucial for ensuring the swimming stability of underwater bionic robots. This paper focuses on the biological beaver as a model for developing a new configuration of an underwater bionic rigid-flexible coupled serial-parallel hybrid mechanism. A bionic beaver bipedal robot is designed following this model. To enhance training efficiency and avoid ineffective exploration, we propose a deep interactive reinforcement learning algorithm that leverages human experience, informed by the movement trajectories of the biological beaver. This algorithm facilitates pitch stability motion control in an underwater bionic beaver bipedal robot without complex modeling and reduces training duration substantially. The implementation of this deep interactive reinforcement learning method on a beaver-like bipedal robot prototype has demonstrated successful pitch stability motion control. Experimental results confirm the efficacy of this control strategy, providing novel perspectives and methodologies for stable motion control in underwater webbed-footed robots.

AAAI Conference 2026 Conference Paper

Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning

  • Shuzheng Si
  • Haozhe Zhao
  • Cheng Gao
  • Yuzhuo Bai
  • Zhitong Wang
  • Bofei Gao
  • Kangyang Luo
  • Wenhao Li

Teaching large language models (LLMs) to be faithful in the provided context is crucial for building reliable information-seeking systems. Therefore, we propose a systematic framework, CANOE, to reduce faithfulness hallucinations of LLMs across different downstream tasks without human annotations. Specifically, we first synthesize short-form question-answering (QA) data with four diverse tasks to construct high-quality and easily verifiable training data without human annotation. Also, we propose Dual-GRPO, a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data, while simultaneously optimizing both short-form and long-form response generation. Notably, Dual-GRPO eliminates the need to manually label preference data to train reward models and avoids over-optimizing short-form generation when relying only on the synthesized short-form QA data. Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs, e.g., GPT-4o and OpenAI o1.

EAAI Journal 2026 Journal Article

Three-dimensional-grounded vision-language framework for robotic task planning: Automated prompt synthesis and supervised reasoning

  • Guoqin Tang
  • Qingxuan Jia
  • Gang Chen
  • Zeyuan Huang
  • Zhipeng Yao
  • Ning Ji

Reliably deploying vision-language models for fine-grained robotic manipulation is a central challenge, as their powerful reasoning is often not grounded in three-dimensional (3D) physical reality. To address this, we propose a novel framework for the application of artificial intelligence in robotics that endows standard, frozen vision-language models with high-precision 3D capabilities without costly retraining. Our work makes three core contributions: (i) a plug-and-play architecture that extends off-the-shelf vision-language models to three-dimensional robotic task planning; (ii) an adaptive visual prompt synthesis method that distills noisy 3D data into high-confidence geometric anchors for the model’s visual input; and (iii) a closed-loop supervisory mechanism where a fine-tuned small language model acts as a domain expert to validate plans and mitigate hallucinations. This approach decouples general artificial intelligence knowledge from specialist robotic expertise, offering a scalable and resource-efficient pathway for real-world deployment. Experiments demonstrate state-of-the-art performance in the zero-shot robotics paradigm, achieving a 96. 0% task success rate on foundational tasks and a 73. 4% task success rate on an ultra-long-horizon assembly task unattainable by baseline methods. Ablation studies confirm the criticality of our modules; for instance, removing the small language model supervisor causes the task success rate on a representative task to drop by 67%. By demonstrating that frozen vision-language models can achieve high-precision 3D grounding through automated prompt synthesis and expert supervision, this work provides a practical blueprint for the next generation of intelligent, adaptable artificial intelligence-based robotic systems.

EAAI Journal 2025 Journal Article

A dual spatial temporal neural network for bottleneck prediction in manufacturing systems

  • Weihong Cen
  • Chupeng Su
  • Kainuo Cen
  • Lie Yang
  • Gang Chen
  • Longhan Xie

In manufacturing systems, bottlenecks act as constraints that limit system throughput. Extensive efforts have been made to detect and predict bottlenecks. Traditional bottleneck prediction methods predominantly utilize time-series feature analysis, which is limited in capturing the dynamic spatial dependencies introduced by production material flow. To address these limitations, we proposed a dual spatial temporal neural network for dynamic bottlenecks (Dual-BDSTN) to learn the dependencies of temporal and spatial features dynamically. In the temporal module, a gated recurrent unit combined with a self-attention mechanism is employed to capture the time-evolving dynamics of temporal features related to machine status. In the spatial module, a dynamic graph neural network is employed to learn spatial information affected by dynamic production material flow and a cross-attention mechanism captures the effect of temporal features on spatial features. Finally, gated recurrent neural networks are applied to capture the temporal trends of the temporal and spatial features to predict future starvation and blockage for identifying bottleneck locations. Experimental results demonstrate that the proposed model outperforms the best benchmark, achieving a 5. 95 and 2. 95 reduction in root mean square error for predicting starvation and blockage times of overall machines in the production system respectively (over 10%), with a 2. 85% improvement in bottleneck prediction accuracy.

IJCAI Conference 2025 Conference Paper

A Survey of Optimization Modeling Meets LLMs: Progress and Future Directions

  • Ziyang Xiao
  • Jingrong Xie
  • Lilin Xu
  • Shisi Guan
  • Jingyan Zhu
  • Xiongwei Han
  • Xiaojin Fu
  • WingYin Yu

By virtue of its great utility in solving real-world problems, optimization modeling has been widely employed for optimal decision-making across various sectors, but it requires substantial expertise from operations research professionals. With the advent of large language models (LLMs), new opportunities have emerged to automate the procedure of mathematical modeling. This survey presents a comprehensive and timely review of recent advancements that cover the entire technical stack, including data synthesis and fine-tuning for the base model, inference frameworks, benchmark datasets, and performance evaluation. In addition, we conducted an in-depth analysis on the quality of benchmark datasets, which was found to have a surprisingly high error rate. We cleaned the datasets and constructed a new leaderboard with fair performance evaluation in terms of base LLM model and datasets. We also build an online portal that integrates resources of cleaned datasets, code and paper repository to benefit the community. Finally, we identify limitations in current methodologies and outline future research opportunities.

IJCAI Conference 2025 Conference Paper

A Timestep-Adaptive Frequency-Enhancement Framework for Diffusion-based Image Super-Resolution

  • Yueying Li
  • Hanbin Zhao
  • Jiaqing Zhou
  • Guozhi Xu
  • Tianlei Hu
  • Gang Chen
  • Haobo Wang

Image super-resolution (ISR) is a classic and challenging problem in computer vision because of complex and unknown degradation patterns in the data collection process. Leveraging powerful generative priors, diffusion-based methods have recently established new state-of-the-art ISR performance, but their characteristics in the frequency domain are still underexplored. In this paper, we innovatively investigate their frequency-domain behaviors from a sampling timestep perspective. Experimentally, we find that current diffusion-based ISR algorithms exhibit insufficiency in different frequency components in distinct groups of timesteps during the sampling. To address this, we first propose a Timestep Division Controller that is able to adaptively divide the timesteps into groups based on the performance gradient across different components. Next, we design two dedicated modules --- the Amplitude and Phase Enhancement Module (APEM) and the High- and Low-Frequency Enhancement Module (HLEM), to regulate the information flow of distinct frequency-domain features. By adaptively enhancing specific frequency components at different stages of the sampling process, the two modules effectively compensate for the insufficient frequency-domain perception of diffusion-based ISR models. Extensive experiments on three benchmark datasets verify the superior ISR performance of our method, e. g. , achieving an average 5. 40% improvement on CLIP-IQA compared to the best diffusion-based ISR baseline.

AAAI Conference 2025 Conference Paper

ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks

  • Feiyang Wang
  • Xingquan Zuo
  • Hai Huang
  • Gang Chen

Many machine learning models are susceptible to adversarial attacks, with decision-based black-box attacks representing the most critical threat in real-world applications. These attacks are extremely stealthy, generating adversarial examples using hard labels obtained from the target machine learning model. This is typically realized by optimizing perturbation directions, guided by decision boundaries identified through query-intensive exact search, significantly limiting the attack success rate. This paper introduces a novel approach using the Approximation Decision Boundary (ADB) to efficiently and accurately compare perturbation directions without precisely determining decision boundaries. The effectiveness of our ADB approach (ADBA) hinges on promptly identifying suitable ADB, ensuring reliable differentiation of all perturbation directions. For this purpose, we analyze the probability distribution of decision boundaries, confirming that using the distribution's median value as ADB can effectively distinguish different perturbation directions, giving rise to the development of the ADBA-md algorithm. ADBA-md only requires four queries on average to differentiate any pair of perturbation directions, which is highly query-efficient. Extensive experiments on six well-known image classifiers clearly demonstrate the superiority of ADBA and ADBA-md over multiple state-of-the-art black-box attacks.

IJCAI Conference 2025 Conference Paper

Advancing Community Detection with Graph Convolutional Neural Networks: Bridging Topological and Attributive Cohesion

  • Anjali de Silva
  • Gang Chen
  • Hui Ma
  • Seyed Mohammad Nekooei
  • Xingquan Zuo

Community detection, a vital technology for real-world applications, uncovers cohesive node groups (communities) by leveraging both topological and attribute similarities in social networks. However, existing Graph Convolutional Networks (GCNs) trained to maximize modularity often converge to suboptimal solutions. Additionally, directly using human-labeled communities for training can undermine topological cohesiveness by grouping disconnected nodes based solely on node attributes. We address these issues by proposing a novel Topological and Attributive Similarity-based Community detection (TAS-Com) method. TAS-Com introduces a novel loss function that exploits the highly effective and scalable Leiden algorithm to detect community structures with global optimal modularity. Leiden is further utilized to refine human-labeled communities to ensure connectivity within each community, enabling TAS-Com to detect community structures with desirable trade-offs between modularity and compliance with human labels. Experimental results on multiple benchmark networks confirm that TAS-Com can significantly outperform several state-of-the-art algorithms.

IROS Conference 2025 Conference Paper

Bridging Text and Vision: A Multi-View Text-Vision Registration Approach for Cross-Modal Place Recognition

  • Tianyi Shang
  • Zhenyu Li
  • Pengjie Xu
  • Jinwei Qiao
  • Gang Chen
  • Zihan Ruan
  • Weijun Hu

Mobile robots necessitate advanced natural language understanding capabilities to accurately identify locations and perform tasks such as package delivery. However, traditional visual place recognition (VPR) methods rely solely on single-view visual information and cannot interpret human language descriptions. To overcome this challenge, we bridge text and vision by proposing a multiview (360° views of the surroundings) text-vision registration approach called Text4VPR for place recognition task, which is the first method that exclusively utilizes textual descriptions to match a database of images. Text4VPR employs the frozen T5 language model to extract global textual embeddings. Additionally, it utilizes the Sinkhorn algorithm with temperature coefficient to assign local tokens to their respective clusters, thereby aggregating visual descriptors from images. During the training stage, Text4VPR emphasizes the alignment between individual text-image pairs for precise textual description. In the inference stage, Text4VPR uses the Cascaded Cross-Attention Cosine Alignment (CCCA) to address the internal mismatch between text and image groups. Subsequently, Text4VPR performs precisely place match based on the descriptions of text-image groups. On Street360Loc, the first text to image VPR dataset we created, Text4VPR builds a robust baseline, achieving a leading top-1 accuracy of 56% and a leading top-10 accuracy of 91% within a 5-meter radius on the test set, which indicates that localization from textual descriptions to images is not only feasible but also holds significant potential for further advancement, as shown in Figure 1. Our code is available at https://github.com/nuozimiaowu/Text4VPR.

AAAI Conference 2025 Conference Paper

CogSQL: A Cognitive Framework for Enhancing Large Language Models in Text-to-SQL Translation

  • Hongwei Yuan
  • Xiu Tang
  • Ke Chen
  • Lidan Shou
  • Gang Chen
  • Huan Li

Large language models (LLMs) have significantly advanced the performance of various natural language processing tasks, including text-to-SQL. Current LLM-based text-to-SQL schemes mainly focus on improving the understanding of natural language questions (NLQs) or refining the quality of generated SQLs. While these strategies are effective, they often address specific, nuanced aspects. In contrast, humans approach text-to-SQL with a holistic view, applying transitional logical reasoning across multiple steps to arrive at the final answer. We believe LLMs can leverage human cognitive processes to achieve greater accuracy in text-to-SQL. In this paper, we present COGSQL, a framework featuring a suite of tailored models and strategies aimed at replicating human cognitive processes for enhanced LLM-based text-to-SQL. COGSQL consists of three key modules: (1) SQL preparation: we employ a coarse-to-fine schema linking and syntax keyword prediction, akin to how human recall and align key concepts for better understanding. (2) SQL generation: we introduce a concept-enhanced chain-of-thought prompting, enhancing NLQ interpretation and SQL composition of LLMs, similar to humans drafting SQL query. (3) SQL correction: we develop NLQ consistency and result consistency techniques to correct various errors, mirroring how humans evaluate and refine reasoning. We conduct extensive experiments using diverse benchmarks and LLMs. The results and analysis verify the effectiveness and generalizability of COGSQL.

AAAI Conference 2025 Conference Paper

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

  • Yuchen Liu
  • Chen Chen
  • Lingjuan Lyu
  • Yaochu Jin
  • Gang Chen

Federated Learning (FL) is notorious for its vulnerability to Byzantine attacks. Most current Byzantine defenses share a common inductive bias: among all the gradients, the densely distributed ones are more likely to be honest. However, such a bias is a poison to Byzantine robustness due to a newly discovered phenomenon in this paper -- gradient skew. We discover that a group of densely distributed honest gradients skew away from the optimal gradient (the average of honest gradients) due to heterogeneous data. This gradient skew phenomenon allows Byzantine gradients to hide within the densely distributed skewed gradients. As a result, Byzantine defenses are confused into believing that Byzantine gradients are honest. Motivated by this observation, we propose a novel skew-aware attack called STRIKE: first, we search for the skewed gradients; then, we construct Byzantine gradients within the skewed gradients. Experiments on three benchmark datasets validate the effectiveness of our attack.

IJCAI Conference 2025 Conference Paper

GATES: Cost-aware Dynamic Workflow Scheduling via Graph Attention Networks and Evolution Strategy

  • Ya Shen
  • Gang Chen
  • Hui Ma
  • Mengjie Zhang

Cost-aware Dynamic Workflow Scheduling (CADWS) is a key challenge in cloud computing, focusing on devising an effective scheduling policy to efficiently schedule dynamically arriving workflow tasks, represented as Directed Acyclic Graphs (DAG), to suitable virtual machines (VMs). Deep reinforcement learning (DRL) has been widely employed for automated scheduling policy design. However, the performance of DRL is heavily influenced by the design of the problem-tailored policy network and is highly sensitive to hyperparameters and the design of reward feedback. Considering the above-mentioned issues, this study proposes a novel DRL method combining Graph Attention Networks-based policy network and Evolution Strategy, referred to as GATES. The contributions of GATES are summarized as follows: (1) GATES can capture the impact of current task scheduling on subsequent tasks by learning the topological relationships between tasks in a DAG. (2) GATES can assess the importance of each VM to the ready task, enabling it to adapt to dynamically changing VM resources. (3) Utilizing Evolution Strategy's robustness, exploratory nature, and tolerance for delayed rewards, GATES achieves stable policy learning in CADWS. Extensive experimental results demonstrate the superiority of the proposed GATES in CADWS, outperforming several state-of-the-art algorithms. The source code is available at: https: //github. com/YaShen998/GATES.

ICLR Conference 2025 Conference Paper

Graph Assisted Offline-Online Deep Reinforcement Learning for Dynamic Workflow Scheduling

  • Yifan Yang
  • Gang Chen
  • Hui Ma
  • Cong Zhang
  • Zhiguang Cao
  • Mengjie Zhang

Dynamic workflow scheduling (DWS) in cloud computing presents substantial challenges due to heterogeneous machine configurations, unpredictable workflow arrivals/patterns, and constantly evolving environments. However, existing research often assumes homogeneous setups and static conditions, limiting flexibility and adaptability in real-world scenarios. In this paper, we propose a novel *Graph assisted Offline-Online Deep Reinforcement Learning* (GOODRL) approach to building an effective and efficient scheduling agent for DWS. Our approach features three key innovations: (1) a *task-specific* graph representation and a *Graph Attention Actor Network* that enable the agent to dynamically assign focused tasks to heterogeneous machines while explicitly considering the future impact of each machine on these tasks; (2) a *system-oriented* graph representation and a *Graph Attention Critic Network* that facilitate efficient processing of new information and understanding its impact on the current state, crucial for managing unpredictable workflow arrivals/patterns in real-time; and (3) an *offline-online* method that utilizes imitation learning for effective offline training and applies gradient control and decoupled high-frequency critic training techniques during online learning to sustain the agent’s robust performance in rapidly changing environments. Experimental results demonstrate that GOODRL significantly outperforms several state-of-the-art algorithms, achieving substantially lower mean flowtime and high adaptability in various online and offline scenarios.

NeurIPS Conference 2025 Conference Paper

IMPACT: Irregular Multi-Patch Adversarial Composition Based on Two‑Phase Optimization

  • Zenghui Yang
  • Xingquan Zuo
  • Hai Huang
  • Gang Chen
  • Xinchao Zhao
  • Tianle Zhang

Deep neural networks have become foundational in various applications but remain vulnerable to adversarial patch attacks. Crafting effective adversarial patches is inherently challenging due to the combinatorial complexity involved in jointly optimizing critical factors such as patch shape, location, number, and content. Existing approaches often simplify this optimization by addressing each factor independently, which limits their effectiveness. To tackle this significant challenge, we introduce a novel and flexible adversarial attack framework termed IMPACT (Irregular Multi-Patch Adversarial Composition based on Two-phase optimization). IMPACT uniquely enables comprehensive optimization of all essential patch factors using gradient-free methods. Specifically, we propose a novel dimensionality reduction encoding scheme that substantially lowers computational complexity while preserving expressive power. Leveraging this encoding, we further develop a two-phase optimization framework: phase 1 employs differential evolution for joint optimization of patch mask and content, while phase 2 refines patch content using an evolutionary strategy for enhanced precision. Additionally, we introduce a new aggregation algorithm explicitly designed to produce contiguous, irregular patches by merging localized regions, ensuring physical applicability. Extensive experiments demonstrate that our method significantly outperforms several state-of-the-art approaches, highlighting the critical benefit of jointly optimizing all patch factors in adversarial patch attacks.

AAAI Conference 2025 Conference Paper

M2Flow: A Motion Information Fusion Framework for Enhanced Unsupervised Optical Flow Estimation in Autonomous Driving

  • Xunpei Sun
  • Gang Chen
  • Zuoxun Hou

Estimating optical flow in occluded regions is a crucial challenge in unsupervised settings. In this work, we introduce M2Flow, a novel framework for unsupervised optical flow estimation that integrates motion information from multiple frames to address occlusions. By modeling inter-frame motion information and employing Motion Information Propagation (MIP) module, M2Flow effectively propagates and integrates motion information across frames, while concurrently estimating bidirectional optical flows for multiple frames. In addition, to handle occlusions across multiple frames, we provide two augmentation modules specifically designed for our multi-frame model to further refine optical flow. The experiments on KITTI and Sintel datasets demonstrate that M2Flow outperforms other state-of-the-art unsupervised approaches, especially in solving occlusions.

NeurIPS Conference 2025 Conference Paper

Not All Data are Good Labels: On the Self-supervised Labeling for Time Series Forecasting

  • Yuxuan Yang
  • Dalin Zhang
  • Yuxuan Liang
  • Hua Lu
  • Gang Chen
  • Huan Li

Time Series Forecasting (TSF) is a crucial task in various domains, yet existing TSF models rely heavily on high-quality data and insufficiently exploit all available data. This paper explores a novel self-supervised approach to re-label time series datasets by inherently constructing candidate datasets. During the optimization of a simple reconstruction network, intermediates are used as pseudo labels in a self-supervised paradigm, improving generalization for any predictor. We introduce the Self-Correction with Adaptive Mask (SCAM), which discards overfitted components and selectively replaces them with pseudo labels generated from reconstructions. Additionally, we incorporate Spectral Norm Regularization (SNR) to further suppress overfitting from a loss landscape perspective. Our experiments on eleven real-world datasets demonstrate that SCAM consistently improves the performance of various backbone models. This work offers a new perspective on constructing datasets and enhancing the generalization of TSF models through self-supervised learning. The code is available at https: //github. com/SuDIS-ZJU/SCAM.

TCS Journal 2025 Journal Article

On the Weisfeiler algorithm of depth-1 stabilization

  • Gang Chen
  • Qing Ren
  • Ilia Ponomarenko

An origin of the multidimensional Weisfeiler-Leman algorithm goes back to a refinement procedure of deep stabilization, introduced by B. Weisfeiler in a paper included in the collective monograph “On construction and identification of graphs” (1976). This procedure is recursive and the recursion starts from an algorithm of depth-1 stabilization, which has never been discussed in the literature. A goal of the present paper is to show that a simplified algorithm of the depth-1 stabilization has the same power as the 3-dimensional Weisfeiler-Leman algorithm. It is proved that the class of coherent configurations obtained as the output of this simplified algorithm coincides with the class introduced earlier by the third author. As an application we also prove that if there exist at least two nonisomorphic projective planes of order q, then the Weisfeiler-Leman dimension of the incidence graph of any projective plane of order q is at least 4.

IJCAI Conference 2025 Conference Paper

POLO: An LLM-Powered Project-Level Code Performance Optimization Framework

  • Jiameng Bai
  • Ruoyi Xu
  • Sai Wu
  • Dingyu Yang
  • Junbo Zhao
  • Gang Chen

Program performance optimization is essential for achieving high execution efficiency, yet it remains a challenging task that requires expertise in both software and hardware. Large Language Models (LLMs), trained on high-quality code from platforms like GitHub and other open-source sources, have shown promise in generating optimized code for simple snippets. However, current LLM-based solutions often fall short when tackling project-level programs due to the complexity of call graphs and the intricate interactions among functions. In this paper, we emulate the process a human expert might follow when optimizing project-level programs and introduce a three-phase framework POLO (PrOject-Level Optimizer) to address this limitation. First, we profile the program to identify performance bottlenecks using an iterative weighting algorithm. Next, we conduct structural analysis by scanning the project and generating a graph that represents the program's structure. Finally, two LLM agents collaborate in iterative cycles to rewrite and optimize the code at these hotspots, gradually improving performance. We conduct experiments on open-source and proprietary projects. The results demonstrate that POLO accurately identifies performance bottlenecks and successfully applies optimizations. Under the O3 compilation flag, the optimized programs achieved speedups ranging from 1. 34x to 21. 5x.

NeurIPS Conference 2025 Conference Paper

Table as a Modality for Large Language Models

  • Liyao Li
  • Chao Ye
  • Wentao Ye
  • Yifei Sun
  • Zhe Jiang
  • Haobo Wang
  • Jiaming Tian
  • Yiming Zhang

To migrate the remarkable successes of Large Language Models (LLMs), the community has made numerous efforts to generalize them to the table reasoning tasks for the widely deployed tabular data. Despite that, in this work, by showing a probing experiment on our proposed StructQA benchmark, we postulate that even the most advanced LLMs (such as GPTs) may still fall short of coping with tabular data. More specifically, the current scheme often simply relies on serializing the tabular data, together with the meta information, then inputting them through the LLMs. We argue that the loss of structural information is the root of this shortcoming. In this work, we further propose TAMO, which bears an ideology to treat the tables as an independent modality integrated with the text tokens. The resulting model in TAMO is a multimodal framework consisting of a hypergraph neural network as the global table encoder seamlessly integrated with the mainstream LLM. Empirical results on various benchmarking datasets, including HiTab, WikiTQ, WikiSQL, FeTaQA, and StructQA, have demonstrated significant improvements on generalization with an average relative gain of 42. 65%.

EAAI Journal 2025 Journal Article

The lead-bismuth eutectic corrosion rate prediction and composition optimization of ferritic/martensitic steels by physics-guided neural network

  • Shaowu Feng
  • Xingyue Sun
  • Gang Chen
  • Xu Chen

In developing safer lead-bismuth cooled nuclear power plants, accurately predicting corrosion rates of ferritic/martensitic (F/M) steel, used as structural materials, is crucial. Incorporated with prior physics knowledge, a physics-guided neural network (PGNN) model was developed for this purpose. By integrating physical knowledge about oxide layer development over time, the proposed PGNN model outperforms conventional models like Artificial Neural Network, Random Forest, eXtreme Gradient Boosting, and Support Vector Machine. The PGNN model demonstrated excellent physical consistency in terms of input feature interpretability and output result trends. Furthermore, with the particle swarm optimization (PSO) algorithm, an application on F/M steel composition design was conducted to enhance the lead-bismuth corrosion resistance with the proposed model. The short-term static lead-bismuth corrosion tests were conducted to assess the anti-corrosion performance of the optimized F/M steel, which showed consistency with the PGNN model's predictions.

IJCAI Conference 2025 Conference Paper

Towards Robust Incremental Learning Under Ambiguous Supervision

  • Rui Wang
  • Mingxuan Xia
  • Haobo Wang
  • Lei Feng
  • Junbo Zhao
  • Gang Chen
  • Chang Yao

Traditional Incremental Learning (IL) targets to handle sequential fully-supervised learning problems where novel classes emerge from time to time. However, due to inherent annotation uncertainty and ambiguity, collecting high-quality annotated data in a dynamic learning system can be extremely expensive. To mitigate this problem, we propose a novel weakly-supervised learning paradigm called Incremental Partial Label Learning (IPLL), where the sequentially arrived data relate to a set of candidate labels rather than the ground truth. Technically, we develop the Prototype-Guided Disambiguation and Replay Algorithm (PGDR) which leverages the class prototypes as a proxy to mitigate two intertwined challenges in IPLL, i. e. , label ambiguity and catastrophic forgetting. To handle the former, PGDR encapsulates a momentum-based pseudo-labeling algorithm along with prototype-guided initialization, resulting in a balanced perception of classes. To alleviate forgetting, we develop a memory replay technique that collects well-disambiguated samples while maintaining representativeness and diversity. By jointly distilling knowledge from curated memory data, our framework exhibits a great disambiguation ability for samples of new tasks and achieves less forgetting of knowledge. Extensive experiments demonstrate that PGDR achieves superior performance over the baselines in the IPLL task.

AAAI Conference 2024 Conference Paper

A Separation and Alignment Framework for Black-Box Domain Adaptation

  • Mingxuan Xia
  • Junbo Zhao
  • Gengyu Lyu
  • Zenan Huang
  • Tianlei Hu
  • Gang Chen
  • Haobo Wang

Black-box domain adaptation (BDA) targets to learn a classifier on an unsupervised target domain while assuming only access to black-box predictors trained from unseen source data. Although a few BDA approaches have demonstrated promise by manipulating the transferred labels, they largely overlook the rich underlying structure in the target domain. To address this problem, we introduce a novel separation and alignment framework for BDA. Firstly, we locate those well-adapted samples via loss ranking and a flexible confidence-thresholding procedure. Then, we introduce a novel graph contrastive learning objective that aligns under-adapted samples to their local neighbors and well-adapted samples. Lastly, the adaptation is finally achieved by a nearest-centroid-augmented objective that exploits the clustering effect in the feature space. Extensive experiments demonstrate that our proposed method outperforms best baselines on benchmark datasets, e.g. improving the averaged per-class accuracy by 4.1% on the VisDA dataset. The source code is available at: https://github.com/MingxuanXia/SEAL.

AAAI Conference 2024 Conference Paper

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

  • Cheng Peng
  • Ke Chen
  • Lidan Shou
  • Gang Chen

Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities. The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data. Recent studies are mainly devoted to exploring various fusion strategies to integrate multi-modal information into a unified representation for all labels. However, such a learning scheme not only overlooks the specificity of each modality but also fails to capture individual discriminative features for different labels. Moreover, dependencies of labels and modalities cannot be effectively modeled. To address these issues, this paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task. Specifically, we devise a reconstruction-based fusion mechanism to better model fine-grained modality-to-label dependencies by contrastively learning modal-separated and label-specific features. To further exploit the modality complementarity, we introduce a shuffle-based aggregation strategy to enrich co-occurrence collaboration among labels. Experiments on two benchmark datasets CMU-MOSEI and M3ED demonstrate the effectiveness of CARAT over state-of-the-art methods. Code is available at https://github.com/chengzju/CARAT.

NeurIPS Conference 2024 Conference Paper

Enhancing LLM Reasoning via Vision-Augmented Prompting

  • Ziyang Xiao
  • Dongxiang Zhang
  • Xiongwei Han
  • Xiaojin Fu
  • Yin Yu
  • Tao Zhong
  • Sai Wu
  • Yuan Wang

Verbal and visual-spatial information processing are two critical subsystems that activate different brain regions and often collaborate together for cognitive reasoning. Despite the rapid advancement of LLM-based reasoning, the mainstream frameworks, such as Chain-of-Thought (CoT) and its variants, primarily focus on the verbal dimension, resulting in limitations in tackling reasoning problems with visual and spatial clues. To bridge the gap, we propose a novel dual-modality reasoning framework called Vision-Augmented Prompting (VAP). Upon receiving a textual problem description, VAP automatically synthesizes an image from the visual and spatial clues by utilizing external drawing tools. Subsequently, VAP formulates a chain of thought in both modalities and iteratively refines the synthesized image. Finally, a conclusive reasoning scheme based on self-alignment is proposed for final result generation. Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validated the superiority of VAP over existing LLMs-based reasoning frameworks.

NeurIPS Conference 2024 Conference Paper

Locating What You Need: Towards Adapting Diffusion Models to OOD Concepts In-the-Wild

  • Jianan Yang
  • Chenchao Gao
  • Zhiqing Xiao
  • Junbo Zhao
  • Sai Wu
  • Gang Chen
  • Haobo Wang

The recent large-scale text-to-image generative models have attained unprecedented performance, while people established adaptor modules like LoRA and DreamBooth to extend this performance to even more unseen concept tokens. However, we empirically find that this workflow often fails to accurately depict the out-of-distribution concepts. This failure is highly related to the low quality of training data. To resolve this, we present a framework called Controllable Adaptor Towards Out-of-Distribution Concepts (CATOD). Our framework follows the active learning paradigm which includes high-quality data accumulation and adaptor training, enabling a finer-grained enhancement of generative results. The aesthetics score and concept-matching score are two major factors that impact the quality of synthetic results. One key component of CATOD is the weighted scoring system that automatically balances between these two scores and we also offer comprehensive theoretical analysis for this point. Then, it determines how to select data and schedule the adaptor training based on this scoring system. The extensive results show that CATOD significantly outperforms the prior approaches with an 11. 10 boost on the CLIP score and a 33. 08% decrease on the CMMD metric.

NeurIPS Conference 2024 Conference Paper

Real-time Stereo-based 3D Object Detection for Streaming Perception

  • Changcai Li
  • Zonghua Gu
  • Gang Chen
  • Libo Huang
  • Wei Zhang
  • Huihui Zhou

The ability to promptly respond to environmental changes is crucial for the perception system of autonomous driving. Recently, a new task called streaming perception was proposed. It jointly evaluate the latency and accuracy into a single metric for video online perception. In this work, we introduce StreamDSGN, the first real-time stereo-based 3D object detection framework designed for streaming perception. StreamDSGN is an end-to-end framework that directly predicts the 3D properties of objects in the next moment by leveraging historical information, thereby alleviating the accuracy degradation of streaming perception. Further, StreamDSGN applies three strategies to enhance the perception accuracy: (1) A feature-flow-based fusion method, which generates a pseudo-next feature at the current moment to address the misalignment issue between feature and ground truth. (2) An extra regression loss for explicit supervision of object motion consistency in consecutive frames. (3) A large kernel backbone with a large receptive field for effectively capturing long-range spatial contextual features caused by changes in object positions. Experiments on the KITTI Tracking dataset show that, compared with the strong baseline, StreamDSGN significantly improves the streaming average precision by up to 4. 33%. Our code is available at https: //github. com/weiyangdaren/streamDSGN-pytorch.

AAAI Conference 2024 Conference Paper

Sampling-Resilient Multi-Object Tracking

  • Zepeng Li
  • Dongxiang Zhang
  • Sai Wu
  • Mingli Song
  • Gang Chen

Multi-Object Tracking (MOT) is a cornerstone operator for video surveillance applications. To enable real-time processing of large-scale live video streams, we study an interesting scenario called down-sampled MOT, which performs object tracking only on a small subset of video frames. The problem is challenging for state-of-the-art MOT methods, which exhibit significant performance degradation under high frame reduction ratios. In this paper, we devise a sampling-resilient tracker with a novel sparse-observation Kalman filter (SOKF). It integrates an LSTM network to capture non-linear and dynamic motion patterns caused by sparse observations. Since the LSTM-based state transition is not compatible with the original noise estimation mechanism, we propose new estimation strategies based on Bayesian neural networks and derive the optimal Kalman gain for SOKF. To associate the detected bounding boxes robustly, we also propose a comprehensive similarity metric that systematically integrates multiple spatial matching signals. Experiments on three benchmark datasets show that our proposed tracker achieves the best trade-off between efficiency and accuracy. With the same tracking accuracy, we reduce the total processing time of ByteTrack by 2× in MOT17 and 3× in DanceTrack.

AAAI Conference 2024 Conference Paper

Uncovering and Mitigating the Hidden Chasm: A Study on the Text-Text Domain Gap in Euphemism Identification

  • Yuxue Hu
  • Junsong Li
  • Mingmin Wu
  • Zhongqiang Huang
  • Gang Chen
  • Ying Sha

Euphemisms are commonly used on social media and darknet marketplaces to evade platform regulations by masking their true meanings with innocent ones. For instance, “weed” is used instead of “marijuana” for illicit transactions. Thus, euphemism identification, i.e., mapping a given euphemism (“weed”) to its specific target word (“marijuana”), is essential for improving content moderation and combating underground markets. Existing methods employ self-supervised schemes to automatically construct labeled training datasets for euphemism identification. However, they overlook the text-text domain gap caused by the discrepancy between the constructed training data and the test data, leading to performance deterioration. In this paper, we present the text-text domain gap and explain how it forms in terms of the data distribution and the cone effect. Moreover, to bridge this gap, we introduce a feature alignment network (FA-Net), which can both align the in-domain and cross-domain features, thus mitigating the domain gap from training data to test data and improving the performance of the base models for euphemism identification. We apply this FA-Net to the base models, obtaining markedly better results, and creating a state-of-the-art model which beats the large language models.

AAAI Conference 2024 Conference Paper

Variational Hybrid-Attention Framework for Multi-Label Few-Shot Aspect Category Detection

  • Cheng Peng
  • Ke Chen
  • Lidan Shou
  • Gang Chen

Multi-label few-shot aspect category detection (FS-ACD) is a challenging sentiment analysis task, which aims to learn a multi-label learning paradigm with limited training data. The difficulty of this task is how to use limited data to generalize effective discriminative representations for different categories. Nowadays, all advanced FS-ACD works utilize the prototypical network to learn label prototypes to represent different aspects. However, such point-based estimation methods are inherently noise-susceptible and bias-vulnerable. To this end, this paper proposes a novel Variational Hybrid-Attention Framework (VHAF) for the FS-ACD task. Specifically, to alleviate the data noise, we adopt a hybrid-attention mechanism to generate more discriminative aspect-specific embeddings. Then, based on these embeddings, we introduce the variational distribution inference to obtain the aspect-specific distribution as a more robust aspect representation, which can eliminate the scarce data bias for better inference. Moreover, we further leverage an adaptive threshold estimation to help VHAF better identify multiple relevant aspects. Extensive experiments on three datasets demonstrate the effectiveness of our VHAF over other state-of-the-art methods. Code is available at https://github.com/chengzju/VHAF.

JBHI Journal 2024 Journal Article

Weakly Supervised Classification for Nasopharyngeal Carcinoma With Transformer in Whole Slide Images

  • Ziwei Hu
  • Jianchao Wang
  • Qinquan Gao
  • Zhida Wu
  • Hanchuan Xu
  • Zhechen Guo
  • Jiawei Quan
  • Lihua Zhong

Pathological examination of nasopharyngeal carcinoma (NPC) is an indispensable factor for diagnosis, guiding clinical treatment and judging prognosis. Traditional and fully supervised NPC diagnosis algorithms require manual delineation of regions of interest on the gigapixel of whole slide images (WSIs), which however is laborious and often biased. In this paper, we propose a weakly supervised framework based on Tokens-to-Token Vision Transformer (WS-T2T-ViT) for accurate NPC classification with only a slide-level label. The label of tile images is inherited from their slide-level label. Specifically, WS-T2T-ViT is composed of the multi-resolution pyramid, T2T-ViT and multi-scale attention module. The multi-resolution pyramid is designed for imitating the coarse-to-fine process of manual pathological analysis to learn features from different magnification levels. The T2T module captures the local and global features to overcome the lack of global information. The multi-scale attention module improves classification performance by weighting the contributions of different granularity levels. Extensive experiments are performed on the 802-patient NPC and CAMELYON16 dataset. WS-T2T-ViT achieves an area under the receiver operating characteristic curve (AUC) of 0. 989 for NPC classification on the NPC dataset. The experiment results of CAMELYON16 dataset demonstrate the robustness and generalizability of WS-T2T-ViT in WSI-level classification.

YNIMG Journal 2023 Journal Article

BOLD Response is more than just magnitude: Improving detection sensitivity through capturing hemodynamic profiles

  • Gang Chen
  • Paul A. Taylor
  • Richard C. Reynolds
  • Ellen Leibenluft
  • Daniel S. Pine
  • Melissa A. Brotman
  • David Pagliaccio
  • Simone P. Haller

Typical fMRI analyses often assume a canonical hemodynamic response function (HRF) that primarily focuses on the peak height of the overshoot, neglecting other morphological aspects. Consequently, reported analyses often reduce the overall response curve to a single scalar value. In this study, we take a data-driven approach to HRF estimation at the whole-brain voxel level, without assuming a response profile at the individual level. We then employ a roughness penalty at the population level to estimate the response curve, aiming to enhance predictive accuracy, inferential efficiency, and cross-study reproducibility. By examining a fast event-related FMRI dataset, we demonstrate the shortcomings and information loss associated with adopting the canonical approach. Furthermore, we address the following key questions: 1) To what extent does the HRF shape vary across different regions, conditions, and participant groups? 2) Does the data-driven approach improve detection sensitivity compared to the canonical approach? 3) Can analyzing the HRF shape help validate the presence of an effect in conjunction with statistical evidence? 4) Does analyzing the HRF shape offer evidence for whole-brain response during a simple task?

JBHI Journal 2023 Journal Article

CUSS-Net: A Cascaded Unsupervised-Based Strategy and Supervised Network for Biomedical Image Diagnosis and Segmentation

  • Xiaogen Zhou
  • Zhiqiang Li
  • Yuyang Xue
  • Shun Chen
  • Meijuan Zheng
  • Cong Chen
  • Yue Yu
  • Xingqing Nie

Biomedical image segmentation and classification are critical components in a computer-aided diagnosis system. However, various deep convolutional neural networks are trained by a single task, ignoring the potential contribution of mutually performing multiple tasks. In this paper, we propose a cascaded unsupervised-based strategy to boost the supervised CNN framework for automated white blood cell (WBC) and skin lesion segmentation and classification, called CUSS-Net. Our proposed CUSS-Net consists of an unsupervised-based strategy (US) module, an enhanced segmentation network named E-SegNet, and a mask-guided classification network called MG-ClsNet. On the one hand, the proposed US module produces coarse masks that provide a prior localization map for the proposed E-SegNet to enhance it in locating and segmenting a target object accurately. On the other hand, the enhanced coarse masks predicted by the proposed E-SegNet are then fed into the proposed MG-ClsNet for accurate classification. Moreover, a novel cascaded dense inception module is presented to capture more high-level information. Meanwhile, we adopt a hybrid loss by combining a dice loss and a cross-entropy loss to alleviate the imbalance training problem. We evaluate our proposed CUSS-Net on three public medical image datasets. Experiments show that our proposed CUSS-Net outperforms representative state-of-the-art approaches.

NeurIPS Conference 2023 Conference Paper

Debiased and Denoised Entity Recognition from Distant Supervision

  • Haobo Wang
  • Yiwen Dong
  • Ruixuan Xiao
  • Fei Huang
  • Gang Chen
  • Junbo Zhao

While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction. To cope with these problems, we propose a novel self-training framework, dubbed DesERT. This framework augments the conventional NER predicative pathway to a dual form that effectively adapts the sample-selection process to conform to its innate distributional-bias structure. The other crucial component of DesERT composes a debiased module aiming to enhance the token representations, hence the quality of the pseudo-labels. Extensive experiments are conducted to validate the DesERT. The results show that our framework establishes a new state-of-art performance, it achieves a +2. 22% average F1 score improvement on five standardized benchmarking datasets. Lastly, DesERT demonstrates its effectiveness under a new DSNER benchmark where additional distant supervision comes from the ChatGPT model.

IJCAI Conference 2023 Conference Paper

Deep Partial Multi-Label Learning with Graph Disambiguation

  • Haobo Wang
  • Shisong Yang
  • Gengyu Lyu
  • Weiwei Liu
  • Tianlei Hu
  • Ke Chen
  • Songhe Feng
  • Gang Chen

In partial multi-label learning (PML), each data example is equipped with a candidate label set, which consists of multiple ground-truth labels and other false-positive labels. Recently, graph-based methods, which demonstrate a good ability to estimate accurate confidence scores from candidate labels, have been prevalent to deal with PML problems. However, we observe that existing graph-based PML methods typically adopt linear multi-label classifiers and thus fail to achieve superior performance. In this work, we attempt to remove several obstacles for extending them to deep models and propose a novel deep Partial multi-Label model with grAph-disambIguatioN (PLAIN). Specifically, we introduce the instance-level and label-level similarities to recover label confidences as well as exploit label dependencies. At each training epoch, labels are propagated on the instance and label graphs to produce relatively accurate pseudo-labels; then, we train the deep model to fit the numerical labels. Moreover, we provide a careful analysis of the risk functions to guarantee the robustness of the proposed model. Extensive experiments on various synthetic datasets and three real-world PML datasets demonstrate that PLAIN achieves significantly superior results to state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Effective Continual Learning for Text Classification with Lightweight Snapshots

  • Jue Wang
  • Dajie Dong
  • Lidan Shou
  • Ke Chen
  • Gang Chen

Continual learning is known for suffering from catastrophic forgetting, a phenomenon where previously learned concepts are forgotten upon learning new tasks. A natural remedy is to use trained models for old tasks as ‘teachers’ to regularize the update of the current model to prevent such forgetting. However, this requires storing all past models, which is very space-consuming for large models, e.g. BERT, thus impractical in real-world applications. To tackle this issue, we propose to construct snapshots of seen tasks whose key knowledge is captured in lightweight adapters. During continual learning, we transfer knowledge from past snapshots to the current model through knowledge distillation, allowing the current model to review previously learned knowledge while learning new tasks. We also design representation recalibration to better handle the class-incremental setting. Experiments over various task sequences show that our approach effectively mitigates catastrophic forgetting and outperforms all baselines.

IJCAI Conference 2023 Conference Paper

Ensemble Reinforcement Learning in Continuous Spaces -- A Hierarchical Multi-Step Approach for Policy Training

  • Gang Chen
  • Victoria Huang

Actor-critic deep reinforcement learning (DRL) algorithms have recently achieved prominent success in tackling various challenging reinforcement learning (RL) problems, particularly complex control tasks with high-dimensional continuous state and action spaces. Nevertheless, existing research showed that actor-critic DRL algorithms often failed to explore their learning environments effectively, resulting in limited learning stability and performance. To address this limitation, several ensemble DRL algorithms have been proposed lately to boost exploration and stabilize the learning process. However, most of existing ensemble algorithms do not explicitly train all base learners towards jointly optimizing the performance of the ensemble. In this paper, we propose a new technique to train an ensemble of base learners based on an innovative multi-step integration method. This training technique enables us to develop a new hierarchical learning algorithm for ensemble DRL that effectively promotes inter-learner collaboration through stable inter-learner parameter sharing. The design of our new algorithm is verified theoretically. The algorithm is also shown empirically to outperform several state-of-the-art DRL algorithms on multiple benchmark RL problems.

YNIMG Journal 2023 Journal Article

Highlight results, don't hide them: Enhance interpretation, reduce biases and improve reproducibility

  • Paul A. Taylor
  • Richard C. Reynolds
  • Vince Calhoun
  • Javier Gonzalez-Castillo
  • Daniel A. Handwerker
  • Peter A. Bandettini
  • Amanda F. Mejia
  • Gang Chen

Most neuroimaging studies display results that represent only a tiny fraction of the collected data. While it is conventional to present "only the significant results" to the reader, here we suggest that this practice has several negative consequences for both reproducibility and understanding. This practice hides away most of the results of the dataset and leads to problems of selection bias and irreproducibility, both of which have been recognized as major issues in neuroimaging studies recently. Opaque, all-or-nothing thresholding, even if well-intentioned, places undue influence on arbitrary filter values, hinders clear communication of scientific results, wastes data, is antithetical to good scientific practice, and leads to conceptual inconsistencies. It is also inconsistent with the properties of the acquired data and the underlying biology being studied. Instead of presenting only a few statistically significant locations and hiding away the remaining results, studies should "highlight" the former while also showing as much as possible of the rest. This is distinct from but complementary to utilizing data sharing repositories: the initial presentation of results has an enormous impact on the interpretation of a study. We present practical examples and extensions of this approach for voxelwise, regionwise and cross-study analyses using publicly available data that was analyzed previously by 70 teams (NARPS; Botvinik-Nezer, et al., 2020), showing that it is possible to balance the goals of displaying a full set of results with providing the reader reasonably concise and "digestible" findings. In particular, the highlighting approach sheds useful light on the kind of variability present among the NARPS teams' results, which is primarily a varied strength of agreement rather than disagreement. Using a meta-analysis built on the informative "highlighting" approach shows this relative agreement, while one using the standard "hiding" approach does not. We describe how this simple but powerful change in practice-focusing on highlighting results, rather than hiding all but the strongest ones-can help address many large concerns within the field, or at least to provide more complete information about them. We include a list of practical suggestions for results reporting to improve reproducibility, cross-study comparisons and meta-analyses.

AAAI Conference 2023 Conference Paper

Human-in-the-Loop Vehicle ReID

  • Zepeng Li
  • Dongxiang Zhang
  • Yanyan Shen
  • Gang Chen

Vehicle ReID has been an active topic in computer vision, with a substantial number of deep neural models proposed as end-to-end solutions. In this paper, we solve the problem from a new perspective and present an interesting variant called human-in-the-loop vehicle ReID to leverage interactive (and possibly wrong) human feedback signal for performance enhancement. Such human-machine cooperation mode is orthogonal to existing ReID models. To avoid incremental training overhead, we propose an Interaction ReID Network (IRIN) that can directly accept the feedback signal as an input and adjust the embedding of query image in an online fashion. IRIN is offline trained by simulating the human interaction process, with multiple optimization strategies to fully exploit the feedback signal. Experimental results show that even by interacting with flawed feedback generated by non-experts, IRIN still outperforms state-of-the-art ReID models by a considerable margin. If the feedback contains no false positive, IRIN boosts the mAP in Veri776 from 81.6% to 95.2% with only 5 rounds of interaction per query image.

YNIMG Journal 2023 Journal Article

Inter-subject correlation during long narratives reveals widespread neural correlates of reading ability

  • David C. Jangraw
  • Emily S. Finn
  • Peter A. Bandettini
  • Nicole Landi
  • Haorui Sun
  • Fumiko Hoeft
  • Gang Chen
  • Kenneth R. Pugh

Recent work using fMRI inter-subject correlation analysis has provided new information about the brain's response to video and audio narratives, particularly in frontal regions not typically activated by single words. This approach is very well suited to the study of reading, where narrative is central to natural experience. But since past reading paradigms have primarily presented single words or phrases, the influence of narrative on semantic processing in the brain - and how that influence might change with reading ability - remains largely unexplored. In this study, we presented coherent stories to adolescents and young adults with a wide range of reading abilities. The stories were presented in alternating visual and auditory blocks. We used a dimensional inter-subject correlation analysis to identify regions in which better and worse readers had varying levels of consistency with other readers. This analysis identified a widespread set of brain regions in which activity timecourses were more similar among better readers than among worse readers. These differences were not detected with standard block activation analyses. Worse readers had higher correlation with better readers than with other worse readers, suggesting that the worse readers had "idiosyncratic" responses rather than using a single compensatory mechanism. Close inspection confirmed that these differences were not explained by differences in IQ or motion. These results suggest an expansion of the current view of where and how reading ability is reflected in the brain, and in doing so, they establish inter-subject correlation as a sensitive tool for future studies of reading disorders.

AAAI Conference 2023 Conference Paper

Neural TSP Solver with Progressive Distillation

  • Dongxiang Zhang
  • Ziyang Xiao
  • Yuan Wang
  • Mingli Song
  • Gang Chen

Travelling salesman problem (TSP) is NP-Hard with exponential search space. Recently, the adoption of encoder-decoder models as neural TSP solvers has emerged as an attractive topic because they can instantly obtain near-optimal results for small-scale instances. Nevertheless, their training efficiency and solution quality degrade dramatically when dealing with large-scale problems. To address the issue, we propose a novel progressive distillation framework, by adopting curriculum learning to train TSP samples in increasing order of their problem size and progressively distilling high-level knowledge from small models to large models via a distillation loss. In other words, the trained small models are used as the teacher network to guide action selection when training large models. To accelerate training speed, we also propose a Delaunary-graph based action mask and a new attention-based decoder to reduce decoding cost. Experimental results show that our approach establishes clear advantages over existing encoder-decoder models in terms of training effectiveness and solution quality. In addition, we validate its usefulness as an initial solution generator for the state-of-the-art TSP solvers, whose probability of obtaining the optimal solution can be further improved in such a hybrid manner.

IJCAI Conference 2023 Conference Paper

ProMix: Combating Label Noise via Maximizing Clean Sample Utility

  • Ruixuan Xiao
  • Yiwen Dong
  • Haobo Wang
  • Lei Feng
  • Runze Wu
  • Gang Chen
  • Junbo Zhao

Learning with Noisy Labels (LNL) has become an appealing topic, as imperfectly annotated data are relatively cheaper to obtain. Recent state-of-the-art approaches employ specific selection mechanisms to separate clean and noisy samples and then apply Semi-Supervised Learning (SSL) techniques for improved performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. To fulfill this, we propose a novel LNL framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high confidence selection technique that selects those examples with high confidence scores and matched predictions with given labels to dynamically expand a base clean sample set. To overcome the potential side effect of excessive clean set selection procedure, we further devise a novel SSL framework that is able to train balanced and unbiased classifiers on the separated clean and noisy samples. Extensive experiments demonstrate that ProMix significantly advances the current state-of-the-art results on multiple benchmarks with different types and levels of noise. It achieves an average improvement of 2. 48% on the CIFAR-N dataset.

NeurIPS Conference 2023 Conference Paper

SPA: A Graph Spectral Alignment Perspective for Domain Adaptation

  • Zhiqing Xiao
  • Haobo Wang
  • Ying Jin
  • Lei Feng
  • Gang Chen
  • Fei Huang
  • Junbo Zhao

Unsupervised domain adaptation (UDA) is a pivotal form in machine learning to extend the in-domain model to the distinctive target domains where the data distributions differ. Most prior works focus on capturing the inter-domain transferability but largely overlook rich intra-domain structures, which empirically results in even worse discriminability. In this work, we introduce a novel graph SPectral Alignment (SPA) framework to tackle the tradeoff. The core of our method is briefly condensed as follows: (i)-by casting the DA problem to graph primitives, SPA composes a coarse graph alignment mechanism with a novel spectral regularizer towards aligning the domain graphs in eigenspaces; (ii)-we further develop a fine-grained message propagation module --- upon a novel neighbor-aware self-training mechanism --- in order for enhanced discriminability in the target domain. On standardized benchmarks, the extensive experiments of SPA demonstrate that its performance has surpassed the existing cutting-edge DA methods. Coupled with dense model analysis, we conclude that our approach indeed possesses superior efficacy, robustness, discriminability, and transferability. Code and data are available at: https: //github. com/CrownX/SPA.

AAAI Conference 2023 Conference Paper

Unsupervised Hierarchical Domain Adaptation for Adverse Weather Optical Flow

  • Hanyu Zhou
  • Yi Chang
  • Gang Chen
  • Luxin Yan

Optical flow estimation has made great progress, but usually suffers from degradation under adverse weather. Although semi/full-supervised methods have made good attempts, the domain shift between the synthetic and real adverse weather images would deteriorate their performance. To alleviate this issue, our start point is to unsupervisedly transfer the knowledge from source clean domain to target degraded domain. Our key insight is that adverse weather does not change the intrinsic optical flow of the scene, but causes a significant difference for the warp error between clean and degraded images. In this work, we propose the first unsupervised framework for adverse weather optical flow via hierarchical motion-boundary adaptation. Specifically, we first employ image translation to construct the transformation relationship between clean and degraded domains. In motion adaptation, we utilize the flow consistency knowledge to align the cross-domain optical flows into a motion-invariance common space, where the optical flow from clean weather is used as the guidance-knowledge to obtain a preliminary optical flow for adverse weather. Furthermore, we leverage the warp error inconsistency which measures the motion misalignment of the boundary between the clean and degraded domains, and propose a joint intra- and inter-scene boundary contrastive adaptation to refine the motion boundary. The hierarchical motion and boundary adaptation jointly promotes optical flow in a unified framework. Extensive quantitative and qualitative experiments have been performed to verify the superiority of the proposed method.

YNIMG Journal 2022 Journal Article

Hyperbolic trade-off: The importance of balancing trial and subject sample sizes in neuroimaging

  • Gang Chen
  • Daniel S. Pine
  • Melissa A. Brotman
  • Ashley R. Smith
  • Robert W. Cox
  • Paul A. Taylor
  • Simone P. Haller

Here we investigate the crucial role of trials in task-based neuroimaging from the perspectives of statistical efficiency and condition-level generalizability. Big data initiatives have gained popularity for leveraging a large sample of subjects to study a wide range of effect magnitudes in the brain. On the other hand, most task-based FMRI designs feature a relatively small number of subjects, so that resulting parameter estimates may be associated with compromised precision. Nevertheless, little attention has been given to another important dimension of experimental design, which can equally boost a study's statistical efficiency: the trial sample size. The common practice of condition-level modeling implicitly assumes no cross-trial variability. Here, we systematically explore the different factors that impact effect uncertainty, drawing on evidence from hierarchical modeling, simulations and an FMRI dataset of 42 subjects who completed a large number of trials of cognitive control task. We find that, due to an approximately symmetric hyperbola-relationship between trial and subject sample sizes in the presence of relatively large cross-trial variability, 1) trial sample size has nearly the same impact as subject sample size on statistical efficiency; 2) increasing both the number of trials and subjects improves statistical efficiency more effectively than focusing on subjects alone; 3) trial sample size can be leveraged alongside subject sample size to improve the cost-effectiveness of an experimental design; 4) for small trial sample sizes, trial-level modeling, rather than condition-level modeling through summary statistics, may be necessary to accurately assess the standard error of an effect estimate. We close by making practical suggestions for improving experimental designs across neuroimaging and behavioral studies.

YNIMG Journal 2022 Journal Article

Layer-specific activation in human primary somatosensory cortex during tactile temporal prediction error processing

  • Yinghua Yu
  • Laurentius Huber
  • Jiajia Yang
  • Masaki Fukunaga
  • Yuhui Chai
  • David C. Jangraw
  • Gang Chen
  • Daniel A. Handwerker

The human brain continuously generates predictions of incoming sensory input and calculates corresponding prediction errors from the perceived inputs to update internal predictions. In human primary somatosensory cortex (area 3b), different cortical layers are involved in receiving the sensory input and generation of error signals. It remains unknown, however, how the layers in the human area 3b contribute to the temporal prediction error processing. To investigate prediction error representation in the area 3b across layers, we acquired layer-specific functional magnetic resonance imaging (fMRI) data at 7T from human area 3b during a task of index finger poking with no-delay, short-delay and long-delay touching sequences. We demonstrate that all three tasks increased activity in both superficial and deep layers of area 3b compared to the random sensory input. The fMRI signal was differentially modulated solely in the deep layers rather than the superficial layers of area 3b by the delay time. Compared with the no-delay stimuli, activity was greater in the deep layers of area 3b during the short-delay stimuli but lower during the long-delay stimuli. This difference activity features in the superficial and deep layers suggest distinct functional contributions of area 3b layers to tactile temporal prediction error processing. The functional segregation in area 3b across layers may reflect that the excitatory and inhibitory interplay in the sensory cortex contributions to flexible communication between cortical layers or between cortical areas.

YNICL Journal 2022 Journal Article

Neural correlates of working memory and compensation at different stages of cognitive impairment in Parkinson’s disease

  • Takaaki Hattori
  • Richard Reynolds
  • Edythe Wiggs
  • Silvina G. Horovitz
  • Codrin Lungu
  • Gang Chen
  • Eiji Yasuda
  • Mark Hallett

Working memory (WM) impairment is one of the most frequent cognitive deficits in Parkinson's disease (PD). However, it is not known how neural activity is altered and compensatory responses eventually fail during progression. We aimed to elucidate neural correlates of WM and compensatory mechanisms in PD. Eighteen cognitively normal PD patients (PD-CogNL), 16 with PD with mild cognitive impairment (PD-MCI), 11 with PD with dementia (PDD), and 17 healthy controls (HCs) were evaluated. Subjects performed an n-back task. Functional MRI data were analyzed by event-related analysis for correct responses. Brain activations were evaluated by comparing them to fixation cross or 0-back task, and correlated with n-back task performance. When compared to fixation cross, PD-CogNL patients had more activation in WM areas than HCs for both the 2- and 3-back tasks. PD-MCI and PDD patients had more activation in WM areas than HCs for the 0- and 1-back task. 2-back task performance was correlated with brain activations (vs. 0-back task) in the bilateral dorsolateral prefrontal cortex and frontal eye field (FEF) and left rostral prefrontal cortex, caudate nucleus, inferior/superior parietal lobule (IPL/SPL), and anterior insular cortex as well as anterior cingulate cortex. 3-back task performance was correlated with brain activations (vs. 0-back task) in the left FEF, right caudate nucleus, and bilateral IPL/SPL. Additional activations on top of the 0-back task, rather than fixation cross, are the neural correlates of WM. Our results suggest PD patients have two types of compensatory mechanisms: (1) Hyperactivation for different WM load tasks depending on their cognitive status. PD-CogNL have hyperactivation for moderate and heavy working memory load tasks while maintaining normal working memory performance. In contrast, PD-MCI and PDD have hyperactivation for control task and light working memory load task, leaving less neural resources to further activate for more demanding tasks and resulting in impaired working memory performance. (2) Bilateral recruitment of WM-related areas, in particular the DLPFC, FEF, IPL/SPL and caudate nucleus, to improve WM performance.

NeurIPS Conference 2022 Conference Paper

SoLar: Sinkhorn Label Refinery for Imbalanced Partial-Label Learning

  • Haobo Wang
  • Mingxuan Xia
  • Yixuan Li
  • Yuren Mao
  • Lei Feng
  • Gang Chen
  • Junbo Zhao

Partial-label learning (PLL) is a peculiar weakly-supervised learning task where the training samples are generally associated with a set of candidate labels instead of single ground truth. While a variety of label disambiguation methods have been proposed in this domain, they normally assume a class-balanced scenario that may not hold in many real-world applications. Empirically, we observe degenerated performance of the prior methods when facing the combinatorial challenge from the long-tailed distribution and partial-labeling. In this work, we first identify the major reasons that the prior work failed. We subsequently propose SoLar, a novel Optimal Transport-based framework that allows to refine the disambiguated labels towards matching the marginal class prior distribution. SoLar additionally incorporates a new and systematic mechanism for estimating the long-tailed class prior distribution under the PLL setup. Through extensive experiments, SoLar exhibits substantially superior results on standardized benchmarks compared to the previous state-of-the-art PLL methods. Code and data are available at: https: //github. com/hbzju/SoLar.

YNIMG Journal 2021 Journal Article

Beyond linearity in neuroimaging: Capturing nonlinear relationships with application to longitudinal studies

  • Gang Chen
  • Tiffany A. Nash
  • Katherine M. Cole
  • Philip D. Kohn
  • Shau-Ming Wei
  • Michael D. Gregory
  • Daniel P. Eisenberg
  • Robert W. Cox

The ubiquitous adoption of linearity for quantitative predictors in statistical modeling is likely attributable to its advantages of straightforward interpretation and computational feasibility. The linearity assumption may be a reasonable approximation especially when the variable is confined within a narrow range, but it can be problematic when the variable's effect is non-monotonic or complex. Furthermore, visualization and model assessment of a linear fit are usually omitted because of challenges at the whole brain level in neuroimaging. By adopting a principle of learning from the data in the presence of uncertainty to resolve the problematic aspects of conventional polynomial fitting, we introduce a flexible and adaptive approach of multilevel smoothing splines (MSS) to capture any nonlinearity of a quantitative predictor for population-level neuroimaging data analysis. With no prior knowledge regarding the underlying relationship other than a parsimonious assumption about the extent of smoothness (e.g., no sharp corners), we express the unknown relationship with a sufficient number of smoothing splines and use the data to adaptively determine the specifics of the nonlinearity. In addition to introducing the theoretical framework of MSS as an efficient approach with a counterbalance between flexibility and stability, we strive to (a) lay out the specific schemes for population-level nonlinear analyses that may involve task (e.g., contrasting conditions) and subject-grouping (e.g., patients vs controls) factors; (b) provide modeling accommodations to adaptively reveal, estimate and compare any nonlinear effects of a predictor across the brain, or to more accurately account for the effects (including nonlinear effects) of a quantitative confound; (c) offer the associated program 3dMSS to the neuroimaging community for whole-brain voxel-wise analysis as part of the AFNI suite; and (d) demonstrate the modeling approach and visualization processes with a longitudinal dataset of structural MRI scans.

YNIMG Journal 2021 Journal Article

Different activation signatures in the primary sensorimotor and higher-level regions for haptic three-dimensional curved surface exploration

  • Jiajia Yang
  • Peter J. Molfese
  • Yinghua Yu
  • Daniel A. Handwerker
  • Gang Chen
  • Paul A. Taylor
  • Yoshimichi Ejima
  • Jinglong Wu

Haptic object perception begins with continuous exploratory contact, and the human brain needs to accumulate sensory information continuously over time. However, it is still unclear how the primary sensorimotor cortex (PSC) interacts with these higher-level regions during haptic exploration over time. This functional magnetic resonance imaging (fMRI) study investigates time-dependent haptic object processing by examining brain activity during haptic 3D curve and roughness estimations. For this experiment, we designed sixteen haptic stimuli (4 kinds of curves × 4 varieties of roughness) for the haptic curve and roughness estimation tasks. Twenty participants were asked to move their right index and middle fingers along the surface twice and to estimate one of the two features-roughness or curvature-depending on the task instruction. We found that the brain activity in several higher-level regions (e.g., the bilateral posterior parietal cortex) linearly increased as the number of curves increased during the haptic exploration phase. Surprisingly, we found that the contralateral PSC was parametrically modulated by the number of curves only during the late exploration phase but not during the early exploration phase. In contrast, we found no similar parametric modulation activity patterns during the haptic roughness estimation task in either the contralateral PSC or in higher-level regions. Thus, our findings suggest that haptic 3D object perception is processed across the cortical hierarchy, whereas the contralateral PSC interacts with other higher-level regions across time in a manner that is dependent upon the features of the object.

AAAI Conference 2021 Conference Paper

Effective Slot Filling via Weakly-Supervised Dual-Model Learning

  • Jue Wang
  • Ke Chen
  • Lidan Shou
  • Sai Wu
  • Gang Chen

Slot filling is a challenging task in Spoken Language Understanding (SLU). Supervised methods usually require large amounts of annotation to maintain desirable performance. A solution to relieve the heavy dependency on labeled data is to employ bootstrapping, which leverages unlabeled data. However, bootstrapping is known to suffer from semantic drift. We argue that semantic drift can be tackled by exploiting the correlation between slot values (phrases) and their respective types. By using some particular weakly-labeled data, namely the plain phrases included in sentences, we propose a weaklysupervised slot filling approach. Our approach trains two models, namely a classifier and a tagger, which can effectively learn from each other on the weakly-labeled data. The experimental results demonstrate that our approach achieves better results than standard baselines on multiple datasets, especially in the low-resource setting.

YNIMG Journal 2021 Journal Article

ICA-based denoising strategies in breath-hold induced cerebrovascular reactivity mapping with multi echo BOLD fMRI

  • Stefano Moia
  • Maite Termenon
  • Eneko Uruñuela
  • Gang Chen
  • Rachael C. Stickland
  • Molly G. Bright
  • César Caballero-Gaudes

Performing a BOLD functional MRI (fMRI) acquisition during breath-hold (BH) tasks is a non-invasive, robust method to estimate cerebrovascular reactivity (CVR). However, movement and breathing-related artefacts caused by the BH can substantially hinder CVR estimates due to their high temporal collinearity with the effect of interest, and attention has to be paid when choosing which analysis model should be applied to the data. In this study, we evaluate the performance of multiple analysis strategies based on lagged general linear models applied on multi-echo BOLD fMRI data, acquired in ten subjects performing a BH task during ten sessions, to obtain subject-specific CVR and haemodynamic lag estimates. The evaluated approaches range from conventional regression models, i.e. including drifts and motion timecourses as nuisance regressors, applied on single-echo or optimally-combined data, to more complex models including regressors obtained from multi-echo independent component analysis with different grades of orthogonalization in order to preserve the effect of interest, i.e. the CVR. We compare these models in terms of their ability to make signal intensity changes independent from motion, as well as the reliability as measured by voxelwise intraclass correlation coefficients of both CVR and lag maps over time. Our results reveal that a conservative independent component analysis model applied on the optimally-combined multi-echo fMRI signal offers the largest reduction of motion-related effects in the signal, while yielding reliable CVR amplitude and lag estimates, although a conventional regression model applied on the optimally-combined data results in similar estimates. This work demonstrates the usefulness of multi-echo based fMRI acquisitions and independent component analysis denoising for precision mapping of CVR in single subjects based on BH paradigms, fostering its potential as a clinically-viable neuroimaging tool for individual patients. It also proves that the way in which data-driven regressors should be incorporated in the analysis model is not straight-forward due to their complex interaction with the BH-induced BOLD response.

TIST Journal 2021 Journal Article

PARP: A Parallel Traffic Condition Driven Route Planning Model on Dynamic Road Networks

  • Tianlun Dai
  • Bohan Li
  • Ziqiang Yu
  • Xiangrong Tong
  • Meng Chen
  • Gang Chen

The problem of route planning on road network is essential to many Location-Based Services (LBSs). Road networks are dynamic in the sense that the weights of the edges in the corresponding graph constantly change over time, representing evolving traffic conditions. Thus, a practical route planning strategy is required to supply the continuous route optimization considering the historic, current, and future traffic condition. However, few existing works comprehensively take into account these various traffic conditions during the route planning. Moreover, the LBSs usually suffer from extensive concurrent route planning requests in rush hours, which imposes a pressing need to handle numerous queries in parallel for reducing the response time of each query. However, this issue is also not involved by most existing solutions. We therefore investigate a parallel traffic condition driven route planning model on a cluster of processors. To embed the future traffic condition into the route planning, we employ a GCN model to periodically predict the travel costs of roads within a specified time period, which facilitates the robustness of the route planning model against the varying traffic condition. To reduce the response time, a Dual-Level Path (DLP) index is proposed to support a parallel route planning algorithm with the filter-and-refine principle. The bottom level of DLP partitions the entire graph into different subgraphs, and the top level is a skeleton graph that consists of all border vertices in all subgraphs. The filter step identifies a global directional path for a given query based on the skeleton graph. In the refine step, the overall route planning for this query is decomposed into multiple sub-optimizations in the subgraphs passed through by the directional path. Since the subgraphs are independently maintained by different processors, the sub-optimizations of extensive queries can be operated in parallel. Finally, extensive evaluations are conducted to confirm the effectiveness and superiority of the proposal.

YNIMG Journal 2021 Journal Article

To pool or not to pool: Can we ignore cross-trial variability in FMRI?

  • Gang Chen
  • Srikanth Padmala
  • Yi Chen
  • Paul A. Taylor
  • Robert W. Cox
  • Luiz Pessoa

In this work, we investigate the importance of explicitly accounting for cross-trial variability in neuroimaging data analysis. To attempt to obtain reliable estimates in a task-based experiment, each condition is usually repeated across many trials. The investigator may be interested in (a) condition-level effects, (b) trial-level effects, or (c) the association of trial-level effects with the corresponding behavior data. The typical strategy for condition-level modeling is to create one regressor per condition at the subject level with the underlying assumption that responses do not change across trials. In this methodology of complete pooling, all cross-trial variability is ignored and dismissed as random noise that is swept under the rug of model residuals. Unfortunately, this framework invalidates the generalizability from the confine of specific trials (e.g., particular faces) to the associated stimulus category ("face"), and may inflate the statistical evidence when the trial sample size is not large enough. Here we propose an adaptive and computationally tractable framework that meshes well with the current two-level pipeline and explicitly accounts for trial-by-trial variability. The trial-level effects are first estimated per subject through no pooling. To allow generalizing beyond the particular stimulus set employed, the cross-trial variability is modeled at the population level through partial pooling in a multilevel model, which permits accurate effect estimation and characterization. Alternatively, trial-level estimates can be used to investigate, for example, brain-behavior associations or correlations between brain regions. Furthermore, our approach allows appropriate accounting for serial correlation, handling outliers, adapting to data skew, and capturing nonlinear brain-behavior relationships. By applying a Bayesian multilevel model framework at the level of regions of interest to an experimental dataset, we show how multiple testing can be addressed and full results reported without arbitrary dichotomization. Our approach revealed important differences compared to the conventional method at the condition level, including how the latter can distort effect magnitude and precision. Notably, in some cases our approach led to increased statistical sensitivity. In summary, our proposed framework provides an effective strategy to capture trial-by-trial responses that should be of interest to a wide community of experimentalists.

YNIMG Journal 2021 Journal Article

Trial and error: A hierarchical modeling approach to test-retest reliability

  • Gang Chen
  • Daniel S. Pine
  • Melissa A. Brotman
  • Ashley R. Smith
  • Robert W. Cox
  • Simone P. Haller

The concept of test-retest reliability indexes the consistency of a measurement across time. High reliability is critical for any scientific study, but specifically for the study of individual differences. Evidence of poor reliability of commonly used behavioral and functional neuroimaging tasks is mounting. Reports on low reliability of task-based fMRI have called into question the adequacy of using even the most common, well-characterized cognitive tasks with robust population-level effects, to measure individual differences. Here, we lay out a hierarchical framework that estimates reliability as a correlation divorced from trial-level variability, and show that reliability tends to be underestimated under the conventional intraclass correlation framework through summary statistics based on condition-level modeling. In addition, we examine how reliability estimation between the two statistical frameworks diverges and assess how different factors (e.g., trial and subject sample sizes, relative magnitude of cross-trial variability) impact reliability estimates. As empirical data indicate that cross-trial variability is large in most tasks, this work highlights that a large number of trials (e.g., greater than 100) may be required to achieve precise reliability estimates. We reference the tools TRR and 3dLMEr for the community to apply trial-level models to behavior and neuroimaging data and discuss how to make these new measurements most useful for future studies.

IJCAI Conference 2020 Conference Paper

Collaboration Based Multi-Label Propagation for Fraud Detection

  • Haobo Wang
  • Zhao Li
  • Jiaming Huang
  • Pengrui Hui
  • Weiwei Liu
  • Tianlei Hu
  • Gang Chen

Detecting fraud users, who fraudulently promote certain target items, is a challenging issue faced by e-commerce platforms. Generally, many fraud users have different spam behaviors simultaneously, e. g. spam transactions, clicks, reviews and so on. Existing solutions have two main limitations: 1) the correlations among multiple spam behaviors are neglected; 2) large-scale computations are intractable when dealing with an enormous user set. To remedy these problems, this work proposes a collaboration based multi-label propagation (CMLP) algorithm. We first introduce a general-purpose version that involves collaboration technique to exploit label correlations. Specifically, it breaks the final prediction into two parts: 1) its own prediction part; 2) the prediction of others, i. e. collaborative part. Then, to accelerate it on large-scale e-commerce data, we propose a heterogeneous graph based variant that detects communities on the user-item graph directly. Both theoretical analysis and empirical results clearly validate the effectiveness and scalability of our proposals.

YNIMG Journal 2020 Journal Article

Fighting or embracing multiplicity in neuroimaging? neighborhood leverage versus global calibration

  • Gang Chen
  • Paul A. Taylor
  • Robert W. Cox
  • Luiz Pessoa

Neuroimaging faces the daunting challenge of multiple testing – an instance of multiplicity – that is associated with two other issues to some extent: low inference efficiency and poor reproducibility. Typically, the same statistical model is applied to each spatial unit independently in the approach of massively univariate modeling. In dealing with multiplicity, the general strategy employed in the field is the same regardless of the specifics: trust the local “unbiased” effect estimates while adjusting the extent of statistical evidence at the global level. However, in this approach, modeling efficiency is compromised because each spatial unit (e. g. , voxel, region, matrix element) is treated as an isolated and independent entity during massively univariate modeling. In addition, the required step of multiple testing “correction” by taking into consideration spatial relatedness, or neighborhood leverage, can only partly recoup statistical efficiency, resulting in potentially excessive penalization as well as arbitrariness due to thresholding procedures. Moreover, the assigned statistical evidence at the global level heavily relies on the data space (whole brain or a small volume). The present paper reviews how Stein’s paradox (1956) motivates a Bayesian multilevel (BML) approach that, rather than fighting multiplicity, embraces it to our advantage through a global calibration process among spatial units. Global calibration is accomplished via a Gaussian distribution for the cross-region effects whose properties are not a priori specified, but a posteriori determined by the data at hand through the BML model. Our framework therefore incorporates multiplicity as integral to the modeling structure, not a separate correction step. By turning multiplicity into a strength, we aim to achieve five goals: 1) improve the model efficiency with a higher predictive accuracy, 2) control the errors of incorrect magnitude and incorrect sign, 3) validate each model relative to competing candidates, 4) reduce the reliance and sensitivity on the choice of data space, and 5) encourage full results reporting. Our modeling proposal reverberates with recent proposals to eliminate the dichotomization of statistical evidence (“significant” vs. “non-significant”), to improve the interpretability of study findings, as well as to promote reporting the full gamut of results (not only “significant” ones), thereby enhancing research transparency and reproducibility.

AAAI Conference 2020 Conference Paper

Incorporating Label Embedding and Feature Augmentation for Multi-Dimensional Classification

  • Haobo Wang
  • Chen Chen
  • Weiwei Liu
  • Ke Chen
  • Tianlei Hu
  • Gang Chen

Feature augmentation, which manipulates the feature space by integrating the label information, is one of the most popular strategies for solving Multi-Dimensional Classification (MDC) problems. However, the vanilla feature augmentation approaches fail to consider the intra-class exclusiveness, and may achieve degenerated performance. To fill this gap, a novel neural network based model is proposed which seamlessly integrates the Label Embedding and Feature Augmentation (LEFA) techniques to learn label correlations. Specifically, based on attentional factorization machine, a cross correlation aware network is introduced to learn a low-dimensional label representation that simultaneously depicts the inter-class correlations and the intra-class exclusiveness. Then the learned latent label vector can be used to augment the original feature space. Extensive experiments on seven real-world datasets demonstrate the superiority of LEFA over state-of-the-art MDC approaches.

YNIMG Journal 2020 Journal Article

Interactions between emotion and action in the brain

  • Liana Catarina Lima Portugal
  • Rita de Cássia Soares Alves
  • Orlando Fernandes Junior
  • Tiago Arruda Sanchez
  • Izabela Mocaiber
  • Eliane Volchan
  • Fátima Smith Erthal
  • Isabel Antunes David

A growing literature supports the existence of interactions between emotion and action in the brain, and the central participation of the anterior midcingulate cortex (aMCC) in this regard. In the present functional magnetic resonance imaging study, we sought to investigate the role of self-relevance during such interactions by varying the context in which threating pictures were presented (with guns pointed towards or away from the observer). Participants performed a simple visual detection task following exposure to such stimuli. Except for voxelwise tests, we adopted a Bayesian analysis framework which evaluated evidence for the hypotheses of interest, given the data, in a continuous fashion. Behaviorally, our results demonstrated a valence by context interaction such that there was a tendency of speeding up responses to targets after viewing threat pictures directed towards the participant. In the brain, interaction patterns that paralleled those observed behaviorally were observed most notably in the middle temporal gyrus, supplementary motor area, precentral gyrus, and anterior insula. In these regions, activity was overall greater during threat conditions relative to neutral ones, and this effect was enhanced in the directed towards context. A valence by context interaction was observed in the aMCC too, where we also observed a correlation (across participants) of evoked responses and reaction time data. Taken together, our study revealed the context-sensitive engagement of motor-related areas during emotional perception, thus supporting the idea that emotion and action interact in important ways in the brain.

IJCAI Conference 2020 Conference Paper

Learning From Multi-Dimensional Partial Labels

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Tianlei Hu
  • Ke Chen
  • Gang Chen

Multi-dimensional classification has attracted huge attention from the community. Though most studies consider fully annotated data, in real practice obtaining fully labeled data in MDC tasks is usually intractable. In this paper, we propose a novel learning paradigm: MultiDimensional Partial Label Learning (MDPL) where the ground-truth labels of each instance are concealed in multiple candidate label sets. We first introduce the partial hamming loss for MDPL that incurs a large loss if the predicted labels are not in candidate label sets, and provide an empirical risk minimization (ERM) framework. Theoretically, we rigorously prove the conditions for ERM learnability of MDPL in both independent and dependent cases. Furthermore, we present two MDPL algorithms under our proposed ERM framework. Comprehensive experiments on both synthetic and real-world datasets validate the effectiveness of our proposals.

YNIMG Journal 2020 Journal Article

Reliability map of individual differences reflected in inter-subject correlation in naturalistic imaging

  • Jiaqi Gao
  • Gang Chen
  • Jinfeng Wu
  • YinShan Wang
  • Yang Hu
  • Ting Xu
  • Xi-Nian Zuo
  • Zhi Yang

Understanding individual differences in brain function is an essential aim of neuroscience. Naturalistic imaging links neural activity to real-life contexts and reflects individual differences in brain response. These unique features make it a promising tool for individualized psychiatry. An essential prerequisite for the extensive use of this paradigm is the reliable representation of inter-individual relationships. We used a test–retest approach to examine whether the naturalistic paradigm reliably represents inter-individual differences, which brain regions have the superior capability, and whether the ability alters with the contents of the stimuli. We quantified the reliability of the inter-subject relationships in repeated scans of two movie clips: a natural sight view and an emotion-evoking story. Besides statistical inference, we included resting-state scans, behavioral tests, and questionnaires as references for the comparison. The results showed that over one-third area of the brain could reliably characterize the inter-individual relationship, and the superior temporal lobe demonstrated comparable reliability representation with the State and Trait Anxiety Inventory. Furthermore, the temporal lobe regions could retain this capability across emotional movies with different contents. This study provides a base for pushing the naturalistic imaging paradigm towards clinical applications and proposes reliable target brain regions for future studies.

YNIMG Journal 2020 Journal Article

Untangling the relatedness among correlations, part III: Inter-subject correlation analysis through Bayesian multilevel modeling for naturalistic scanning

  • Gang Chen
  • Paul A. Taylor
  • Xianggui Qu
  • Peter J. Molfese
  • Peter A. Bandettini
  • Robert W. Cox
  • Emily S. Finn

While inter-subject correlation (ISC) analysis is a powerful tool for naturalistic scanning data, drawing appropriate statistical inferences is difficult due to the daunting task of accounting for the intricate relatedness in data structure as well as handling the multiple testing issue. Although the linear mixed-effects (LME) modeling approach (Chen et al. , 2017a) is capable of capturing the relatedness in the data and incorporating explanatory variables, there are a few challenging issues: 1) it is difficult to assign accurate degrees of freedom for each testing statistic, 2) multiple testing correction is potentially over-penalizing due to model inefficiency, and 3) thresholding necessitates arbitrary dichotomous decisions. Here we propose a Bayesian multilevel (BML) framework for ISC data analysis that integrates all regions of interest into one model. By loosely constraining the regions through a weakly informative prior, BML dissolves multiplicity through conservatively pooling the effect of each region toward the center and improves collective fitting and overall model performance. In addition to potentially achieving a higher inference efficiency, BML improves spatial specificity and easily allows the investigator to adopt a philosophy of full results reporting. A dataset of naturalistic scanning is utilized to illustrate the modeling approach with 268 parcels and to showcase the modeling capability, flexibility and advantages in results reporting. The associated program will be available as part of the AFNI suite for general use.

AAAI Conference 2019 Conference Paper

CAMO: A Collaborative Ranking Method for Content Based Recommendation

  • Chengwei Wang
  • Tengfei Zhou
  • Chen Chen
  • Tianlei Hu
  • Gang Chen

In real-world recommendation tasks, feedback data are usually sparse. Therefore, a recommender’s performance is often determined by how much information that it can extract from textual contents. However, current methods do not make full use of the semantic information. They encode the textual contents either by “bag-of-words” technique or Recurrent Neural Network (RNN). The former neglects the order of words while the latter ignores the fact that textual contents can contain multiple topics. Besides, there exists a dilemma in designing a recommender. On the one hand, we shall use a sophisticated model to exploit every drop of information in item contents; on the other hand, we shall adopt a simple model to prevent itself from over-fitting when facing the sparse feedbacks. To fill the gaps, we propose a recommender named CAMO 1. CAMO employs a multi-layer content encoder for simultaneously capturing the semantic information of multitopic and word order. Moreover, CAMO makes use of adversarial training to prevent the complex encoder from overfitting. Extensive empirical studies show that CAMO outperforms state-of-the-art methods in predicting users’ preferences.

IJCAI Conference 2019 Conference Paper

Discriminative and Correlative Partial Multi-Label Learning

  • Haobo Wang
  • Weiwei Liu
  • Yang Zhao
  • Chen Zhang
  • Tianlei Hu
  • Gang Chen

In partial label learning (PML), each instance is associated with a candidate label set that contains multiple relevant labels and other false positive labels. The most challenging issue for the PML is that the training procedure is prone to be affected by the labeling noise. We observe that state-of-the-art PML methods are either powerless to disambiguate the correct labels from the candidate labels or incapable of extracting the label correlations sufficiently. To fill this gap, a two-stage DiscRiminative and correlAtive partial Multi-label leArning (DRAMA) algorithm is presented in this work. In the first stage, a confidence value is learned for each label by utilizing the feature manifold, which indicates how likely a label is correct. In the second stage, a gradient boosting model is induced to fit the label confidences. Specifically, to explore the label correlations, we augment the feature space by the previously elicited labels on each boosting round. Extensive experiments on various real-world datasets clearly validate the superiority of our proposed method.

YNIMG Journal 2019 Journal Article

Finding the baby in the bath water – evidence for task-specific changes in resting state functional connectivity evoked by training

  • Adam Steel
  • Cibu Thomas
  • Aaron Trefler
  • Gang Chen
  • Chris I. Baker

Resting-state functional connectivity (rsFC) between brain regions has been used for studying training-related changes in brain function during the offline period of skill learning. However, it is difficult to infer whether the observed training-related changes in rsFC measured between two scans occur as a consequence of task performance, whether they are specific to a given task, or whether they reflect confounding factors such as diurnal fluctuations in brain physiology that impact the MRI signal. Here, we sought to elucidate whether task-specific changes in rsFC are dissociable from time-of-day related changes by evaluating rsFC changes after participants were provided training in either a visuospatial task or a motor sequence task compared to a non-training condition. Given the nature of the tasks, we focused on changes in rsFC of the hippocampal and sensorimotor cortices after short-term training, while controlling for the effect of time-of-day. We also related the change in rsFC of task-relevant brain regions to performance improvement in each task. Our results demonstrate that, even in the absence of any experimental manipulation, significant changes in rsFC can be detected between two resting state functional MRI scans performed just a few hours apart, suggesting time-of-day has a significant impact on rsFC. However, by estimating the magnitude of the time-of-day effect, our findings also suggest that task-specific changes in rsFC can be dissociated from the changes attributed to time-of-day. Taken together, our results show that rsFC can provide insights about training-related changes in brain function during the offline period of skill learning. However, demonstrating the specificity of the changes in rsFC to a given task requires a rigorous experimental design that includes multiple active and passive control conditions, and robust behavioral measures.

IS Journal 2019 Journal Article

Identifying Adverse Drug Events From Social Media Using An Improved Semisupervised Method

  • Jing Liu
  • Gang Wang
  • Gang Chen

Adverse drug event (ADE) is a serious health concern. Social media has provided patients a broad platform to share their ADE experiences, impelling the development of social media-based pharmacovigilance. However, social media analysis of ADEs presents several important challenges that need to be addressed for high-performing ADE identification. To address these challenges, a feature weighted-based improved disagreement-based semisupervised learning method, named WIDSSL, is proposed for effectively identifying ADEs from non-ADEs. Empirical results demonstrate the effectiveness of WIDSSL. Our proposed WIDSSL method can reduce the reliance on a large number of labeled instances for high-performing ADE identification, and hence enhance the feasibility of conducting social media-based pharmacovigilance.

YNIMG Journal 2019 Journal Article

Neural correlates of developing theory of mind competence in early childhood

  • Yaqiong Xiao
  • Fengji Geng
  • Tracy Riggins
  • Gang Chen
  • Elizabeth Redcay

Theory of mind (ToM) encompasses a range of abilities that show different developmental time courses. However, relatively little work has examined the neural correlates of ToM during early childhood. In this study, we investigated the neural correlates of ToM in typically developing children aged 4–8 years using resting-state functional magnetic resonance imaging. We calculated whole-brain functional connectivity with the right temporo-parietal junction (RTPJ), a core region involved in ToM, and examined its relation to children's early, basic, and advanced components of ToM competence assessed by a parent-report measure. Total ToM and both basic and advanced ToM components, but not early, consistently showed a positive correlation with connectivity between RTPJ and posterior cingulate cortex/precuneus; advanced ToM was also correlated with RTPJ to left TPJ connectivity. However, early and advanced ToM components showed negative correlation with the right inferior/superior parietal lobe, suggesting that RTPJ network differentiation is also related to ToM abilities. We confirmed and extended these results using a Bayesian modeling approach demonstrating significant relations between multiple nodes of the mentalizing network and ToM abilities, with no evidence for differences in relations between ToM components. Our data provide new insights into the neural correlates of multiple aspects of ToM in early childhood and may have implications for both typical and atypical development of ToM.

AAAI Conference 2019 Conference Paper

Two-Stage Label Embedding via Neural Factorization Machine for Multi-Label Classification

  • Chen Chen
  • Haobo Wang
  • Weiwei Liu
  • Xingyuan Zhao
  • Tianlei Hu
  • Gang Chen

Label embedding has been widely used as a method to exploit label dependency with dimension reduction in multilabel classification tasks. However, existing embedding methods intend to extract label correlations directly, and thus they might be easily trapped by complex label hierarchies. To tackle this issue, we propose a novel Two-Stage Label Embedding (TSLE) paradigm that involves Neural Factorization Machine (NFM) to jointly project features and labels into a latent space. In encoding phase, we introduce a Twin Encoding Network (TEN) that digs out pairwise feature and label interactions in the first stage and then efficiently learn higherorder correlations with deep neural networks (DNNs) in the second stage. After the codewords are obtained, a set of hidden layers is applied to recover the output labels in decoding phase. Moreover, we develop a novel learning model by leveraging a max margin encoding loss and a label-correlation aware decoding loss, and we adopt the mini-batch Adam to optimize our learning model. Lastly, we also provide a kernel insight to better understand our proposed TSLE. Extensive experiments on various real-world datasets demonstrate that our proposed model significantly outperforms other state-ofthe-art approaches.

YNIMG Journal 2018 Journal Article

Is the encoding of Reward Prediction Error reliable during development?

  • Hanna Keren
  • Gang Chen
  • Brenda Benson
  • Monique Ernst
  • Ellen Leibenluft
  • Nathan A. Fox
  • Daniel S. Pine
  • Argyris Stringaris

Reward Prediction Errors (RPEs), defined as the difference between the expected and received outcomes, are integral to reinforcement learning models and play an important role in development and psychopathology. In humans, RPE encoding can be estimated using fMRI recordings, however, a basic measurement property of RPE signals, their test-retest reliability across different time scales, remains an open question. In this paper, we examine the 3-month and 3-year reliability of RPE encoding in youth (mean age at baseline = 10. 6 ± 0. 3 years), a period of developmental transitions in reward processing. We show that RPE encoding is differentially distributed between the positive values being encoded predominantly in the striatum and negative RPEs primarily encoded in the insula. The encoding of negative RPE values is highly reliable in the right insula, across both the long and the short time intervals. Insula reliability for RPE encoding is the most robust finding, while other regions, such as the striatum, are less consistent. Striatal reliability appeared significant as well once covarying for factors, which were possibly confounding the signal to noise ratio. By contrast, task activation during feedback in the striatum is highly reliable across both time intervals. These results demonstrate the valence-dependent differential encoding of RPE signals between the insula and striatum, and the consistency of RPE signals or lack thereof, during childhood and into adolescence. Characterizing the regions where the RPE signal in BOLD fMRI is a reliable marker is key for estimating reward-processing alterations in longitudinal designs, such as developmental or treatment studies.

YNIMG Journal 2018 Journal Article

Statistical power comparisons at 3T and 7T with a GO / NOGO task

  • Salvatore Torrisi
  • Gang Chen
  • Daniel Glen
  • Peter A. Bandettini
  • Chris I. Baker
  • Richard Reynolds
  • Jeffrey Yen-Ting Liu
  • Joseph Leshin

The field of cognitive neuroscience is weighing evidence about whether to move from standard field strength to ultra-high field (UHF). The present study contributes to the evidence by comparing a cognitive neuroscience paradigm at 3 Tesla (3T) and 7 Tesla (7T). The goal was to test and demonstrate the practical effects of field strength on a standard GO/NOGO task using accessible preprocessing and analysis tools. Two independent matched healthy samples (N = 31 each) were analyzed at 3T and 7T. Results show gains at 7T in statistical strength, the detection of smaller effects and group-level power. With an increased availability of UHF scanners, these gains may be exploited by cognitive neuroscientists and other neuroimaging researchers to develop more efficient or comprehensive experimental designs and, given the same sample size, achieve greater statistical power at 7T.

IJCAI Conference 2017 Conference Paper

Deep Supervised Hashing with Nonlinear Projections

  • Sen Su
  • Gang Chen
  • Xiang Cheng
  • Rong Bi

Hashing has attracted broad research interests in large scale image retrieval due to its high search speed and efficient storage. Recently, many deep hashing methods have been proposed to perform simultaneous nonlinear feature learning and hash projection learning, which have shown superior performance compared to hand-crafted feature based hashing methods. Nonlinear projection functions have shown their advantages over the linear ones due to their powerful generalization capabilities. To improve the performance of deep hashing methods by generalizing projection functions, we propose the idea of implementing a pure nonlinear deep hashing network architecture. By consolidating the above idea, this paper presents a Deep Supervised Hashing architecture with Nonlinear Projections (DSHNP). In particular, soft decision trees are adopted as the nonlinear projection functions, since they can generate differentiable nonlinear outputs and can be trained with deep neural networks in an end-to-end way. Moreover, to make the hash codes as independent as possible, we design two regularizers imposed on the parameter matrices of the leaves in the soft decision trees. Extensive evaluations on two benchmark image datasets show that the proposed DSHNP outperforms several state-of-the-art hashing methods.

YNIMG Journal 2017 Journal Article

Intrinsic signal optical imaging of visual brain activity: Tracking of fast cortical dynamics

  • Haidong D. Lu
  • Gang Chen
  • Junjie Cai
  • Anna W. Roe

Hemodynamic-based brain imaging techniques are typically incapable of monitoring brain activity with both high spatial and high temporal resolutions. In this study, we have used intrinsic signal optical imaging (ISOI), a relatively high spatial resolution imaging technique, to examine the temporal resolution of the hemodynamic signal. We imaged V1 responses in anesthetized monkey to a moving light spot. Movies of cortical responses clearly revealed a focus of hemodynamic response traveling across the cortical surface. Importantly, at different locations along the cortical trajectory, response timecourses maintained a similar tri-phasic shape and shifted sequentially across cortex with a predictable delay. We calculated the time between distinguishable timecourses and found that the temporal resolution of the signal at which two events can be reliably distinguished is about 80 milliseconds. These results suggest that hemodynamic-based imaging is suitable for detecting ongoing cortical events at high spatial resolution and with temporal resolution relevant for behavioral studies.

YNIMG Journal 2017 Journal Article

Is the statistic value all we should care about in neuroimaging?

  • Gang Chen
  • Paul A. Taylor
  • Robert W. Cox

Here we address an important issue that has been embedded within the neuroimaging community for a long time: the absence of effect estimates in results reporting in the literature. The statistic value itself, as a dimensionless measure, does not provide information on the biophysical interpretation of a study, and it certainly does not represent the whole picture of a study. Unfortunately, in contrast to standard practice in most scientific fields, effect (or amplitude) estimates are usually not provided in most results reporting in the current neuroimaging publications and presentations. Possible reasons underlying this general trend include (1) lack of general awareness, (2) software limitations, (3) inaccurate estimation of the BOLD response, and (4) poor modeling due to our relatively limited understanding of FMRI signal components. However, as we discuss here, such reporting damages the reliability and interpretability of the scientific findings themselves, and there is in fact no overwhelming reason for such a practice to persist. In order to promote meaningful interpretation, cross validation, reproducibility, meta and power analyses in neuroimaging, we strongly suggest that, as part of good scientific practice, effect estimates should be reported together with their corresponding statistic values. We provide several easily adaptable recommendations for facilitating this process.

AAAI Conference 2017 Conference Paper

Latent Discriminant Analysis with Representative Feature Discovery

  • Gang Chen

Linear Discriminant Analysis (LDA) is a well-known method for dimension reduction and classification with focus on discriminative feature selection. However, how to discover discriminative as well as representative features in LDA model has not been explored. In this paper, we propose a latent Fisher discriminant model with representative feature discovery in an semi-supervised manner. Specifically, our model leverages advantages of both discriminative and generative models by generalizing LDA with data-driven prior over the latent variables. Thus, our method combines multi-class, latent variables and dimension reduction in an unified Bayesian framework. We test our method on MUSK and Corel datasets and yield competitive results compared to baselines. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.

YNIMG Journal 2017 Journal Article

Untangling the relatedness among correlations, Part II: Inter-subject correlation group analysis through linear mixed-effects modeling

  • Gang Chen
  • Paul A. Taylor
  • Yong-Wook Shin
  • Richard C. Reynolds
  • Robert W. Cox

It has been argued that naturalistic conditions in FMRI studies provide a useful paradigm for investigating perception and cognition through a synchronization measure, inter-subject correlation (ISC). However, one analytical stumbling block has been the fact that the ISC values associated with each single subject are not independent, and our previous paper (Chen et al. , 2016) used simulations and analyses of real data to show that the methodologies adopted in the literature do not have the proper control for false positives. In the same paper, we proposed nonparametric subject-wise bootstrapping and permutation testing techniques for one and two groups, respectively, which account for the correlation structure, and these greatly outperformed the prior methods in controlling the false positive rate (FPR); that is, subject-wise bootstrapping (SWB) worked relatively well for both cases with one and two groups, and subject-wise permutation (SWP) testing was virtually ideal for group comparisons. Here we seek to explicate and adopt a parametric approach through linear mixed-effects (LME) modeling for studying the ISC values, building on the previous correlation framework, with the benefit that the LME platform offers wider adaptability, more powerful interpretations, and quality control checking capability than nonparametric methods. We describe both theoretical and practical issues involved in the modeling and the manner in which LME with crossed random effects (CRE) modeling is applied. A data-doubling step further allows us to conveniently track the subject index, and achieve easy implementations. We pit the LME approach against the best nonparametric methods, and find that the LME framework achieves proper control for false positives. The new LME methodologies are shown to be both efficient and robust, and they will be publicly available in AFNI (http: //afni. nimh. nih. gov).

YNIMG Journal 2017 Journal Article

Variance decomposition for single-subject task-based fMRI activity estimates across many sessions

  • Javier Gonzalez-Castillo
  • Gang Chen
  • Thomas E. Nichols
  • Peter A. Bandettini

Here we report an exploratory within-subject variance decomposition analysis conducted on a task-based fMRI dataset with an unusually large number of repeated measures (i. e. , 500 trials in each of three different subjects) distributed across 100 functional scans and 9 to 10 different sessions. Within-subject variance was segregated into four primary components: variance across-sessions, variance across-runs within a session, variance across-blocks within a run, and residual measurement/modeling error. Our results reveal inhomogeneous and distinct spatial distributions of these variance components across significantly active voxels in grey matter. Measurement error is dominant across the whole brain. Detailed evaluation of the remaining three components shows that across-session variance is the second largest contributor to total variance in occipital cortex, while across-runs variance is the second dominant source for the rest of the brain. Network-specific analysis revealed that across-block variance contributes more to total variance in higher-order cognitive networks than in somatosensory cortex. Moreover, in some higher-order cognitive networks across-block variance can exceed across-session variance. These results help us better understand the temporal (i. e. , across blocks, runs and sessions) and spatial distributions (i. e. , across different networks) of within-subject natural variability in estimates of task responses in fMRI. They also suggest that different brain regions will show different natural levels of test-retest reliability even in the absence of residual artifacts and sufficiently high contrast-to-noise measurements. Further confirmation with a larger sample of subjects and other tasks is necessary to ensure generality of these results.

YNIMG Journal 2016 Journal Article

Behavioral and neural stability of attention bias to threat in healthy adolescents

  • Lauren K. White
  • Jennifer C. Britton
  • Stefanie Sequeira
  • Emily G. Ronkin
  • Gang Chen
  • Yair Bar-Haim
  • Tomer Shechner
  • Monique Ernst

Considerable translational research on anxiety examines attention bias to threat and the efficacy of attention training in reducing symptoms. Imaging research on the stability of brain functions engaged by attention bias tasks could inform such research. Perturbed fronto-amygdala function consistently arises in attention bias research on adolescent anxiety. The current report examines the stability of the activation and functional connectivity of these regions on the dot-probe task. Functional magnetic resonance imaging (fMRI) activation and connectivity data were acquired with the dot-probe task in 39 healthy youth (f=18, Mean Age=13. 71years, SD=2. 31) at two time points, separated by approximately nine weeks. Intraclass-correlations demonstrate good reliability in both neural activation for the ventrolateral PFC and task-specific connectivity for fronto-amygdala circuitry. Behavioral measures showed generally poor test–retest reliability. These findings suggest potential avenues for future brain imaging work by highlighting brain circuitry manifesting stable functioning on the dot-probe attention bias task.

AAAI Conference 2016 Conference Paper

Maximum Margin Dirichlet Process Mixtures for Clustering

  • Gang Chen
  • Haiying Zhang
  • Caiming Xiong

The Dirichlet process mixtures (DPM) can automatically infer the model complexity from data. Hence it has attracted significant attention recently, and is widely used for model selection and clustering. As a generative model, it generally requires prior base distribution to learn component parameters by maximizing posterior probability. In contrast, discriminative classifiers model the conditional probability directly, and have yielded better results than generative classifiers. In this paper, we propose a maximum margin Dirichlet process mixture for clustering, which is different from the traditional DPM for parameter modeling. Our model takes a discriminative clustering approach, by maximizing a conditional likelihood to estimate parameters. In particular, we take a EM-like algorithm by leveraging Gibbs sampling algorithm for inference, which in turn can be perfectly embedded in the online maximum margin learning procedure to update model parameters. We test our model and show comparative results over the traditional DPM and other nonparametric clustering approaches.

YNIMG Journal 2016 Journal Article

Segregating attention from response control when performing a motor inhibition task

  • Harma Meffert
  • Soonjo Hwang
  • Zachary T. Nolan
  • Gang Chen
  • James R. Blair

Considerable work has demonstrated that inferior frontal gyrus (IFG), anterior insula cortex (AIC) and the supplementary motor area (SMA) are responsive during inhibitory control tasks. However, there is disagreement as to whether this relates to response selection/ inhibition or attentional processing. The current study investigates this by using a Go/No-go task with a factorial design. We observed that both left IFG and dorsal pre-SMA were responsive to no-go cues irrespective of cue frequency. This suggests a role for both in the inhibition of motor responses. Generalized psychophysiological interaction (gPPI) analyses suggest that inferior frontal gyrus may implement this function through interaction with basal ganglia and by suppressing the visual representation of cues associated with no-go responses. Anterior insula cortex and a more ventral portion of pre-SMA showed greater responsiveness to low frequency relative to higher frequency stimuli, irrespective of response type. This may reflect the hypothesized role of anterior insula cortex in marking low frequency items for additional processing (cf. Menon and Uddin, 2010). Consistent with this, the gPPI analysis revealed significantly greater anterior insula cortex connectivity with visual cortex in response to low relative to high frequency cues.

YNIMG Journal 2016 Journal Article

Untangling the relatedness among correlations, part I: Nonparametric approaches to inter-subject correlation analysis at the group level

  • Gang Chen
  • Yong-Wook Shin
  • Paul A. Taylor
  • Daniel R. Glen
  • Richard C. Reynolds
  • Robert B. Israel
  • Robert W. Cox

FMRI data acquisition under naturalistic and continuous stimuli (e. g. , watching a video or listening to music) has become popular recently due to the fact that it entails less manipulation and more realistic/complex contexts involved in the task, compared to the conventional task-based experimental designs. The synchronization or response similarities among subjects are typically measured through inter-subject correlation (ISC) between any pair of subjects. At the group level, summarizing the collection of ISC values is complicated by their intercorrelations, which necessarily lead to the violation of independence assumed in typical parametric approaches such as Student's t-test. Nonparametric methods, such as bootstrapping and permutation testing, have previously been adopted for testing purposes by resampling the time series of each subject, but the quantitative validity of these specific approaches in terms of controllability of false positive rate (FPR) has never been explored before. Here we survey the methods of ISC group analysis that have been employed in the literature, and discuss the issues involved in those methods. We then propose less computationally intensive nonparametric methods that can be performed at the group level (for both one- and two-sample analyses), as compared to the popular method of circularly shifting the EPI time series at the individual level. As part of the new approaches, subject-wise (SW) resampling is adopted instead of element-wise (EW) resampling, so that exchangeability and independence assumptions are satisfied, and the patterned correlation structure among the ISC values can be more accurately captured. We examine the FPR controllability and power achievement of all the methods through simulations, as well as their performance when applied to a real experimental dataset.

YNIMG Journal 2015 Journal Article

Nature of functional links in valuation networks differentiates impulsive behaviors between abstinent heroin-dependent subjects and nondrug-using subjects

  • Tianye Zhai
  • Yongcong Shao
  • Gang Chen
  • Enmao Ye
  • Lin Ma
  • Lubin Wang
  • Yu Lei
  • Guangyu Chen

Advanced neuroimaging studies have identified brain correlates of pathological impulsivity in a variety of neuropsychiatric disorders. However, whether and how these spatially separate and functionally integrated neural correlates collectively contribute to aberrant impulsive behaviors remains unclear. Building on recent progress in neuroeconomics toward determining a biological account of human behaviors, we employed resting-state functional MRI to characterize the nature of the links between these neural correlates and to investigate their impact on impulsivity. We demonstrated that through functional connectivity with the ventral medial prefrontal cortex, the δ-network (regions of the executive control system, such as the dorsolateral prefrontal cortex) and the β-network (regions of the reward system involved in the mesocorticolimbic pathway), jointly influence impulsivity measured by the Barratt impulsiveness scale scores. In control nondrug-using subjects, the functional link between the β- and δ-networks is balanced, and the δ-network competitively controls impulsivity. However, in abstinent heroin-dependent subjects, the link is imbalanced, with stronger β-network connectivity and weaker δ-network connectivity. The imbalanced link is associated with impulsivity, indicating that the β- and δ-networks may mutually reinforce each other in abstinent heroin-dependent subjects. These findings of an aberrant link between the β- and δ-networks in abstinent heroin-dependent subjects may shed light on the mechanism of aberrant behaviors of drug addiction and may serve as an endophenotype to mark individual subjects' self-control capacity.

YNICL Journal 2013 Journal Article

Late-life depression, mild cognitive impairment and hippocampal functional network architecture

  • Chunming Xie
  • Wenjun Li
  • Gang Chen
  • B. Douglas Ward
  • Malgorzata B. Franczak
  • Jennifer L. Jones
  • Piero G. Antuono
  • Shi-Jiang Li

Late-life depression (LLD) and amnestic mild cognitive impairment (aMCI) are associated with medial temporal lobe structural abnormalities. However, the hippocampal functional connectivity (HFC) similarities and differences related to these syndromes when they occur alone or coexist are unclear. Resting-state functional connectivity MRI (R-fMRI) technique was used to measure left and right HFC in 72 elderly participants (LLD [n = 18], aMCI [n = 17], LLD with comorbid aMCI [n = 12], and healthy controls [n = 25]). The main and interactive relationships of LLD and aMCI on the HFC networks were determined, after controlling for age, gender, education and gray matter volumes. The effects of depressive symptoms and episodic memory deficits on the hippocampal functional connections also were assessed. While increased and decreased left and right HFC with several cortical and subcortical structures involved in mood regulation were related to LLD, aMCI was associated with globally diminished connectivity. Significant LLD-aMCI interactions on the right HFC networks were seen in the brain regions critical for emotion processing and higher-order cognitive functions. In the interactive brain regions, LLD and aMCI were associated with diminished hippocampal functional connections, whereas the comorbid group demonstrated enhanced connectivity. Main and interactive effects of depressive symptoms and episodic memory performance were also associated with bilateral HFC network abnormalities. In conclusion, these findings indicate that discrete hippocampal functional network abnormalities are associated with LLD and aMCI when they occur alone. However, when these conditions coexist, more pronounced vulnerabilities of the hippocampal networks occur, which may be a marker of disease severity and impending cognitive decline. By utilizing R-fMRI technique, this study provides novel insights into the neural mechanisms underlying LLD and aMCI in the functional network level.

TAAS Journal 2010 Journal Article

A self-organization mechanism based on cross-entropy method for P2P-like applications

  • Gang Chen
  • Abdolhossein Sarrafzadeh
  • Chor Ping Low
  • Liang Zhang

P2P-like applications are quickly gaining popularity in the Internet. Such applications are commonly modeled as graphs with nodes and edges. Usually nodes represent running processes that exchange information with each other through communication channels as represented by the edges. They often need to autonomously determine their suitable working mode or local status for the purpose of improving performance, reducing operation cost, or achieving system-level design goals. In order to achieve this objective, the concept of status configuration is introduced in this article and a mathematical correspondence is further established between status configuration and an optimization index ( OI ), which serves as a unified abstraction of any system design goals. Guided by this correspondence and inspired by the cross-entropy algorithm, a cross-entropy-driven self-organization mechanism (CESM) is proposed in this article. CESM exhibits the self-organization property since desirable status configurations that lead to high OI values will quickly emerge from purely localized interactions. Both theoretical and experimental analysis have been performed. The results strongly indicate that CESM is a simple yet effective technique which is potentially suitable for many P2P-like applications.

IJCAI Conference 2009 Conference Paper

  • Xinxin Bai
  • Gang Chen
  • Qiming Tian
  • Wenjun Yin
  • Jin Dong

Location plays a very important role in the retail business due to its huge and long-term investment. In this paper, we propose a novel semisupervised regression model for evaluating convenience store location based on spatial data analysis. First, the input features for each convenience store can be extracted by analyzing the elements around it based on a geographic information system, and the turnover is used to evaluate its performance. Second, considering the practical application scenario, a manifold regularization model with one semi-supervised performance information constraint is provided. The promising experimental results in the real-world dataset demonstrate the effectiveness of the proposed approach in performance prediction of certain candidate locations for new convenience store opening.

IJCAI Conference 2009 Conference Paper

  • Jianwen Zhang
  • Yangqiu Song
  • Gang Chen
  • Changshui Zhang

This paper deals with evolutionary clustering, which refers to the problem of clustering data with distribution drifting along time. Starting from a density estimation view to clustering problems, we propose two general on-line frameworks. In the first framework, i. e. , historical data dependent (HDD), current model distribution is designed to approximate both current and historical data distributions. In the second framework, i. e. , historical model dependent (HMD), current model distribution is designed to approximate both current data distribution and historical model distribution. Both frameworks are based on the general exponential family mixture (EFM) model. As a result, all conventional clustering algorithms based on EFMs can be extended to evolutionary setting under the two frameworks. Empirical results validate the two frameworks.

MFCS Conference 1997 Conference Paper

Subtyping Calculus of Construction (Extended Abstract)

  • Gang Chen

Abstract We present in this paper a subtyping extension of Calculus of Construction. We prove that this system has good meta-theoretical properties: transitivity elimination, subject reduction, strong normalization and decidability of subtyping. This work provides a theoretical foundation for adding subtyping to proof checkers like Coq, LEGO etc.