Arrow Research search

Author name cluster

Jian Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

62 papers
2 author rows

Possible papers

62

AAAI Conference 2026 Conference Paper

A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning

  • Yile Chen
  • Zeyi Wen
  • Jian Chen
  • Jin Huang

As an essential component of fine-tuning, warm-up plays a crucial role in promoting stability and generalization. Many studies have examined its underlying mechanisms from different aspects. However, most of the studies focus on incorporating these insights into optimizers to reduce the reliance on warm-up. Little attention has been paid to addressing the inherent limitations of the warm-up itself, which restricts its effectiveness. In this work, we revisit warm-up from a loss landscape perspective and identify several limitations with existing warm-up, including: (1) susceptibility to nearby suboptimal traps, (2) sensitivity to hyperparameters and random seeds, and (3) inefficiency during the early stages of training. To overcome these limitations, we propose Sensitivity-Aware Warm-Up (SAWU), a lightweight and adaptive strategy that dynamically leverages learning sensitivity during warm-up to guide updates toward better and more stable basins. In addition, SAWU also introduces an adaptive scheduling mechanism and phase transition strategy across warm-up, stable, and decay phases to further enhance robustness and efficiency. Extensive experiments on various downstream tasks show that SAWU significantly outperforms the vanilla method (e.g., average 3.43% improvement on RoBerta). Moreover, SAWU can be easily combined with various optimizers and remains effective even when warm-up-based methods fail (e.g, it lifts RAdam from 49.46% to 91.78% on qnli. Thanks to its lightweight nature, SAWU introduces minimal overhead and even reduces training time by over 5% compared to other methods.

EAAI Journal 2026 Journal Article

A deep learning framework for on-street parking demand prediction: Integrating spatio-temporal dynamics and policy impacts

  • Keliang Liu
  • Jian Chen

Urban on-street short-term parking demand prediction is fundamental for smart parking guidance systems. However, current prediction methods often rely on a single data source and fail to account for the dynamic impacts of environmental factors outside the parking system. This limitation constrains the accuracy of model prediction and does not allow for an assessment of how parking demand is affected by the dynamics of policy changes. To address this issue, this study proposes a comprehensive deep learning forecasting framework. Utilizing data from 123 on-street parking facilities over seven months, totaling more than 6. 48 million parking order data. Recognizing the varied spatial and temporal influences that built environment and parking regulations exert on demand patterns, we used the Multi-scale Geographically and Temporally Weighted Regression model (MGTWR) to quantify these relationships. We then incorporated the spatio-temporal coefficients derived from the MGTWR model, alongside additional influential variables, as inputs for a novel deep learning architecture that combines MGTWR, Graph Attention Networks (GAT), and Attention-based Long Short-Term Memory (ALSTM), which we designate as “MGTWR-GAT-ALSTM. ” Our model was benchmarked against traditional baseline methods, and the results indicate that MGTWR-GAT-ALSTM yields superior predictive performance, with Mean Absolute Error, Root Mean Squared Error, and Coefficient of Determination metrics of 0. 01, 0. 04, and 0. 92, respectively. Additionally, we performed ablation experiments to confirm that the model design does not introduce redundancy. The proposed prediction model aims to enhance the construction of smart parking systems, providing a dynamic assessment tool for parking policies.

AAAI Conference 2026 Conference Paper

Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection

  • Yuxuan Hu
  • Jian Chen
  • Yuhao Wang
  • Zixuan Li
  • Jing Xiong
  • Pengyue Jia
  • Wei Wang
  • Chengming Li

Stickers are widely used in online communication to convey emotions and implicit intentions. The Sticker Response Selection (SRS) task aims to select the most contextually appropriate sticker based on the dialogue. However, existing methods typically rely on semantic matching and model emotional and intentional cues separately, which can lead to mismatches when emotions and intentions are misaligned. To address this issue, we propose Emotion and Intention Guided Multi-Modal Learning (EIGML). This framework is the first to jointly model emotion and intention, effectively reducing the bias caused by isolated modeling and significantly improving selection accuracy. Specifically, we introduce Dual-Level Contrastive Framework to perform both intra-modality and inter-modality alignment, ensuring consistent representation of emotional and intentional features within and across modalities. In addition, we design an Intention-Emotion Guided Multi-Modal Fusion module that integrates emotional and intentional information progressively through three components: Emotion-Guided Intention Knowledge Selection, Intention-Emotion Guided Attention Fusion, and Similarity-Adjusted Matching Mechanism. This design injects rich, effective information into the model and enables a deeper understanding of the dialogue, ultimately enhancing sticker selection performance. Experimental results on two public datasets show that EIGML outperforms state-of-the-art baselines, achieving higher accuracy and a better understanding of emotional and intentional features.

AAAI Conference 2026 Conference Paper

FastFLUX: Pruning FLUX with Block-wise Replacement and Sandwich Training

  • Fuhan Cai
  • Yong Guo
  • Jie Li
  • Wenbo Li
  • Jian Chen
  • Xiangzhong Fang

Recent advancements in text-to-image (T2I) generation have led to the emergence of highly expressive models such as diffusion transformers (DiTs), exemplified by FLUX. However, their massive parameter sizes lead to slow inference, high memory usage, and poor deployability. Existing acceleration methods (e.g., single-step distillation and attention pruning) often suffer from significant performance degradation and incur substantial training costs. To address these limitations, we propose FastFLUX, an architecture-level pruning framework designed to enhance the inference efficiency of FLUX. At its core is the Block-wise Replacement with Linear Layers (BRLL) method, which replaces structurally complex residual branches in ResBlocks with lightweight linear layers while preserving the original shortcut connections for stability. Furthermore, we introduce Sandwich Training (ST), a localized fine-tuning strategy that leverages LoRA to supervise neighboring blocks, mitigating performance drops caused by structural replacement. Experiments show that our FastFLUX maintains high image quality under both qualitative and quantitative evaluations, while significantly improving inference speed, even with 20% of the hierarchy pruned.

EAAI Journal 2026 Journal Article

Laplacian-guided contextual instance learning for whole slide image classification

  • Jian Chen
  • Ziyuan Chen
  • Geng Chen
  • Mengyu Liu
  • Sohaib Asif
  • He Zhang
  • Jun Jin

Classification plays an important role in the diagnosis and prognosis of cancers such as endometrial and breast cancer. Achieving satisfactory performance in classifying cancer molecular subtypes from whole slide images presents a substantial challenge. This difficulty arises from diverse and complex inter-instance relationships and feature homogeneity among different molecular subtypes. To address these issues, this paper presents a novel Laplacian-guided contextual instance learning (LapCIL) framework, which focuses on learning inter-instance relationships to effectively identify molecular subtypes. The LapCIL framework consists of a dynamic contiguous masking strategy, a contextual instance learning block, and a Laplacian channel classification head. In the LapCIL framework, a dynamic contiguous masking strategy is proposed to generate more inter-instance relationships from finite data. Considering the diversity and complexity of inter-instance relationships, a contextual instance learning block is introduced, which leverages a contextual self-attention mechanism to capture the relationships between different instances. To enhance the distinguishing capability between different instances even further, especially in scenarios where feature homogeneity renders it challenging to differentiate morphologically similar cell types, the LapCIL framework incorporates a Laplacian channel classification head. The Laplacian channel classification head focuses on structured local features and dynamically attends to discriminative channel groups. Extensive experiments are conducted on the CAncer MEtastases in LYmphnOdes challeNge (CAMELYON16) breast cancer dataset, the BReAst Carcinoma Subtyping dataset, and a clinical endometrial cancer dataset to evaluate the proposed LapCIL framework. Our framework achieves significant advantages over state-of-the-art methods, both on the clinical dataset and the CAMELYON16 breast cancer dataset.

AAAI Conference 2026 Conference Paper

NaVLA$^2$: A Vision-Language-Audio-Action Model for Multimodal Instruction Navigation

  • Jugang Fan
  • Peihao Chen
  • ChangHao Li
  • Qing Du
  • Jian Chen
  • Mingkui Tan

Embodied navigation is a fundamental capability for intelligent agents, yet remains challenging in partially observable environments where navigation instructions can be difficult to interpret. However, existing tasks only provide unimodal instructions, which are ambiguous in complex multimodal environments with multiple similar objects, and may result in misinterpretation and navigation failure. To overcome these limitations, we introduce MINav, a novel task where the navigation path is precisely described by a multimodal instruction. The instruction provides multimodal cues, including object categories, RGB images, language descriptions, and auditory descriptions, which help the agent to disambiguate and ground objects in the environment and navigate effectively. We further construct a large-scale dataset of 43.9K navigation episodes using a two-stage pipeline that first annotates multimodal references of objects and then synthesizes diverse multimodal instructions. We find that existing methods struggle on MINav task, indicating substantial room for improvement in agents' multimodal grounding. To address this, we propose NaVLA^2, a vision-language-audio-action model that additionally integrates spatial audio and employs a CoThinkAct module to jointly generate high-level reasoning and consistent low-level actions. Experimental results demonstrate that NaVLA^2 significantly outperforms competitive baselines on MINav benchmark. We hope that our proposed MINav and NaVLA^2 will facilitate future research toward agents with stronger multimodal understanding and grounding capabilities for navigation.

AAAI Conference 2026 Conference Paper

Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model

  • Jiedong Zhuang
  • Lu Lu
  • Ming Dai
  • Rui Hu
  • Jian Chen
  • Qiang Liu
  • Haoji Hu

Multimodal large language models (MLLMs) are plagued by exorbitant inference costs attributable to the profusion of visual tokens within the vision encoder. The redundant visual tokens engenders a substantial computational load and key-value (KV) cache footprint bottleneck. Existing approaches focus on token-wise optimization, leveraging diverse intricate token pruning techniques to eliminate non-crucial visual tokens. Nevertheless, these methods often unavoidably undermine the integrity of the KV cache, resulting in failures in long-text generation tasks. To this end, we conduct an in-depth investigation towards the attention mechanism of the model from a new perspective, and discern that attention within more than half of all decode layers are semantic similar. Upon this finding, we contend that the attention in certain layers can be streamlined by inheriting the attention from their preceding layers. Consequently, we propose Lazy Attention, an efficient attention mechanism that enables cross-layer sharing of similar attention patterns. It ingeniously reduces layer-wise redundant computation in attention. In Lazy Attention, we develop a novel layer-shared cache, Q Cache, tailored for MLLMs, which facilitates the reuse of queries across adjacent layers. In particular, Q Cache is lightweight and fully compatible with existing inference frameworks, including Flash Attention and KV cache. Additionally, our method is highly flexible as it is orthogonal to existing token-wise techniques and can be deployed independently or combined with token pruning approaches. Empirical evaluations on multiple benchmarks demonstrate that our method can reduce KV cache usage by over 35% and achieve 1.5x throughput improvement, while sacrificing only approximately 1% of performance on various MLLMs. Compared with SOTA token-wise methods, our technique achieves superior accuracy preservation.

TMLR Journal 2026 Journal Article

Revisit, Extend, and Enhance Hessian-Free Influence Functions

  • Ziao Yang
  • Han Yue
  • Jian Chen
  • Hongfu Liu

Influence functions serve as crucial tools for assessing sample influence. By employing the first-order Taylor expansion, sample influence can be estimated without the need for expensive model retraining. However, applying influence functions directly to deep models presents challenges, primarily due to the non-convex nature of the loss function and the large size of model parameters. This difficulty not only makes computing the inverse of the Hessian matrix costly but also renders it non-existent in some cases. In this paper, we revisit a Hessian-free method, which substitutes the inverse of the Hessian matrix with an identity matrix, and offer deeper insights into why this straightforward approximation method is effective. Furthermore, we extend its applications beyond measuring model utility to include considerations of fairness and robustness. Finally, we enhance this method through an ensemble strategy. To validate its effectiveness, we conduct experiments on synthetic data and extensive evaluations on noisy label detection, sample selection for large language model fine-tuning, and defense against adversarial attacks.

AAAI Conference 2026 Conference Paper

Revisiting Cross-Architecture Distillation: Adaptive Dual-Teacher Transfer for Lightweight Video Models

  • Ying Peng
  • Hongsen Ye
  • Changxin Huang
  • Xiping Hu
  • Jian Chen
  • Runhao Zeng

Vision Transformers (ViTs) have achieved strong performance in video action recognition, but their high computational cost limits their practicality. Lightweight CNNs are more efficient but suffer from accuracy gaps. Cross-Architecture Knowledge Distillation (CAKD) addresses this by transferring knowledge from ViTs to CNNs, yet existing methods often struggle with architectural mismatch and overlook the value of stronger homogeneous CNN teachers. To tackle these challenges, we propose a Dual-Teacher Knowledge Distillation framework that leverages both a heterogeneous ViT teacher and a homogeneous CNN teacher to collaboratively guide a lightweight CNN student. We introduce two key components: (1) Discrepancy-Aware Teacher Weighting, which dynamically fuses the predictions from ViT and CNN teachers by assigning adaptive weights based on teacher confidence and prediction discrepancy with the student, enabling more informative and effective supervision; and (2) a Structure Discrepancy-Aware Distillation strategy, where the student learns the residual features between ViT and CNN teachers via a lightweight auxiliary branch, focusing on transferable architectural differences without mimicking all of ViT’s high-dimensional patterns. Extensive experiments on benchmarks including HMDB51, EPIC-KITCHENS-100, and Kinetics-400, demonstrate that our method consistently outperforms state-of-the-art distillation approaches, achieving notable performance improvements with a maximum accuracy gain of 5.95% on HMDB51.

JBHI Journal 2026 Journal Article

TinnitusLLM: A Multimodal Large Language Model Framework for Tinnitus Diagnosis Through EEG-fMRI Fusion Learning

  • Yipeng Du
  • Xiaohui Chen
  • Zewei Liu
  • Zhengwu Liu
  • Ngai Wong
  • Chi Zhang
  • Jian Chen
  • Zhiwei Ding

Accurate tinnitus diagnosis is crucial for enabling timely therapeutic intervention and longitudinal treatment monitoring. While non-invasive neuroimaging modalities-particularly electroencephalography (EEG) with millisecond temporal resolution and functional magnetic resonance imaging (fMRI) with millimeter spatial resolution- provide complementary neural features, existing diagnostic approaches remain constrained to unimodal analysis of EEG or fMRI data, inherently limiting diagnostic precision and clinical generalizability. This paper introduces TinnitusLLM, the first multimodal large language model (LLM) framework that synergistically integrates EEG and fMRI features for tinnitus diagnosis. To enable LLM-based interpretation of neural signals, this framework integrates three key components: (1) a neuroinspired positional encoding mechanism that injects neurophysiological priors into the embedding space, enabling neurologically grounded, dynamic positional mapping of EEG and fMRI tokens; (2) multimodal autoregressive pretraining on more than 500 hours of EEG and 250 hours of fMRI data to learn causally informed predictive representations; and (3) fine-tuning with a cross-modal, subject-invariant adversarial learning strategy that enforces subject-independent constraints in the shared cross-modal feature space, thereby substantially improving diagnostic robustness across subjects. We validate TinnitusLLM through comprehensive experiments on a rigorously collected multimodal dataset containing 20 participants. Quantitative evaluations demonstrate that TinnitusLLM achieves superior cross-subject diagnostic accuracy compared to the state-of-the-art baseline methods. These results underscore TinnitusLLM's potential as a clinically viable framework for objective tinnitus assessment through multimodal neural decoding.

IJCAI Conference 2025 Conference Paper

DERI: Cross-Modal ECG Representation Learning with Deep ECG-Report Interaction

  • Jian Chen
  • Xiaoru Dong
  • Wei Wang
  • Shaorui Zhou
  • Lequan Yu
  • Xiping Hu

Electrocardiogram (ECG) is widely used to diagnose cardiac conditions via deep learning methods. Although existing self-supervised learning (SSL) methods have achieved great performance in learning representation for ECG-based cardiac conditions classification, the clinical semantics can not be effectively captured. To overcome this limitation, we proposed to learn cross-modal ECG representations that contain more clinical semantics via a novel framework with \textbf{D}eep \textbf{E}CG-\textbf{R}eport \textbf{I}nteraction (\textbf{DERI}). Specifically, we design a novel framework combining multiple alignments and mutual feature reconstructions to learn effective representation of the ECG with the clinical report, which fuses the clinical semantics of the report. An RME-Module inspired by masked modeling is proposed to improve the ECG representation learning. Furthermore, we extend ECG representation learning to report generation with a language model, which is significant for evaluating clinical semantics in the learned representations and even clinical applications. Comprehensive experiments with various settings are conducted on various datasets to show the superior performance of our DERI. Our code is released on https: //github. com/cccccj-03/DERI.

ICML Conference 2025 Conference Paper

DexScale: Automating Data Scaling for Sim2Real Generalizable Robot Control

  • Guiliang Liu
  • Yueci Deng
  • Runyi Zhao
  • Huayi Zhou 0001
  • Jian Chen
  • Jietao Chen
  • Ruiyan Xu
  • Yunxin Tai

A critical prerequisite for achieving generalizable robot control is the availability of a large-scale robot training dataset. Due to the expense of collecting realistic robotic data, recent studies explored simulating and recording robot skills in virtual environments. While simulated data can be generated at higher speeds, lower costs, and larger scales, the applicability of such simulated data remains questionable due to the gap between simulated and realistic environments. To advance the Sim2Real generalization, in this study, we present DexScale, a data engine designed to perform automatic skills simulation and scaling for learning deployable robot manipulation policies. Specifically, DexScale ensures the usability of simulated skills by integrating diverse forms of realistic data into the simulated environment, preserving semantic alignment with the target tasks. For each simulated skill in the environment, DexScale facilitates effective Sim2Real data scaling by automating the process of domain randomization and adaptation. Tuned by the scaled dataset, the control policy achieves zero-shot Sim2Real generalization across diverse tasks, multiple robot embodiments, and widely studied policy model architectures, highlighting its importance in advancing Sim2Real embodied intelligence.

IJCAI Conference 2025 Conference Paper

ECG2TOK: ECG Pre-Training with Self-Distillation Semantic Tokenizers

  • Xiaoyan Yuan
  • Wei Wang
  • Han Liu
  • Jian Chen
  • Xiping Hu

Self-supervised learning (SSL) has garnered increasing attention in electrocardiogram (ECG) analysis for its effectiveness in resource-limited settings. Existing state-of-the-art SSL methods rely on time-frequency detail reconstruction, but due to the inherent redundancy of ECG signals and individual variability, these approaches often yield suboptimal performance. In contrast, discrete label prediction becomes a superior pre-training objective by encouraging models to efficiently abstract ECG high-level semantics. However, the continuity and significant variability of ECG signals pose a challenge in generating semantically discrete labels. To address this issue, we propose an ECG pretraining framework with a self-distillation semantic tokenizer (ECG2TOK), which maps continuous ECG signals into discrete labels for self-supervised training. Specifically, the tokenizer extracts semantically aware embeddings of ECG by self-distillation and performs online clustering to generate semantically rich discrete labels. Subsequently, the SSL model is trained in conjunction with masking strategies and discrete label prediction to facilitate the abstraction of high-level semantic representations. We evaluate ECG2TOK in six downstream tasks, demonstrating that ECG2TOK efficiently achieves state-of-the-art performance and up to a 30. 73% AUC increase in low-resource scenarios. Moreover, visualization experiments demonstrate that the discrete labels generated by ECG2TOK exhibit consistent semantics closely associated with clinical features. Our code is available on https: //github. com/YXYanova/ECG2TOK.

AAAI Conference 2025 Conference Paper

FedSum: Data-Efficient Federated Learning Under Data Scarcity Scenario for Text Summarization

  • Zhiyong Ma
  • Zhengping Li
  • Yuanjie Shi
  • Jian Chen

Text summarization task extracts salient information from a large amount of text for productivity enhancement. However, most existing methods heavily rely on training models from ample and centrally stored data which is infeasible to collect in practice, due to privacy concerns and data scarcity nature under several settings (e.g., edge computing or cold starting). The main challenge lies in constructing the privacy-preserving and well-behaved summarization model under the data scarcity scenario, where the data scarcity nature will lead to the knowledge shortage of the model while magnifying the impact of data bias, causing performance degeneration. To tackle this challenge, previous studies attempt to complement samples or improve the efficiency of data. The former is usually associated with high computing costs or has a large dependence on empirical settings, while the latter might not effective due to the lack of consideration of data bias. In this work, we propose FedSum which extends the standard FL framework from depth and breadth to further extract prime and diversified knowledge from limited resources for text summarization. For depth extension, we introduce a Data Partition method to cooperatively recognize the prime samples that are more significant and unbiased, and the Data skip mechanism is introduced to help the model further focus on those prime samples during the local training process. For breadth extension, FedSum extends the source of knowledge and develops the summarization model by extracting knowledge from the data samples, hidden spaces, and globally received parameters. Extensive experiments on four benchmark datasets verify the promising improvement of FedSum compared to baselines, and show its generalizability, scalability, and robustness.

EAAI Journal 2025 Journal Article

Hybrid and multiple ensemble metamodel-based evaluation for operating tunnel performance in three-dimensional spatially variable soils

  • Ning Tian
  • Jinsong Huang
  • Jian Chen
  • Kaiwei Tian
  • Peng Wu

In recent years, the Random Finite Element Method (RFEM) has gained prominence in geotechnical engineering for assessing the inherent spatial variability in the mechanical properties of both natural and processed soils. Nevertheless, RFEM often demands more extensive computational resources than deterministic finite element analysis, as it is coupled with Monte-Carlo simulations (MCS). To mitigate this computational burden, metamodeling techniques have emerged as a popular approach. This paper proposes a novel and hybrid Support Vector Regression (SVR) metamodel by fusing the RFEM analysis. The metamodel can efficiently generate the original finite element method predicted quantities with limited training by utilizing input random field features, which encapsulate high-dimensional information pertaining to spatially variable soil stiffness parameters. Furthermore, based on ensemble learning, the Bagging and Adaboost algorithms were used to develop a multiple SVR (M-SVR) ensemble learning metamodel to enhance prediction reliability. Simultaneously, considering the limitation that machine learning prediction can only provide a single value, the prediction results with confidence intervals based on Bagging ensemble algorithms were also developed to quantify the uncertainty of machine learning predictions in regression analysis. The consistency between SVR and M-SVR predictions and RFEM calculations is demonstrated through a problem involving the failure probability evaluation of tunnel longitudinal performance induced by ground surface surcharge in three-dimensional spatially variable soils. The substantial improvement in efficiency with the adoption of the SVR and M-SVR, as compared to RFEM, underscores the immense potential of machine learning algorithms in conducting geotechnical reliability analyses involved with spatial variability.

EAAI Journal 2025 Journal Article

Identification of zinc stripping defects from cathode plate based on deep learning

  • Tao Liu
  • Yibin Liu
  • Jian Chen
  • Jin Gong

During hydro-zinc smelting, the cathode plates are attached by with residual zinc or discarded due to damaged insulation strips and edging strips. Such defects limit the recycling of cathode plates. Current manual observation leads to low accuracy and speed of recognition owing to perception biases. Therefore, this work applied computer vision and deep learning semantic segmentation technology to realize the defect recognition of cathode plates. Firstly, a semantic segmentation dataset on cathode plates was constructed for training and testing the model. Then a network of attention mechanism and multiscale feature fusion (AMNet) was proposed to detect the defects. In AMNet, the encoder-decoder jump connection architecture was designed to fuse low-level and high-level features. A channel attention module was incorporated to enhance focus on the channels with important information, and the newly proposed multiscale feature extraction module was used to solve the problem of target multiscale capture. Through related parameter selection experiments, the final AMNet achieved 95. 12% and 97. 73% for Mean Intersection over Union (MIoU) and mean pixel accuracy (MPA), respectively. These values are 3. 24 and 1. 74 percentage points higher than DeepLabv3+.

AAAI Conference 2025 Conference Paper

Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

  • Qi Deng
  • Shuaicheng Niu
  • Ronghao Zhang
  • Yaofo Chen
  • Runhao Zeng
  • Jian Chen
  • Xiping Hu

Test-time adaptation (TTA) aims to fine-tune a trained model online using unlabeled testing data to adapt to new environments or out-of-distribution data, demonstrating broad application potential in real-world scenarios. However, in this optimization process, unsupervised learning objectives like entropy minimization frequently encounter noisy learning signals. These signals produce unreliable gradients, which hinder the model’s ability to converge to an optimal solution quickly and introduce significant instability into the optimization process. In this paper, we seek to resolve these issues from the perspective of optimizer design. Unlike prior TTA using manually designed optimizers like SGD, we employ a learning-to-optimize approach to automatically learn an optimizer, called Meta Gradient Generator (MGG). Specifically, we aim for MGG to effectively utilize historical gradient information during the online optimization process to optimize the current model. To this end, in MGG, we design a lightweight and efficient sequence modeling layer -- gradient memory layer. It exploits a self-supervised reconstruction loss to compress historical gradient information into network parameters, thereby enabling better memorization ability over a long-term adaptation process. We only need a small number of unlabeled samples to pre-train MGG, and then the trained MGG can be deployed to process unseen samples. Promising results on ImageNet-C/R/Sketch/A indicate that our method surpasses current state-of-the-art methods with fewer updates, less data, and significantly shorter adaptation times. Compared with a previous SOTA SAR, we achieve 7.4% accuracy improvement and 4.2x faster adaptation speed on ImageNet-C.

ICML Conference 2025 Conference Paper

Learning with Selectively Labeled Data from Multiple Decision-makers

  • Jian Chen
  • Zhehao Li
  • Xiaojie Mao

We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by multiple decision-makers, each with different decision rules. We analyze this setup under a principled instrumental variable (IV) framework and rigorously study the identification of classification risk. We establish conditions for the exact identification of classification risk and derive tight partial identification bounds when exact identification fails. We further propose a unified cost-sensitive learning (UCL) approach to learn classifiers robust to selection bias in both identification settings. Finally, we theoretically and numerically validate the efficacy of our proposed method.

ICLR Conference 2025 Conference Paper

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

  • Ranajoy Sadhukhan
  • Jian Chen
  • Zhuoming Chen
  • Vashisth Tiwari
  • Ruihang Lai
  • Jinyuan Shi
  • Ian En-Hsu Yen
  • Avner May

Large Language Models (LLMs) have become more prevalent in long-context applications such as interactive chatbots, document analysis, and agent workflows, but it is challenging to serve long-context requests with low latency and high throughput. Speculative decoding (SD) is a widely used technique to reduce latency losslessly, but the conventional wisdom suggests that its efficacy is limited to small batch sizes. In MagicDec, we show that surprisingly SD can achieve speedup even for a high throughput inference regime for moderate to long sequences. More interestingly, an intelligent drafting strategy can achieve better speedup with increasing batch size based on our rigorous analysis. MagicDec first identifies the bottleneck shifts with increasing batch size and sequence length, and uses these insights to deploy SD more effectively for high throughput inference. We leverage draft model with sparse KV cache to address the KV bottleneck, which scales with both sequence length and batch size. Additionally, we propose a theoretical model to select the optimal drafting strategy for maximum speedup. Our work highlights the broad applicability of speculative decoding in long-context serving, as it can enhance throughput and reduce latency without compromising accuracy. For moderate to long sequences, we demonstrate up to 2.51x speedup for LLaMA-3.1-8B when serving batch sizes ranging from 32 to 256 on various types of hardware and tasks.

JBHI Journal 2025 Journal Article

Multi-source Signal Fusion with Contrastive AutoEncoder for Emotion Classification

  • Shen Zhao
  • Yuzhu Hu
  • Jian Chen
  • Wei Wang
  • Xiping Hu

Emotion recognition is of great importance for human-computer interaction. Emotion recognition technology based on physiological signals has shown great potential because of its strong objectivity and real-time capability. One of the most challenging tasks in this field is how to better fuse multi-source signals to extract information as comprehensively as possible. We propose a new framework for multi-source signal fusion and emotion recognition to address key challenges in feature alignment and representation learning. First, to reduce the distance between multi-source homogeneous signals in the feature space, we design a novel Contrastive Pairs AutoEncoder (CPAE), which is for feature alignment before aggregating the signals obtained from the Dual-LSTM. We also propose a designed cross-modal frequency module (CMF-Module), using a multi-layer perceptron (MLP) to learn the real and imaginary components of the signal's frequency representation, which integrates Resblock to achieve dual-channel time-domain and frequency-domain feature extraction. Furthermore, we incorporate the hidden ordinal relationships among emotional categories into the feature space through regression loss, and constrain the feature distribution using the Wasserstein distance. Experiments on public datasets show the best performance of our proposed method by comparing with baselines. We also conduct ablation studies to better verify the effect of the proposed method.

AAAI Conference 2025 Conference Paper

Restabilizing Diffusion Models with Predictive Noise Fusion Strategy for Image Super-Resolution

  • Luoqian Jiang
  • Yong Guo
  • Bingna Xu
  • Haolin Pan
  • Jiezhang Cao
  • Wenbo Li
  • Jian Chen

Diffusion models are prominent in image generation for producing detailed and realistic images from Gaussian noises. However, they often encounter instability issues in image restoration tasks, e.g., super-resolution. Existing methods typically rely on multiple runs to find an initial noise that produces a reasonably restored image. Unfortunately, these methods are computationally expensive and time-consuming without guaranteeing stable and consistent performance. To address these challenges, we propose a novel Predictive Noise Fusion Strategy (PNFS) that predicts pixel-wise errors in the restored image and combines different noises to generate a more effective noise. Extensive experiments show that PNFS significantly improves the stability and performance of diffusion models in super-resolution, both quantitatively and qualitatively. Furthermore, PNFS can be flexibly integrated into various diffusion models to enhance their stability.

AAAI Conference 2025 Conference Paper

ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming

  • Jiedong Zhuang
  • Lu Lu
  • Ming Dai
  • Rui Hu
  • Jian Chen
  • Qiang Liu
  • Haoji Hu

Multimodal large language models (MLLMs) enhance their perceptual capabilities by integrating visual and textual information. However, processing the massive number of visual tokens incurs a significant computational cost. Existing analysis of the MLLM attention mechanisms remains shallow, leading to coarse-grain token pruning strategies that fail to effectively balance speed and accuracy. In this paper, we conduct a comprehensive investigation of MLLM attention mechanisms with LLaVA. We find that numerous visual tokens and partial attention computations are redundant during the decoding process. Based on this insight, we propose Spatial-Temporal Visual Token Trimming (ST3), a framework designed to accelerate MLLM inference without retraining. ST3 consists of two primary components: 1) Progressive Visual Token Pruning (PVTP), which eliminates inattentive visual tokens across layers, and 2) Visual Token Annealing (VTA), which dynamically reduces the number of visual tokens in each layer as the generated tokens grow. Together, these techniques deliver around 2x faster inference with only about 30% KV cache memory compared to the original LLaVA, while maintaining consistent performance across various datasets. Crucially, ST3 can be seamlessly integrated into existing pre-trained MLLMs, providing a plug-and-play solution for efficient inference.

TAAS Journal 2025 Journal Article

Vehicle Dynamics and Interaction for Trajectory Prediction and Traffic Control

  • Jian Chen
  • Shaorui Zhou
  • Wei Wang
  • Yuzhu Hu
  • Jianqing Li
  • Ben-guo He
  • Junxin Chen
  • Marwan Omar

Trajectory prediction is a crucial challenge in autonomous vehicle motion planning and decision-making techniques. However, existing methods face limitations in accurately capturing vehicle dynamics and interactions. To address this issue, this article proposes a novel approach to extracting vehicle velocity and acceleration, enabling the learning of vehicle dynamics and encoding them as auxiliary information. The VDI-LSTM model is designed, incorporating graph convolution and attention mechanisms to capture vehicle interactions using trajectory data and dynamic information. Specifically, a dynamics encoder is designed to capture the dynamic information, a dynamic graph is employed to represent vehicle interactions, and an attention mechanism is introduced to enhance the performance of LSTM and graph convolution. To demonstrate the effectiveness of our model, extensive experiments are conducted, including comparisons with several baselines and ablation studies on real-world highway datasets. Experimental results show that VDI-LSTM outperforms other baselines compared, which obtains a 3% improvement on the average RMSE indicator over the five prediction steps.

NeurIPS Conference 2024 Conference Paper

A probability contrastive learning framework for 3D molecular representation learning

  • Jiayu Qin
  • Jian Chen
  • Rohan Sharma
  • Jingchen Sun
  • Changyou Chen

Contrastive Learning (CL) plays a crucial role in molecular representation learning, enabling unsupervised learning from large scale unlabeled molecule datasets. It has inspired various applications in molecular property prediction and drug design. However, existing molecular representation learning methods often introduce potential false positive and false negative pairs through conventional graph augmentations like node masking and subgraph removal. The issue can lead to suboptimal performance when applying standard contrastive learning techniques to molecular datasets. To address the issue of false positive and negative pairs in molecular representation learning, we propose a novel probability-based contrastive learning (CL) framework. Unlike conventional methods, our approach introduces a learnable weight distribution via Bayesian modeling to automatically identify and mitigate false positive and negative pairs. This method is particularly effective because it dynamically adjusts to the data, improving the accuracy of the learned representations. Our model is learned by a stochastic expectation-maximization process, which optimizes the model by iteratively refining the probability estimates of sample weights and updating the model parameters. Experimental results indicate that our method outperforms existing approaches in 13 out of 15 molecular property prediction benchmarks in MoleculeNet dataset and 8 out of 12 benchmarks in the QM9 benchmark, achieving new state-of-the-art results on average.

EAAI Journal 2024 Journal Article

Adversarial robust decision-making under uncertainty learning and dynamic ensemble selection

  • Ruoxi Qin
  • Linyuan Wang
  • Xuehui Du
  • Jian Chen
  • Xingyuan Chen
  • Bin Yan

As the adversarial robustness research of deep neural networks has struggled in attack and defense games with static defense methodology, scholars have introduced the dynamic idea of the systems control to changeover the passive defense position though adapting decision-making. According to the different levels at which dynamism acts on neural networks, dynamic defense methods can be mainly divided into two categories: dynamic feedback control based on input level and uncertainty estimation detection based on decision level. Although both methods aim to hinder the success of the attacker, they cannot achieve the perfect conditions for constructing black box attacks because they ignore the positive role of dynamics in defense at the model level. Inspired by conventional ensemble selection technology in machine learning that treats different models as mutable objects for improving accuracy in uncertain data, this work investigates the robustness issue from a new dynamic aspect: model-level dynamic defense, whether the dynamic attributes depend on input or decision. Specifically, the Dirichlet prior combined with diversity constraint is imposed on the ensemble parameter in training phase to construct select criterion and candidate sub-models. Therefore, the final prediction of ensemble can be strategically selected though the rank of different sub-models’ uncertainty value for robust decision-making in the test phase. The experimental results indicate the comprehensive promotion of robustness (at least 4. 17% in black-box attack conditions and at least 1. 78% in the case of high-disturbance white-box attack budge) of the proposed method compared with common dynamic and static defense methods.

JBHI Journal 2024 Journal Article

An Ensemble Classification Model for Depression Based on Wearable Device Sleep Data

  • Yuzhu Hu
  • Jian Chen
  • Junxin Chen
  • Wei Wang
  • Shen Zhao
  • Xiping Hu

Depression is one of the most common mental disorders, with sleep disturbances as typical symptoms. With the popularity of wearable devices increasing in recent years, more and more people wear portable devices to track sleep quality. Based on this, we believe that depression detection through wearable sleep data is more intelligent and economical. However, the majority of wearable devices face the problem of missing data during the data collection process. Otherwise, most existing studies of depression identification focus on the utilization of complex data, making it difficult to generalize and susceptible to noise interference. To address these issues, we propose a systematic ensemble classification model for depression (ECD). For the missing data problem of wearable devices, we design an improved GAIN method to further control the generation range of interpolated values, which can achieve a more reasonable treatment of missing values. Compared with the original GAIN approach, the improved method shows a 28. 56% improvement when using MAE as the metric. For depression recognition, we use ensemble learning to construct a depression classification model which combines five classification models, including SVM, KNN, LR, CBR, and DT. Ensemble learning can improve the model's robustness and generalization. The voting mechanism is used in several places to improve noise immunity. The final classification model performed great on the dataset, with a precision of 92. 55% and a recall of 91. 89%. These results illustrate how efficient this method is in automatically detecting depression.

NeurIPS Conference 2024 Conference Paper

BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

  • Haohong Lin
  • Wenhao Ding
  • Jian Chen
  • Laixi Shi
  • Jiacheng Zhu
  • Bo Li
  • Ding Zhao

Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce B ilin E ar CAUS al r E presentation (BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL. See more details in our project page: https: //sites. google. com/view/be-cause.

JBHI Journal 2024 Journal Article

Image Recovery Matters: A Recovery-Extraction Framework for Robust Fetal Brain Extraction From MR Images

  • Jian Chen
  • Ranlin Lu
  • Shilin Ye
  • Mengting Guang
  • Tewodros Megabiaw Tassew
  • Bin Jing
  • Guofu Zhang
  • Geng Chen

The extraction of the fetal brain from magnetic resonance (MR) images is a challenging task. In particular, fetal MR images suffer from different kinds of artifacts introduced during the image acquisition. Among those artifacts, intensity inhomogeneity is a common one affecting brain extraction. In this work, we propose a deep learning-based recovery-extraction framework for fetal brain extraction, which is particularly effective in handling fetal MR images with intensity inhomogeneity. Our framework involves two stages. First, the artifact-corrupted images are recovered with the proposed generative adversarial learning-based image recovery network with a novel region-of-darkness discriminator that enforces the network focusing on artifacts of the images. Second, we propose a brain extraction network for more effective fetal brain segmentation by strengthening the association between lower- and higher-level features as well as suppressing task-irrelevant features. Thanks to the proposed recovery-extraction strategy, our framework is able to accurately segment fetal brains from artifact-corrupted MR images. The experiments show that our framework achieves promising performance in both quantitative and qualitative evaluations, and outperforms state-of-the-art methods in both image recovery and fetal brain extraction.

YNIMG Journal 2024 Journal Article

Multimodal investigation of dynamic brain network alterations in autism spectrum disorder: Linking connectivity dynamics to symptoms and developmental trajectories

  • Lin Wan
  • Yuhang Li
  • Gang Zhu
  • Dalin Yang
  • Fali Li
  • Wen Wang
  • Jian Chen
  • Guang Yang

BACKGROUND: Autism spectrum disorder (ASD) has been associated with disrupted brain connectivity, yet a comprehensive understanding of the dynamic neural underpinnings remains lacking. This study employed concurrent electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) techniques to investigate dynamic functional connectivity (dFC) patterns and neurovascular characteristics in children with ASD. We also explored associations between neurovascular characteristics and the developmental trajectory of adaptive behavior in individuals with ASD. METHODS: Resting-state EEG and fNIRS data were simultaneously recorded from 58 ASD and 63 TD children. We implemented a k-means clustering approach to extract the dFC states for each modality. In addition, a multimodal covariance network (MCN) was constructed from the EEG and fNIRS dFC features to capture the neurovascular characteristics linked to ASD. RESULTS: EEG analyses revealed atypical properties of dFC states in the beta and gamma bands in children with ASD compared to TD children. For fNIRS, the ASD group exhibited atypical properties of dFC states such as duration and transitions relative to the TD group. The MCN analysis revealed significantly suppressed functional covariance between right superior temporal and left Broca's areas, alongside enhanced right dorsolateral prefrontal-left Broca covariance in ASD. Notably, we found that early neurovascular characteristics can predict the developmental progress of adaptive functioning in ASD. CONCLUSION: The multimodal investigation revealed distinct dFC patterns and neurovascular characteristics associated with ASD, elucidating potential neural mechanisms underlying core symptoms and their developmental trajectories. Our study highlights that integrating complementary neuroimaging modalities may aid in unraveling the complex neurobiology of ASD.

JAIR Journal 2023 Journal Article

Decentralized Gradient-Quantization Based Matrix Factorization for Fast Privacy-Preserving Point-of-Interest Recommendation

  • Xuebin Zhou
  • Zhibin Hu
  • Jin Huang
  • Jian Chen

With the rapidly growing of location-based social networks, point-of-interest (POI) recommendation has been attracting tremendous attentions. Previous works for POI recommendation usually use matrix factorization (MF)-based methods, which achieve promising performance. However, existing MF-based methods suffer from two critical limitations: (1) Privacy issues: all users’ sensitive data are collected to the centralized server which may leak on either the server side or during transmission. (2) Poor resource utilization and training efficiency: training on centralized server with potentially huge low-rank matrices is computational inefficient. In this paper, we propose a novel decentralized gradient-quantization based matrix factorization (DGMF) framework to address the above limitations in POI recommendation. Compared with the centralized MF methods which store all sensitive data and low-rank matrices during model training, DGMF treats each user’s device (e.g., phone) as an independent learner and keeps the sensitive data on each user’s end. Furthermore, a privacy-preserving and communication-efficient mechanism with gradient-quantization technique is presented to train the proposed model, which aims to handle the privacy problem and reduces the communication cost in the decentralized setting. Theoretical guarantees of the proposed algorithm and experimental studies on real-world datasets demonstrate the effectiveness of the proposed algorithm.

EAAI Journal 2023 Journal Article

Discrete limited attentional collaborative filtering for fast social recommendation

  • Zhibin Hu
  • Xuebin Zhou
  • Zhiwei He
  • Zehang Yang
  • Jian Chen
  • Jin Huang

Over the last few years, social recommendation has attracted tremendous attention due to the ever-growing online social platform such as Twitter and Facebook. However, as the number of users increases rapidly, recommendation efficiency has become the bottleneck of many existing social recommender systems due to the computation and storage of real-valued models. For addressing the efficiency problem, recent researches resolve it by introducing hashing technique into social recommender systems. By mapping real values to discrete values, the computational speed is guaranteed as well as the storage cost is reduced. Nevertheless, these methods suffer from two critical limitations: (1) The inevitable quantization loss brought by hash function decreases recommendation accuracy to a certain extent. (2) The original social relations contain massive noise that may result in sub-optimal accuracy of recommendation without considering the fact that people can only pay attention to a small number of their friends. Therefore, to tackle the above limitations and have a better tradeoff between accuracy and efficiency, in this paper, we propose a novel social recommendation method called Discrete Limited Attentional Collaborative Filtering (DLACF), which models recommendation objective with limited attention as a constrained mix-integer optimization problem. Since the original problem is NP-hard, we further devise a computationally efficient optimization algorithm to learn the binary codes as well as to estimate the best influential friends. Experimental results conducted on two real-world datasets demonstrate the effectiveness of our proposed model, achieving the averaged improvement of 118. 7% and 54. 7% compared to state-of-the-art discrete methods.

YNICL Journal 2023 Journal Article

Examining post-concussion white matter change in a pediatric sample

  • Michael Takagi
  • Gareth Ball
  • Franz E. Babl
  • Nicholas Anderson
  • Jian Chen
  • Cathriona Clarke
  • Gavin A. Davis
  • Stephen J.C. Hearps

Diffusion-Weight Imaging (DWI) is increasingly used to explore a range of outcomes in pediatric concussion, particularly the neurobiological underpinnings of symptom recovery. However, the DWI findings within the broader pediatric concussion literature are mixed, which can largely be explained by methodological heterogeneity. To address some of these limitations, the aim of the present study was to utilize internationally- recognized criteria for concussion and a consistent imaging timepoint to conduct a comprehensive, multi-parametric survey of white matter microstructure after concussion. Forty-three children presenting with concussion to the emergency department of a tertiary level pediatric hospital underwent neuroimaging and were classified as either normally recovering (n = 27), or delayed recovering (n = 14) based on their post-concussion symptoms at 2 weeks post-injury.We combined multiple DWI metrics across four modeling approaches using Linked Independent Component Analysis (LICA) to extract several independent patterns of covariation in tissue microstructure present in the study cohort. Our analysis did not identify significant differences between the symptomatic and asymptomatic groups and no component significantly predicted delayed recovery. If white matter microstructure changes are implicated in delayed recovery from concussion, these findings, alongside previous work, suggest that current diffusion techniques are insufficient to detect those changes at this time.

NeurIPS Conference 2023 Conference Paper

FGPrompt: Fine-grained Goal Prompting for Image-goal Navigation

  • Xinyu Sun
  • Peihao Chen
  • Jugang Fan
  • Jian Chen
  • Thomas Li
  • Mingkui Tan

Learning to navigate to an image-specified goal is an important but challenging task for autonomous systems like household robots. The agent is required to well understand and reason the location of the navigation goal from a picture shot in the goal position. Existing methods try to solve this problem by learning a navigation policy, which captures semantic features of the goal image and observation image independently and lastly fuses them for predicting a sequence of navigation actions. However, these methods suffer from two major limitations. 1) They may miss detailed information in the goal image, and thus fail to reason the goal location. 2) More critically, it is hard to focus on the goal-relevant regions in the observation image, because they attempt to understand observation without goal conditioning. In this paper, we aim to overcome these limitations by designing a Fine-grained Goal Prompting (\sexyname) method for image-goal navigation. In particular, we leverage fine-grained and high-resolution feature maps in the goal image as prompts to perform conditioned embedding, which preserves detailed information in the goal image and guides the observation encoder to pay attention to goal-relevant regions. Compared with existing methods on the image-goal navigation benchmark, our method brings significant performance improvement on 3 benchmark datasets (\textit{i. e. ,} Gibson, MP3D, and HM3D). Especially on Gibson, we surpass the state-of-the-art success rate by 8\% with only 1/50 model size.

IROS Conference 2023 Conference Paper

Hierarchical Attention Network for Planning-Informed Multi-Agent Trajectory Prediction

  • Wenyi Xiong
  • Jian Chen
  • Xinfang Zhang
  • Qi Wang
  • Ziheng Qi

The accurate prediction of the neighboring vehicles' trajectories affects the security of autonomous driving vehicles. However, it is challenging for existing methods to anticipating the trajectories of vehicles in the vicinity due to the uncertainty of driving behaviors and the complex interaction patterns of traffic flows. In this study, incorporating the planning information of the ego vehicle, we propose a novel trajectory prediction approach based on the hierarchical attention mechanism. Firstly, a spatio-temporary attention module is presented to extract the social interaction of surrounding vehicles and capture the temporal dependence of continuous frame historical information and planning information. Then, a hard-soft attention module is designed to perform two tasks: weighing the importance of both historical and future information, and learning different location information about the target vehicles. Our method is evaluated on two national highway datasets. The experimental results show that our algorithm achieves the state-of-the-art performance.

NeurIPS Conference 2023 Conference Paper

Label-Retrieval-Augmented Diffusion Models for Learning from Noisy Labels

  • Jian Chen
  • Ruiyi Zhang
  • Tong Yu
  • Rohan Sharma
  • Zhiqiang Xu
  • Tong Sun
  • Changyou Chen

Learning from noisy labels is an important and long-standing problem in machine learning for real applications. One of the main research lines focuses on learning a label corrector to purify potential noisy labels. However, these methods typically rely on strict assumptions and are limited to certain types of label noise. In this paper, we reformulate the label-noise problem from a generative-model perspective, i. e. , labels are generated by gradually refining an initial random guess. This new perspective immediately enables existing powerful diffusion models to seamlessly learn the stochastic generative process. Once the generative uncertainty is modeled, we can perform classification inference using maximum likelihood estimation of labels. To mitigate the impact of noisy labels, we propose the L abel- R etrieval- A ugmented (LRA) diffusion model, which leverages neighbor consistency to effectively construct pseudo-clean labels for diffusion training. Our model is flexible and general, allowing easy incorporation of different types of conditional information, e. g. , use of pre-trained models, to further boost model performance. Extensive experiments are conducted for evaluation. Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets. Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases. Code is available: https: //anonymous. 4open. science/r/LRA-diffusion-5F2F

YNICL Journal 2022 Journal Article

A diffusion MRI study of brain white matter microstructure in adolescents and adults with a Fontan circulation: Investigating associations with resting and peak exercise oxygen saturations and cognition

  • Charlotte E Verrall
  • Jian Chen
  • Chun-Hung Yeh
  • Mark T Mackay
  • Yves d'Udekem
  • David S Winlaw
  • Ajay Iyengar
  • Julian Ayer

INTRODUCTION: ), and attention and processing speed. METHODS: were measured during cardiopulmonary exercise testing (CPET; N = 81). Attention and processing speed were assessed using Cogstate (N = 67 and 70, respectively). Linear regression analyses adjusted for age, sex, and intracranial volume were performed to investigate associations between i) tract-specific DTI metrics and CPET variables, and ii) tract-specific DTI metrics and attention and processing speed z-scores. RESULTS: was positively associated with FA of the left uncinate fasciculus (p < 0.01). Negative associations were identified between mean FA of the right arcuate fasciculus, right SLF-II and right SLF-III and processing speed (p ≤ 0.01). No significant associations were identified between DTI-based metrics and attention. CONCLUSION: Chronic hypoxemia may have long-term detrimental impact on white matter microstructure in people living with a Fontan circulation. Paradoxical associations between processing speed and tract-specific DTI metrics could be suggestive of compensatory white matter remodeling. Longitudinal investigations focused on the mechanisms and trajectory of altered white matter microstructure and associated cognitive dysfunction in people with a Fontan circulation are required to better understand causal associations.

EAAI Journal 2022 Journal Article

A spatial temporal graph neural network model for predicting flashover in arbitrary building floorplans

  • Wai Cheong Tam
  • Eugene Yujun Fu
  • Jiajia Li
  • Xinyan Huang
  • Jian Chen
  • Michael Xuelin Huang

Rapid fire progression, such as flashover, has been one of the leading causes for firefighter deaths and injuries in residential building environments. Due to long computational time of and the required prior knowledge about the fire scene, existing models cannot be used to predict the potential occurrence of flashover in practical firefighting applications. In this paper, a scene-agnostic model (FlashNet) is proposed to predict flashover based on limited heat detector temperature information up to 150 °C. FlashNet utilizes spatial temporal graph convolutional neural networks to effectively learn features from the limited temperature information and to tackle building structure variations. The proposed model is benchmarked against five different state-of-the-art flashover prediction models. Results show that FlashNet outperforms the existing flashover prediction models and it can reliably predict flashover 30 s preceding its occurrence with an overall accuracy of about 92. 1%. Ablation study is carried out to examine the effectiveness of different key model components and geometric average adjacency matrix. The research outcomes from this study are expected to enhance firefighters’ situational awareness in the fire scene, protecting them from hazardous fire environments and to pave the way for the development of data-driven prediction systems.

YNICL Journal 2022 Journal Article

Assessment of intraoperative diffusion EPI distortion and its impact on estimation of supratentorial white matter tract positions in pediatric epilepsy surgery

  • Joseph Yuan-Mou Yang
  • Jian Chen
  • Bonnie Alexander
  • Kurt Schilling
  • Michael Kean
  • Alison Wray
  • Marc Seal
  • Wirginia Maixner

The effectiveness of correcting diffusion Echo Planar Imaging (EPI) distortion and its impact on tractography reconstruction have not been adequately investigated in the intraoperative MRI setting, particularly for High Angular Resolution Diffusion Imaging (HARDI) acquisition. In this study, we evaluated the effectiveness of EPI distortion correction using 27 legacy intraoperative HARDI datasets over two consecutive surgical time points, acquired without reverse phase-encoded data, from 17 children who underwent epilepsy surgery at our institution. The data was processed with EPI distortion correction using the Synb0-Disco technique (Schilling et al., 2019) and without distortion correction. The corrected and uncorrected b0 diffusion-weighted images (DWI) were first compared visually. The mutual information indices between the original T1-weighted images and the fractional anisotropy images derived from corrected and uncorrected DWI were used to quantify the effect of distortion correction. Sixty-four white matter tracts were segmented from each dataset, using a deep-learning based automated tractography algorithm for the purpose of a standardized and unbiased evaluation. Displacement was calculated between tracts generated before and after distortion correction. The tracts were grouped based on their principal morphological orientations to investigate whether the effects of EPI distortion vary with tract orientation. Group differences in tract distortion were investigated both globally, and regionally with respect to proximity to the resecting lesion in the operative hemisphere. Qualitatively, we observed notable improvement in the corrected diffusion images, over the typically affected brain regions near skull-base air sinuses, and correction of additional distortion unique to intraoperative open cranium images, particularly over the resection site. This improvement was supported quantitatively, as mutual information indices between the FA and T1-weighted images were significantly greater after the correction, compared to before the correction. Maximum tract displacement between the corrected and uncorrected data, was in the range of 7.5 to 10.0 mm, a magnitude that would challenge the safety resection margin typically tolerated for tractography-informed surgical guidance. This was particularly relevant for tracts oriented partially or fully in-line with the acquired diffusion phase-encoded direction. Portions of these tracts passing close to the resection site demonstrated significantly greater magnitude of displacement, compared to portions of tracts remote from the resection site in the operative hemisphere. Our findings have direct clinical implication on the accuracy of intraoperative tractography-informed image guidance and emphasize the need to develop a distortion correction technique with feasible intraoperative processing time.

IS Journal 2022 Journal Article

Recognition Model of Sideslip of Surrounding Vehicles Based on Perception Information of Driverless Vehicle

  • Yunfeng Xiang
  • Yansong He
  • Yugong Luo
  • Dexu Bu
  • Weiwei Kong
  • Jian Chen

At present, many vehicle sideslip driving status estimation approaches based on the inner information of the sideslip vehicle have been studied. However, the method of identifying the sideslip in surrounding vehicles is rarely developed. The surrounding severe sideslip vehicle threatens the safety of driverless vehicles if the driverless vehicle cannot detect the surrounding sideslip vehicle. Therefore, this study proposes a sideslip recognition model that uses the perception information of driverless vehicles to assess the sideslip driving status of the surrounding vehicles. First, the severe sideslip is defined, which may influence the safety of driverless vehicles, and the sideslip recognition problem is described. Second, the severe sideslip is divided into two categories according to different sideslip trajectories. The two types of sideslip progress are analyzed, and the sideslip features of the two types of serious sideslip are extracted. A sideslip recognition model is established using a logical rule method based on the sideslip features. Finally, some simulation experiments are designed to verify the proposed sideslip recognition model. The simulation results show that the true-positive rate and the false-positive rates are 100% and 6. 8%, respectively, which demonstrates that the proposed sideslip recognition model has a good performance.

YNIMG Journal 2021 Journal Article

Tractography dissection variability: What happens when 42 groups dissect 14 white matter bundles on the same dataset?

  • Kurt G. Schilling
  • François Rheault
  • Laurent Petit
  • Colin B. Hansen
  • Vishwesh Nath
  • Fang-Cheng Yeh
  • Gabriel Girard
  • Muhamed Barakovic

White matter bundle segmentation using diffusion MRI fiber tractography has become the method of choice to identify white matter fiber pathways in vivo in human brains. However, like other analyses of complex data, there is considerable variability in segmentation protocols and techniques. This can result in different reconstructions of the same intended white matter pathways, which directly affects tractography results, quantification, and interpretation. In this study, we aim to evaluate and quantify the variability that arises from different protocols for bundle segmentation. Through an open call to users of fiber tractography, including anatomists, clinicians, and algorithm developers, 42 independent teams were given processed sets of human whole-brain streamlines and asked to segment 14 white matter fascicles on six subjects. In total, we received 57 different bundle segmentation protocols, which enabled detailed volume-based and streamline-based analyses of agreement and disagreement among protocols for each fiber pathway. Results show that even when given the exact same sets of underlying streamlines, the variability across protocols for bundle segmentation is greater than all other sources of variability in the virtual dissection process, including variability within protocols and variability across subjects. In order to foster the use of tractography bundle dissection in routine clinical settings, and as a fundamental analytical tool, future endeavors must aim to resolve and reduce this heterogeneity. Although external validation is needed to verify the anatomical accuracy of bundle dissections, reducing heterogeneity is a step towards reproducible research and may be achieved through the use of standard nomenclature and definitions of white matter bundles and well-chosen constraints and decisions in the dissection process.

NeurIPS Conference 2020 Conference Paper

Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps

  • Chaozheng Wu
  • Jian Chen
  • Qiaoyu Cao
  • Jianchi Zhang
  • Yunxin Tai
  • Lin Sun
  • Kui Jia

Learning robotic grasps from visual observations is a promising yet challenging task. Recent research shows its great potential by preparing and learning from large-scale synthetic datasets. For the popular, 6 degree-of-freedom (6-DOF) grasp setting of parallel-jaw gripper, most of existing methods take the strategy of heuristically sampling grasp candidates and then evaluating them using learned scoring functions. This strategy is limited in terms of the conflict between sampling efficiency and coverage of optimal grasps. To this end, we propose in this work a novel, end-to-end \emph{Grasp Proposal Network (GPNet)}, to predict a diverse set of 6-DOF grasps for an unseen object observed from a single and unknown camera view. GPNet builds on a key design of grasp proposal module that defines \emph{anchors of grasp centers} at discrete but regular 3D grid corners, which is flexible to support either more precise or more diverse grasp predictions. To test GPNet, we contribute a synthetic dataset of 6-DOF object grasps; evaluation is conducted using rule-based criteria, simulation test, and real test. Comparative results show the advantage of our methods over existing ones. Notably, GPNet gains better simulation results via the specified coverage, which helps achieve a ready translation in real test. Our code and dataset are available on \url{https: //github. com/CZ-Wu/GPNet}.

YNIMG Journal 2020 Journal Article

Long-term development of white matter fibre density and morphology up to 13 years after preterm birth: A fixel-based analysis

  • Claire E. Kelly
  • Deanne K. Thompson
  • Sila Genc
  • Jian Chen
  • Joseph YM. Yang
  • Chris Adamson
  • Richard Beare
  • Marc L. Seal

Background It is well documented that infants born very preterm (VP) are at risk of brain injury and altered brain development in the neonatal period, however there is a lack of long-term, longitudinal studies on the effects of VP birth on white matter development over childhood. Most previous studies were based on voxel-averaged, non-fibre-specific diffusion magnetic resonance imaging (MRI) measures, such as fractional anisotropy. In contrast, the novel diffusion MRI analysis framework, fixel-based analysis (FBA), enables whole-brain analysis of microstructural and macrostructural properties of individual fibre populations at a sub-voxel level. We applied FBA to investigate the long-term implications of VP birth and associated perinatal risk factors on fibre development in childhood and adolescence. Methods Diffusion images were acquired for a cohort of VP (born <30 weeks’ gestation) and full-term (FT, ≥37 weeks’ gestation) children at two timepoints: mean (SD) 7. 6 (0. 2) years (n ​= ​138 VP and 32 FT children) and 13. 3 (0. 4) years (n ​= ​130 VP and 45 FT children). 103 VP and 21 FT children had images at both ages for longitudinal analysis. At every fixel (individual fibre population within an image voxel) across the white matter, we compared FBA metrics (fibre density (FD), cross-section (FC) and a combination of these properties (FDC)) between VP and FT groups cross-sectionally at each timepoint, and longitudinally between timepoints. We also examined associations between known perinatal risk factors and FBA metrics in the VP group. Results Compared with FT children, VP children had lower FD, FC and FDC throughout the white matter, particularly in the corpus callosum, tapetum, inferior fronto-occipital fasciculus, fornix and cingulum at ages 7 and 13 years, as well as the corticospinal tract and anterior limb of the internal capsule at age 13 years. VP children also had slower FDC development in the corpus callosum and corticospinal tract between ages 7 and 13 years compared with FT children. Within VP children, earlier gestational age at birth, lower birth weight z-score, and neonatal brain abnormalities were associated with lower FD, FC and FDC throughout the white matter at both ages. Conclusions VP birth and concomitant perinatal risk factors are associated with fibre tract-specific alterations to axonal development in childhood and adolescence.

JMLR Journal 2020 Journal Article

ThunderGBM: Fast GBDTs and Random Forests on GPUs

  • Zeyi Wen
  • Hanfeng Liu
  • Jiashuai Shi
  • Qinbin Li
  • Bingsheng He
  • Jian Chen

Gradient Boosting Decision Trees (GBDTs) and Random Forests (RFs) have been used in many real-world applications. They are often a standard recipe for building state-of-the-art solutions to machine learning and data mining problems. However, training and prediction are very expensive computationally for large and high dimensional problems. This article presents an efficient and open source software toolkit called ThunderGBM which exploits the high-performance Graphics Processing Units (GPUs) for GBDTs and RFs. ThunderGBM supports classification, regression and ranking, and can run on single or multiple GPUs of a machine. Our experimental results show that ThunderGBM outperforms the existing libraries while producing similar models, and can handle high dimensional problems where existing GPU-based libraries fail. Documentation, examples, and more details about ThunderGBM are available at https://github.com/xtra-computing/thundergbm. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

YNIMG Journal 2019 Journal Article

Changes in neonatal regional brain volume associated with preterm birth and perinatal factors

  • Bonnie Alexander
  • Claire E. Kelly
  • Chris Adamson
  • Richard Beare
  • Diana Zannino
  • Jian Chen
  • Andrea L. Murray
  • Wai Yen Loh

Background Preterm birth is associated with altered brain development, with younger gestational age (GA) at birth often associated with greater brain volume reduction. Such volume alterations at term equivalent age (TEA) have been found with differing magnitude across different brain regions, although this has mostly been investigated with regards to whole tissue volumes and large-scale subdivisions. In addition to degree of prematurity, many other perinatal factors have been found to influence brain structure and development in infants born preterm. We aimed to clarify the relationships between degree of prematurity and regional brain volumes at TEA, and between perinatal factors and regional brain volumes at TEA, in finer spatial detail. Methods 285 preterm and term-born infants (GA at birth 24. 6–42. 1 weeks; 145 female; 59 born at term) were scanned at TEA. Data on perinatal factors were obtained by chart review, including sex, multiple birth, birthweight standard deviation (SD) score, postnatal growth and social risk. The Melbourne Children's Regional Infant Brain (M-CRIB) atlas was registered to the current sample, then 100 brain regions were labelled for volumetric analyses. Linear regressions with generalised estimating equations and likelihood ratio tests were performed to investigate whether GA at birth or perinatal factors were associated with regional volumes at TEA. Results Younger GA at birth was associated with smaller volumes at TEA in some regions including bilateral cerebral white matter, middle temporal gyri, amygdalae, pallidum and brainstem. In other regions, younger GA at birth was associated with larger volumes, including in primary visual, motor and somatosensory cortices. Positive associations between perinatal factors and regional volumes at TEA were found in many brain regions for birthweight SD score, and male sex, independent of GA at birth. These associations were seen on both univariable analyses, and multivariable analyses controlling for other perinatal factors. Social risk and multiple birth were generally not associated with regional brain volumes, and postnatal growth was associated with volume in many regions only after adjusting for other perinatal factors. Conclusions These results elucidate regional brain volume differences associated with preterm birth and perinatal factors at a more detailed parcellated level than previously reported, and contribute to understanding of the complex array of correlates of preterm birth.

YNICL Journal 2019 Journal Article

Characterisation of brain volume and microstructure at term-equivalent age in infants born across the gestational age spectrum

  • Deanne K. Thompson
  • Claire E. Kelly
  • Jian Chen
  • Richard Beare
  • Bonnie Alexander
  • Marc L. Seal
  • Katherine J. Lee
  • Lillian G. Matthews

BACKGROUND: Risk of morbidity differs between very preterm (VP; <32 weeks' gestational age (GA)), moderate preterm (MP; 32-33 weeks' GA), late preterm (LP; 34-36 weeks' GA), and full-term (FT; ≥37 weeks' GA) infants. However, brain structure at term-equivalent age (TEA; 38-44 weeks) remains to be characterised in all clinically important GA groups. We aimed to compare global and regional brain volumes, and regional white matter microstructure, between VP, MP, LP and FT groups at TEA, in order to establish the magnitude and anatomical locations of between-group differences. METHODS: Structural images from 328 infants (91 VP, 63 MP, 104 LP and 70 FT) were segmented into white matter, cortical grey matter, cerebrospinal fluid (CSF), subcortical grey matter, brainstem and cerebellum. Global tissue volumes were analysed, and additionally, cortical grey matter and white matter volumes were analysed at the regional level using voxel-based morphometry. Fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD) and radial diffusivity (RD) images from 361 infants (92 VP, 69 MP, 120 LP and 80 FT) were analysed using Tract-Based Spatial Statistics. Statistical analyses involved examining the overall effect of GA group on global volumes (using linear regressions) and regional volumes and microstructure (using non-parametric permutation testing), as well performing post-hoc comparisons between the GA sub-groups. RESULTS: On global analysis, cerebrospinal fluid (CSF) volume was larger in all preterm sub-groups compared with the FT group. On regional analysis, volume was smaller in parts of the temporal cortical grey matter, and parts of the temporal white matter and corpus callosum, in all preterm sub-groups compared with the FT group. FA was lower, and RD and MD were higher in voxels located in much of the white matter in all preterm sub-groups compared with the FT group. The anatomical locations of group differences were similar for each preterm vs. FT comparison, but the magnitude and spatial extent of group differences was largest for the VP, followed by the MP, and then the LP comparison. Comparing within the preterm groups, the VP sub-group had smaller frontal and temporal grey and white matter volume, and lower FA and higher MD and RD within voxels in the approximate location of the corpus callosum compared with the MP sub-group. There were few volume and microstructural differences between the MP and LP sub-groups. CONCLUSION: All preterm sub-groups had atypical brain volume and microstructure at TEA when compared with a FT group, particularly for the CSF, temporal grey and white matter, and corpus callosum. In general, the groups followed a gradient, where the differences were most pronounced for the VP group, less pronounced for the MP group, and least pronounced for the LP group. The VP sub-group was particularly vulnerable compared with the MP and LP sub-groups.

YNIMG Journal 2019 Journal Article

Early life predictors of brain development at term-equivalent age in infants born across the gestational age spectrum

  • Deanne K. Thompson
  • Claire E. Kelly
  • Jian Chen
  • Richard Beare
  • Bonnie Alexander
  • Marc L. Seal
  • Katherine Lee
  • Lillian G. Matthews

Background It is well established that preterm infants have altered brain development compared with full-term (FT; ≥37 weeks' gestational age [GA]) infants, however the perinatal factors associated with brain development in preterm infants have not been fully elucidated. In particular, perinatal predictors of brain development may differ between very preterm infants (VP; <32 weeks' GA) and infants born moderate (MP; 32–33 weeks' GA) and late (LP; 34–36 weeks' GA) preterm, but this has not been studied. This study aimed to investigate the effects of early life predictors on brain volume and microstructure at term-equivalent age (TEA; 38–44 weeks), and whether these effects differ for GA groups (VP, MP, LP or FT). Methods Structural images from 328 infants (91 VP, 63 MP, 104 LP and 70 FT) were segmented into white matter, cortical grey matter, cerebrospinal fluid, subcortical grey matter, brainstem and cerebellum. Cortical grey matter and white matter images were analysed using voxel-based morphometry. Fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD) and radial diffusivity (RD) images from 361 infants (92 VP, 69 MP, 120 LP and 80 FT) were analysed using Tract-Based Spatial Statistics. Relationships between early life predictors (birthweight standard deviation score [BWSDS], multiple birth, sex, postnatal growth and social risk) and global brain volumes were analysed using linear regressions. Relationships between early life predictors and regional brain volumes and diffusion measures were analysed using voxelwise non-parametric permutation testing. Results Male sex was associated with higher global volumes of all tissues and higher regional volumes throughout much of the cortical grey matter and white matter, particularly in the FT group. Male sex was also associated with lower FA and higher AD, RD and MD in the optic radiation, external and internal capsules and corona radiata, and these associations were generally similar between GA groups. Higher BWSDS was associated with higher global volumes of all tissues and higher regional volumes in much of the cortical grey matter and white matter in all GA groups, as well as higher FA and lower RD and MD in many major tracts (corpus callosum, optic radiation, internal and external capsules and corona radiata), particularly in the MP and LP groups. Multiple birth and social risk also showed associations with global and regional volumes and regional diffusion values which varied by GA group, but these associations were not independent of the other early life predictors. Postnatal growth was not associated with brain volumes or diffusion values. Conclusion Early life predictors of brain volumes and microstructure at TEA include sex, BWSDS, multiple birth and social risk, which have different effects based on GA group at birth. This study improves knowledge of the perinatal factors associated with brain abnormalities in infants born across the prematurity spectrum.

IJCAI Conference 2019 Conference Paper

Multi-Level Visual-Semantic Alignments with Relation-Wise Dual Attention Network for Image and Text Matching

  • Zhibin Hu
  • Yongsheng Luo
  • Jiong Lin
  • Yan Yan
  • Jian Chen

Image-text matching is central to visual-semantic cross-modal retrieval and has been attracting extensive attention recently. Previous studies have been devoted to finding the latent correspondence between image regions and words, e. g. , connecting key words to specific regions of salient objects. However, existing methods are usually committed to handle concrete objects, rather than abstract ones, e. g. , a description of some action, which in fact are also ubiquitous in description texts of real-world. The main challenge in dealing with abstract objects is that there is no explicit connections between them, unlike their concrete counterparts. One therefore has to alternatively find the implicit and intrinsic connections between them. In this paper, we propose a relation-wise dual attention network (RDAN) for image-text matching. Specifically, we maintain an over-complete set that contains pairs of regions and words. Then built upon this set, we encode the local correlations and the global dependencies between regions and words by training a visual-semantic network. Then a dual pathway attention network is presented to infer the visual-semantic alignments and image-text similarity. Extensive experiments validate the efficacy of our method, by achieving the state-of-the-art performance on several public benchmark datasets.

NeurIPS Conference 2019 Conference Paper

NAT: Neural Architecture Transformer for Accurate and Compact Architectures

  • Yong Guo
  • Yin Zheng
  • Mingkui Tan
  • Qi Chen
  • Jian Chen
  • Peilin Zhao
  • Junzhou Huang

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e. g. , convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e. g. , skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w. r. t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i. e. , CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

AAAI Conference 2018 Conference Paper

Double Forward Propagation for Memorized Batch Normalization

  • Yong Guo
  • Qingyao Wu
  • Chaorui Deng
  • Jian Chen
  • Mingkui Tan

Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs). Although the standard BN can significantly accelerate the training of DNNs and improve the generalization performance, it has several underlying limitations which may hamper the performance in both training and inference. In the training stage, BN relies on estimating the mean and variance of data using a single minibatch. Consequently, BN can be unstable when the batch size is very small or the data is poorly sampled. In the inference stage, BN often uses the so called moving mean and moving variance instead of batch statistics, i. e. , the training and inference rules in BN are not consistent. Regarding these issues, we propose a memorized batch normalization (MBN), which considers multiple recent batches to obtain more accurate and robust statistics. Note that after the SGD update for each batch, the model parameters will change, and the features will change accordingly, leading to the Distribution Shift before and after the update for the considered batch. To alleviate this issue, we present a simple Double-Forward scheme in MBN which can further improve the performance. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference. Empirical results show that the MBN based models trained with the Double-Forward scheme greatly reduce the sensitivity of data and significantly improve the generalization performance.

AAAI Conference 2018 Short Paper

Efficient Support Vector Machine Training Algorithm on GPUs

  • Jiashuai Shi
  • Zeyi Wen
  • Bingsheng He
  • Jian Chen

Support Vector Machines (SVMs) are popular for many machine learning tasks. With rapid growth of dataset size, the high cost of training limits the wide use of SVMs. Several SVM implementations on GPUs have been proposed to accelerate SVMs. However, they support only classification (SVC) or regression (SVR). In this work, we propose a simple and effective SVM training algorithm on GPUs which can be used for SVC, SVR and one-class SVM. Initial experiments show that our implementation outperforms existing ones. We are in the process of encapsulating our algorithm into an easy-to-use library which has Python, R and MATLAB interfaces.

AAAI Conference 2018 Short Paper

Selecting Proper Multi-Class SVM Training Methods

  • Yawen Chen
  • Zeyi Wen
  • Jian Chen
  • Jin Huang

Support Vector Machines (SVMs) are excellent candidate solutions to solving multi-class problems, and multi-class SVMs can be trained by several different methods. Different training methods commonly produce SVMs with different effectiveness, and no multi-class SVM training method always outperforms other multi-class SVM training methods on all problems. This raises difficulty for practitioners to choose the best training method for a given problem. In this work, we propose a Multi-class Method Selection (MMS) approach to help users select the most appropriate method among one-versus-one (OVO), one-versus-all (OVA) and structural SVMs (SSVMs) for a given problem. Our key idea is to select the training method based on the distribution of training data and the similarity between different classes. Using the distribution and class similarity, we estimate the unclassifiable rate of each multi-class SVM training method, and select the training method with the minimum unclassifiable rate. Our initial findings show: (i) SSVMs with linear kernel perform worse than OVO and OVA; (ii) MMS often produces SVM classifiers that can confidently classify unseen instances.

JMLR Journal 2018 Journal Article

ThunderSVM: A Fast SVM Library on GPUs and CPUs

  • Zeyi Wen
  • Jiashuai Shi
  • Qinbin Li
  • Bingsheng He
  • Jian Chen

Support Vector Machines (SVMs) are classic supervised learning models for classification, regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that 26% of the data mining and machine learning practitioners are users of SVMs. However, SVM training and prediction are very expensive computationally for large and complex problems. This paper presents an efficient and open source SVM software toolkit called ThunderSVM which exploits the high-performance of Graphics Processing Units (GPUs) and multi-core CPUs. ThunderSVM supports all the functionalities-including classification (SVC), regression (SVR) and one-class SVMs-of LibSVM and uses identical command line options, such that existing LibSVM users can easily apply our toolkit. ThunderSVM can be used through multiple language interfaces including C/C++, Python, R and MATLAB. Our experimental results show that ThunderSVM is generally an order of magnitude faster than LibSVM while producing identical SVMs. In addition to the high efficiency, we design our convex optimization solver in a general way such that SVC, SVR, and one-class SVMs share the same solver for the ease of maintenance. Documentation, examples, and more about ThunderSVM are available at https://github.com/zeyiwen/thundersvm [abs] [ pdf ][ bib ] [ webpage ] [ code ] &copy JMLR 2018. ( edit, beta )

YNIMG Journal 2017 Journal Article

A new neonatal cortical and subcortical brain atlas: the Melbourne Children's Regional Infant Brain (M-CRIB) atlas

  • Bonnie Alexander
  • Andrea L. Murray
  • Wai Yen Loh
  • Lillian G. Matthews
  • Chris Adamson
  • Richard Beare
  • Jian Chen
  • Claire E. Kelly

Investigating neonatal brain structure and function can offer valuable insights into behaviour and cognition in healthy and clinical populations; both at term age, and longitudinally in comparison with later time points. Parcellated brain atlases for adult populations are readily available, however warping infant data to adult template space is not ideal due to morphological and tissue differences between these groups. Several parcellated neonatal atlases have been developed, although there remains strong demand for manually parcellated ground truth data with detailed cortical definition. Additionally, compatibility with existing adult atlases is favourable for use in longitudinal investigations. We aimed to address these needs by replicating the widely-used Desikan-Killiany (2006) adult cortical atlas in neonates. We also aimed to extend brain coverage by complementing this cortical scheme with basal ganglia, thalamus, cerebellum and other subcortical segmentations. Thus, we have manually parcellated these areas volumetrically using high-resolution neonatal T 2-weighted MRI scans, and initial automated and manually edited tissue classification, providing 100 regions in all. Linear and nonlinear T 2-weighted structural templates were also generated. In this paper we provide manual parcellation protocols, and present the parcellated probability maps and structural templates together as the Melbourne Children's Regional Infant Brain (M-CRIB) atlas.

AAAI Conference 2017 Conference Paper

Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding

  • Zeyi Wen
  • Bin Li
  • Ramamohanarao Kotagiri
  • Jian Chen
  • Yawen Chen
  • Rui Zhang

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the hth SVM for training the (h + 1)th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the hth SVM for improving the efficiency of training the (h + 1)th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.

YNIMG Journal 2016 Journal Article

Structural connectivity relates to perinatal factors and functional impairment at 7 years in children born very preterm

  • Deanne K. Thompson
  • Jian Chen
  • Richard Beare
  • Christopher L. Adamson
  • Rachel Ellis
  • Zohra M. Ahmadzai
  • Claire E. Kelly
  • Katherine J. Lee

Objective To use structural connectivity to (1) compare brain networks between typically and atypically developing (very preterm) children, (2) explore associations between potential perinatal developmental disturbances and brain networks, and (3) describe associations between brain networks and functional impairments in very preterm children. Methods 26 full-term and 107 very preterm 7-year-old children (born <30weeks' gestational age and/or <1250g) underwent T 1 - and diffusion-weighted imaging. Global white matter fibre networks were produced using 80 cortical and subcortical nodes, and edges were created using constrained spherical deconvolution-based tractography. Global graph theory metrics were analysed, and regional networks were identified using network-based statistics. Cognitive and motor function were assessed at 7years of age. Results Compared with full-term children, very preterm children had reduced density, lower global efficiency and higher local efficiency. Those with lower gestational age at birth, infection or higher neonatal brain abnormality score had reduced connectivity. Reduced connectivity within a widespread network was predictive of impaired IQ, while reduced connectivity within the right parietal and temporal lobes was associated with motor impairment in very preterm children. Conclusions This study utilised an innovative structural connectivity pipeline to reveal that children born very preterm have less connected and less complex brain networks compared with typically developing term-born children. Adverse perinatal factors led to disturbances in white matter connectivity, which in turn are associated with impaired functional outcomes, highlighting novel structure–function relationships.

JBHI Journal 2013 Journal Article

A Hybrid Low Power Biopatch for Body Surface Potential Measurement

  • Geng Yang
  • Jian Chen
  • Li Xie
  • Jia Mao
  • Hannu Tenhunen
  • Li-Rong Zheng

This paper presents a wearable biopatch prototype for body surface potential measurement. It combines three key technologies, including mixed-signal system on chip (SoC) technology, inkjet printing technology, and anisotropic conductive adhesive (ACA) bonding technology. An integral part of the biopatch is a low-power low-noise SoC. The SoC contains a tunable analog front end, a successive approximation register analog-to-digital converter, and a reconfigurable digital controller. The electrodes, interconnections, and interposer are implemented by inkjet-printing the silver ink precisely on a flexible substrate. The reliability of printed traces is evaluated by static bending tests. ACA is used to attach the SoC to the printed structures and form the flexible hybrid system. The biopatch prototype is light and thin with a physical size of 16 cm × 16 cm. Measurement results show that low-noise concurrent electrocardiogram signals from eight chest points have been successfully recorded using the implemented biopatch.

YNIMG Journal 2012 Journal Article

Application of principal component analysis to study topography of hypoxic–ischemic brain injury

  • Shaloo Singhal
  • Jian Chen
  • Richard Beare
  • Henry Ma
  • John Ly
  • Thanh G. Phan

The regions at risk of ischemia following cardio-respiratory arrest have not been systematically analysed. This knowledge may be of use in determining the mechanism of ischemic injury at vulnerable sites. The aim of this study is to evaluate the use of principal component analysis to analyse the covariance patterns of hypoxic ischemic injury. The inclusion criteria were: age≥17years, cardio-respiratory arrest and coma on admission (2003–2011). Regions of ischemic injury were manually segmented on fluid attenuated inversion recovery (FLAIR) and diffusion weighted (DWI) sequences and linearly registered into common stereotaxic coordinate space. Topography of ischemic injury was assessed using principal component analysis (covariance data) and compared qualitatively against current method of topography analysis, the probabilistic method (frequency data). For the probabilistic data, subgroup analyses were performed using t-statistics while for the covariance data, subgroup analyses were performed by calculating the angle between the principle components. To account for bias due to a higher frequency of coma survivors in the studied group, we performed sensitivity analysis by sequentially removing coma survivors such that the final data set contained higher rate of death. Quantitative analysis between these methods could not be performed as they have different units of measurement. Forty one patients were included in this series (mean age±SD=51. 5±18. 9years). In our probabilistic map, the highest frequency of ischemic injury on the DWI and FLAIR sequences was putamen (0. 250), caudate (0. 225), temporal lobes (0. 175), occipital (0. 150) and hippocampus (0. 125). The first 6 principal components contained 77. 7% of the variance of the data. The first component showed covariance between the deep grey matter nuclei and posterior cortical structures (contains 50. 2% of the variance of the data). There was similarity in the findings of the subgroup analyses by the downtime whether it was assessed by t-statistics for probabilistic data or angle between the principal components for the covariance data. The sensitivity analysis showed that the pattern of ischemic injury did not change when the analysis was restricted to patients who died. In conclusion, PCA method has many advantages over probabilistic method. In the context of this dataset, PCA showed covariance between deep grey matter nuclei and the posterior cortical structures whereas the probabilistic map provided complementary information on the frequency of occurrence at these locations.

YNIMG Journal 2010 Journal Article

Development of a new tool to correlate stroke outcome with infarct topography: A proof-of-concept study

  • Thanh G. Phan
  • Jian Chen
  • Geoffrey Donnan
  • Velandai Srikanth
  • Amanda Wood
  • David C. Reutens

Improving the ability to assess potential stroke deficit may aid the selection of patients most likely to benefit from acute stroke therapies. Methods based only on ‘at risk’ volumes or initial neurological condition do predict eventual outcome, but not perfectly. Given the close relationship between anatomy and function in the brain, we performed a proof-of-concept study to examine how well stroke outcome correlated with infarct location and extent. A prospective study of 60 patients with ischemic stroke (38 in the training set and 22 in the validation set), using an implementation of partial least squares with penalized logistic regression (PLS-PLR), was performed. The method yielded a model relating location of infarction (on a voxel-by-voxel basis) and neurological deficits. The area under the receiver operating characteristics curve (AUC) method was used to assess the accuracy of the method for predicting outcome. In the validation phase, this model indicated the presence of neglect (AUC 0. 89), aphasia (AUC 0. 79), right-arm motor deficit (0. 94), and right-leg motor deficit (AUC 0. 94) but less accurately indicated left-arm motor deficit (0. 52) and left-leg motor deficit (0. 69). The model indicated no to mild disability (Rankin≤2) versus moderate to severe disability (Rankin>2) with AUC 0. 78. In this proof-of-concept study, we have demonstrated that stroke outcome correlates well with infarct location raising the possibility of accurate prediction of neurological deficit in the individual stroke patient using only information on infarct location and multivariate regression methods.

YNIMG Journal 2009 Journal Article

Development and validation of morphological segmentation of age-related cerebral white matter hyperintensities

  • Richard Beare
  • Velandai Srikanth
  • Jian Chen
  • Thanh G. Phan
  • Jennifer Stapleton
  • Rebecca Lipshut
  • David Reutens

Accurate automated segmentation of age-related white matter hyperintensity (WMH) is desirable for topological studies and those involving large samples. We assessed the accuracy of a novel automated method for segmentation of WMH on magnetic resonance imaging (MRI) in a randomly selected population-based sample of older people aged >60 years. The method combined morphological segmentation and statistical classifiers. Validation of this method was performed against expert manual segmentation in a sample of 30 scans, and against semi-automated segmentation in 202 scans. Its performance was also compared with those of other known methods derived from simple thresholding or Gaussian mixture modelling. Automated morphological segmentation combined with an adaptive boosting statistical classifier showed substantial agreement with manual segmentation, with an intraclass correlation coefficient (ICC) of 0. 90 (95% confidence interval [CI], 0. 80–0. 95) for WMH volume and median similarity index (SI) of 0. 58 (interquartile range [IQR] 0. 50–0. 65). The method also showed similarly high levels of agreement with semi-automated segmentation, with ICC 0. 92 (95% CI 0. 89–0. 93) and median SI 0. 56 (IQR 0. 49–0. 66). Its best performance was observed for the highest tertile of WMH volume. Threshold-based and Gaussian mixture model-driven automated segmentation generally did not perform well in this study.

IS Journal 2008 Journal Article

Intelligent-Commerce Research in China

  • Daniel Zeng
  • Fei-Yue Wang
  • Xiaolong Zheng
  • Yong Yuan
  • Guoqing Chen
  • Jian Chen

Recent years have witnessed the increased application of AI technologies to real-world e-commerce challenges. This article presents a brief overview of representative work by Chinese researchers, covering topics such as multiagent decision making, keyword advertising, social networks, recommender systems, information retrieval and the semantic Web, and computational experiments. This article is part of a special issue on AI in China.

ICRA Conference 2007 Conference Paper

Optimal Admission Polices for a Retailer of Seasonal Products with Drop-Shipping

  • Frank Y. Chen
  • Jian Chen
  • Yongbo Xiao

This paper studies optimal inventory rationing policies for a retailer of perishable products who sells through its own stores and third party Websites by an affiliate program. By posting on partners' Webpages, an affiliate program allows the retailer to attract more customers who otherwise would be missed. However, the retailer needs to pay out a commission for each sale originated from the Website operator that participates in the affiliate program. Thus, the net revenue of selling one unit of product to an online "referral" (online customer) is less profitable than that to a customer from a physical store. When the inventory at stores is running low, the retailer may further refer the online request to somebody else for fulfilling, which is equivalently to say that the retailer can reject online customer requests. Therefore, upon the arrival of any demand through the affiliate program, the retailer needs to decide whether or not to accept it; and if so, assign which of multiple outlets for the fulfillment. Based on a discrete-time dynamic programming model, the optimal admission policy of the retailer is analyzed in this paper, which is shown to be a two-dimensional threshold policy. The structural properties of the revenue function are analyzed, and numerical examples are given to show the revenue impact of optimal admission control.

EAAI Journal 2001 Journal Article

A predictive system for blast furnaces by integrating a neural network with qualitative analysis

  • Jian Chen

Silicon content in pig iron has long been used as one of the most important indices to represent the thermal state of a blast furnace. In this paper, a predictive system for blast furnaces by integrating a neural network with qualitative analysis is presented. The qualitative trend of the process in blast furnace is predicted through causal analysis and qualitative reasoning, and the relevant variables as the inputs of a neural network model are determined. Then, a neural network model is constructed and trained with appropriate data. Evaluation of the system is made by comparing the predicted values with observed data (totally 610 heats are included), and the performance of the system is excellent.