Arrow Research search

Author name cluster

Jie Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

121 papers
2 author rows

Possible papers

121

EAAI Journal 2026 Journal Article

An efficient and accurate network for gardenia fruit detection

  • Xunkuai Zhou
  • Yanni Wang
  • Jie Chen
  • Ben M. Chen

Automatic detection of gardenia fruits is crucial for mechanized harvesting and accurate yield estimation, yet this topic has received comparatively limited attention in recent years. Existing approaches often incur substantial memory footprints and computational burdens, precluding deployment utilization on resource-constrained robotic platforms. Moreover, methods that perform well on one task frequently degrade on another due to cross-task discrepancies in data distributions and objectives, thereby constraining their generalization and practical applicability. To address the foregoing challenges, we propose Gardenia Fruit Detection Network (GFNet), a lightweight detector with strong cross-task generalization that enables accurate, real-time inference under resource-constrained conditions (i. e. , low parameters and computational cost). A lightweight downsampling feature extraction module reduces computation and memory while enhancing representation capacity, followed by three downsampling stages that combine a lightweight adaptive extraction module and a multi-path extractor to enrich features while suppressing redundant ones. Next, a context-aware multi-scale fusion network adaptively aggregates representations from different feature extraction stages, and the fused features are decoded by a lightweight detection head to produce final predictions. In addition, we design a flexible activation function to strengthen nonlinear representation and facilitate adaptation across heterogeneous detection tasks, thereby improving the model’s generalization and practical deployability. GFNet achieves state-of-the-art performance with only 1. 9 million parameters and 7. 4 Billion Floating-Point Operations (BFLOPs), enabling real-time inference at 17. 0 Frames Per Second (FPS) on an edge-computing platform. The extended applications to Unmanned Aerial Vehicle (UAV) detection and defect detection tasks further confirm the superiority and practical engineering applicability of the proposed activation function and GFNet.

AAAI Conference 2026 Conference Paper

BiHiTo: Biomolecular Hierarchy-inspired Tokenization

  • Ruochong Zheng
  • Yutian Liu
  • Yian Zhao
  • Zhiwei Nie
  • Xuehan Hou
  • Chang Liu
  • Siwei Ma
  • Youdong Mao

Three-dimensional atomic arrangements of biomolecules are key to demystifying biological functions. The rapid expansion of accessible structural data, driven by advances in AI for science, highlights the critical challenge of efficiently modeling large-scale biomolecular structures, which are high-dimensional systems shaped by biological assembly principles. To address this, we introduce BiHiTo, a multi-level Biomolecular Hierarchy-inspired Tokenizer that intrinsically mimics natural biological assembly hierarchies. Specifically, we design a multi-codebook quantizer that mirrors the natural hierarchy of biomolecular structure, enabling simultaneous capture of representations spanning atomic motifs to global conformational variations. This hierarchical alignment markedly improves the biological interpretability and reconstruction fidelity of biomolecular structure.Extensive experiments demonstrate that BiHiTo delivers state-of-the-art performance and robust generalization across molecular dynamics trajectories and macromolecular complexes, facilitating advances in structure generation and dynamic conformation exploration. In the reconstruction of the CASP14 and OOD test set FastFolding protein multi-conformation data, our method achieves a 17% and 51% reduction in RMSD compared to Bio2Token, respectively.

EAAI Journal 2026 Journal Article

Class-aware contrastive learning for radio signal generalized category discovery

  • Jie Chen
  • Shilian Zheng
  • Luxin Zhang
  • Keqiang Yue
  • Zhijin Zhao

Generalized Category Discovery (GCD) aims to classify known classes and discover novel classes within unlabeled data containing both known and novel classes using only limited subset of annotated known-class samples, providing a promising paradigm for handling real-world scenarios. To tackle the open-world recognition challenge of radio signals in wireless communications, this paper proposes a Class-Aware Contrastive Learning (CACL) method for GCD tasks on radio signals. Specifically, CACL adopts a class-aware contrastive strategy that explicitly leverages the known–novel data partition during training to construct contrastive pairs, enabling separation between known and novel representations without requiring any novel-class labels, thereby mitigating boundary ambiguity arising from intertwined class distributions. Additionally, CACL introduces a class-semantic self-distillation strategy that leverages the known-class semantic space to impose soft semantic regularization on novel-class samples, improving semantic consistency and novel-class clustering quality. Extensive experiments on multiple public modulation recognition datasets demonstrate that the CACL achieves superior effectiveness and robustness in modulation signal GCD tasks, outperforming existing GCD baselines.

AAAI Conference 2026 Conference Paper

CoGenSAM: Codebook-Interactive Generative Labeling for Adapting SAM to Crack Segmentation

  • Zhuangzhuang Chen
  • Nuo Chen
  • Dachong Li
  • Zhiliang Lin
  • Xingyu Feng
  • Yifan Zhang
  • Jie Chen
  • Jianqiang Li

The goal of this work is to adapt Segment Anything Models (SAM) into crack segmentation tasks via automatic label generation, thus eliminating manual annotation cost. In this regard, an intuitive approach is to extract edges of crack samples and generate labels via the dilation and erosion processes for fine-tuning SAM. However, this simple solution cannot guarantee the quality of generated labels, as crack regions will be corrupted due to the imperfect edge detection. To this end, this paper proposes CoGenSAM, a novel Codebook-interactive Generative Labeling framework that enables an annotation-free SAM fine-tuning. To achieve this, in the first stage, we pre-train a vector-quantized variational auto-encoder (VQVAE) by reconstructing the synthesized crack-like structures for learning crack-aware priors within the codebook. In the second stage, these priors help another VQVAE serve as the restoration model to restore the randomly corrupted structures into uncorrupted ones. Specifically, we propose the crack-aware contrastive-interaction to maximize the mutual information with the above priors via codebook interaction. Then, high-quality labels can be generated by restoring corrupted labels from edge detection, contributing to an annotation-free SAM fine-tuning. We collect a new dataset, Bridge2025, to address the limited availability of related bridge-oriented benchmarks. Experiments show that our performance is close to fully-supervised methods.

AAAI Conference 2026 Conference Paper

Conditional Distribution Learning for Graph Classification

  • Jie Chen
  • Hua Mao
  • Chuanbin Liu
  • Zhu Wang
  • Xi Peng

Leveraging the diversity and quantity of data provided by various graph-structured data augmentations while preserving intrinsic semantic information is challenging. Additionally, successive layers in graph neural network (GNN) tend to produce more similar node embeddings, while graph contrastive learning aims to increase the dissimilarity between negative pairs of node embeddings. This inevitably results in a conflict between the message-passing mechanism (MPM) of GNNs and the contrastive learning (CL) of negative pairs via intraviews. In this paper, we propose a conditional distribution learning (CDL) method that learns graph representations from graph-structured data for semisupervised graph classification. Specifically, we present an end-to-end graph representation learning model to align the conditional distributions of weakly and strongly augmented features over the original features. This alignment enables the CDL model to effectively preserve intrinsic semantic information when both weak and strong augmentations are applied to graph-structured data. To avoid the conflict between the MPM and the CL of negative pairs, positive pairs of node representations are retained for measuring the similarity between the original features and the corresponding weakly augmented features. Extensive experiments with several benchmark graph datasets demonstrate the effectiveness of the proposed CDL method.

AAAI Conference 2026 Conference Paper

Deep Inverse Shading: Consistent Albedo and Surface Detail Recovery via Generative Refinement

  • Jiacheng Wu
  • Ruiqi Zhang
  • Jie Chen

Reconstructing human avatars using generative priors is essential for achieving versatile and realistic avatar models. Traditional approaches often rely on volumetric representations guided by generative models, but these methods require extensive volumetric rendering queries, leading to slow training. Alternatively, surface-based representations offer faster optimization through differentiable rasterization, yet they are typically limited by vertex count, restricting mesh resolution and scalability when combined with generative priors. Moreover, integrating generative priors into physically based human avatar modeling remains largely unexplored. To address these challenges, we introduce DIS (Deep Inverse Shading), a unified framework for high-fidelity, relightable avatar reconstruction that incorporates generative priors into a coherent surface representation. DIS centers on a mesh-based model that serves as the target for optimizing both surface and material details. The framework fuses multi-view 2D generative surface normal predictions, rich in detail but often inconsistent, into the central mesh using a normal conversion module. This module converts generative normal outputs into per-triangle surface offsets via differentiable rasterization, enabling the capture of fine geometric details beyond sparse vertex limitations. Additionally, DIS integrates a de-shading module, informed by generative priors, to recover accurate material properties such as albedo. This module refines albedo predictions by removing baked-in shading and back-propagates reconstruction errors to further optimize the mesh geometry. Through this joint optimization of geometry and material appearance, DIS achieves physically consistent, high-quality reconstructions suitable for accurate relighting. Our experiments show that DIS delivers SOTA relighting quality, enhanced rendering efficiency, lower memory consumption, and detailed surface reconstruction.

YNIMG Journal 2026 Journal Article

Detecting early brain susceptibility changes before demyelination in cuprizone mouse model using quantitative susceptibility mapping (QSM)

  • Xinyue Han
  • Jie Chen
  • Zhuoheng Liu
  • Juan Liu
  • Mingquan Lin
  • Nian Wang

Multiple sclerosis (MS) is a neurological disease that affects the central nervous system through demyelination and inflammation. Animal model, including the cuprizone (CPZ) model, provides a robust platform for studying demyelination and remyelination in MS. While conventional MRI techniques are sensitive to myelin changes, quantitative susceptibility mapping (QSM) offers additional advantages by capturing both myelin- and iron-related pathology. In this study, we performed longitudinal whole-brain multimodal magnetic resonance imaging (MRI), including T2-weighted imaging, magnetization transfer imaging, and QSM, in CPZ-treated mice across multiple stages covering pre-demyelination, acute demyelination, chronic demyelination, and remyelination. Regional analyses focused on the corpus callosum (CC) and anterior commissure (AC), complemented by histological validation. All three MRI modalities detected demyelination, characterized by increased T2 signal, decreased magnetization transfer ratio (MTR), and increased susceptibility, with partial recovery during remyelination. QSM demonstrated unique sensitivity by identifying susceptibility decreases at week 2, before apparent demyelination, corresponding to early oligodendrocyte dysfunction. Regional heterogeneity was observed, with the CC showing rapid alterations during acute demyelination and the AC exhibiting steadier changes across acute and chronic phases. These results establish QSM as a sensitive imaging biomarker capable of detecting early MS pathology and tracking dynamic changes in oligodendrocytes. By complementing conventional MRI techniques, QSM enhances the characterization of white matter injury in the CPZ model and holds translational potential for monitoring disease progression and therapeutic response in MS.

TMLR Journal 2026 Journal Article

LoDAdaC: a unified local training-based decentralized framework with adaptive gradients and compressed communication

  • Wei Liu
  • Anweshit Panda
  • Ujwal Pandey
  • Haven Cook
  • George Slota
  • Naigang Wang
  • Jie Chen
  • Yangyang Xu

In the decentralized distributed learning, achieving fast convergence and low communication cost is essential for scalability and high efficiency. Adaptive gradient methods, such as Adam, have demonstrated strong practical performance in deep learning and centralized distributed settings. However, their convergence properties remain largely unexplored in decentralized settings involving multiple local training steps, such as federated learning. To address this limitation, we propose LoDAdaC, a unified multiple \textbf{Lo}cal Training (MLT) \textbf{D}ecentralized framework with \textbf{Ada}m-type updates and \textbf{C}ompressed communication (CC). LoDAdaC accommodates a broad class of optimizers for its local adaptive updates, including AMSGrad, Adam, and AdaGrad; it is compatible with standard (possibly biased) compressors such as low-bit quantization and sparsification. MLT and CC enable LoDAdaC to achieve multiplied reduction of communication cost, while the technique of adaptive updates enables fast convergence. We rigorously prove the combined advantage through complexity analysis. In addition, experiments on image classification and GPT-style language model training validate our theoretical findings and show that LoDAdaC significantly outperforms existing decentralized algorithms in terms of convergence speed and communication efficiency.

TCS Journal 2026 Journal Article

Matchmaking encryption for NC1 circuits without obfuscation

  • Ying Gao
  • Xinrui Yang
  • Jie Chen
  • Yijian Zhang
  • Yu Li

Matchmaking encryption (ME) is a new form of encryption proposed by Ateniese et al. (CRYPTO, 2019). Constructing an ME scheme that supports complex functions without relying on obfuscation is an important area of research, but it has seen limited success despite significant effort. Existing ME schemes either focus on very restricted policies (i. e. , for identity matching), or require obfuscation techniques. In this paper, we propose the first ME construction that supports NC 1 circuits without using obfuscation. Our results can be summarized as follows. (1) We propose an ME scheme for NC 1 circuits from LWE and pairings, with provable security in the generic group model (GGM). (2) We further propose an ME scheme for NC 1 circuits in the standard model, by leveraging inner product functional encryption and using the KOALA knowledge assumption. Technically, we follow the blueprint of Francati et al. (Eurocrypt, 2023) but start from the two-input attribute-based encryption by Agrawal et al. (CRYPTO, 2022), which allows for a form of “linking” between two independently generated ciphertexts. In terms of security, our schemes protect the sender’s privacy, prove the authenticity of sender data, and ensure that receivers without access privileges remain uninformed about any information.

AAAI Conference 2026 Conference Paper

Patho-AgenticRAG: Towards Multimodal Agentic Retrieval-Augmented Generation for Pathology VLMs via Reinforcement Learning

  • Wenchuan Zhang
  • Jingru Guo
  • Hengzhe Zhang
  • Penghao Zhang
  • Jie Chen
  • Shuwan Zhang
  • Zhang Zhang
  • Yuhao Yi

Although Vision Language Models (VLMs) have shown generalization in medical imaging, pathology presents unique challenges due to ultra-high resolution, complex tissue structures, and nuanced semantics. These factors make pathology VLMs prone to hallucinations, i.e., generating outputs inconsistent with visual evidence, which undermines clinical trust. Existing RAG approaches in this domain largely depend on text-based knowledge bases, limiting their ability to leverage diagnostic visual cues. To address this, we propose Patho-AgenticRAG, a multimodal RAG framework with a database built on page-level embeddings from authoritative pathology textbooks. Unlike traditional text-only retrieval systems, it supports joint text–image search, enabling retrieval of textbook pages that contain both the queried text and relevant visual cues, thus avoiding the loss of critical image-based information. Patho-AgenticRAG also supports reasoning, task decomposition, and multi-turn search interactions, improving accuracy in complex diagnostic scenarios. Experiments show that Patho-AgenticRAG significantly outperforms existing multimodal models in complex pathology tasks like multiple-choice diagnosis and visual question answering.

AAAI Conference 2026 Conference Paper

Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner

  • Wenchuan Zhang
  • Penghao Zhang
  • Jingru Guo
  • Tao Cheng
  • Jie Chen
  • Shuwan Zhang
  • Zhang Zhang
  • Yuhao Yi

Recent advances in vision-language models (VLMs) have enabled broad progress in the general medical field. However, pathology still remains a more challenging sub-domain, with current pathology-specific VLMs exhibiting limitations in both diagnostic accuracy and reasoning plausibility. Such shortcomings are largely attributable to the nature of current pathology datasets, which are primarily composed of image–description pairs that lack the depth and structured diagnostic paradigms employed by real-world pathologists. In this study, we leverage pathology textbooks and real-world pathology experts to construct high-quality, reasoning-oriented datasets. Building on this, we introduce Patho-R1, a multimodal RL-based pathology Reasoner, trained through a three-stage pipeline: (1) continued pretraining on 3.5 million image-text pairs for knowledge infusion; (2) supervised fine-tuning on 500k high-quality Chain-of-Thought samples for reasoning incentivizing; (3) reinforcement learning using Group Relative Policy Optimization and Decoupled Clip and Dynamic sAmpling Policy Optimization strategies for multimodal reasoning quality refinement. To further assess the alignment quality of our dataset, we propose Patho-CLIP, trained on the same figure-caption corpus used for continued pretraining. Comprehensive experimental results demonstrate that both Patho-CLIP and Patho-R1 achieve robust performance across a wide range of pathology-related tasks, including zero-shot classification, cross-modal retrieval, Visual Question Answering, and Multiple Choice Question.

AAAI Conference 2026 Conference Paper

ProAR: Probabilistic Autoregressive Modeling for Molecular Dynamics

  • Kaiwen Cheng
  • Yutian Liu
  • Zhiwei Nie
  • Mujie Lin
  • Yanzhen Hou
  • Yiheng Tao
  • Chang Liu
  • Jie Chen

Understanding the structural dynamics of biomolecules is crucial for uncovering biological functions. As molecular dynamics (MD) simulation data becomes more available, deep generative models have been developed to synthesize realistic MD trajectories. However, existing methods produce fixed-length trajectories by jointly denoising high-dimensional spatiotemporal representations, which conflicts with MD’s frame-by-frame integration process and fails to capture time-dependent conformational diversity. Inspired by MD's sequential nature, we introduce a new probabilistic autoregressive (ProAR) framework for trajectory generation. ProAR uses a dual-network system that models each frame as a multivariate Gaussian distribution and employs an anti-drifting sampling strategy to reduce cumulative errors. This approach captures conformational uncertainty and time-coupled structural changes while allowing flexible generation of trajectories of arbitrary length. Experiments on ATLAS, a large-scale protein MD dataset, demonstrate that for long trajectory generation, our model achieves a 7.5% reduction in reconstruction RMSE and an average 25.8% improvement in conformation change accuracy compared to previous state-of-the-art methods. For conformation sampling task, it performs comparably to specialized time-independent models, providing a flexible and dependable alternative to standard MD simulations.

AAAI Conference 2026 Conference Paper

SAOT: An Enhanced Locality-Aware Spectral Transformer for Solving PDEs

  • Chenhong Zhou
  • Jie Chen
  • Zaifeng Yang

Neural operators have shown great potential in solving a family of Partial Differential Equations (PDEs) by modeling the mappings between input and output functions. Fourier Neural Operator (FNO) implements global convolutions via parameterizing the integral operators in Fourier space. However, it often results in over-smoothing solutions and fails to capture local details and high-frequency components. To address these limitations, we investigate incorporating the spatial-frequency localization property of Wavelet transforms into the Transformer architecture. We propose a novel Wavelet Attention (WA) module with linear computational complexity to efficiently learn locality-aware features. Building upon WA, we further develop the Spectral Attention Operator Transformer (SAOT), a hybrid spectral Transformer framework that integrates WA’s localized focus with the global receptive field of Fourier-based Attention (FA) through a gated fusion block. Experimental results demonstrate that WA significantly mitigates the limitations of FA and outperforms existing Wavelet-based neural operators by a large margin. By integrating the locality-aware and global spectral representations, SAOT achieves state-of-the-art performance on six operator learning benchmarks and exhibits strong discretization-invariant ability.

AAAI Conference 2026 Conference Paper

SOSControl: Enhancing Human Motion Generation Through Saliency-Aware Symbolic Orientation and Timing Control

  • Ho Yin Au
  • Junkun Jiang
  • Jie Chen

Traditional text-to-motion frameworks often lack precise control, and existing approaches based on joint keyframe locations provide only positional guidance, making it challenging and unintuitive to specify body part orientations and motion timing. To address these limitations, we introduce the Salient Orientation Symbolic (SOS) script, a programmable symbolic framework for specifying body part orientations and motion timing at keyframes. We further propose an automatic SOS extraction pipeline that employs temporally-constrained agglomerative clustering for frame saliency detection and a Saliency-based Masking Scheme (SMS) to generate sparse, interpretable SOS scripts directly from motion data. Moreover, we present the SOSControl framework, which treats the available orientation symbols in the sparse SOS script as salient and prioritizes satisfying these constraints during motion generation. By incorporating SMS-based data augmentation and gradient-based iterative optimization, the framework enhances alignment with user-specified constraints. Additionally, it employs a ControlNet-based ACTOR-PAE Decoder to ensure smooth and natural motion outputs. Extensive experiments demonstrate that the SOS extraction pipeline generates human-interpretable scripts with symbolic annotations at salient keyframes, while the SOSControl framework outperforms existing baselines in motion quality, controllability, and generalizability with respect to motion timing and body part orientation control.

EAAI Journal 2026 Journal Article

Uncertainty- and hardness-weighted loss functions for medical image segmentation

  • Yanyan Zheng
  • Yabo Wu
  • Jie Chen
  • Xiaoguo Yang
  • Hao Zhang
  • Quanyong Yi
  • Jiantao Pu
  • Lei Wang

Accurate segmentation of medical images is essential for various image processing tasks and is now predominantly achieved using deep learning techniques. However, existing approaches often employ loss functions that fail to account for pixel-level differences in prediction uncertainty or hardness. This limitation frequently results in relatively large segmentation errors, particularly in object boundary regions. To address the limitation, we developed a novel class of uncertainty-/hardness-weighted loss functions by introducing two distinct pixel-wise weighting schemes: probability-guided uncertainty (PGU) and region-enhanced hardness (REH) weights. These weights, derived from the differences between network predictions and their corresponding ground truths, were designed to emphasize challenging pixels while reducing segmentation uncertainties. We validated these loss functions by integrating them with two classical neural networks, i. e. , Swin Transformer based U-shape network (Swin-Unet) and V-shape network (V-Net) to segment two- and three-dimensional target objects across four different images datasets, including Retinal Fundus Glaucoma Challenge (REFUGE) dataset, Retinal Vascular Tree Analysis (RETA) dataset, optical coherence tomography (OCT) dataset, and Atria Segmentation Challenge (ASC) dataset. Extensive experiments demonstrated that our developed loss functions outperformed classical losses, such as cross-entropy (CE) and Dice losses, along with their variants, highlighting the effectiveness and generalization of the introduced weighting schemes. The source code is available at https: //github. com/wmuLei/uhLoss.

AAAI Conference 2026 Conference Paper

UniAPO: Unified Multimodal Automated Prompt Optimization

  • Qipeng zhu
  • Yanzhe Chen
  • Huasong Zhong
  • Jie Chen
  • Yan Li
  • Zhixin Zhang
  • Junping Zhang
  • Zhenheng Yang

Prompting is fundamental to unlocking the full potential of large language models. To automate and enhance this process, automatic prompt optimization (APO) has been developed, demonstrating effectiveness primarily in text-only input scenarios. However, extending existing APO methods to multimodal tasks—such as video-language generation—introduces two core challenges: (i) visual token inflation, where long visual-token sequences restrict context capacity and result in insufficient feedback signals; (ii) a lack of process-level supervision, as existing methods focus on outcome-level supervision and overlook intermediate supervision, limiting prompt optimization. We present UniAPO: Unified Multimodal Automated Prompt Optimization, the first framework tailored for multimodal APO. UniAPO adopts an EM-inspired optimization process that decouples feedback modeling and prompt refinement, making the optimization more stable and goal-driven. To further address the aforementioned challenges, we introduce a short-long term memory mechanism: historical feedback mitigates context limitations, while historical prompts provide directional guidance for effective prompt optimization. UniAPO achieves consistent gains across text, image, and video benchmarks, establishing a unified framework for efficient and transferable prompt optimization.

AAAI Conference 2026 Conference Paper

WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

  • Zishan Shu
  • Juntong Wu
  • Wei Yan
  • Xudong Liu
  • Hongyu Zhang
  • Chang Liu
  • Youdong Mao
  • Jie Chen

Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency—from low-frequency global layout to high-frequency edges and textures—is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency–time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in O(NlogN) time—far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to 1.6× higher throughput and 30% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics.

EAAI Journal 2025 Journal Article

A hybrid architecture based on structured state space sequence model and convolutional neural network for real-time object detection

  • Jie Chen
  • Meng Joo Er

Real-time performance is essential for practical deployment of object detection on edge devices, where high processing speed and low latency are paramount. This paper introduces a novel approach aimed at boosting real-time object detection while strictly adhering to computational constraints. A structured state space sequence model, Mamba, is strategically embedded in the early stages of the backbone network to capture long-range dependencies, thereby enhancing the model’s representation capability. Given the limitations of Mamba in directional perception, a lightweight spatial attention mechanism is introduced to integrate global context into each spatial location. Additionally, a computationally efficient module inspired by the Ghost module is developed to reduce resource demands. This dual-strategy approach optimizes both performance and efficiency in real-time object detection. Extensive experiments demonstrate the superiority of this proposed approach; on the Microsoft Common Objects in Context (MS COCO) dataset, it achieves a +1. 6 AP (Average Precision) improvement over state-of-the-art methods, reaching 41. 1 AP with minimal added model complexity on the nano scale. The effectiveness and efficiency of each component are further substantiated through ablation studies on the Pascal Visual Object Classes (Pascal VOC dataset). To verify the universality of the proposed method, this study selects underwater object detection, characterized by an extremely complex background environment, as the other validation scenario. Through the application of this proposed approach to underwater object detection, a state-of-the-art result of 69. 5 AP was obtained on the Detecting Underwater Objects (DUO) dataset, exceeding that of You Only Look Once Detector version 11 (YOLO11) by +0. 3 AP. Code: https: //github. com/chenjie04/Hybrid-YOLO.

IROS Conference 2025 Conference Paper

Achieving Lift-to-Weight Ratio >3. 5 in Piezoelectric Direct-Driven Insect-Scale Flapping-Wing MAVs

  • Xiang Lu
  • Jie Chen
  • Yang Chen
  • Zixin Deng
  • Yulie Wu
  • Xuezhong Wu
  • Dingbang Xiao

Insect-scale flapping-wing micro aerial vehicles (FWMAVs) employing piezoelectric direct-drive configurations eliminate traditional kinematic chains through direct coupling of the wing and actuator. While this design approach significantly reduces structural complexity and manufacturing costs compared to transmission-dependent systems, it inherently limits wing stroke amplitude and consequent lift generation. This paper presents a novel lift-enhancement strategy for piezoelectric direct-drive FWMAVs, effectively improving payload capacity through optimized aerodynamic performance. The redesigned X-configuration prototype demonstrates outstanding metrics: 68 mm wingspan with 212 mg total mass achieves 7. 47 mN maximum lift (exceeding 3. 5: 1 lift-to-weight ratio) and 1. 25 m/s takeoff speed. Experimental validation confirms 39% payload capacity improvement and 34% lift-to-weight ratio enhancement compared to baseline designs. This enhancement establishes our robot as the current state-of-the-art in piezoelectric direct-drive FWMAVs regarding lift-to-weight ratio.

JBHI Journal 2025 Journal Article

Active-Supervised Model for Intestinal Ulcers Segmentation Using Fuzzy Labeling

  • Jie Chen
  • Yanning Lin
  • Faisal Saeed
  • Ziqian Ding
  • Muhammad Diyan
  • Jianqiang Li
  • Zhaoxia Wang

Inflammatory bowel disease (IBD) is a chronic inflammatory condition of the intestines with a rising global incidence. Colonoscopy remains the gold standard for IBD diagnosis, but traditional image-scoring methods are subjective and complex, impacting diagnostic accuracy and efficiency. To address these limitations, this paper investigates machine learning techniques for intestinal ulcer segmentation, focusing on multi-category ulcer segmentation to enhance IBD diagnosis. We identified two primary challenges in intestinal ulcer segmentation: 1) labeling noise, where inaccuracies in medical image annotation introduce ambiguity, hindering model training, and 2) performance variability across datasets, where models struggle to maintain high accuracy due to medical image diversity. To address these challenges, we propose an active ulcer segmentation algorithm based on fuzzy labeling. A collaborative training segmentation model is designed to utilize pixel-wise confidence extracted from fuzzy labels, distinguishing high- and low-confidence regions, and enhancing robustness to noisy labels through network cooperation. To mitigate performance disparities, we introduce a data adaptation strategy leveraging active learning. By selecting high-information samples based on uncertainty and diversity, the strategy enables incremental model training, improving adaptability. Extensive experiments on public and hospital datasets validate the proposed methods. Our collaborative training model and active learning strategy show significant advantages in handling noisy labels and enhancing model performance across datasets, paving the way for more precise and efficient IBD diagnosis.

AAAI Conference 2025 Conference Paper

Adversarial Learning Under Hybrid Perturbations for Robust Acute Lymphoblastic Leukemia Classification

  • Jie Chen
  • Xinyuan Liu
  • Xintong Liu
  • Jianqiang Li

Acute lymphoblastic leukemia is a childhood cancer prevalent worldwide, which can prove fatal within weeks or months. However, current diagnosis models based on machine learning and deep learning methods fail to consider device noise (pixel-level perturbations) and rotation/translation (spatial-transformed perturbations), which can undermine the model's robustness. Adversarial training is a potential solution to this issue. This paper presents a hybrid perturbation adversarial training (HPAT) strategy that leverages two types of adversarial samples: pixel-level adversarial samples and spatial adversarial samples. This work generates these hybrid adversarial samples through Projected Gradient Descent (PGD) in couple with spatial transformation based on the Bayesian optimization (STBO) algorithm, respectively. This work introduced the Mixed Batch Normalization (MixBN) module to handle both adversarial samples and clean samples, alleviating the problem of clean accuracy degradation due to adversarial training. The proposed hybrid adversarial training strategy is tested on the public acute lymphoblastic leukemia dataset and found that it outperformed existing acute lymphoblastic cell classification models.

AAAI Conference 2025 Conference Paper

Aligning Instance Brownian Bridge with Texts for Open-Vocabulary Video Instance Segmentation

  • Zesen Cheng
  • Kehan Li
  • Li Hao
  • Peng Jin
  • Xiawu Zheng
  • Chang Liu
  • Jie Chen

Temporally locating objects with arbitrary class texts is the primary pursuit of open-vocabulary Video Instance Segmentation (VIS). Because of the insufficient vocabulary of video data, previous methods leverage the image-text pretraining model for recognizing object instances by separately aligning each frame with class texts. As a result, the separation breaks the instance movement context of videos and requires a lot of inference overhead. To tackle these issues, we propose BridgeText Alignment (BTA) to link frame-level instance representations as a Brownian Bridge. On one hand, we can calculate the global descriptor of a Brownian bridge for capturing instance dynamics, which enables extra considering temporal information rather than only static information of each frame for aligning with texts. On the other hand, according to the goal-conditioned property of the Brownian bridge, we can estimate the middle frame features via the start and the end frame features so the global feature calculation of a Brownian bridge only needs to infer a few frames, which largely reduces inference overhead. We term our overall pipeline as BriVIS. Following the training settings of previous works, BriVIS surpasses the SOTA (OV2Seg) by a clear margin. For example, on the challenging large-vocabulary datasets (BURST, LVVIS), BriVIS achieves 5.7 and 20.9 mAP, which exhibits +2.2∼+6.7 mAP improvement compared to OV2Seg. Furthermore, after training via BTA, using only the head and the tail frames for alignment improves the speed by 32% (2.77 → 1.88 s/iter) while just decreasing the performance by 0.2 mAP (21.1 → 20.9 mAP).

EAAI Journal 2025 Journal Article

An objective-guided multi-strategy evolutionary algorithm for multi-objective coalition formation

  • Miao Guo
  • Bin Xin
  • Jie Chen
  • Shuxin Ding

The coalition formation (CF) problem is crucial for reasonably organizing agents with diverse and complementary capabilities to address complex scenarios in collaborative environments. While CF has received some research attention, the multi-objective coalition formation (MOCF) problem remains relatively unexplored and presents significant challenges. In the context of disaster relief and emergency response, this paper delves into the MOCF problem and constructs the mathematical model, which minimizes both the latest arrival time and the total cost of coalition members under mission-specific capability constraints. To tackle this, this paper proposes an innovative objective-guided multi-strategy evolutionary algorithm (OGMSEA) for effective capability aggregation regarding mission requirements and objective trade-offs, which leverages the problem characteristics of multiple objectives and lower-bound constraints. The initialization strategies leverage various objective weights to generate a uniformly distributed and extensive set of initial solutions. The repair strategies restore unsatisfied coalitions by evaluating the alignment of idle agents with the remaining capability requirements and optimization objectives. The restart strategies reconstruct repetitive solutions to maintain the population diversity. Comprehensive experiments demonstrate OGMSEA’s superior performance in terms of applicability and adaptability, better achieving inverted generational distance and hypervolume metrics across 135 various cases compared with advanced algorithms. In large-scale complex scenarios (e. g. , more than 100 agents, 10 missions, and a demand-supply ratio on capabilities of 0. 5), OGMSEA consistently achieves a high-quality Pareto front due to its well-designed strategies. Additionally, a forest fire scenario is constructed and addressed by forming firefighting coalitions, demonstrating the practical applicability of this study.

AAAI Conference 2025 Conference Paper

Attack-inspired Calibration Loss for Calibrating Crack Recognition

  • Zhuangzhuang Chen
  • Qiangyu Chen
  • Jiahao Zhang
  • Zhiliang Lin
  • Xingyu Feng
  • Jie Chen
  • Jianqiang Li

Deep neural networks (DNNs) have substantially achieved high predictive accuracy in many vision tasks. However, we find that they are poorly calibrated for crack recognition tasks, as these DNNs tend to produce both under-confident and over-confident predictions in such safety-critical applications, thereby limiting their practical use in real-world scenarios. To address this issue, we propose a novel attack-inspired calibration loss (AICL) that explicitly regularizes class probabilities to be better confidence estimation. Specifically, we first propose the attack-inspired correctness estimation method (ACE) that aims to estimate the correctness degree of each sample via adversarial attacks. Then, we propose Correctness-aware Distribution Guidance, which starts from a distribution perspective that enforces the ordinal ranking of the predicted confidence referring to the estimated correctness degree. The proposed method can be conveniently implemented on top of any DNNs-based crack recognition model by serving as a plug-and-play loss function. To address the limited availability of related benchmarks, we collect a fully annotated dataset, namely, Bridge2024, which involves inconsistent cracks and noisy backgrounds in real-world bridges. Our AICL outperforms the state-of-art calibration methods on various benchmark datasets including CRACK2019, SDNET2018, and our BRIDGE2024.

NeurIPS Conference 2025 Conference Paper

Causality Meets the Table: Debiasing LLMs for Faithful TableQA via Front-Door Intervention

  • Zhen Yang
  • Ziwei Du
  • Minghan Zhang
  • Wei Du
  • Jie Chen
  • Fulan Qian
  • Shu Zhao

Table Question Answering (TableQA) combines natural language understanding and structured data reasoning, posing challenges in semantic interpretation and logical inference. Recent advances in Large Language Models (LLMs) have improved TableQA performance through Direct Prompting and Agent paradigms. However, these models often rely on spurious correlations, as they tend to overfit to token co-occurrence patterns in pretraining corpora, rather than perform genuine reasoning. To address this issue, we propose Causal Intervention TableQA (CIT), which is based on a structural causal graph and applies front-door adjustment to eliminate bias caused by token co-occurrence. CIT formalizes TableQA as a causal graph and identifies token co-occurrence patterns as confounders. By applying front-door adjustment, CIT guides question variant generation and reasoning to reduce confounding effects. Experiments on multiple benchmarks show that CIT achieves state-of-the-art performance, demonstrating its effectiveness in mitigating bias. Consistent gains across various LLMs further confirm its generalizability.

AAAI Conference 2025 Conference Paper

CLEP: A Novel Contrastive Learning Method for Evolutionary Reentrancy Vulnerability Detection

  • Jie Chen
  • Liangmin Wang
  • Huijuan Zhu
  • Victor S. Sheng

Reentrancy vulnerabilities in smart contracts have been exploited to steal enormous amounts of money, thus detecting reentrancy vulnerabilities is a hotspot issue in security research. However, a new attack is emerging in which attackers continuously release new reentrancy patterns to exploit fresh vulnerabilities and obfuscate existing ones. Existing detection methods neglect the time-series evolution of vulnerabilities across different smart contract versions, leading to a gradual decline in their effectiveness over time. We investigate the time-series correlations among vulnerabilities in various versions and refer to these as Evolutionary Reentrancy Vulnerabilities (ERVs). We summarize that ERVs detection faces two key challenges: (i) capturing the evolving pattern of ERVs along a complete evolutionary chain and (ii) detecting fresh reentrancy vulnerabilities in new versions. To address these challenges, we propose CLEP, a novel Contrastive Learning with Evolving Pairs detection method. It can effectively capture the evolving patterns by discerning similarities and differences across versions. Specifically, we first modified the sample distribution by incorporating version declarations as time-series evolution information. Then, leveraging the hierarchical similarity, we design an evolving pairs scheme to form negative and positive contract pairs across versions. Finally, we build a complete evolutionary chain by proposing a version-aware contrastive sampler. Our experimental results show that CLEP not only outperforms state-of-the-art baselines in version-specific scenarios but also shows promising performance in cross-version evolution scenarios.

TCS Journal 2025 Journal Article

Clustering under a knapsack constraint: Parameterized approximation for the knapsack median problem

  • Zhen Zhang
  • Zhuohang Gao
  • Limei Liu
  • Yao Liu
  • Jie Chen
  • Qilong Feng

The Knapsack Median problem, defined over a set of clients and facilities in a metric space, seeks to open a subset of facilities and connect each client to an opened facility, with the goal of minimizing the sum of client-connection costs while keeping the sum of facility-opening costs within a specified budget. Solving this problem exactly in FPT time, parameterized by the maximum number of opened facilities (denoted by k), is unlikely due to its W[2]-hardness. Thus, we focus on parameterized approximation algorithms for the problem. We give a sampling-based method that reduces the solution search space, which yields a ( 3 + ε ) -approximation algorithm running in ( k ε − 1 ) O ( k ) n O ( 1 ) time in general metric spaces and a ( 1 + ε ) -approximation algorithm with similar running time in d-dimensional Euclidean space.

TMLR Journal 2025 Journal Article

Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization

  • Wei Liu
  • Anweshit Panda
  • Ujwal Pandey
  • Christopher Brissette
  • Yikang Shen
  • George Slota
  • Naigang Wang
  • Jie Chen

In this paper, we design two compressed decentralized algorithms for solving nonconvex stochastic optimization under two different scenarios. Both algorithms adopt a momentum technique to achieve fast convergence and a message-compression technique to save communication costs. Though momentum acceleration and compressed communication have been used in literature, it is highly nontrivial to theoretically prove the effectiveness of their composition in a decentralized algorithm that can maintain the benefits of both sides, because of the need to simultaneously control the consensus error, the compression error, and the bias from the momentum gradient. For the scenario where gradients are bounded, our proposal is a compressed decentralized adaptive method. To the best of our knowledge, this is the first decentralized adaptive stochastic gradient method with compressed communication. For the scenario of data heterogeneity without bounded gradients, our proposal is a compressed decentralized heavy-ball method, which applies a gradient tracking technique to address the challenge of data heterogeneity. Notably, both methods achieve an optimal convergence rate, and they can achieve linear speed up and adopt topology-independent algorithmic parameters within a certain regime of the user-specified error tolerance. Superior empirical performance is observed over state-of-the-art methods on training deep neural networks (DNNs) and Transformers.

AAAI Conference 2025 Conference Paper

Cross-View Graph Consistency Learning for Invariant Graph Representations

  • Jie Chen
  • Hua Mao
  • Wai Lok Woo
  • Chuanbin Liu
  • Xi Peng

Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived from an incomplete graph structure through a coupled graph structure augmentation scheme. This augmentation scheme mitigates the potential information loss that is commonly associated with various data augmentation techniques involving raw graph data, such as edge perturbation, node removal, and attribute masking. Second, we propose a CGCL model that can learn invariant graph representations. A cross-view training scheme is proposed to train the proposed CGCL model. This scheme attempts to maximize the consistency information between one augmented view and the graph structure reconstructed from the other augmented view. Furthermore, we offer a comprehensive theoretical CGCL analysis. This paper empirically and experimentally demonstrates the effectiveness of the proposed CGCL method, achieving competitive results on graph datasets in comparisons with several state-of-the-art algorithms.

NeurIPS Conference 2025 Conference Paper

Deep Compositional Phase Diffusion for Long Motion Sequence Generation

  • Ho Yin Au
  • Jie Chen
  • Junkun Jiang
  • Jingyu Xiang

Recent research on motion generation has shown significant progress in generating semantically aligned motion with singular semantics. However, when employing these models to create composite sequences containing multiple semantically generated motion clips, they often struggle to preserve the continuity of motion dynamics at the transition boundaries between clips, resulting in awkward transitions and abrupt artifacts. To address these challenges, we present Compositional Phase Diffusion, which leverages the Semantic Phase Diffusion Module (SPDM) and Transitional Phase Diffusion Module (TPDM) to progressively incorporate semantic guidance and phase details from adjacent motion clips into the diffusion process. Specifically, SPDM and TPDM operate within the latent motion frequency domain established by the pre-trained Action-Centric Motion Phase Autoencoder (ACT-PAE). This allows them to learn semantically important and transition-aware phase information from variable-length motion clips during training. Experimental results demonstrate the competitive performance of our proposed framework in generating compositional motion sequences that align semantically with the input conditions, while preserving phase transitional continuity between preceding and succeeding motion clips. Additionally, motion inbetweening task is made possible by keeping the phase parameter of the input motion sequences fixed throughout the diffusion process, showcasing the potential for extending the proposed framework to accommodate various application scenarios. Codes are available at https: //github. com/asdryau/TransPhase.

AAAI Conference 2025 Conference Paper

Defense Against Model Stealing Based on Account-Aware Distribution Discrepancy

  • Jian-Ping Mei
  • Weibin Zhang
  • Jie Chen
  • Xuyun Zhang
  • Tiantian Zhu

Malicious users attempt to replicate commercial models functionally at low cost by training a clone model with query responses. It is challenging to timely prevent such model-stealing attacks to achieve strong protection and maintain utility. In this paper, we propose a novel non-parametric detector called Account-aware Distribution Discrepancy (ADD) to recognize queries from malicious users by leveraging account-wise local dependency. We formulate each class as a Multivariate Normal distribution (MVN) in the feature space and measure the malicious score as the sum of weighted class-wise distribution discrepancy. The ADD detector is combined with random-based prediction poisoning to yield a plug-and-play defense module named D-ADD for image classification models. Results of extensive experimental studies show that D-ADD achieves strong defense against different types of attacks with little interference in serving benign users for both soft and hard-label settings.

AAAI Conference 2025 Conference Paper

DigitalLLaVA: Incorporating Digital Cognition Capability for Physical World Comprehension in Multimodal LLMs

  • Shiyu Li
  • Pengxu Wei
  • Pengchong Qiao
  • Chang Liu
  • Jie Chen

Multimodal Large Language Models (MLLMs) have shown remarkable cognitive capabilities in various cross-modal tasks.However, existing MLLMs struggle with tasks that require physical digital cognition, such as accurately reading an electric meter or pressure gauge. This limitation significantly reduces their effectiveness in practical applications like industrial monitoring and home energy management, where digital sensors are not feasible. For humans, physical digits are artificially defined quantities presented on specific carriers, which require training to recognize. As existing MLLMs are only pre-trained in the manner of object recognition, they fail to comprehend the relationship between digital carriers and their reading. To this end, referring to human behavior, we propose a novel DigitalLLaVA method to explicitly inject digital cognitive abilities into MLLMs in a two-step manner. In the first step, to improve the MLLM's understanding of physical digit carriers, we propose a digit carrier mapping method. This step utilizes object-level text-image pairs to enhance the model's comprehension of objects containing physical digits. For the second step, unlike previous methods that rely on sequential digital prediction or digit regression, we propose a 32 bit floating point simulation approach that treats digit prediction as a whole. Using digit-level text-image pairs, we train three float heads to predict 32-bit floating-point numbers using 0/1 binary classification. This step significantly reduces the search space, making the prediction process more robust and straightforward. Being simple but effective, our method can identify very precise metrics (i.e., accurate to ±0.001) and provide floating-point results, showing its applicability in digital carrier domains.

NeurIPS Conference 2025 Conference Paper

Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection

  • Yu Li
  • Xingyu Qiu
  • Yuqian Fu
  • Jie Chen
  • Tianwen Qian
  • Xu Zheng
  • Danda Pani Paudel
  • Yanwei Fu

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to detect novel objects with only a handful of labeled samples from previously unseen domains. While data augmentation and generative methods have shown promise in few-shot learning, their effectiveness for CD-FSOD remains unclear due to the need for both visual realism and domain alignment. Existing strategies, such as copy-paste augmentation and text-to-image generation, often fail to preserve the correct object category or produce backgrounds coherent with the target domain, making them non-trivial to apply directly to CD-FSOD. To address these challenges, we propose Domain-RAG, a training-free, retrieval-guided compositional image generation framework tailored for CD-FSOD. Domain-RAG consists of three stages: domain-aware background retrieval, domain-guided background generation, and foreground-background composition. Specifically, the input image is first decomposed into foreground and background regions. We then retrieve semantically and stylistically similar images to guide a generative model in synthesizing a new background, conditioned on both the original and retrieved contexts. Finally, the preserved foreground is composed with the newly generated domain-aligned background to form the generated image. Without requiring any additional supervision or training, Domain-RAG produces high-quality, domain-consistent samples across diverse tasks, including CD-FSOD, remote sensing FSOD, and camouflaged FSOD. Extensive experiments show consistent improvements over strong baselines and establish new state-of-the-art results. Codes will be released upon acceptance. The source code and instructions are available at https: //github. com/LiYu0524/Domain-RAG.

IJCAI Conference 2025 Conference Paper

Dual-Balancing for Physics-Informed Neural Networks

  • Chenhong Zhou
  • Jie Chen
  • Zaifeng Yang
  • Ching Eng Png

Physics-informed neural networks (PINNs) have emerged as a new learning paradigm for solving partial differential equations (PDEs) by enforcing the constraints of physical equations, boundary conditions (BCs), and initial conditions (ICs) into the loss function. Despite their successes, vanilla PINNs still suffer from poor accuracy and slow convergence due to the intractable multi-objective optimization issue. In this paper, we propose a novel Dual-Balanced PINN (DB-PINN), which dynamically adjusts loss weights by integrating inter-balancing and intra-balancing to alleviate two imbalance issues in PINNs. Inter-balancing aims to mitigate the gradient imbalance between PDE residual loss and condition-fitting losses by determining an aggregated weight that offsets their gradient distribution discrepancies. Intra-balancing acts on condition-fitting losses to tackle the imbalance in fitting difficulty across diverse conditions. By evaluating the fitting difficulty based on the loss records, intra-balancing can allocate the aggregated weight proportionally to each condition loss according to its fitting difficulty level. We further introduce a robust weight update strategy to prevent abrupt spikes and arithmetic overflow in instantaneous weight values caused by large loss variances, enabling smooth weight updating and stable training. Extensive experiments demonstrate that DB-PINN achieves significantly superior performance than those popular gradient-based weighting methods in terms of convergence speed and prediction accuracy. Our code and supplementary material are available at https: //github. com/chenhong-zhou/DualBalanced-PINNs.

NeurIPS Conference 2025 Conference Paper

DyMoDreamer: World Modeling with Dynamic Modulation

  • Boxuan Zhang
  • Runqing Wang
  • Wei Xiao
  • Weipu Zhang
  • Jian Sun
  • Gao Huang
  • Jie Chen
  • Gang Wang

A critical bottleneck in deep reinforcement learning (DRL) is sample inefficiency, as training high-performance agents often demands extensive environmental interactions. Model-based reinforcement learning (MBRL) mitigates this by building world models that simulate environmental dynamics and generate synthetic experience, improving sample efficiency. However, conventional world models process observations holistically, failing to decouple dynamic objects and temporal features from static backgrounds. This approach is computationally inefficient, especially for visual tasks where dynamic objects significantly influence rewards and decision-making performance. To address this, we introduce DyMoDreamer, a novel MBRL algorithm that incorporates a dynamic modulation mechanism to improve the extraction of dynamic features and enrich the temporal information. DyMoDreamer employs differential observations derived from a novel inter-frame differencing mask, explicitly encoding object-level motion cues and temporal dynamics. Dynamic modulation is modeled as stochastic categorical distributions and integrated into a recurrent state-space model (RSSM), enhancing the model's focus on reward-relevant dynamics. Experiments demonstrate that DyMoDreamer sets a new state-of-the-art on the Atari $100$k benchmark with a $156. 6$\% mean human-normalized score, establishes a new record of $832$ on the DeepMind Visual Control Suite, and gains a $9. 5$\% performance improvement after $1$M steps on the Crafter benchmark.

NeurIPS Conference 2025 Conference Paper

GMV: A Unified and Efficient Graph Multi-View Learning Framework

  • Qipeng zhu
  • Jie Chen
  • Jian Pu
  • Junping Zhang

Graph Neural Networks (GNNs) are pivotal in graph classification but often struggle with generalization and overfitting. We introduce a unified and efficient Graph Multi-View (GMV) learning framework that integrates multi-view learning into GNNs to enhance robustness and efficiency. Leveraging the lottery ticket hypothesis, GMV activates diverse sub-networks within a single GNN through a novel training pipeline, which includes mixed-view generation, and multi-view decomposition and learning. This approach simultaneously broadens "views" from the data, model, and optimization perspectives during training to enhance the generalization capabilities of GNNs. During inference, GMV only incorporates additional prediction heads into standard GNNs, thereby achieving multi-view learning at minimal cost. Our experiments demonstrate that GMV surpasses other augmentation and ensemble techniques for GNNs and Graph Transformers across various graph classification scenarios.

ICML Conference 2025 Conference Paper

GPEN: Global Position Encoding Network for Enhanced Subgraph Representation Learning

  • Nannan Wu
  • Yuming Huang
  • Yiming Zhao
  • Jie Chen
  • Wenjun Wang 0002

Subgraph representation learning has attracted growing interest due to its wide applications in various domains. However, existing methods primarily focus on local neighborhood structures while overlooking the significant impact of global structural information, in particular the influence of multi-hop neighbors beyond immediate neighborhoods. This presents two key challenges: how to effectively capture the structural relationships between distant nodes, and how to prevent excessive aggregation of global structural information from weakening the discriminative ability of subgraph representations. To address these challenges, we propose GPEN (Global Position Encoding Network). GPEN leverages a hierarchical tree structure to encode each node’s global position based on its path distance to the root node, enabling a systematic way to capture relationships between distant nodes. Furthermore, we introduce a boundary-aware convolution module that selectively integrates global structural information while maintaining the unique structural patterns of each subgraph. Extensive experiments on eight public datasets identify that GPEN significantly outperforms state-of-the-art methods in subgraph representation learning.

ICLR Conference 2025 Conference Paper

Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems

  • Jie Chen

Preconditioning is at the heart of iterative solutions of large, sparse linear systems of equations in scientific disciplines. Several algebraic approaches, which access no information beyond the matrix itself, are widely studied and used, but ill-conditioned matrices remain very challenging. We take a machine learning approach and propose using graph neural networks as a general-purpose preconditioner. They show attractive performance for many problems and can be used when the mainstream preconditioners perform poorly. Empirical evaluation on over 800 matrices suggests that the construction time of these graph neural preconditioners (GNPs) is more predictable and can be much shorter than that of other widely used ones, such as ILU and AMG, while the execution time is faster than using a Krylov method as the preconditioner, such as in inner-outer GMRES. GNPs have a strong potential for solving large-scale, challenging algebraic problems arising from not only partial differential equations, but also economics, statistics, graph, and optimization, to name a few.

YNIMG Journal 2025 Journal Article

Increased spindle-related brain activation in right middle temporal gyrus during N2 than N3 among healthy sleepers: Initial discovery and independent sample replication

  • Yan Shao
  • Yupeng Guo
  • Yun Chen
  • Guangyuan Zou
  • Jie Chen
  • Xuejiao Gao
  • Panpan Lu
  • Yujie Tong

The association between spindle metrics and sleep architecture differs during N2 vs. N3 sleep, the underlying neural mechanism is not clearly illustrated. Here, we tested the discrepancy in spindle-related brain activation between N2 and N3 within healthy college students (dataset 1: n = 27, 59 % females, median age 23 years), using simultaneous electroencephalography-functional magnetic resonance imaging (EEG-fMRI). To assess the replicability of the finding, we repeated the analysis among normal adults (independent dataset 2: n = 30, 50 % females, median age 32 years). The finding from dataset 1 indicated significantly increased blood-oxygen level-dependent signal in the right middle temporal gyrus during N2 compared with N3, which was well replicated in dataset 2. Furthermore, correlation analysis was performed to explore the association between this spindle-related brain activation and N2, N3 sleep duration during EEG-fMRI. We conducted the correlation analysis in N2 and N3, respectively. The negative association between spindle-related brain activation in the right middle temporal gyrus and sleep duration was only observed in N2. Our findings emphasize the unique role of spindle-related brain activation in the right middle temporal gyrus during N2 in shortening N2 sleep duration.

AIIM Journal 2025 Journal Article

Interactive prototype learning and self-learning for few-shot medical image segmentation

  • Yuhui Song
  • Chenchu Xu
  • Boyan Wang
  • Xiuquan Du
  • Jie Chen
  • Yanping Zhang
  • Shuo Li

Few-shot learning alleviates the heavy dependence of medical image segmentation on large-scale labeled data, but it shows strong performance gaps when dealing with new tasks compared with traditional deep learning. Existing methods mainly learn the class knowledge of a few known (support) samples and extend it to unknown (query) samples. However, the large distribution differences between the support image and the query image lead to serious deviations in the transfer of class knowledge, which can be specifically summarized as two segmentation challenges: Intra-class inconsistency and Inter-class similarity, blurred and confused boundaries. In this paper, we propose a new interactive prototype learning and self-learning network to solve the above challenges. First, we propose a deep encoding-decoding module to learn the high-level features of the support and query images to build peak prototypes with the greatest semantic information and provide semantic guidance for segmentation. Then, we propose an interactive prototype learning module to improve intra-class feature consistency and reduce inter-class feature similarity by conducting mid-level features-based mean prototype interaction and high-level features-based peak prototype interaction. Last, we propose a query features-guided self-learning module to separate foreground and background at the feature level and combine low-level feature maps to complement boundary information. Our model achieves competitive segmentation performance on benchmark datasets and shows substantial improvement in generalization ability.

JBHI Journal 2025 Journal Article

Interpretable Dynamic Directed Graph Convolutional Network for Multi-Relational Prediction of Missense Mutation and Drug Response

  • Qian Gao
  • Tao Xu
  • Xiaodi Li
  • Wanling Gao
  • Haoyuan Shi
  • Youhua Zhang
  • Jie Chen
  • Zhenyu Yue

Tumor heterogeneity presents a significant challenge in predicting drug responses, especially as missense mutations within the same gene can lead to varied outcomes such as drug resistance, enhanced sensitivity, or therapeutic ineffectiveness. These complex relationships highlight the need for advanced analytical approaches in oncology. Due to their powerful ability to handle heterogeneous data, graph convolutional networks (GCNs) represent a promising approach for predicting drug responses. However, simple bipartite graphs cannot accurately capture the complex relationships involved in missense mutation and drug response. Furthermore, Deep learning models for drug response are often considered “black boxes”, and their interpretability remains a widely discussed issue. To address these challenges, we propose an Interpretable Dynamic Directed Graph Convolutional Network (IDDGCN) framework, which incorporates four key features: 1) the use of directed graphs to differentiate between sensitivity and resistance relationships, 2) the dynamic updating of node weights based on node-specific interactions, 3) the exploration of associations between different mutations within the same gene and drug response, and 4) the enhancement of interpretability models through the integration of a weighted mechanism that accounts for the biological significance, alongside a ground truth construction method to evaluate prediction transparency. The experimental results demonstrate that IDDGCN outperforms existing state-of-the-art models, exhibiting excellent predictive power. Both qualitative and quantitative evaluations of its interpretability further highlight its ability to explain predictions, offering a fresh perspective for precision oncology and targeted drug development.

NeurIPS Conference 2025 Conference Paper

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

  • Yuanzhe Liu
  • Ryan Deng
  • Tim Kaler
  • Xuhao Chen
  • Charles Leiserson
  • Yao Ma
  • Jie Chen

Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel at different optimization categories and no one dominates others. This observation prompts the question of how one leverages multiple LLM agents to solve a coding problem without knowing their complementary strengths a priori. We argue that a team of agents can learn from each other's successes and failures so as to improve their own performance. Thus, a lesson is the knowledge produced by an agent and passed on to other agents in the collective solution process. We propose a lesson-based collaboration framework, design the lesson solicitation--banking--selection mechanism, and demonstrate that a team of small LLMs with lessons learned can outperform a much larger LLM and other multi-LLM collaboration methods.

IJCAI Conference 2025 Conference Paper

MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction

  • Zishan Shu
  • Yufan Deng
  • Hongyu Zhang
  • Zhiwei Nie
  • Jie Chen

Activity cliff prediction is a critical task in drug discovery and material design. Existing computational methods are limited to handling single binding targets, which restricts the applicability of these prediction models. In this paper, we present the Multi-Grained Target Perception network (MTPNet) to incorporate the prior knowledge of interactions between the molecules and their target proteins. Specifically, MTPNet is a unified framework for activity cliff prediction, which consists of two components: Macro-level Target Semantic (MTS) guidance and Micro-level Pocket Semantic (MPS) guidance. By this way, MTPNet dynamically optimizes molecular representations through multi-grained protein semantic conditions. To our knowledge, it is the first time to employ the receptor proteins as guiding information to effectively capture critical interaction details. Extensive experiments on 30 representative activity cliff datasets demonstrate that MTPNet significantly outperforms previous approaches, achieving an average RMSE improvement of 18. 95% on top of several mainstream GNN architectures. Overall, MTPNet internalizes interaction patterns through conditional deep learning to achieve unified predictions of activity cliffs, helping to accelerate compound optimization and design. Codes are available at: https: //github. com/ZishanShu/MTPNet.

NeurIPS Conference 2025 Conference Paper

SE-GUI: Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning

  • Xinbin Yuan
  • Jian Zhang
  • Kaixin Li
  • Zhuoxuan Cai
  • Lujian Yao
  • Jie Chen
  • Enguang Wang
  • Qibin Hou

Graphical User Interface (GUI) agents have made substantial strides in understanding and executing user instructions across diverse platforms. Yet, grounding these instructions to precise interface elements remains challenging—especially in complex, high-resolution, professional environments. Traditional supervised fine-tuning (SFT) methods often require large volumes of diverse data and exhibit weak generalization. To overcome these limitations, we introduce a reinforcement learning (RL)-based framework that incorporates three core strategies: (1) seed data curation to ensure high-quality training samples, (2) a dense policy gradient that provides continuous feedback based on prediction accuracy, and (3) a self-evolutionary reinforcement finetuning mechanism that iteratively refines the model using attention maps. With only 3k training samples, our 7B-parameter model achieves state-of-the-art results among similarly sized models on three grounding benchmarks. Notably, it attains 47. 3\% accuracy on the ScreenSpot-Pro dataset—outperforming much larger models, such as UI-TARS-72B, by a margin of 24. 2\%. These findings underscore the effectiveness of RL-based approaches in enhancing GUI agent performance, particularly in high-resolution, complex environments.

YNIMG Journal 2025 Journal Article

Tasting emotions: An in-depth fmri study exploring gustatory and visual cross-modal associations across various spatio-temporal regions of the human brain

  • Jie Chen
  • Yuansheng Liu
  • Lina Huang
  • Luming Hu
  • Xueying Li
  • Liuqing Wei
  • Weiping Yang
  • Simin Zhao

This study investigates how taste influences emotional face recognition, focusing on the cross-modal interaction between gustatory and visual stimuli. While prior research has primarily examined how visual cues modulate taste perception, the reverse direction-how taste shapes visual processing in emotional contexts-remains underexplored. Using a combination of task-based functional MRI (task-fMRI) and resting-state fMRI (rs-fMRI), we examined the neural mechanisms by which taste modulates the perception of emotional faces. Behaviorally, sour tastes facilitated faster recognition of disgusted faces, while sweet tastes enhanced the detection of pleasant expressions. Neuroimaging results revealed that these emotionally congruent taste-face pairings elicited distinct activation patterns in the early visual cortex, including a significant interaction effect in the right calcarine gyrus (primary visual cortex, V1). Task-fMRI also showed modulation in the medial cingulate gyrus, fusiform gyrus, and superior frontal regions depending on emotional congruency. Resting-state fMRI revealed sustained alterations in intrinsic connectivity within the medial cingulate and paracingulate cortex following cross-modal dissonance, suggesting lasting neural effects beyond stimulus presentation. Together, these findings demonstrate the dynamic and enduring influence of taste on emotional face processing and offer novel insights into the neural basis of multisensory affective integration. By integrating task-based and resting-state fMRI, this study provides a comprehensive framework for understanding how affectively salient gustatory inputs shape social perception through both early perceptual and sustained neural mechanisms.

ICML Conference 2025 Conference Paper

Teaching Language Models to Critique via Reinforcement Learning

  • Zhihui Xie 0002
  • Jie Chen
  • Liyu Chen
  • Weichao Mao
  • Jingjing Xu 0001
  • Lingpeng Kong

Teaching large language models (LLMs) to critique and refine their outputs is crucial for building systems that can iteratively improve, yet it is fundamentally limited by the ability to provide accurate judgments and actionable suggestions. In this work, we study LLM critics for code generation and propose $\texttt{CTRL}$, a framework for $\texttt{C}$ritic $\texttt{T}$raining via $\texttt{R}$einforcement $\texttt{L}$earning, which trains a critic model to generate feedback that maximizes correction performance for a fixed generator model without human supervision. Our results demonstrate that critics trained with $\texttt{CTRL}$ significantly enhance pass rates and mitigate compounding errors across both base and stronger generator models. Furthermore, we show that these critic models act as accurate generative reward models and enable test-time scaling through iterative critique-revision, achieving up to 106. 1% relative improvements across challenging code generation benchmarks.

IROS Conference 2025 Conference Paper

Three-DOF controlled flight in palm-scale micro robotic blimp driven by flapping wings

  • Jie Chen
  • Xiang Lu
  • Yulie Wu
  • Yang Chen
  • Dingbang Xiao
  • Xuezhong Wu

Micro blimps exhibit significant potential for applications in environmental monitoring and disaster rescue. Nonetheless, traditional propulsion methods for micro blimps encounter challenges such as complex mechanical structures, intricate attitude control, and large volumes. This paper present a novel compact and lightweight bio-inspired flapping-wing-driven micro robotic blimp actuated by piezoelectric (PZT), featuring a simplified structure and achieving three-degree-of-freedom (DOF) motion control with only two flapping-wing thruster units. We present a high-voltage drive-sense-control circuit and adaptive control strategy, enabling wireless remote control, onboard attitude sensing, and closed-loop yaw control. The proposed micro robotic blimp, powered by an onboard battery, measures 15 cm in major axis and weighs 1. 53 g, achieves a maneuvering speed of 17 cm/s, and angular velocity reaches 12°/s with a yaw angle control accuracy of 0. 5°. As the smallest and lightest known self-powered micro blimp capable of stable yaw control, the platform demonstrates excellent endurance and environmental stealth characteristics and advances the design of micro aerial vehicles by offering a novel and efficient approach.

NeurIPS Conference 2025 Conference Paper

Unraveling Metameric Dilemma for Spectral Reconstruction: A High-Fidelity Approach via Semi-Supervised Learning

  • Xingxing Yang
  • Jie Chen
  • Zaifeng Yang

Spectral reconstruction from RGB images often suffers from a metameric dilemma, where distinct spectral distributions map to nearly identical RGB values, making them indistinguishable to current models and leading to unreliable reconstructions. In this paper, we present Diff-Spectra that integrates supervised physics-aware spectral estimation and unsupervised high-fidelity spectral regularization for HSI reconstruction. We first introduce an Adaptive illumiChroma Decoupling (AICD) module to decouple illumination and chrominance information, which learns intrinsic and distinctive feature distributions, thereby mitigating the metameric issue. Then, we incorporate the AICD into a learnable spectral response function (SRF) guided hyperspectral initial estimation mechanism to mimic the physical image formation and thus inject physics-aware reasoning into neural networks, turning an ill-posed problem into a constrained, interpretable task. We also introduce a metameric spectra augmentation method to synthesize comprehensive hyperspectral data to pre-train a Spectral Diffusion Module (SDM), which internalizes the statistical properties of real-world HSI data, enforcing unsupervised high-fidelity regularization on the spectral transitions via inner-loop optimization during inference. Extensive experimental evaluations demonstrate that our Diff-Spectra achieves SOTA performance on both Spectral reconstruction and downstream HSI classification.

JBHI Journal 2025 Journal Article

Valence-Arousal Disentangled Representation Learning for Emotion Recognition in SSVEP-Based BCIs

  • Yipeng Du
  • Jie Chen
  • Zhengwu Liu
  • Ngai Wong
  • Chi Zhang
  • Zhiwei Ding
  • Jian Liu
  • Edith C.H. Ngai

Steady state visually evoked potential (SSVEP)-based brain-computer interfaces (BCIs), which are widely used in rehabilitation and disability assistance, can benefit from real-time emotion recognition to enhance human–machine interaction. However, the learned discri-minative latent representations in SSVEP-BCIs may generalize in an unintended direction, which can lead to reduced accuracy in detecting emotional states. In this paper, we introduce a Valence-Arousal Disentangled Representation Learning (VADL) method, drawing inspir-ation from the classical two-dimensional emotional model, to enhance the performance and generalization of emotion recognition within SSVEP-BCIs. VADL distinctly disentangles the latent variables of valence and arousal information to improve accuracy. It utilizes the structured state space duality model to thoroughly extract global emotional features. Additionally, we propose a Multisubject Gradient Blending training strategy that individually tailors the learning pace of reconstruction and discrimination tasks within VADL on-the-fly. To verify the feasibility of our method, we have developed a comprehensive database comprising 23 subjects, in which both the emotional states and SSVEPs were effectively elicited. Experimental results indicate that VADL surpasses existing state-of-the-art benchmark algorithms.

IROS Conference 2024 Conference Paper

A Point-Line Features Fusion Method for Fast and Robust Monocular Visual-Inertial Initialization

  • Guoqiang Xie
  • Jie Chen
  • Tianhang Tang
  • Zeyu Chen
  • Ling Lei
  • Yiguang Liu

Fast and robust initialization is essential for highly accurate monocular visual-inertial odometer (VIO), but at present majority of initialization methods rely only on point features, unstable in low texture and blurring situations. Therefore, we propose a novel point-line features fusion method for monocular visual-inertial initialization, as line features are more stable and provide richer geometric information than point features: 1) a closed-form line features initialization method is presented, and combined with point features to obtain a more integrated and robust linear system; 2) a monocular depth network is adopted to provide learned affine-invariant depth map, requiring only one prior depth map for the first frame, which can improve performance under low-parallax scenarios; 3) we can easily use RANSAC to reject outliers in solving linear system based on our formulation. Moreover, line feature re-projection residual is added to visual-inertial bundle adjustment (VI-BA) to obtain more accurate initial parameters. The proposed method is more accurate and robust than state-of-the-art methods due to the line features, especially under extreme low-parallax scenarios, and extensive experiments on popular datasets have confirmed, 0. 5s initialization window on EuRoC MAV, 0. 3s initialization window on TUM-VI, while the standard method normally waits for a window of 2s.

EAAI Journal 2024 Journal Article

An explainable neural network integrating Jiles-Atherton and nonlinear auto-regressive exogenous models for modeling universal hysteresis

  • Lei Ni
  • Jie Chen
  • Guoqiang Chen
  • Dongmei Zhao
  • Geng Wang
  • Sumeet S. Aphale

The inherent nonlinear and memory-dependent input-output characteristics of piezoelectric actuators pose challenges to the precision of piezoelectric positioning systems. In order to solve this problem, this paper firstly transforms the Jiles-Atherton (JA) model into a neural network structure, designs the Jiles-Atherton neural network (JANN), and combines JANN with nonlinear autoregressive exogenous input (NARX) neural network. A hybrid JA-NARX neural network model is proposed for the first time. This model has the advantages of simple structure, high modeling accuracy, and good interpretability. The effectiveness of the proposed JA-NARX neural network model is validated through a series of experiments, specifically assessing its capacity to accurately capture rate-dependent and asymmetric hysteresis characteristics. The results show that although the proposed neural network model has fewer layers and relatively simple structure, it can realize the high-precision modeling of piezoelectric hysteresis dynamics at a lower computational cost. The experimental data shows that, under the excitation of 60 Hz input signal, the model's PV error only accounts for 0. 82% of the full scale range, and the modeling performance is far superior to other models.

NeurIPS Conference 2024 Conference Paper

Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

  • Rong Ma
  • Jie Chen
  • Xiangyang Xue
  • Jian Pu

Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark. Our code can be found in https: //github. com/Mrhonor/AutoUniSeg.

JBHI Journal 2024 Journal Article

CALLM: Enhancing Clinical Interview Analysis Through Data Augmentation With Large Language Models

  • Yuqi Wu
  • Kaining Mao
  • Yanbo Zhang
  • Jie Chen

The global prevalence of mental health disorders is increasing, leading to a significant economic burden estimated in trillions of dollars. In automated mental health diagnosis, the scarcity and imbalance of clinical data pose considerable challenges for researchers, limiting the effectiveness of machine learning algorithms. To cope with this issue, this paper aims to introduce a novel clinical transcript data augmentation framework by leveraging large language models (CALLM). The framework follows a “patient-doctor role-playing” intuition to generate realistic synthetic data. In addition, our study introduces a unique “Textbook-Assignment-Application” (T-A-A) partitioning approach to offer a systematic means of crafting synthetic clinical interview datasets. Concurrently, we have also developed a “Response-Reason” prompt engineering paradigm to generate highly authentic and diagnostically valuable transcripts. By leveraging a fine-tuned DistilBERT model on the E-DAIC PTSD dataset, we achieved a balanced accuracy of 0. 77, an F1-score of 0. 70, and an AUC of 0. 78 during test set evaluations, which showcase robust adaptability in both Zero-Shot Learning (ZSL) and Few-Shot Learning (FSL) scenarios. We further compare the CALLM framework with other data augmentation methods and PTSD diagnostic works and demonstrates consistent improvements. Compared to conventional data collection methods, our synthetic dataset not only demonstrates superior performance but also incurs less than 1% of the associated costs.

AAAI Conference 2024 Conference Paper

CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning

  • Qingsong Yan
  • Qiang Wang
  • Kaiyong Zhao
  • Jie Chen
  • Bo Li
  • Xiaowen Chu
  • Fei Deng

Neural Radiance Fields have demonstrated impressive performance in novel view synthesis. However, NeRF and most of its variants still rely on traditional complex pipelines to provide extrinsic and intrinsic camera parameters, such as COLMAP. Recent works, like NeRFmm, BARF, and L2G-NeRF, directly treat camera parameters as learnable and estimate them through differential volume rendering. However, these methods work for forward-looking scenes with slight motions and fail to tackle the rotation scenario in practice. To overcome this limitation, we propose a novel camera parameter free neural radiance field (CF-NeRF), which incrementally reconstructs 3D representations and recovers the camera parameters inspired by incremental structure from motion. Given a sequence of images, CF-NeRF estimates camera parameters of images one by one and reconstructs the scene through initialization, implicit localization, and implicit optimization. To evaluate our method, we use a challenging real-world dataset, NeRFBuster, which provides 12 scenes under complex trajectories. Results demonstrate that CF-NeRF is robust to rotation and achieves state-of-the-art results without providing prior information and constraints.

JBHI Journal 2024 Journal Article

Difference-Deformable Convolution With Pseudo Scale Instance Map for Cell Localization

  • Chengyang Zhang
  • Jie Chen
  • Bo Li
  • Min Feng
  • Yongquan Yang
  • Qikui Zhu
  • Hong Bu Bu

Cell localization still faces two unresolved challenges: 1) the dramatic variations in cell morphology, coupled with the heterogeneous intensity distribution of lightly stained cells; 2) existing cell location maps lack scale information, resulting in insufficient supervision for point maps and inaccurate supervision for density maps. 1) To address the first challenges, we introduce a novel gradient-aware and shape-adaptive Difference-Deformable Convolution (DDConv), which enhances the model's robustness to color by leveraging gradient information while adaptively adjusting the shape of the convolutional kernel to tackle the substantial variability in cell morphology. 2) To overcome the issue of unreasonable location maps, we propose the Pseudo-Scale Instance (PSI) map, which can adaptively provide the corresponding scale information for each cell to realize accurate supervision. We analyze and evaluate DDConv and the PSI map in three challenging cell localization tasks. In comparison to existing methods, our proposed approach significantly enhances localization performance, setting a new benchmark for the cell localization task.

TCS Journal 2024 Journal Article

Efficient code-based fully dynamic group signature scheme

  • Luping Wang
  • Jie Chen
  • Huan Dai
  • Chongben Tao

Code-based group signature is an important research topic in recent years. Since the pioneering work by Alamélou et al. (WCC 2015), several other schemes have been proposed to provide improvements in security, efficiency and functionality. However, most existing constructions work only in the static setting where the group population is fixed at the setup phase. Only a few schemes address partially dynamic, which can realize only one of users enrollment or revocation. In this work, we provide an efficient code-based fully dynamic group signature (FDGS) scheme, i. e. , users have flexibility when joining and leaving the group. Specifically, to upgrade the scheme into a fully dynamic group signature, we first add a dynamic ingredient into the static 2-RNSD Merkle-tree accumulator (ASIACRYPT 2019), then create a simple rule and utilize the Stern-like zero-knowledge protocol to handle users enrollment and revocation efficiently (i. e. , without resetting the whole tree). Moreover, our solution is the first exploration of code-based FDGS with constant signature size.

EAAI Journal 2024 Journal Article

Exponential distance transform maps for cell localization

  • Bo Li
  • Jie Chen
  • Hang Yi
  • Min Feng
  • Yongquan Yang
  • Qikui Zhu
  • Hong Bu

Cell localization in medical image analysis aims for precise identification of cell positions. Existing methods involve predicting density maps from images, followed by post-processing to extract cell location and number details. The quality of generated density maps significantly impacts the model’s localization and counting performance. However, density maps produced with Gaussian kernels exhibit stacking in dense regions, resulting in inaccurate cell location information and suboptimal localization performance. In this study, we propose an exponential distance transform map that ensures accurate location information and provides well-defined gradient details for effective model learning, setting a new benchmark for high performance. Additionally, to address the challenge of substantial variations in cell color within images, we introduce a multi-scale gradient aggregation module that enhances the model’s color recognition robustness through gradient information utilization. Experimental results across diverse datasets showcase notable improvements, establishing a novel benchmark for cell localization.

TMLR Journal 2024 Journal Article

GLASU: A Communication-Efficient Algorithm for Federated Learning with Vertically Distributed Graph Data

  • Xinwei Zhang
  • Mingyi Hong
  • Jie Chen

Vertical federated learning (VFL) is a distributed learning paradigm, where computing clients collectively train a model based on the partial features of the same set of samples they possess. Current research on VFL focuses on the case when samples are independent, but it rarely addresses an emerging scenario when samples are interrelated through a graph. In this work, we train a graph neural network (GNN) through VFL, where each client owns a part of the node features and a different edge set. This data scenario incurs a significant communication overhead, not only because of the handling of distributed features but also due to neighborhood aggregation in a GNN. Moreover, the training analysis is faced with a challenge caused by the biased stochastic gradients. We propose a model-splitting method that splits a backbone GNN across the clients and the server and a communication-efficient algorithm, GLASU, to train such a model. GLASU adopts lazy aggregation and stale updates to skip communication in neighborhood aggregation and in model updates, respectively, greatly reducing communication while enjoying convergence guarantees. We conduct extensive numerical experiments on real-world datasets, showing that GLASU effectively trains a GNN that matches the accuracy of centralized training, while using only a fraction of the time due to communication saving.

NeurIPS Conference 2024 Conference Paper

Graph Neural Flows for Unveiling Systemic Interactions Among Irregularly Sampled Time Series

  • Giangiacomo Mercatali
  • Andre Freitas
  • Jie Chen

Interacting systems are prevalent in nature. It is challenging to accurately predict the dynamics of the system if its constituent components are analyzed independently. We develop a graph-based model that unveils the systemic interactions of time series observed at irregular time points, by using a directed acyclic graph to model the conditional dependencies (a form of causal notation) of the system components and learning this graph in tandem with a continuous-time model that parameterizes the solution curves of ordinary differential equations (ODEs). Our technique, a graph neural flow, leads to substantial enhancements over non-graph-based methods, as well as graph-based methods without the modeling of conditional dependencies. We validate our approach on several tasks, including time series classification and forecasting, to demonstrate its efficacy.

NeurIPS Conference 2024 Conference Paper

HiCoM: Hierarchical Coherent Motion for Dynamic Streamable Scenes with 3D Gaussian Splatting

  • Qiankun Gao
  • Jiarui Meng
  • Chengxiang Wen
  • Jie Chen
  • Jian Zhang

The online reconstruction of dynamic scenes from multi-view streaming videos faces significant challenges in training, rendering and storage efficiency. Harnessing superior learning speed and real-time rendering capabilities, 3D Gaussian Splatting (3DGS) has recently demonstrated considerable potential in this field. However, 3DGS can be inefficient in terms of storage and prone to overfitting by excessively growing Gaussians, particularly with limited views. This paper proposes an efficient framework, dubbed HiCoM, with three key components. First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy. Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians to swiftly and accurately learn motions across frames. Finally, we continually refine the 3DGS with additional Gaussians, which are later merged into the initial 3DGS to maintain consistency with the evolving scene. To preserve a compact representation, an equivalent number of low-opacity Gaussians that minimally impact the representation are removed before processing subsequent frames. Extensive experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about 20% and reduces the data storage by 85%, achieving competitive free-viewpoint video synthesis quality but with higher robustness and stability. Moreover, by parallel learning multiple frames simultaneously, our HiCoM decreases the average training wall time to <2 seconds per frame with negligible performance degradation, substantially boosting real-world applicability and responsiveness.

YNIMG Journal 2024 Journal Article

High-resolution diffusion magnetic resonance imaging and spatial-transcriptomic in developing mouse brain

  • Xinyue Han
  • Surendra Maharjan
  • Jie Chen
  • Yi Zhao
  • Yi Qi
  • Leonard E. White
  • G. Allan Johnson
  • Nian Wang

Brain development is a highly complex process regulated by numerous genes at the molecular and cellular levels. Brain tissue exhibits serial microstructural changes during the development process. High-resolution diffusion magnetic resonance imaging (dMRI) affords a unique opportunity to probe these changes in the developing brain non-destructively. In this study, we acquired multi-shell dMRI datasets at 32 µm isotropic resolution to investigate the tissue microstructure alterations, which we believe to be the highest spatial resolution dMRI datasets obtained for postnatal mouse brains. We adapted the Allen Developing Mouse Brain Atlas (ADMBA) to integrate quantitative MRI metrics and spatial transcriptomics. Diffusion tensor imaging (DTI), diffusion kurtosis imaging (DKI), and neurite orientation dispersion and density imaging (NODDI) metrics were used to quantify brain development at different postnatal days. We demonstrated that the differential evolutions of fiber orientation distributions contribute to the distinct development patterns in white matter (WM) and gray matter (GM). Furthermore, the genes enriched in the nervous system that regulate brain structure and function were expressed in spatial correlation with age-matched dMRI. This study is the first one providing high-resolution dMRI, including DTI, DKI, and NODDI models, to trace mouse brain microstructural changes in WM and GM during postnatal development. This study also highlighted the genotype-phenotype correlation of spatial transcriptomics and dMRI, which may improve our understanding of brain microstructure changes at the molecular level.

JBHI Journal 2024 Journal Article

Hybrid Bayesian Optimization-Based Graphical Discovery for Methylation Sites Prediction

  • Lingyan Gu
  • Tingbo Chen
  • Jianqiang Li
  • Yu-An Huang
  • Zhihua Du
  • Victor C.M. Leung
  • Jie Chen

Protein methylation is one of the most important reversible post-translational modifications (PTMs), playing a vital role in the regulation of gene expression. Protein methylation sites serve as biomarkers in cardiovascular and pulmonary diseases, influencing various aspects of normal cell biology and pathogenesis. Nonetheless, the majority of existing computational methods for predicting protein methylation sites (PMSP) have been constructed based on protein sequences, with few methods leveraging the topological information of proteins. To address this issue, we propose an innovative framework for predicting Methylation Sites using Graphs (GraphMethySite) that employs graph convolution network in conjunction with Bayesian Optimization (BO) to automatically discover the graphical structure surrounding a candidate site and improve the predictive accuracy. In order to extract the most optimal subgraphs associated with methylation sites, we extend GraphMethySite by coupling it with a hybrid Bayesian optimization (together named GraphMethySite $^+$ ) to determine and visualize the topological relevance among amino-acid residues. We evaluated our framework on two extended protein methylation datasets, and empirical results demonstrate that it outperforms existing state-of-the-art methylation prediction methods.

AAAI Conference 2024 Conference Paper

Hyperspectral Image Reconstruction via Combinatorial Embedding of Cross-Channel Spatio-Spectral Clues

  • Xingxing Yang
  • Jie Chen
  • Zaifeng Yang

Existing learning-based hyperspectral reconstruction methods show limitations in fully exploiting the information among the hyperspectral bands. As such, we propose to investigate the chromatic inter-dependencies in their respective hyperspectral embedding space. These embedded features can be fully exploited by querying the inter-channel correlations in a combinatorial manner, with the unique and complementary information efficiently fused into the final prediction. We found such independent modeling and combinatorial excavation mechanisms are extremely beneficial to uncover marginal spectral features, especially in the long wavelength bands. In addition, we have proposed a spatio-spectral attention block and a spectrum-fusion attention module, which greatly facilitates the excavation and fusion of information at both semantically long-range levels and fine-grained pixel levels across all dimensions. Extensive quantitative and qualitative experiments show that our method (dubbed CESST) achieves SOTA performance. Code for this project is at: https://github.com/AlexYangxx/CESST.

AAAI Conference 2024 Conference Paper

Parallel Vertex Diffusion for Unified Visual Grounding

  • Zesen Cheng
  • Kehan Li
  • Peng Jin
  • Siheng Li
  • Xiangyang Ji
  • Li Yuan
  • Chang Liu
  • Jie Chen

Unified visual grounding (UVG) capitalizes on a wealth of task-related knowledge across various grounding tasks via one-shot training, which curtails retraining costs and task-specific architecture design efforts. Vertex generation-based UVG methods achieve this versatility by unified modeling object box and contour prediction and provide a text-powered interface to vast related multi-modal tasks, e.g., visual question answering and captioning. However, these methods typically generate vertexes sequentially through autoregression, which is prone to be trapped in error accumulation and heavy computation, especially for high-dimension sequence generation in complex scenarios. In this paper, we develop Parallel Vertex Diffusion (PVD) based on the parallelizability of diffusion models to accurately and efficiently generate vertexes in a parallel and scalable manner. Since the coordinates fluctuate greatly, it typically encounters slow convergence when training diffusion models without geometry constraints. Therefore, we consummate our PVD by two critical components, i.e., center anchor mechanism and angle summation loss, which serve to normalize coordinates and adopt a differentiable geometry descriptor from the point-in-polygon problem of computational geometry to constrain the overall difference of prediction and label vertexes. These innovative designs empower our PVD to demonstrate its superiority with state-of-the-art performance across various grounding tasks.

NeurIPS Conference 2024 Conference Paper

Parameterized Approximation Schemes for Fair-Range Clustering

  • Zhen Zhang
  • Xiaohong Chen
  • Limei Liu
  • Jie Chen
  • Junyu Huang
  • Qilong Feng

Fair-range clustering extends classical clustering formulations by associating each data point with one or more demographic labels. It imposes lower and upper bound constraints on the number of facilities opened for each label, ensuring fair representation of all demographic groups by the selected facilities. In this paper we focus on the fair-range $k$-median and $k$-means problems in Euclidean spaces. We give $(1+\varepsilon)$-approximation algorithms with fixed-parameter tractable running times for both problems, parameterized by the numbers of opened facilities and demographic labels. For Euclidean metrics, these are the first parameterized approximation schemes for the problems, improving upon the previously known $O(1)$-approximation ratios given by Thejaswi et al. (KDD 2022).

AAAI Conference 2024 Conference Paper

Practical Privacy-Preserving MLaaS: When Compressive Sensing Meets Generative Networks

  • Jia Wang
  • Wuqiang Su
  • Zushu Huang
  • Jie Chen
  • Chengwen Luo
  • Jianqiang Li

The Machine-Learning-as-a-Service (MLaaS) framework allows one to grab low-hanging fruit of machine learning techniques and data science, without either much expertise for this sophisticated sphere or provision of specific infrastructures. However, the requirement of revealing all training data to the service provider raises new concerns in terms of privacy leakage, storage consumption, efficiency, bandwidth, etc. In this paper, we propose a lightweight privacy-preserving MLaaS framework by combining Compressive Sensing (CS) and Generative Networks. It’s constructed on the favorable facts observed in recent works that general inference tasks could be fulfilled with generative networks and classifier trained on compressed measurements, since the generator could model the data distribution and capture discriminative information which are useful for classification. To improve the performance of the MLaaS framework, the supervised generative models of the server are trained and optimized with prior knowledge provided by the client. In order to prevent the service provider from recovering the original data as well as identifying the queried results, a noise-addition mechanism is designed and adopted into the compressed data domain. Empirical results confirmed its performance superiority in accuracy and resource consumption against the state-of-the-art privacy preserving MLaaS frameworks.

TMLR Journal 2024 Journal Article

SA-MLP: Distilling Graph Knowledge from GNNs into Structure-Aware MLP

  • Jie Chen
  • Mingyuan Bai
  • Shouzhen Chen
  • Junbin Gao
  • Junping Zhang
  • Jian Pu

The recursive node fetching and aggregation in message-passing cause inference latency when deploying Graph Neural Networks (GNNs) to large-scale graphs. One promising inference acceleration direction is to distill GNNs into message-passing-free student Multi-Layer Perceptrons (MLPs). However, the MLP student without graph dependency cannot fully learn the structure knowledge from GNNs, which causes inferior performance in heterophilic and online scenarios. To address this problem, we first design a simple yet effective Structure-Aware MLP (SA-MLP) as a student model. It utilizes linear layers as encoders and decoders to capture features and graph structures without message-passing among nodes. Furthermore, we introduce a novel structure-mixing knowledge distillation technique. It generates virtual samples imbued with a hybrid of structure knowledge from teacher GNNs, thereby enhancing the learning ability of MLPs for structure information. Extensive experiments on eight benchmark datasets under both transductive and online settings show that our SA-MLP can consistently achieve similar or even better results than teacher GNNs while maintaining as fast inference speed as MLPs. Our findings reveal that SA-MLP efficiently assimilates graph knowledge through distillation from GNNs in an end-to-end manner, eliminating the need for complex model architectures and preprocessing of features/structures. Our code is available at https://github.com/JC-202/SA-MLP.

AAAI Conference 2024 Conference Paper

Secure Distributed Sparse Gaussian Process Models Using Multi-Key Homomorphic Encryption

  • Adil Nawaz
  • Guopeng Chen
  • Muhammad Umair Raza
  • Zahid Iqbal
  • Jianqiang Li
  • Victor C.M. Leung
  • Jie Chen

Distributed sparse Gaussian process (dGP) models provide an ability to achieve accurate predictive performance using data from multiple devices in a time efficient and scalable manner. The distributed computation of model, however, risks exposure of privately owned data to public manipulation. In this paper we propose a secure solution for dGP regression models using multi-key homomorphic encryption. Experimental results show that with a little sacrifice in terms of time complexity, we achieve a secure dGP model without deteriorating the predictive performance compared to traditional non-secure dGP models. We also present a practical implementation of the proposed model using several Nvidia Jetson Nano Developer Kit modules to simulate a real-world scenario. Thus, secure dGP model plugs the data security issues of dGP and provide a secure and trustworthy solution for multiple devices to use privately owned data for model computation in a distributed environment availing speed, scalability and robustness of dGP.

EAAI Journal 2024 Journal Article

The extended weighted t-norms-based linear hybrid aggregation function and its application for aggregating improved basic uncertain linguistic information

  • Yi Yang
  • Mengqi Jie
  • Yuhan Zhao
  • Limei Liu
  • Junfeng Yang
  • Jie Chen

In recent decades, the Archimedean triangular norm (t-norm) and Archimedean triangular conorm (t-conorm) have been fundamental theories in the design and construction of information aggregation functions. The application of weighted Archimedean t-norm/t-conorm-based aggregation functions has been widely extended to uncertain information environments, such as fuzzy sets and linguistic term sets. However, these functions are susceptible to aggregation failure when dealing with information groups that contain extreme values, leading to unreasonable aggregation results. This paper aims to address the issue of aggregation failure by developing an extended Archimedean t-norm/t-conorm-based linear hybrid aggregation framework. Firstly, the extended weighted Archimedean t-norms and t-conorms that are suitable for processing linguistic term sets are proposed. Furthermore, an aggregation contribution function is introduced to evaluate the impact of both extreme and normal values on aggregated outcomes. This function also facilitates the identification of deficiencies within existing weighted Archimedean t-norm/t-conorm-based aggregation functions. Secondly, building upon the expanded Archimedean t-norm/t-conorm as the foundational framework, a linear weighted hybrid operator is developed by employing an extreme value identification function as guidance. The rationality of this operator is validated through the utilization of previously defined aggregation contribution function. Subsequently, to consolidate improved basic uncertain linguistic information (IBULI) pairs, the proposed hybrid operator is employed for constructing an IBULI-aggregation function. Finally, a product ranking method is developed by integrating the proposed operator and incorporating a user credibility calculation-based approach for converting ratings to IBULIs. The efficacy and rationality of the proposed approach is substantiated through a case study and comparative analysis of car ranking application.

TCS Journal 2024 Journal Article

Tightly secure (H)IBE in the random oracle model

  • Qiaohan Chu
  • Jie Chen

We present an adaptively secure identity-based encryption (IBE) scheme with constant sized parameters and constant security loss in multi-instance multi-ciphertext (MIMC) setting in prime-order groups, in the random oracle model. We then further construct an adaptively secure hierarchical identity-based encryption (HIBE) scheme with shorter sized parameters and constant security loss in the MIMC setting in prime-order groups, in the random oracle model. At the core of our technique, we observe that the pseudorandomness property of the random oracle matches the conditions of Left Subgroup Indistinguishability (LS) and Right Subgroup Indistinguishability (RS) in dual system groups (DSG) well, so that we can transform the queried secret keys and the challenge ciphertexts for once. We construct our IBE scheme from Agrawal and Chase's work (CCS 2017) and construct our HIBE scheme from Boneh et al. 's work (Eurocrypt 2005). As a byproduct of our IBE scheme, we construct a new DSG instantiation from Chen et al. 's work (Eurocrypt 2015) under bi-MDDH assumption.

ICML Conference 2023 Conference Paper

A Gromov-Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening

  • Yifan Chen 0004
  • Rentian Yao
  • Yun Yang
  • Jie Chen

Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov–Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https: //github. com/ychen-stat-ml/GW-Graph-Coarsening.

EAAI Journal 2023 Journal Article

Bi-level optimization of charging scheduling of a battery swap station based on deep reinforcement learning

  • Mao Tan
  • Zhuocen Dai
  • Yongxin Su
  • Caixue Chen
  • Ling Wang
  • Jie Chen

With the rapid increase of in the number of electric vehicle (EV), battery swapping is becoming a promising idea because of its short service waiting time. However, in the face of the uncertainty of the power grid and EV behavior, it is difficult to achieve a forward-looking and fast-response scheduling in a large scale battery swap station (BSS). A new bi-level scheduling model is proposed to solve this problem, in which the upper level is built on a deep reinforcement learning (DRL) framework to optimally allocate power among the chargers, and the lower level is modeled as a series of MILP subproblems for dispatching power among the batteries in a charger. A prediction module is included in the DRL framework improve the foresight of the algorithm, and a safety module is designed to avoid unsafe actions. Experimental results indicate that the proposed approach has excellent performance in large scale problem solving. It reduces the operating costs of the BSS significantly while satisfying the maximum power demand constraint. This is able to provide more economic benefits for the BSS and help peak shaving and valley filling for the power grid.

JBHI Journal 2023 Journal Article

BMAnet: Boundary Mining With Adversarial Learning for Semi-Supervised 2D Myocardial Infarction Segmentation

  • Chenchu Xu
  • Yifei Wang
  • Dong Zhang
  • Longfei Han
  • Yanping Zhang
  • Jie Chen
  • Shuo Li

Automatic segmentation of myocardial infarction (MI) regions in late gadolinium-enhanced cardiac magnetic resonance images is an essential step in the computed diagnosis of myocardial infarction. Most of the current myocardial infarction region segmentation methods are based on fully supervised deep learning. However, cardiologists' annotation of myocardial infarction regions in cardiac magnetic resonance images during the diagnosis process is time-consuming and expensive. This paper proposes a semi-supervised myocardial infarction segmentation. It consists of two models: 1) a boundary mining model and 2) an adversarial learning model. The boundary mining model can solve the boundary ambiguity problem by enlarging the gap between the foreground and background features, thus segmenting the myocardial infarction region accurately. The adversarial learning model can make the boundary mining model learn from additional unlabeled data by evaluating the segmentation performance and providing pseudo supervision, which significantly increases the robustness of the boundary mining model. We conduct extensive experiments on an in-house myocardial magnetic resonance dataset. The experimental results on six evaluation metrics demonstrate that our method achieves excellent results in myocardial infarction segmentation and outperforms the state-of-the-art semi-supervised methods.

ICML Conference 2023 Conference Paper

Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data

  • Yonggui Yan
  • Jie Chen
  • Pin-Yu Chen
  • Xiaodong Cui
  • Songtao Lu
  • Yangyang Xu

We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples per worker for each proximal update, which is important to achieve good generalization performance on training deep neural networks. With a smoothness condition on the expected loss function (but not on each sample function), the proposed methods can achieve an optimal sample complexity result to produce a near-stationary point. Numerical experiments on training neural networks demonstrate the significantly better generalization performance of our methods over large-batch training methods and momentum variance-reduction methods and also, the ability of handling heterogeneous data by the gradient tracking scheme.

EAAI Journal 2023 Journal Article

Deep ensemble learning for high-dimensional subsurface fluid flow modeling

  • Abouzar Choubineh
  • Jie Chen
  • David A. Wood
  • Frans Coenen
  • Fei Ma

The accuracy of Deep Learning (DL) algorithms can be improved by combining several deep learners into an ensemble. This avoids the continuous endeavor required to adjust the architecture of individual networks or the nature of the propagation. This study investigates prediction improvements possible using Deep Ensemble Learning (DEL) to determine four distinct multiscale basis functions in the mixed Generalized Multiscale Finite Element Method (GMsFEM), involving the permeability field as the only input. 376, 250 samples were initially generated, filtered down to 367, 811 after data pre-processing. A standard Convolutional Neural Network (CNN) named SkiplessCNN and three skip connection-based CNNs named FirstSkipCNN, MidSkipCNN, and DualSkipCNN were developed for the base learners. For each basis function, these four CNNs were combined into an ensemble model using linear regression and ridge regression, separately, as part of the stacking technique. A comparison of the coefficient of determination (R 2 ) and Mean Squared Error (MSE) confirms the effectiveness of all three skip connections in enhancing the performance of the standard CNN, with DualSkip being the most effective among them. Additionally, as evaluated on the testing subset, the combined models meaningfully outperform the individual models for all basis functions. The case that applies linear regression delivers R 2 ranging from 0. 8456 to 0. 9191 and MSE ranging from 0. 0092 to 0. 0369. The ridge regression case achieves marginally better predictions with R 2 ranging from 0. 8539 to 0. 922, and MSE ranging from 0. 009 to 0. 0349 because its solution involves more evenly distributed weights.

NeurIPS Conference 2023 Conference Paper

Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning

  • Yu Wang
  • Zhun Zhong
  • Pengchong Qiao
  • Xuxin Cheng
  • Xiawu Zheng
  • Chang Liu
  • Nicu Sebe
  • Rongrong Ji

Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task, aiming to classify unlabeled samples from both seen and novel classes using partially labeled samples from the seen classes. Previous works typically explore the relationship of samples as priors on the pre-defined single-granularity labels to help novel class recognition. In fact, classes follow a taxonomy and samples can be classified at multiple levels of granularity, which contains more underlying relationships for supervision. We thus argue that learning with single-granularity labels results in sub-optimal representation learning and inaccurate pseudo labels, especially with unknown classes. In this paper, we take the initiative to explore and propose a uniformed framework, called Taxonomic context prIors Discovering and Aligning (TIDA), which exploits the relationship of samples under various granularity. It allows us to discover multi-granularity semantic concepts as taxonomic context priors (i. e. , sub-class, target-class, and super-class), and then collaboratively leverage them to enhance representation learning and improve the quality of pseudo labels. Specifically, TIDA comprises two components: i) A taxonomic context discovery module that constructs a set of hierarchical prototypes in the latent space to discover the underlying taxonomic context priors; ii) A taxonomic context-based prediction alignment module that enforces consistency across hierarchical predictions to build the reliable relationship between classes among various granularity and provide additions supervision. We demonstrate that these two components are mutually beneficial for an effective OSSL framework, which is theoretically explained from the perspective of the EM algorithm. Extensive experiments on seven commonly used datasets show that TIDA can significantly improve the performance and achieve a new state of the art. The source codes are publicly available at https: //github. com/rain305f/TIDA.

YNIMG Journal 2023 Journal Article

Distinct brain state dynamics of native and second language processing during narrative listening in late bilinguals

  • Xiangrong Tang
  • Juan Zhang
  • Lanfang Liu
  • Menghan Yang
  • Shijie Li
  • Jie Chen
  • Yumeng Ma
  • Jia Zhang

The process of complex cognition, which includes language processing, is dynamic in nature and involves various network modes or cognitive modes. This dynamic process can be manifested by a set of brain states and transitions between them. Previous neuroimaging studies have shed light on how bilingual brains support native language (L1) and second language (L2) through a shared network. However, the mechanism through which this shared brain network enables L1 and L2 processing remains unknown. This study examined this issue by testing the hypothesis that L1 and L2 processing is associated with distinct brain state dynamics in terms of brain state integration and transition flexibility. A group of late Chinese-English bilinguals was scanned using functional magnetic resonance imaging (fMRI) while listening to eight short narratives in Chinese (L1) and English (L2). Brain state dynamics were modeled using the leading eigenvector dynamic analysis framework. The results show that L1 processing involves more integrated states and frequent transitions between integrated and segregated states, while L2 processing involves more segregated states and fewer transitions. Our work provides insight into the dynamic process of narrative listening comprehension in late bilinguals and sheds new light on the neural representation of language processing and related disorders.

IROS Conference 2023 Conference Paper

FISS+: Efficient and Focused Trajectory Generation and Refinement Using Fast Iterative Search and Sampling Strategy

  • Shuo Sun
  • Jie Chen
  • Jiawei Sun 0006
  • Chengran Yuan
  • Yuanchen Li
  • Tangyike Zhang
  • Marcelo H. Ang

Trajectory planning plays a crucial role in autonomous driving systems, as it is tasked to generate feasible trajectories under highly dynamic scenarios within the time constraint. This paper proposes a novel two-stage coarse-to-fine framework for efficient sampling-based trajectory planning. The proposed method is designed to iteratively generate new trajectory samples focused on the low-cost regions in the sampling space. Two trajectory exploration algorithms are well-designed for efficient search in discretized coarse global space and continuous fine local space, respectively. Experimental results on the first-of-its-kind planning benchmark tool CommonRoad show that our method significantly outperforms the baseline methods both in optimality and computational efficiency. Overall, our approach offers a promising solution for efficient and effective trajectory planning in more autonomous vehicle applications.

ICML Conference 2023 Conference Paper

GC-Flow: A Graph-Based Flow Network for Effective Clustering

  • Tianchun Wang
  • Farzaneh Mirzazadeh
  • Xiang Zhang 0001
  • Jie Chen

Graph convolutional networks (GCNs) are discriminative models that directly model the class posterior $p(y|\mathbf{x})$ for semi-supervised classification of graph data. While being effective, as a representation learning approach, the node representations extracted from a GCN often miss useful information for effective clustering, because the objectives are different. In this work, we design normalizing flows that replace GCN layers, leading to a generative model that models both the class conditional likelihood $p(\mathbf{x}|y)$ and the class prior $p(y)$. The resulting neural network, GC-Flow, retains the graph convolution operations while being equipped with a Gaussian mixture representation space. It enjoys two benefits: it not only maintains the predictive power of GCN, but also produces well-separated clusters, due to the structuring of the representation space. We demonstrate these benefits on a variety of benchmark data sets. Moreover, we show that additional parameterization, such as that on the adjacency matrix used for graph convolutions, yields additional improvement in clustering.

AAAI Conference 2023 Conference Paper

Learnable Blur Kernel for Single-Image Defocus Deblurring in the Wild

  • Jucai Zhai
  • Pengcheng Zeng
  • Chihao Ma
  • Jie Chen
  • Yong Zhao

Recent research showed that the dual-pixel sensor has made great progress in defocus map estimation and image defocus deblurring. However, extracting real-time dual-pixel views is troublesome and complex in algorithm deployment. Moreover, the deblurred image generated by the defocus deblurring network lacks high-frequency details, which is unsatisfactory in human perception. To overcome this issue, we propose a novel defocus deblurring method that uses the guidance of the defocus map to implement image deblurring. The proposed method consists of a learnable blur kernel to estimate the defocus map, which is an unsupervised method, and a single-image defocus deblurring generative adversarial network (DefocusGAN) for the first time. The proposed network can learn the deblurring of different regions and recover realistic details. We propose a defocus adversarial loss to guide this training process. Competitive experimental results confirm that with a learnable blur kernel, the generated defocus map can achieve results comparable to supervised methods. In the single-image defocus deblurring task, the proposed method achieves state-of-the-art results, especially significant improvements in perceptual quality, where PSNR reaches 25.56 dB and LPIPS reaches 0.111.

AAAI Conference 2023 Conference Paper

Proximal Stochastic Recursive Momentum Methods for Nonconvex Composite Decentralized Optimization

  • Gabriel Mancino-Ball
  • Shengnan Miao
  • Yangyang Xu
  • Jie Chen

Consider a network of N decentralized computing agents collaboratively solving a nonconvex stochastic composite problem. In this work, we propose a single-loop algorithm, called DEEPSTORM, that achieves optimal sample complexity for this setting. Unlike double-loop algorithms that require a large batch size to compute the (stochastic) gradient once in a while, DEEPSTORM uses a small batch size, creating advantages in occasions such as streaming data and online learning. This is the first method achieving optimal sample complexity for decentralized nonconvex stochastic composite problems, requiring O(1) batch size. We conduct convergence analysis for DEEPSTORM with both constant and diminishing step sizes. Additionally, under proper initialization and a small enough desired solution error, we show that DEEPSTORM with a constant step size achieves a network-independent sample complexity, with an additional linear speed-up with respect to N over centralized methods. All codes are made available at https://github.com/gmancino/DEEPSTORM.

IJCAI Conference 2023 Conference Paper

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

  • Peng Jin
  • Hao Li
  • Zesen Cheng
  • Jinfa Huang
  • Zhennan Wang
  • Li Yuan
  • Chang Liu
  • Jie Chen

Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with natural language descriptions. Current methods either fail to leverage the local details or are computationally expensive. What's worse, they fail to leverage the heterogeneous concepts in data. In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings. For disentangled conceptualization, we divide the coarse feature into multiple latent factors related to semantic concepts. For set-to-set alignment, where a set of visual concepts correspond to a set of textual concepts, we propose an adaptive pooling method to aggregate semantic concepts to address the partial matching. In particular, since we encode concepts independently in only a few dimensions, DiCoSA is superior at efficiency and granularity, ensuring fine-grained interactions using a similar computational complexity as coarse-grained alignment. Extensive experiments on five datasets, including MSR-VTT, LSMDC, MSVD, ActivityNet, and DiDeMo, demonstrate that our method outperforms the existing state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

TG-VQA: Ternary Game of Video Question Answering

  • Hao Li
  • Peng Jin
  • Zesen Cheng
  • Songyang Zhang
  • Kai Chen
  • Zhennan Wang
  • Chang Liu
  • Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them. However, since relying heavily on human instructions, i. e. , annotations or priors, current contrastive learning-based VideoQA methods remains challenging to perform fine-grained visual-linguistic alignments. In this work, we innovatively resort to game theory, which can simulate complicated relationships among multiple players with specific interaction strategies, e. g. , video, question, and answer as ternary players, to achieve fine-grained alignment for VideoQA task. Specifically, we carefully design a VideoQA-specific interaction strategy to tailor the characteristics of VideoQA, which can mathematically generate the fine-grained visual-linguistic alignment label without label-intensive efforts. Our TG-VQA outperforms existing state-of-the-art by a large margin (more than 5%) on long-term and short-term VideoQA datasets, verifying its effectiveness and generalization ability. Thanks to the guidance of game-theoretic interaction, our model impressively convergences well on limited data (10^4 videos), surpassing most of those pre-trained on large-scale data (10^7 videos).

IJCAI Conference 2023 Conference Paper

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

  • Zesen Cheng
  • Peng Jin
  • Hao Li
  • Kehan Li
  • Siheng Li
  • Xiangyang Ji
  • Chang Liu
  • Jie Chen

The top-down and bottom-up methods are two mainstreams of referring segmentation, while both methods have their own intrinsic weaknesses. Top-down methods are chiefly disturbed by Polar Negative (PN) errors owing to the lack of fine-grained cross-modal alignment. Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information. Nevertheless, we discover that two types of methods are highly complementary for restraining respective weaknesses but the direct average combination leads to harmful interference. In this context, we build Win-win Cooperation (WiCo) to exploit complementary nature of two types of methods on both interaction and integration aspects for achieving a win-win improvement. For the interaction aspect, Complementary Feature Interaction (CFI) introduces prior object information to bottom-up branch and provides fine-grained information to top-down branch for complementary feature enhancement. For the integration aspect, Gaussian Scoring Integration (GSI) models the gaussian performance distributions of two branches and weighted integrates results by sampling confident scores from the distributions. With our WiCo, several prominent bottom-up and top-down combinations achieve remarkable improvements on three common datasets with reasonable extra costs, which justifies effectiveness and generality of our method.

NeurIPS Conference 2022 Conference Paper

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

  • Peng Jin
  • Jinfa Huang
  • Fenglin Liu
  • Xian Wu
  • Shen Ge
  • Guoli Song
  • David Clifton
  • Jie Chen

Most video-and-language representation learning approaches employ contrastive learning, e. g. , CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive experiments on three benchmark text-video retrieval datasets prove that our EMCL can learn more discriminative video-and-language representations than previous methods, and significantly outperform previous state-of-the-art methods across all metrics. More encouragingly, the proposed method can be applied to boost the performance of existing approaches either as a jointly training layer or an out-of-the-box inference module with no extra training, making it easy to be incorporated into any existing methods.

ICLR Conference 2022 Conference Paper

Graph-Augmented Normalizing Flows for Anomaly Detection of Multiple Time Series

  • Enyan Dai
  • Jie Chen

Anomaly detection is a widely studied task for a broad variety of data types; among them, multiple time series appear frequently in applications, including for example, power grids and traffic networks. Detecting anomalies for multiple time series, however, is a challenging subject, owing to the intricate interdependencies among the constituent series. We hypothesize that anomalies occur in low density regions of a distribution and explore the use of normalizing flows for unsupervised anomaly detection, because of their superior quality in density estimation. Moreover, we propose a novel flow model by imposing a Bayesian network among constituent series. A Bayesian network is a directed acyclic graph (DAG) that models causal relationships; it factorizes the joint probability of the series into the product of easy-to-evaluate conditional probabilities. We call such a graph-augmented normalizing flow approach GANF and propose joint estimation of the DAG with flow parameters. We conduct extensive experiments on real-world datasets and demonstrate the effectiveness of GANF for density estimation, anomaly detection, and identification of time series distribution drift.

JBHI Journal 2022 Journal Article

MDAN: Mirror Difference Aware Network for Brain Stroke Lesion Segmentation

  • Qiqi Bao
  • Shiyu Mi
  • Bowen Gang
  • Wenming Yang
  • Jie Chen
  • Qingmin Liao

Brain stroke lesion segmentation is of great importance for stroke rehabilitation neuroimaging analysis. Due to the large variance of stroke lesion shapes and similarities of tissue intensity distribution, it remains a challenging task. To help detect abnormalities, the anatomical symmetries of brain magnetic resonance (MR) images have been widely used as visual cues for clinical practices. However, most methods for brain images segmentation do not fully utilize structural symmetry information. This paper presents a novel mirror difference aware network (MDAN) for stroke lesion segmentation. The network uses an encoder-decoder architecture, aiming at holistically exploiting the symmetries of image features. Specifically, a differential feature augmentation (DFA) module is developed in the encoding path to highlight the semantically pathological asymmetries of features in abnormalities. In the DFA module, a Siamese contrastive supervised loss is designed to enhance discriminative features, and a mirror position-based difference augmentation (MDA) module is used to further magnify the discrepancy. Moreover, mirror feature fusion (MFF) modules are applied to efficiently fuse and transfer the information both of the original input and the horizontally flipped features to the decoding path. Extensive experiments on the Anatomical Tracings of Lesions After Stroke (ATLAS) dataset show the proposed MDAN outperforms the state-of-the-art methods.

EAAI Journal 2022 Journal Article

Multi-node load forecasting based on multi-task learning with modal feature extraction

  • Mao Tan
  • Chenglin Hu
  • Jie Chen
  • Ling Wang
  • Zhengmao Li

Accurate multi-node load forecasting is the key to the safe, reliable, and economical operation of the power system. However, the dynamic nature of load and the coupling nature of networks are difficult to extract, making consistent and accurate forecasting of node load rather difficult. In this regard, this paper proposes a soft sharing multi-task deep learning method for multi-node load forecasting in the power system. It has the following aspects: (1) Considering the coupling characteristics of the node network, a multi-modal feature module, based on the inception strategy and gated temporal convolutional network (GTCN), is firstly designed to explore the coupling features implied in the node load data. (2) A novel multi-objective neural network model is proposed to achieve simultaneous prediction of multi-node load by integrating the multi-modal feature module and gated recurrent unit (GRU). For sharing the learning information of sub-networks, this paper uses the soft sharing mechanism to capture load features, which can better optimize the prediction task for each node load simultaneously. Load data from the New Zealand distribution network and AEMO are used to compare the proposed model’s performance in various scenarios using regression metrics such as mean absolute percentage error (MAPE), Weighted Mean Accuracy (WMA), root mean squared logarithmic error (RMSLE), and Diebold–Mariano (DM). The simulation results show that the proposed method can explore the spatial–temporal coupling characteristics in multi-node load data. Compared with existing state-of-the-art multi-node load prediction methods, our proposed method’s MAPE decrease 17. 04% and 3. 92% in Non-aggregation and Aggregation situations.

JBHI Journal 2022 Journal Article

Stroke Risk Prediction With Hybrid Deep Transfer Learning Framework

  • Jie Chen
  • Yingru Chen
  • Jianqiang Li
  • Jia Wang
  • Zijie Lin
  • Asoke K. Nandi

Stroke has become a leading cause of death and long-term disability in the world with no effective treatment. Deep learning-based approaches have the potential to outperform existing stroke risk prediction models, but they rely on large well-labeled data. Due to the strict privacy protection policy in health-care systems, stroke data is usually distributed among different hospitals in small pieces. In addition, the positive and negative instances of such data are extremely imbalanced. Transfer learning can solve small data issue by exploiting the knowledge of a correlated domain, especially when multiple source of data are available. In this work, we propose a novel Hybrid Deep Transfer Learning-based Stroke Risk Prediction (HDTL-SRP) scheme to exploit the knowledge structure from multiple correlated sources (i. e. , external stroke data, chronic diseases data, such as hypertension and diabetes). The proposed framework has been extensively tested in synthetic and real-world scenarios, and it outperforms the state-of-the-art stroke risk prediction models. It also shows the potential of real-world deployment among multiple hospitals aided with 5 G/B5G infrastructures.

YNICL Journal 2022 Journal Article

Thrombus magnetic susceptibility is associated with recanalization and clinical outcome in patients with ischemic stroke

  • Jie Chen
  • Zhe Zhang
  • Ximing Nie
  • Yuyuan Xu
  • Chunlei Liu
  • Xingquan Zhao
  • Zhongrong Miao
  • Yongjun Wang

In acute ischemic stroke patients with large vessel occlusion, the characteristics of the occluding thrombus on neuroimaging may be associated with recanalization after endovascular thrombectomy (EVT); however, the relationship between magnetic susceptibility of thrombus and clinical outcome remains unclear. We utilized quantitative susceptibility mapping (QSM) MRI to assess the magnetic susceptibility of thrombus in acute ischemic stroke patients undergoing EVT, and to evaluate its relationship with recanalization and functional outcomes. Patients with documented intracranial artery occlusion were consecutively recruited from one research center of the RESCUE-RE study (a registration study for Critical Care of Acute Ischemic Stroke After Recanalization). All the recruited patients underwent a 3D multi-echo MRI scan on a 3.0 T scanner for both susceptibility-weighted imaging (SWI) and QSM quantification of the thrombus. Among 61 patients included in the analyses, 51 (75.0 %) patients achieved thrombolysis in cerebral infarction (TICI) 2b/3 and 22 (36.1 %) patients had favorable functional outcomes. Successful recanalization was significantly associated with a higher thrombus magnetic susceptibility mean value (0.27 ± 0.09 vs 0.20 ± 0.09 ppm, p = 0.020) and lower coefficient of variation (0.42 ± 0.12 vs 0.52 ± 0.19, p = 0.024). ROC curve analysis showed the optimal cutoff value for thrombus susceptibility for predicting good clinical outcomes was 0.25 ppm (sensitivity 86.4 %, specificity 69.2 %). In multivariable logistic regression analyses, increased thrombus magnetic susceptibility was independently and significantly associated with good functional outcomes (adjusted odds ratio 15.11 [95 % confidence interval 2.64-86.46], p = 0.002). This study demonstrated that the increased thrombus magnetic susceptibility is associated with successful recanalization and favorable functional outcomes for intracranial artery occluded stroke patients.

NeurIPS Conference 2021 Conference Paper

CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

  • Tao Sheng
  • Jie Chen
  • Zhouhui Lian

Scene text detection remains a grand challenge due to the variation in text curvatures, orientations, and aspect ratios. One of the hardest problems in this task is how to represent text instances of arbitrary shapes. Although many methods have been proposed to model irregular texts in a flexible manner, most of them lose simplicity and robustness. Their complicated post-processings and the regression under Dirac delta distribution undermine the detection performance and the generalization ability. In this paper, we propose an efficient text instance representation named CentripetalText (CT), which decomposes text instances into the combination of text kernels and centripetal shifts. Specifically, we utilize the centripetal shifts to implement pixel aggregation, guiding the external text pixels to the internal text kernels. The relaxation operation is integrated into the dense regression for centripetal shifts, allowing the correct prediction in a range instead of a specific value. The convenient reconstruction of text contours and the tolerance of prediction errors in our method guarantee the high detection accuracy and the fast inference speed, respectively. Besides, we shrink our text detector into a proposal generation module, namely CentripetalText Proposal Network (CPN), replacing Segmentation Proposal Network (SPN) in Mask TextSpotter v3 and producing more accurate proposals. To validate the effectiveness of our method, we conduct experiments on several commonly used scene text benchmarks, including both curved and multi-oriented text datasets. For the task of scene text detection, our approach achieves superior or competitive performance compared to other existing methods, e. g. , F-measure of 86. 3% at 40. 0 FPS on Total-Text, F-measure of 86. 1% at 34. 8 FPS on MSRA-TD500, etc. For the task of end-to-end scene text recognition, our method outperforms Mask TextSpotter v3 by 1. 1% in F-measure on Total-Text.

NeurIPS Conference 2021 Conference Paper

CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks

  • Ruchir Puri
  • David Kung
  • Geert Janssen
  • Wei Zhang
  • Giacomo Domeniconi
  • Vladimir Zolotov
  • Julian T Dolby
  • Jie Chen

Over the last several decades, software has been woven into the fabric of every aspect of our society. As software development surges and code infrastructure of enterprise applications ages, it is now more critical than ever to increase software development productivity and modernize legacy applications. Advances in deep learning and machine learning algorithms have enabled breakthroughs in computer vision, speech recognition, natural language processing and beyond, motivating researchers to leverage AI techniques to improve software development efficiency. Thus, the fast-emerging research area of “AI for Code” has garnered new interest and gathered momentum. In this paper, we present a large-scale dataset \textit{CodeNet}, consisting of over 14 million code samples and about 500 million lines of code in 55 different programming languages, which is aimed at teaching AI to code. In addition to its large scale, CodeNet has a rich set of high-quality annotations to benchmark and help accelerate research in AI techniques for a variety of critical coding tasks, including code similarity and classification, code translation between a large variety of programming languages, and code performance (runtime and memory) improvement techniques. Additionally, CodeNet provides sample input and output test sets for 98. 5\% of the code samples, which can be used as an oracle for determining code correctness and potentially guide reinforcement learning for code quality improvements. As a usability feature, we provide several pre-processing tools in CodeNet to transform source code into representations that can be readily used as inputs into machine learning models. Results of code classification and code similarity experiments using the CodeNet dataset are provided as a reference. We hope that the scale, diversity and rich, high-quality annotations of CodeNet will offer unprecedented research opportunities at the intersection of AI and Software Engineering.

UAI Conference 2021 Conference Paper

Dynamic visualization for L1 fusion convex clustering in near-linear time

  • Bingyuan Zhang
  • Jie Chen
  • Yoshikazu Terada

Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.

IJCAI Conference 2021 Conference Paper

Graph Universal Adversarial Attacks: A Few Bad Actors Ruin Graph Learning Models

  • Xiao Zang
  • Yi Xie
  • Jie Chen
  • Bo Yuan

Deep neural networks, while generalize well, are known to be sensitive to small adversarial perturbations. This phenomenon poses severe security threat and calls for in-depth investigation of the robustness of deep learning models. With the emergence of neural networks for graph structured data, similar investigations are urged to understand their robustness. It has been found that adversarially perturbing the graph structure and/or node features may result in a significant degradation of the model performance. In this work, we show from a different angle that such fragility similarly occurs if the graph contains a few bad-actor nodes, which compromise a trained graph neural network through flipping the connections to any targeted victim. Worse, the bad actors found for one graph model severely compromise other models as well. We call the bad actors ``anchor nodes'' and propose an algorithm, named GUA, to identify them. Thorough empirical investigations suggest an interesting finding that the anchor nodes often belong to the same class; and they also corroborate the intuitive trade-off between the number of anchor nodes and the attack success rate. For the dataset Cora which contains 2708 nodes, as few as six anchor nodes will result in an attack success rate higher than 80% for GCN and other three models.

TCS Journal 2021 Journal Article

On parameterized algorithms for fixed-order book thickness with respect to the pathwidth of the vertex ordering

  • Yunlong Liu
  • Jie Chen
  • Jingui Huang
  • Jianxin Wang

Given a graph G = ( V, E ) with a vertex ordering ≺, the fixed-order book thickness problem asks whether there is a page assignment σ such that 〈 ≺, σ 〉 is a k-page book embedding of G. This problem is NP-complete even for any fixed k greater than 3. Recently, Bhore et al. (2019, 2020) [1, 2] presented a parameterized algorithm with respect to the pathwidth κ of the vertex ordering. In this paper, we first re-analyze the running time for Bhore et al. 's algorithm, and prove a bound of 2 O ( κ 2 ) ⋅ | V | improving Bhore et al. 's bound of κ O ( κ 2 ) ⋅ | V |. Then, we show that fixed-order book thickness parameterized by the pathwidth of the vertex ordering does not admit a polynomial kernel unless NP ⊆ coNP/poly. Finally, we show that a generalized fixed-order book thickness problem, in which a budget of at most c crossings over all pages was given, admits a parameterized algorithm running in time ( c + 2 ) O ( κ 2 ) ⋅ | V |.

IJCAI Conference 2021 Conference Paper

RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

  • Dongming Yang
  • Yuexian Zou
  • Can Zhang
  • Meng Cao
  • Jie Chen

Human-Object Interaction (HOI) detection devotes to learn how humans interact with surrounding objects. Latest end-to-end HOI detectors are short of relation reasoning, which leads to inability to learn HOI-specific interactive semantics for predictions. In this paper, we therefore propose novel relation reasoning for HOI detection. We first present a progressive Relation-aware Frame, which brings a new structure and parameter sharing pattern for interaction inference. Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions. Based on modules above, we construct an end-to-end trainable framework named Relation Reasoning Network (abbr. RR-Net). Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5. 5% and 9. 8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.

AAAI Conference 2021 Conference Paper

Unsupervised Learning of Graph Hierarchical Abstractions with Differentiable Coarsening and Optimal Transport

  • Tengfei Ma
  • Jie Chen

Hierarchical abstractions are a methodology for solving large-scale graph problems in various disciplines. Coarsening is one such approach: it generates a pyramid of graphs whereby the one in the next level is a structural summary of the prior one. With a long history in scientific computing, many coarsening strategies were developed based on mathematically driven heuristics. Recently, resurgent interests exist in deep learning to design hierarchical methods learnable through differentiable parameterization. These approaches are paired with downstream tasks for supervised learning. In practice, however, supervised signals (e. g. , labels) are scarce and are often laborious to obtain. In this work, we propose an unsupervised approach, coined OTCOARSEN- ING, with the use of optimal transport. Both the coarsening matrix and the transport cost matrix are parameterized, so that an optimal coarsening strategy can be learned and tailored for a given set of graphs without use of labels. We demonstrate that the proposed approach produces meaningful coarse graphs and yields competitive performance compared with supervised methods for graph classification and regression.

TCS Journal 2020 Journal Article

A post-quantum hybrid encryption based on QC-LDPC codes in the multi-user setting

  • Luping Wang
  • Jie Chen
  • Kai Zhang
  • Haifeng Qian

The encryption schemes based on coding theory are one of the most accredited choices in post-quantum scenario, where QC-LDPC codes are usually employed to construct concrete schemes due to the well security and good efficiency. In this work, we introduce a new IND-CCA secure multi-instance framework for code-based hybrid encryption primitive in the random oracle model, which is derived from our new multi-instance KEM and DEM building modules. We note that previous multi-instance KEM and DEM are usually derived from single-instance KEM and DEM, and hence suffers from large parameter sizes and security loss. Nevertheless, our multi-instance KEM is a direct construction based on a key generation function and a one-way trapdoor function, and our multi-instance DEM is constructed from a standard DEM and MAC with a tag in the input to achieve a tighter security loss. Finally, we present a IND-CCA secure multi-instance hybrid encryption scheme based on QC-LDPC codes in the random oracle model, where the scheme achieves small private key size and only consumes addition and multiplication operations over F 2 [ x ].

JBHI Journal 2020 Journal Article

Automatic Medical Code Assignment via Deep Learning Approach for Intelligent Healthcare

  • Fei Teng
  • Zheng Ma
  • Jie Chen
  • Ming Xiao
  • Lufei Huang

With the development of healthcare 4. 0, there has been an explosion in the amount of data such as image, medical text, physiological signals, lab tests, etc. Among them, medical records provide a complete picture of the associated clinical events. However, the processing of medical texts is difficult because they are structurally free, diverse in style, and have subjective factors. Assigning metadata codes from the International Classification of Diseases (ICD) presents a standardized way of indicating diagnoses and procedures, so it becomes a mandatory process for understanding medical records to make better clinical and financial decisions. Such a manual encoding task is time-consuming, error-prone and expensive. In this paper, we proposed a deep learning approach and a medical topic mining method to automatically predict ICD codes from text-free medical records. The result of the F1 score on Medical Information Mart for Intensive Care (MIMIC-III) dataset increases by 5% over the state of art. It also suitable for multiple ICD versions and languages. For the specific disease, atrial fibrillation, the F1 score is up to 96% and 93. 3% using in-house ICD-10 datasets and MIMIC-III datasets, respectively. We developed an Artificial Intelligence based coding system, which can greatly improve the efficiency and accuracy of human coders, and meanwhile accelerate the secondary use for clinical informatics.

AAAI Conference 2020 Conference Paper

CAG: A Real-Time Low-Cost Enhanced-Robustness High-Transferability Content-Aware Adversarial Attack Generator

  • Huy Phan
  • Yi Xie
  • Siyu Liao
  • Jie Chen
  • Bo Yuan

Deep neural networks (DNNs) are vulnerable to adversarial attack despite their tremendous success in many artificial intelligence fields. Adversarial attack is a method that causes the intended misclassfication by adding imperceptible perturbations to legitimate inputs. To date, researchers have developed numerous types of adversarial attack methods. However, from the perspective of practical deployment, these methods suffer from several drawbacks such as long attack generating time, high memory cost, insufficient robustness and low transferability. To address the drawbacks, we propose a Content-aware Adversarial Attack Generator (CAG) to achieve real-time, low-cost, enhanced-robustness and hightransferability adversarial attack. First, as a type of generative model-based attack, CAG shows significant speedup (at least 500 times) in generating adversarial examples compared to the state-of-the-art attacks such as PGD and C&W. Furthermore, CAG only needs a single generative model to perform targeted attack to any targeted class. Because CAG encodes the label information into a trainable embedding layer, it differs from prior generative model-based adversarial attacks that use n different copies of generative models for n different targeted classes. As a result, CAG significantly reduces the required memory cost for generating adversarial examples. Moreover, CAG can generate adversarial perturbations that focus on the critical areas of input by integrating the class activation maps information in the training process, and hence improve the robustness of CAG attack against the state-of-art adversarial defenses. In addition, CAG exhibits high transferability across different DNN classifier models in black-box attack scenario by introducing random dropout in the process of generating perturbations. Extensive experiments on different datasets and DNN models have verified the realtime, low-cost, enhanced-robustness, and high-transferability benefits of CAG.

AAAI Conference 2020 Conference Paper

Embedding Compression with Isotropic Iterative Quantization

  • Siyu Liao
  • Jie Chen
  • Yanzhi Wang
  • Qinru Qiu
  • Bo Yuan

Continuous representation of words is a standard component in deep learning-based NLP models. However, representing a large vocabulary requires significant memory, which can cause problems, particularly on resource-constrained platforms. Therefore, in this paper we propose an isotropic iterative quantization (IIQ) approach for compressing embedding vectors into binary ones, leveraging the iterative quantization technique well established for image retrieval, while satisfying the desired isotropic property of PMI based models. Experiments with pre-trained embeddings (i. e. , GloVe and HDC) demonstrate a more than thirty-fold compression ratio with comparable and sometimes even improved performance over the original real-valued embedding vectors.

AAAI Conference 2020 Conference Paper

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

  • Aldo Pareja
  • Giacomo Domeniconi
  • Jie Chen
  • Tengfei Ma
  • Toyotaro Suzumura
  • Hiroki Kanezashi
  • Tim Kaler
  • Tao Schardl

Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at https: //github. com/IBM/EvolveGCN.

NeurIPS Conference 2020 Conference Paper

Online Convex Optimization Over Erdos-Renyi Random Networks

  • Jinlong Lei
  • Peng Yi
  • Yiguang Hong
  • Jie Chen
  • Guodong Shi

The work studies how node-to-node communications over an Erd\H{o}s-R\'enyi random network influence distributed online convex optimization, which is vital in solving large-scale machine learning in antagonistic or changing environments. At per step, each node (computing unit) makes a local decision, experiences a loss evaluated with a convex function, and communicates the decision with other nodes over a network. The node-to-node communications are described by the Erd\H{o}s-R\'enyi rule, where independently each link takes place with a probability $p$ over a prescribed connected graph. The objective is to minimize the system-wide loss accumulated over a finite time horizon. We consider standard distributed gradient descents with full gradients, one-point bandits and two-points bandits for convex and strongly convex losses, respectively. We establish how the regret bounds scale with respect to time horizon $T$, network size $N$, decision dimension $d$, and an algebraic network connectivity. The regret bounds scaling with respect to $T$ match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problems, e. g. , $\mathcal{O}(\sqrt{T}) $ and $\mathcal{O}(\ln(T)) $ regrets are established for convex and strongly convex losses with full gradient feedback and two-points information, respectively. For classical Erd\H{o}s-R\'enyi networks over all-to-all possible node communications, the regret scalings with respect to the probability $p$ are analytically established, based on which the tradeoff between the communication overhead and computation accuracy is clearly demonstrated. Numerical studies have validated the theoretical findings.

AAAI Conference 2020 Conference Paper

Online Planner Selection with Graph Neural Networks and Adaptive Scheduling

  • Tengfei Ma
  • Patrick Ferber
  • Siyu Huo
  • Jie Chen
  • Michael Katz

Automated planning is one of the foundational areas of AI. Since no single planner can work well for all tasks and domains, portfolio-based techniques have become increasingly popular in recent years. In particular, deep learning emerges as a promising methodology for online planner selection. Owing to the recent development of structural graph representations of planning tasks, we propose a graph neural network (GNN) approach to selecting candidate planners. GNNs are advantageous over a straightforward alternative, the convolutional neural networks, in that they are invariant to node permutations and that they incorporate node labels for better inference. Additionally, for cost-optimal planning, we propose a twostage adaptive scheduling method to further improve the likelihood that a given task is solved in time. The scheduler may switch at halftime to a different planner, conditioned on the observed performance of the first one. Experimental results validate the effectiveness of the proposed method against strong baselines, both deep learning and non-deep learning based. The code is available at https: //github. com/matenure/GNN planner.

AAAI Conference 2020 Conference Paper

Scalable Variational Bayesian Kernel Selection for Sparse Gaussian Process Regression

  • Tong Teng
  • Jie Chen
  • Yehong Zhang
  • Bryan Kian Hsiang Low

This paper presents a variational Bayesian kernel selection (VBKS) algorithm for sparse Gaussian process regression (SGPR) models. In contrast to existing GP kernel selection algorithms that aim to select only one kernel with the highest model evidence, our VBKS algorithm considers the kernel as a random variable and learns its belief from data such that the uncertainty of the kernel can be interpreted and exploited to avoid overconfident GP predictions. To achieve this, we represent the probabilistic kernel as an additional variational variable in a variational inference (VI) framework for SGPR models where its posterior belief is learned together with that of the other variational variables (i. e. , inducing variables and kernel hyperparameters). In particular, we transform the discrete kernel belief into a continuous parametric distribution via reparameterization in order to apply VI. Though it is computationally challenging to jointly optimize a large number of hyperparameters due to many kernels being evaluated simultaneously by our VBKS algorithm, we show that the variational lower bound of the log-marginal likelihood can be decomposed into an additive form such that each additive term depends only on a disjoint subset of the variational variables and can thus be optimized independently. Stochastic optimization is then used to maximize the variational lower bound by iteratively improving the variational approximation of the exact posterior belief via stochastic gradient ascent, which incurs constant time per iteration and hence scales to big data. We empirically evaluate the performance of our VBKS algorithm on synthetic and massive real-world datasets.

AAAI Conference 2019 Conference Paper

A Sequential Set Generation Method for Predicting Set-Valued Outputs

  • Tian Gao
  • Jie Chen
  • Vijil Chenthamarakshan
  • Michael Witbrock

Consider a general machine learning setting where the output is a set of labels or sequences. This output set is unordered and its size varies with the input. Whereas multi-label classification methods seem a natural first resort, they are not readily applicable to set-valued outputs because of the growth rate of the output space; and because conventional sequence generation doesn’t reflect sets’ order-free nature. In this paper, we propose a unified framework—sequential set generation (SSG)—that can handle output sets of labels and sequences. SSG is a meta-algorithm that leverages any probabilistic learning method for label or sequence prediction, but employs a proper regularization such that a new label or sequence is generated repeatedly until the full set is produced. Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences. We perform experiments with both benchmark and synthetic data sets and demonstrate SSG’s strong performance over baseline methods.

NeurIPS Conference 2019 Conference Paper

Adaptively Aligned Image Captioning via Adaptive Attention Time

  • Lun Huang
  • Wenmin Wang
  • Yaxian Xia
  • Jie Chen

Recent neural models for image captioning usually employ an encoder-decoder framework with an attention mechanism. However, the attention mechanism in such a framework aligns one single (attended) image feature vector to one caption word, assuming one-to-one mapping from source image regions and target caption words, which is never possible. In this paper, we propose a novel attention model, namely Adaptive Attention Time (AAT), to align the source and the target adaptively for image captioning. AAT allows the framework to learn how many attention steps to take to output a caption word at each decoding step. With AAT, an image region can be mapped to an arbitrary number of caption words while a caption word can also attend to an arbitrary number of image regions. AAT is deterministic and differentiable, and doesn't introduce any noise to the parameter gradients. In this paper, we empirically show that AAT improves over state-of-the-art methods on the task of image captioning. Code is available at https: //github. com/husthuaan/AAT.

TCS Journal 2019 Journal Article

Efficient public key encryption with equality test in the standard model

  • Kai Zhang
  • Jie Chen
  • Hyung Tae Lee
  • Haifeng Qian
  • Huaxiong Wang

Public key encryption with equality test (PKEET) is a special kind of public encryption scheme (PKE) that allows a tester to perform equality tests on ciphertexts generated by different public keys as well as the same public key. This feature enables us to apply PKEET to various scenarios in practice, such as efficient data management on encrypted databases and spam filtering in encrypted email systems. From these reasons, since Yang et al. [1] first proposed the concept of PKEET, there have been proposed many PKEET schemes to improve efficiency or to enhance functionalities. However, to the best of our knowledge, almost all existing schemes were presented under assuming the existence of random oracles, except for generic construction proposed by Lee et al. On the other hand, their generic approach for PKEET employs a 2-level hierarchical identity-based encryption and a strongly unforgeable one-time signature, which suffers from low efficiency. In this paper, we propose an efficient PKEET scheme under a specific cryptographic assumption in the standard model. To this end, we first encrypt a message and its hash value in a parallel way by following the recently proposed strategy. Then, to prevent adaptive chosen ciphertext attacks (CCA2), we give a link between them by adapting the technique which was originally proposed for identity-based encryption and previously exploited to design efficient CCA2-secure PKE schemes. We show that our proposed construction satisfies formal security requirements for PKEET under the decisional bilinear Diffie–Hellman (DBDH) assumption in the standard model. As a result, we obtain a new PKEET scheme which has shorter ciphertext and trapdoor sizes, and improves computational costs for encryption, decryption, and test algorithms, by about 60%, 77%, and 66%, respectively, compared to a PKEET instantiation obtained by the prior generic framework.

YNIMG Journal 2019 Journal Article

Genetic contribution to the phenotypic correlation between trait impulsivity and resting-state functional connectivity of the amygdala and its subregions

  • Dang Zheng
  • Jie Chen
  • Xiaoming Wang
  • Yuan Zhou

Trait impulsivity, a predisposition to respond to stimuli without regard for the potentially negative consequences, contributes to many maladaptive behaviors. Studies have shown that both genetic factors and interregional functional interactions underlie trait impulsivity. However, whether common genes contribute to both trait impulsivity and its neural basis is still unknown. This study investigated the phenotypic correlations between trait impulsivity and the resting-state functional connectivity (rsFC) of the amygdala as well as its subregions and the genetic contribution to the phenotypic correlations. By recruiting a sample of 292 twins in late adolescence and young adulthood, we found that trait impulsivity was positively correlated with the rsFC between the left full amygdala and the right dorsolateral prefrontal cortex (DLPFC). Further analyses on the subregions of the amygdala showed that trait impulsivity was positively correlated with the rsFCs between the left basolateral (BL) amygdala and both the right DLPFC and the right inferior frontal gyrus and with the rsFCs between the right superficial (SF) amygdala and both the dorsal anterior cingulate cortex and right anterior insula. Bivariate genetic modelling analyses found genetic overlaps between trait impulsivity and the rsFC of the left full amygdala or the left BL amygdala with the right DLPFC. The proportions of phenotypic associations accounted for by overlapping genes were 82% and 60%, respectively. These results provide evidence for the genetic overlap between trait impulsivity and the intrinsic brain functional connectivity centered at the amygdala and especially at its BL subregion.

TCS Journal 2019 Journal Article

Public key encryption with equality test via hash proof system

  • Ming Zeng
  • Jie Chen
  • Kai Zhang
  • Haifeng Qian

Public key encryption with equality test (PKEET) allows a tester to know whether ciphertexts are the encryptions of a same message or not by using the trapdoors issued from their owners, which is a useful cryptographic primitive can be deployed in many applications, such as in the mechanism of searching over encrypted data. Based on Hash Proof System (HPS) introduced by Cramer and Shoup, this paper presents an oversimplified paradigm for constructing PKEET in the standard model. Compared with the previous works that use identity-based encryption, strongly unforgeable one-time signature or other strong cryptographic primitives, our paradigm requires only the universal2 property of HPS and provides an efficient way to obtain concrete PKEET schemes based on different assumptions in the standard model, since HPS has been shown can be easily realized from a board range of NP languages (e. g. , DLIN-based, DCR-based, Lattice-based and so on). Moreover, to demonstrate the practicality of the proposed paradigm, we instantiate it based on two kinds of NP languages respectively, one is based on the decisional Diffie-Hellman (DDH) assumption, the other one is based on the decisional composite residuosity (DCR) assumption, which results in the first concrete PKEET schemes that in the standard model without using pairing operations, and the schemes' security are also based on the standard DDH assumption and the standard DCR assumption respectively.

AAAI Conference 2018 Conference Paper

A Cascaded Inception of Inception Network With Attention Modulated Feature Fusion for Human Pose Estimation

  • Wentao Liu
  • Jie Chen
  • Cheng Li
  • Chen Qian
  • Xiao Chu
  • Xiaolin Hu

Accurate keypoint localization of human pose needs diversified features: the high level for contextual dependencies and the low level for detailed refinement of joints. However, the importance of the two factors varies from case to case, but how to efficiently use the features is still an open problem. Existing methods have limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modeling the human perception process. This paper presents three novel techniques step by step to efficiently utilize different levels of features for human pose estimation. Firstly, an inception of inception (IOI) block is designed to emphasize the low level features. Secondly, an attention mechanism is proposed to adjust the importance of individual levels according to the context. Thirdly, a cascaded network is proposed to sequentially localize the joints to enforce message passing from joints of stand-alone parts like head and torso to remote joints like wrist or ankle. Experimental results demonstrate that the proposed method achieves the state-of-the-art performance on both MPII and LSP benchmarks.

NeurIPS Conference 2018 Conference Paper

Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders

  • Tengfei Ma
  • Jie Chen
  • Cao Xiao

Deep generative models have achieved remarkable success in various data domains, including images, time series, and natural languages. There remain, however, substantial challenges for combinatorial structures, including graphs. One of the key challenges lies in the difficulty of ensuring semantic validity in context. For example, in molecular graphs, the number of bonding-electron pairs must not exceed the valence of an atom; whereas in protein interaction networks, two proteins may be connected only when they belong to the same or correlated gene ontology terms. These constraints are not easy to be incorporated into a generative model. In this work, we propose a regularization framework for variational autoencoders as a step toward semantic validity. We focus on the matrix representation of graphs and formulate penalty terms that regularize the output distribution of the decoder to encourage the satisfaction of validity constraints. Experimental results confirm a much higher likelihood of sampling valid graphs in our approach, compared with others reported in the literature.

EAAI Journal 2017 Journal Article

A new Self-Organizing Extreme Learning Machine soft sensor model and its applications in complicated chemical processes

  • Zhiqiang Geng
  • Jungen Dong
  • Jie Chen
  • Yongming Han

The control of product quality of complex chemical processes strictly depends on the measure of the key process variables. However, the online measure device is extremely expensive, and these devices are hard to protect. Meanwhile, there is a delay for these online measure devices. Therefore, the soft sensor technology plays a vital role in measuring the key process variables. Extreme Learning Machine (ELM) is an efficient and simple single layer feed-forward neural networks (SLFNs) to building an exact soft sensor model. However, unsuitable selected hidden nodes and random parameters will greatly affect the performance of the ELM. Therefore, this paper proposes a novel Self-Organizing Extreme Learning Machine (SOELM) algorithm constructed by the biological neuron-glia interaction principle to solve the issue of the ELM. Firstly, the weights between input layer nodes and the CNS are tuned iteratively by the Hebbian learning rule. Then the network structure is adjusted self-organizing by Mutual Information (MI) among different structures of networks. Secondly, the weights between the CNS and output layer nodes are obtained by the ELM. The experimental results based on different UCI data sets prove that the SOELM has a better generalization capability and stability than that of the ELM. Moreover, our proposed method is developed as a soft sensor model for accurately predicting the key variables of the Purified Terephthalic Acid (PTA) process.

JMLR Journal 2017 Journal Article

Hierarchically Compositional Kernels for Scalable Nonparametric Learning

  • Jie Chen
  • Haim Avron
  • Vikas Sindhwani

We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nyström method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off- diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from $O(n^2)$ and $O(n^3)$ to $O(nr)$ and $O(nr^2)$, respectively, where $n$ is the number of training examples and $r$ is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller $r$. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

NeurIPS Conference 2017 Conference Paper

Solving Most Systems of Random Quadratic Equations

  • Gang Wang
  • Georgios Giannakis
  • Yousef Saad
  • Jie Chen

This paper deals with finding an $n$-dimensional solution $\bm{x}$ to a system of quadratic equations $y_i=|\langle\bm{a}_i, \bm{x}\rangle|^2$, $1\le i \le m$, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a \emph{weighted maximal correlation initialization} obtainable with a few power iterations, followed by successive refinements based on \emph{iteratively reweighted gradient-type iterations}. The novel techniques distinguish themselves from prior works by the inclusion of a fresh (re)weighting regularization. For certain random measurement models, the proposed procedure returns the true solution $\bm{x}$ with high probability in time proportional to reading the data $\{(\bm{a}_i; y_i)\}_{1\le i \le m}$, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns, that is, $m\ge cn$. Empirically, the upshots of this contribution are: i) perfect signal recovery in the high-dimensional regime given only an \emph{information-theoretic limit number} of equations; and, ii) (near-)optimal statistical accuracy in the presence of additive noise. Extensive numerical tests using both synthetic data and real images corroborate its improved signal recovery performance and computational efficiency relative to state-of-the-art approaches.

AAAI Conference 2015 Conference Paper

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

  • Kian Hsiang Low
  • Jiangbo Yu
  • Jie Chen
  • Patrick Jaillet

The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rankcum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.

TCS Journal 2014 Journal Article

Doubly spatial encryption from DBDH

  • Jie Chen
  • Hoeteck Wee

Functional encryption is an emerging paradigm for public-key encryption which enables fine-grained control of access to encrypted data. Doubly-spatial encryption (DSE) captures all functionalities that we know how to realize via pairings-based assumptions, including (H)IBE, IPE, NIPE, CP-ABE and KP-ABE. In this paper, we propose a construction of DSE from the decisional bilinear Diffie–Hellman (DBDH) assumption. This also yields the first non-zero inner product encryption (NIPE) scheme based on DBDH. Quite surprisingly, we know how to realize NIPE and DSE from stronger assumptions in bilinear groups but not from the basic DBDH assumption. Along the way, we present a novel algebraic characterization of no instances for the DSE functionality, which we use crucially in the proof of security.

AAAI Conference 2014 Conference Paper

GP-Localize: Persistent Mobile Robot Localization Using Online Sparse Gaussian Process Observation Model

  • Nuo Xu
  • Kian Hsiang Low
  • Jie Chen
  • Keng Kiat Lim
  • Etkin Ozgul

Central to robot exploration and mapping is the task of persistent localization in environmental fields characterized by spatially correlated measurements. This paper presents a Gaussian process localization (GP-Localize) algorithm that, in contrast to existing works, can exploit the spatially correlated field measurements taken during a robot’s exploration (instead of relying on prior training data) for efficiently and scalably learning the GP observation model online through our proposed novel online sparse GP. As a result, GP-Localize is capable of achieving constant time and memory (i. e. , independent of the size of the data) per filtering step, which demonstrates the practical feasibility of using GPs for persistent robot localization and autonomy. Empirical evaluation via simulated experiments with real-world datasets and a real robot experiment shows that GP-Localize outperforms existing GP localization algorithms.

AAMAS Conference 2012 Conference Paper

Decentralized Active Robotic Exploration and Mapping for Probabilistic Field Classification in Environmental Sensing

  • Kian Hsiang Low
  • Jie Chen
  • John Dolan
  • Steve Chien
  • David Thompson

A central problem in environmental sensing and monitoring is to classify/label the hotspots in a large-scale environmental field. This paper presents a novel \emph{decentralized active robotic exploration} (DARE) strategy for probabilistic classification/labeling of hotspots in a \emph{Gaussian process} (GP)-based field. In contrast to existing state-of-the-art exploration strategies for learning environmental field maps, the time needed to solve the DARE strategy is independent of the map resolution and the number of robots, thus making it practical for in situ, real-time active sampling. Its exploration behavior exhibits an interesting formal trade-off between that of boundary tracking until the hotspot region boundary can be accurately predicted and wide-area coverage to find new boundaries in sparsely sampled areas to be tracked. We provide a theoretical guarantee on the active exploration performance of the DARE strategy: under reasonable conditional independence assumption, we prove that it can optimally achieve two formal cost-minimizing exploration objectives based on the misclassification and entropy criteria. Importantly, this result implies that the uncertainty of labeling the hotspots in a GP-based field is greatest at or close to the hotspot region boundaries. Empirical evaluation on real-world plankton density and temperature field data shows that, subject to limited observations, DARE strategy can achieve more superior classification of hotspots and time efficiency than state-of-the-art active exploration strategies.

IJCAI Conference 2011 Conference Paper

Learning Compact Visual Descriptor for Low Bit Rate Mobile Landmark Search

  • Rongrong Ji
  • Ling-Yu Duan
  • Jie Chen
  • Hongxun Yao
  • Tiejun Huang
  • Wen Gao

In this paper, we propose to extract a compact yet discriminative visual descriptor directly on the mobile device, which tackles the wireless query transmission latency in mobile landmark search. This descriptor is offline learnt from the location contexts of geo-tagged Web photos from both Flickr and Panoramio with two phrases: First, we segment the landmark photo collections into discrete geographical regions using a Gaussian Mixture Model [Stauffer et al. , 2000]. Second, a ranking sensitive vocabulary boosting is introduced to learn a compact codebook within each region. To tackle the locally optimal descriptor learning caused by imprecise geographical segmentation, we further iterate above phrases by feedback an "entropy" based descriptor compactness into a prior distribution to constrain the Gaussian mixture modeling. Consequently, when entering a specific geographical region, the codebook in the mobile device is downstream adapted, which ensures efficient extraction of compact descriptor, its low bit rate transmission, as well as promising discrimination ability. We deploy our descriptor within both HTC and iPhone mobile phones, testing landmark search in typical areas included Beijing, New York, and Barcelona containing one million images. Our learning descriptor outperforms alternative compact descriptors [Chen et al. , 2009][Chen et al. , 2010][Chandrasekhar et al. , 2009a][Chandrasekhar et al. , 2009b] with a large margin.

JMLR Journal 2009 Journal Article

Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

  • Jie Chen
  • Haw-ren Fang
  • Yousef Saad

Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact k NN graph takes Θ( dn 2 ) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate k NN graph in Θ( dn t ) time for high dimensional data (large d ). The exponent t ∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building k NN graphs. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )