Arrow Research search

Author name cluster

Chen Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Tracing the Heart’s Pathways: ECG Representation Learning from a Cardiac Conduction Perspective

  • Tan Pan
  • Yixuan Sun
  • Chen Jiang
  • Qiong Gao
  • Rui Sun
  • Xingmeng Zhang
  • Zhenqi Yang
  • LIMEI HAN

The multi-lead electrocardiogram (ECG) stands as a cornerstone of cardiac diagnosis. Recent strides in electrocardiogram self-supervised learning (eSSL) have brightened prospects for enhancing representation learning without relying on high-quality annotations. Yet earlier eSSL methods suffer a key limitation: they focus on consistent patterns across leads and beats, overlooking the inherent differences in heartbeats rooted in cardiac conduction processes, while subtle but significant variations carry unique physiological signatures. Moreover, representation learning for ECG analysis should align with ECG diagnostic guidelines, which progress from individual heartbeats to single leads and ultimately to lead combinations. This sequential logic, however, is often neglected when applying pre-trained models to downstream tasks. To address these gaps, we propose CLEAR-HUG, a two-stage framework designed to capture subtle variations in cardiac conduction across leads while adhering to ECG diagnostic guidelines. In the first stage, we introduce an eSSL model termed Conduction-LEAd Reconstructor (CLEAR), which captures both specific variations and general commonalities across heartbeats. Treating each heartbeat as a distinct entity, CLEAR employs a simple yet effective sparse attention mechanism to reconstruct signals without interference from other heartbeats. In the second stage, we implement a Hierarchical lead-Unified Group head (HUG) for disease diagnosis, mirroring clinical workflow. Experimental results across six tasks show a 6.84% improvement, validating the effectiveness of CLEAR-HUG. This highlights its ability to enhance representations of cardiac conduction and align patterns with expert diagnostic guidelines.

ICLR Conference 2025 Conference Paper

Brain Bandit: A Biologically Grounded Neural Network for Efficient Control of Exploration

  • Chen Jiang
  • Jiahui An
  • Yating Liu
  • Ni Ji

How to balance between exploration and exploitation in an uncertain environment is a central challenge in reinforcement learning. In contrast, humans and animals have demonstrated superior exploration efficiency in novel environments. To understand how the brain’s neural network controls exploration under uncertainty, we analyzed the dynamical systems model of a biological neural network that controls explore-exploit decisions during foraging. Mathematically, this model (named the Brain Bandit Net, or BBN) is a special type of stochastic continuous Hopfield network. We show through theory and simulation that BBN can perform posterior sampling of action values with a tunable bias towards or against uncertain options. We then demonstrate that, in multi-armed bandit (MAB) tasks, BBN can generate probabilistic choice behavior with a flexible uncertainty bias resembling human and animal choice patterns. In addition to its high efficiency in MAB tasks, BBN can also be embedded with reinforcement learning algorithms to accelerate learning in MDP tasks. Altogether, our findings reveal the theoretical foundation for efficient exploration in biological neural networks and propose a general, brain-inspired algorithm for enhancing exploration in RL.

NeurIPS Conference 2025 Conference Paper

ChromFound: Towards A Universal Foundation Model for Single-Cell Chromatin Accessibiltiy Data

  • Yifeng Jiao
  • Yuchen Liu
  • Yu Zhang
  • Xin Guo
  • Yushuai Wu
  • Chen Jiang
  • Jiyang Li
  • Hongwei Zhang

The advent of single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) offers an innovative perspective for deciphering regulatory mechanisms by assembling a vast repository of single-cell chromatin accessibility data. While foundation models have achieved significant success in single-cell transcriptomics, there is currently no foundation model for scATAC-seq that supports zero-shot high-quality cell identification and comprehensive multi-omics analysis simultaneously. Key challenges lie in the high dimensionality and sparsity of scATAC-seq data, as well as the lack of a standardized schema for representing open chromatin regions (OCRs). Here, we present ChromFound, a foundation model tailored for scATAC-seq. ChromFound utilizes a hybrid architecture and genome-aware tokenization to effectively capture genome-wide long contexts and regulatory signals from dynamic chromatin landscapes. Pretrained on 1. 97 million cells from 30 tissues and 6 disease conditions, ChromFound demonstrates broad applicability across 6 diverse tasks. Notably, it achieves robust zero-shot performance in generating universal cell representations and exhibits excellent transferability in cell type annotation and cross-omics prediction. By uncovering enhancer-gene links undetected by existing computational methods, ChromFound offers a promising framework for understanding disease risk variants in the noncoding genome. The implementation of ChromFound is available via https: //github. com/JohnsonKlose/ChromFound.

IROS Conference 2025 Conference Paper

Interpreting Behaviors and Geometric Constraints as Knowledge Graphs for Robot Manipulation Control

  • Chen Jiang
  • Allie Wang
  • Martin Jägersand

In this paper, we investigate the feasibility of using knowledge graphs to interpret actions and behaviors for robot manipulation control. Equipped with an uncalibrated visual servoing controller, we propose to use robot knowledge graphs to unify behavior trees and geometric constraints, conceptualizing robot manipulation control as semantic events. The robot knowledge graphs not only preserve the advantages of behavior trees in scripting actions and behaviors, but also offer additional benefits of mapping natural interactions between concepts and events, which enable knowledgeable explanations of the manipulation contexts. Through real-world evaluations, we demonstrate the flexibility of the robot knowledge graphs to support explainable robot manipulation control.

NeurIPS Conference 2025 Conference Paper

Minimal Semantic Sufficiency Meets Unsupervised Domain Generalization

  • Tan Pan
  • Kaiyu Guo
  • Dongli Xu
  • Zhaorui Tan
  • Chen Jiang
  • Deshu Chen
  • Xin Guo
  • Brian Lovell

The generalization ability of deep learning has been extensively studied in supervised settings, yet it remains less explored in unsupervised scenarios. Recently, the Unsupervised Domain Generalization (UDG) task has been proposed to enhance the generalization of models trained with prevalent unsupervised learning techniques, such as Self-Supervised Learning (SSL). UDG confronts the challenge of distinguishing semantics from variations without category labels. Although some recent methods have employed domain labels to tackle this issue, such domain labels are often unavailable in real-world contexts. In this paper, we address these limitations by formalizing UDG as the task of learning a Minimal Sufficient Semantic Representation: a representation that (i) preserves all semantic information shared across augmented views (sufficiency), and (ii) maximally removes information irrelevant to semantics (minimality). We theoretically ground these objectives from the perspective of information theory, demonstrating that optimizing representations to achieve sufficiency and minimality directly reduces out-of-distribution risk. Practically, we implement this optimization through Minimal-Sufficient UDG (MS-UDG), a learnable model by integrating (a) an InfoNCE-based objective to achieve sufficiency; (b) two complementary components to promote minimality: a novel semantic-variation disentanglement loss and a reconstruction-based mechanism for capturing adequate variation. Empirically, MS-UDG sets a new state-of-the-art on popular unsupervised domain-generalization benchmarks, consistently outperforming existing SSL and UDG methods, without category or domain labels during representation learning.

ICRA Conference 2025 Conference Paper

Point and Go: Intuitive Reference Frame Reallocation in Mode Switching for Assistive Robotics

  • Allie Wang
  • Chen Jiang
  • Michael Przystupa
  • Justin Valentine
  • Martin Jägersand

Operating high degree of freedom robots can be difficult for users of wheelchair mounted robotic manipulators. Mode switching in Cartesian space has several drawbacks such as unintuitive control reference frames, separate translation and orientation control, and limited movement capabilities that hinder performance. We propose Point and Go mode switching, which reallocates the Cartesian mode switching reference frames into a more intuitive action space comprised of new translation and rotation modes. We use a novel sweeping motion to point the gripper, which defines the new translation axis along the robot base frame's horizontal plane. This creates an intuitive ‘point and go’ translation mode that allows the user to easily perform complex, human-like movements without switching control modes. The system's rotation mode combines position control with a refined endeffector oriented frame that provides precise and consistent robot actions in various end-effector poses. We verified its effectiveness through initial experiments, followed by a three-task user study that compared our method to Cartesian mode switching and a state of the art learning method. Results show that Point and Go mode switching reduced completion times by 31%, pauses by 41%, and mode switches by 33%, while receiving significantly favorable responses in user surveys.

ICRA Conference 2025 Conference Paper

Robot Manipulation in Salient Vision Through Referring Image Segmentation and Geometric Constraints

  • Chen Jiang
  • Allie Wang
  • Martin Jägersand

In this paper, we perform robot manipulation activities in real-world environments with language contexts by integrating a compact referring image segmentation model into the robot's perception module. First, we propose CLIPU 2 Net, a lightweight referring image segmentation model designed for fine-grain boundary and structure segmentation from language expressions. Then, we deploy the model in an eye-in-hand visual servoing system to enact robot control in the real world. The key to our system is the representation of salient visual information as geometric constraints, linking the robot's visual perception to actionable commands. Experimental results on 46 real-world robot manipulation tasks demonstrate that our method outperforms traditional visual servoing methods relying on labor-intensive feature annotations, excels in fine-grain referring image segmentation with a compact decoder size of 6. 6 MB, and supports robot control across diverse contexts.

AAAI Conference 2024 Conference Paper

BBScore: A Brownian Bridge Based Metric for Assessing Text Coherence

  • Zhecheng Sheng
  • Tianhao Zhang
  • Chen Jiang
  • Dongyeop Kang

Measuring the coherence of text is a vital aspect of evaluating the quality of written content. Recent advancements in neural coherence modeling have demonstrated their efficacy in capturing entity coreference and discourse relations, thereby enhancing coherence evaluation. However, many existing methods heavily depend on static embeddings or focus narrowly on nearby context, constraining their capacity to measure the overarching coherence of long texts. In this paper, we posit that coherent texts inherently manifest a sequential and cohesive interplay among sentences, effectively conveying the central theme, purpose, or standpoint. To explore this abstract relationship, we introduce the "BB Score," a novel reference-free metric grounded in Brownian bridge theory for assessing text coherence. Our findings showcase that when synergized with a simple additional classification component, this metric attains a performance level comparable to state-of-the-art techniques on standard artificial discrimination tasks. We also establish in downstream tasks that this metric effectively differentiates between human-written documents and text generated by large language models within specific domains. Furthermore, we illustrate the efficacy of this approach in detecting written styles attributed to various large language models, underscoring its potential for generalizability. In summary, we present a novel Brownian bridge coherence metric capable of measuring both local and global text coherence, while circumventing the need for end-to-end model training. This flexibility allows for its application in various downstream tasks.

ICRA Conference 2024 Conference Paper

CLIPUNetr: Assisting Human-robot Interface for Uncalibrated Visual Servoing Control with CLIP-driven Referring Expression Segmentation

  • Chen Jiang
  • Yuchen Yang
  • Martin Jägersand

The classical human-robot interface in uncalibrated image-based visual servoing (UIBVS) relies on either human annotations or semantic segmentation with categorical labels. Both methods fail to match natural human communication and convey rich semantics in manipulation tasks as effectively as natural language expressions. In this paper, we tackle this problem by using referring expression segmentation, which is a prompt-based approach, to provide more in-depth information for robot perception. To generate high-quality segmentation predictions from referring expressions, we propose CLIPUNetr - a new CLIP-driven referring expression segmentation network. CLIPUNetr leverages CLIP’s strong vision-language representations to segment regions from referring expressions, while utilizing its "U-shaped" encoder-decoder architecture to generate predictions with sharper boundaries and finer structures. Furthermore, we propose a new pipeline to integrate CLIPUNetr into UIBVS and apply it to control robots in real-world environments. In experiments, our method improves boundary and structure measurements by an average of 120% and can successfully assist real-world UIBVS control in an unstructured manipulation environment.

IROS Conference 2024 Conference Paper

EVSMap: An Efficient Volumetric-Semantic Mapping Approach for Embedded Systems

  • Jiyuan Qiu
  • Chen Jiang
  • Pengfei Zhang
  • Haowen Wang

Despite significant progress in perception tasks such as 3D scene mapping and semantic information extraction using SLAM and deep learning, applying these techniques within computationally constrained embedded systems remains a challenge. In this work, we introduce a novel end-to-end framework for efficient and real-time volumetric-semantic mapping. We have developed a lightweight and robust RGB-D segmentation network for extracting semantic information. Through the introduction of three distinct modules—CFIM, DAPPF, and LAD—our network significantly enhances real-time performance while achieving Mean Intersection over Union (MIoU) scores comparable to state-of-the-art (SOTA) models. Our model reduces the parameters by 8 to 26 times compared to similar networks and improves inference speed by 2 to 3 times. Additionally, we improved a multi-class bayesian updating strategy by refining penalty function to reduce the memory size of the semantic map and enhance the mapping speed. Compared with other volumetric-semantic mapping approaches, our work maintains the same level of detail in semantic information representation, while increasing mapping speed by 1. 3 to 9. 6 times and reducing memory size of the map by up to 2. 6 times. Finally, we applied our work to real-world mobile robot exploration scenarios, demonstrating the efficiency of the proposed framework.

AAAI Conference 2023 Conference Paper

TransVCL: Attention-Enhanced Video Copy Localization Network with Flexible Supervision

  • Sifeng He
  • Yue He
  • Minlong Lu
  • Chen Jiang
  • Xudong Yang
  • Feng Qian
  • Xiaobo Zhang
  • Lei Yang

Video copy localization aims to precisely localize all the copied segments within a pair of untrimmed videos in video retrieval applications. Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints. In this paper, we propose TransVCL: an attention-enhanced video copy localization network, which is optimized directly from initial frame-level features and trained end-to-end with three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for similarity matrix generation, and a temporal alignment module for copied segments localization. In contrast to previous methods demanding the handcrafted similarity matrix, TransVCL incorporates long-range temporal information between feature sequence pair using self- and cross- attention layers. With the joint design and optimization of three components, the similarity matrix can be learned to present more discriminative copied patterns, leading to significant improvements over previous methods on segment-level labeled datasets (VCSL and VCDB). Besides the state-of-the-art performance in fully supervised setting, the attention architecture facilitates TransVCL to further exploit unlabeled or simply video-level labeled data. Additional experiments of supplementing video-level labeled datasets including SVD and FIVR reveal the high flexibility of TransVCL from full supervision to semi-supervision (with or without video-level annotation). Code is publicly available at https://github.com/transvcl/TransVCL.

ICRA Conference 2022 Conference Paper

Unsupervised Depth Completion and Denoising for RGB-D Sensors

  • Lei Fan 0005
  • Yunxuan Li
  • Chen Jiang
  • Ying Wu 0001

Depth information is considered valuable as it describes geometric structures, which benefits various robotic tasks. However, the depth acquired by RGB-D sensors still suffers from two deficiencies, i. e. , incompletion and noises. Previous methods complete depth by exploring hand-tuned models or raising surface assumptions, while nowadays, deep approaches intend to solve this problem with rendered image pairs. For depth denoising, as a consequence of different sensor mechanisms, most methods can only work under specific devices. With existing methods, three challenges emerge: the onerous training set collecting process, the mismatch between existing models and present RGB-D sensors, and the non-real-time computation. In this paper, we first state depth completion and denoising are inherently different and without the need to collect or render complete and noiseless ground truths. We address all mentioned challenges with two separate un-supervised learning procedures. The completion network takes color and incomplete depth as input and predicts values to the unobserved area, which combines prior knowledge and color-depth correlations. The denoising step exploits image sequences to construct noise models in a self-supervised manner with the ability to cater to different sensors. Experimental comparisons and ablation studies demonstrate that even without human-labeled ground truths, the proposed method could produce better completion results and also reduce noises in real-time.

AAAI Conference 2020 System Paper

Automatic Car Damage Assessment System: Reading and Understanding Videos as Professional Insurance Inspectors

  • Wei Zhang
  • Yuan Cheng
  • Xin Guo
  • Qingpei Guo
  • Jian Wang
  • Qing Wang
  • Chen Jiang
  • Meng Wang

We demonstrate a car damage assessment system in car insurance field based on artificial intelligence techniques, which can exempt insurance inspectors from checking cars on site and help people without professional knowledge to evaluate car damages when accidents happen. Unlike existing approaches, we utilize videos instead of photos to interact with users to make the whole procedure as simple as possible. We adopt object and video detection and segmentation techniques in computer vision, and take advantage of multiple frames extracted from videos to achieve high damage recognition accuracy. The system uploads video streams captured by mobile devices, recognizes car damage on the cloud asynchronously and then returns damaged components and repair costs to users. The system evaluates car damages and returns results automatically and effectively in seconds, which reduces laboratory costs and decreases insurance claim time significantly.

IROS Conference 2020 Conference Paper

Understanding Contexts Inside Robot and Human Manipulation Tasks through Vision-Language Model and Ontology System in Video Streams

  • Chen Jiang
  • Masood Dehghan
  • Martin Jägersand

Manipulation tasks in daily life, such as pouring water, unfold through human intentions. Being able to process contextual knowledge from these Activities of Daily Living (ADLs) over time can help us understand manipulation intentions, which are essential for an intelligent robot to transition smoothly between various manipulation actions. In this paper, to model the intended concepts of manipulation, we present a vision dataset under a strictly constrained knowledge domain for both robot and human manipulations, where manipulation concepts and relations are stored by an ontology system in a taxonomic manner. Furthermore, we propose a scheme to generate a combination of visual attentions and an evolving knowledge graph filled with commonsense knowledge. Our scheme works with real-world camera streams and fuses an attention-based Vision-Language model with the ontology system. The experimental results demonstrate that the proposed scheme can successfully represent the evolution of an intended object manipulation procedure for both robots and humans. The proposed scheme allows the robot to mimic human-like intentional behaviors by watching real-time videos. We aim to develop this scheme further for real-world robot intelligence in Human-Robot Interaction.

ICRA Conference 2019 Conference Paper

Video Object Segmentation using Teacher-Student Adaptation in a Human Robot Interaction (HRI) Setting

  • Mennatullah Siam
  • Chen Jiang
  • Steven Weikai Lu
  • Laura Petrich
  • Mahmoud Gamal
  • Mohamed Elhoseiny
  • Martin Jägersand

Video object segmentation is an essential task in robot manipulation to facilitate grasping and learning affordances. Incremental learning is important for robotics in unstructured environments. Inspired by the children learning process, human robot interaction (HRI) can be utilized to teach robots about the world guided by humans similar to how children learn from a parent or a teacher. A human teacher can show potential objects of interest to the robot, which is able to self adapt to the teaching signal without providing manual segmentation labels. We propose a novel teacher-student learning paradigm to teach robots about their surrounding environment. A two-stream motion and appearance “teacher” network provides pseudo-labels to adapt an appearance “student” network. The student network is able to segment the newly learned objects in other scenes, whether they are static or in motion. We also introduce a carefully designed dataset that serves the proposed HRI setup, denoted as (I)nteractive (V)ideo (O)bject (S)egmentation. Our IVOS dataset contains teaching videos of different objects, and manipulation tasks. Our proposed adaptation method outperforms the state-of-theart on DAVIS and FBMS with 6. 8% and 1. 2% in F-measure respectively. It improves over the baseline on IVOS dataset with 46. 1% and 25. 9% in mIoU.