Author name cluster

Xuejin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

JBHI Journal 2025 Journal Article

BioSAM: Generating SAM Prompts From Superpixel Graph for Biological Instance Segmentation

Miaomiao Cai
Xiaoyu Liu
Zhiwei Xiong
Xuejin Chen

Proposal-free instance segmentation methods have significantly advanced the field of biological image analysis. Recently, the Segment Anything Model (SAM) has shown an extraordinary ability to handle challenging instance boundaries. However, directly applying SAM to biological images that contain instances with complex morphologies and dense distributions fails to yield satisfactory results. In this work, we propose BioSAM, a new biological instance segmentation framework generating SAM prompts from a superpixel graph. Specifically, to avoid over-merging, we first generate sufficient superpixels as graph nodes and construct an initialized graph. We then generate initial prompts from each superpixel and aggregate them through a graph neural network (GNN) by predicting the relationship of superpixels to avoid over-segmentation. We employ the SAM encoder embeddings and the SAM-assisted superpixel similarity as new features for the graph to enhance its discrimination capability. With the graph-based prompt aggregation, we utilize the aggregated prompts in SAM to refine the segmentation and generate more accurate instance boundaries. Comprehensive experiments on four representative biological datasets demonstrate that our proposed method outperforms state-of-the-art methods.

Details DOI

ICML Conference 2024 Conference Paper

GaussianPro: 3D Gaussian Splatting with Progressive Propagation

Kai Cheng
Xiaoxiao Long
Kaizhi Yang
Yao Yao 0008
Wei Yin 0006
Yuexin Ma
Wenping Wang 0001
Xuejin Chen

3D Gaussian Splatting (3DGS) has recently revolutionized the field of neural rendering with its high fidelity and efficiency. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling large-scale scenes that unavoidably contain texture-less surfaces, SfM techniques fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classic multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and utilizes patch matching to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method. Our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1. 15dB in terms of PSNR. Codes and data are available at https: //github. com/kcheng1021/GaussianPro.

Details

AAAI Conference 2024 Conference Paper

Learning Multimodal Volumetric Features for Large-Scale Neuron Tracing

Qihua Chen
Xuejin Chen
Chenxuan Wang
Yixiong Liu
Zhiwei Xiong
Feng Wu

The current neuron reconstruction pipeline for electron microscopy (EM) data usually includes automatic image segmentation followed by extensive human expert proofreading. In this work, we aim to reduce human workload by predicting connectivity between over-segmented neuron pieces, taking both microscopy image and 3D morphology features into account, similar to human proofreading workflow. To this end, we first construct a dataset, named FlyTracing, that contains millions of pairwise connections of segments expanding the whole fly brain, which is three orders of magnitude larger than existing datasets for neuron segment connection. To learn sophisticated biological imaging features from the connectivity annotations, we propose a novel connectivity-aware contrastive learning method to generate dense volumetric EM image embedding. The learned embeddings can be easily incorporated with any point or voxel-based morphological representations for automatic neuron tracing. Extensive comparisons of different combination schemes of image and morphological representation in identifying split errors across the whole fly brain demonstrate the superiority of the proposed approach, especially for the locations that contain severe imaging artifacts, such as section missing and misalignment. The dataset and code are available at https://github.com/Levishery/Flywire-Neuron-Tracing.

PDF Details DOI

ICLR Conference 2024 Conference Paper

MovingParts: Motion-based 3D Part Discovery in Dynamic Radiance Field

Kaizhi Yang
Xiaoshuai Zhang
Zhiao Huang
Xuejin Chen
Zexiang Xu
Hao Su 0001

We present MovingParts, a NeRF-based method for dynamic scene reconstruction and part discovery. We consider motion as an important cue for identifying parts, that all particles on the same part share the common motion pattern. From the perspective of fluid simulation, existing deformation-based methods for dynamic NeRF can be seen as parameterizing the scene motion under the Eulerian view, i.e., focusing on specific locations in space through which the fluid flows as time passes. However, it is intractable to extract the motion of constituting objects or parts using the Eulerian view representation. In this work, we introduce the dual Lagrangian view and enforce representations under the Eulerian/Lagrangian views to be cycle-consistent. Under the Lagrangian view, we parameterize the scene motion by tracking the trajectory of particles on objects. The Lagrangian view makes it convenient to discover parts by factorizing the scene motion as a composition of part-level rigid motions. Experimentally, our method can achieve fast and high-quality dynamic scene reconstruction from even a single moving camera, and the induced part-based representation allows direct applications of part tracking, animation, 3D scene editing, etc.

Details

NeurIPS Conference 2024 Conference Paper

Slot-VLM: Object-Event Slots for Video-Language Modeling

Jiaqi Xu
Cuiling Lan
Wenxuan Xie
Xuejin Chen
Yan Lu

Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an effective method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a new framework designed to generate semantically decomposed video tokens, in terms of object-wise and event-wise visual representations, to facilitate LLM inference. Particularly, we design an Object-Event Slots module, i. e. , OE-Slots, that adaptively aggregates the dense video tokens from the vision encoder to a set of representative slots. In order to take into account both the spatial object details and the varied temporal dynamics, we build OE-Slots with two branches: the Object-Slots branch and the Event-Slots branch. The Object-Slots branch focuses on extracting object-centric slots from features of high spatial resolution but low frame sample rate, emphasizing detailed object information. The Event-Slots branch is engineered to learn event-centric slots from high temporal sample rate but low spatial resolution features. These complementary slots are combined to form the vision context, serving as the input to the LLM for effective video reasoning. Our experimental results demonstrate the effectiveness of our Slot-VLM, which achieves the state-of-the-art performance on video question-answering.

PDF Details DOI

ICLR Conference 2024 Conference Paper

UC-NERF: Neural Radiance Field for Under-Calibrated Multi-View Cameras in Autonomous Driving

Kai Cheng
Xiaoxiao Long
Wei Yin 0006
Jin Wang 0001
Zhiqiang Wu 0001
Yuexin Ma
Kaixuan Wang
Xiaozhi Chen

Multi-camera setups find widespread use across various applications, such as autonomous driving, as they greatly expand sensing capabilities. Despite the fast development of Neural radiance field (NeRF) techniques and their wide applications in both indoor and outdoor scenes, applying NeRF to multi-camera systems remains very challenging. This is primarily due to the inherent under-calibration issues in multi-camera setup, including inconsistent imaging effects stemming from separately calibrated image signal processing units in diverse cameras, and system errors arising from mechanical vibrations during driving that affect relative camera poses. In this paper, we present UC-NeRF, a novel method tailored for novel view synthesis in under-calibrated multi-view camera systems. Firstly, we propose a layer-based color correction to rectify the color inconsistency in different image regions. Second, we propose virtual warping to generate more viewpoint-diverse but color-consistent virtual views for color correction and 3D recovery. Finally, a spatiotemporally constrained pose refinement is designed for more robust and accurate pose calibration in multi-camera systems. Our method not only achieves state-of-the-art performance of novel view synthesis in multi-camera setups, but also effectively facilitates depth estimation in large-scale outdoor scenes with the synthesized novel views.

Details

EAAI Journal 2023 Journal Article

Neural message-passing for objective-based uncertainty quantification and optimal experimental design

Qihua Chen
Xuejin Chen
Hyun-Myung Woo
Byung-Jun Yoon

Various real-world scientific applications involve the mathematical modeling of complex uncertain systems with numerous unknown parameters. Accurate parameter estimation is often practically infeasible in such systems, as the available training data may be insufficient and the cost of acquiring additional data may be high. In such cases, based on a Bayesian paradigm, we can design robust operators retaining the best overall performance across all possible models and design optimal experiments that can effectively reduce uncertainty to enhance the performance of such operators maximally. While objective-based uncertainty quantification (objective-UQ) based on MOCU (mean objective cost of uncertainty) provides an effective means for quantifying uncertainty in complex systems, the high computational cost of estimating MOCU has been a challenge in applying it to real-world scientific/engineering problems. In this work, we propose a novel scheme to reduce the computational cost for objective-UQ via MOCU based on a data-driven approach. We adopt a neural message-passing model for surrogate modeling, incorporating a novel axiomatic constraint loss that penalizes an increase in the estimated system uncertainty. As an illustrative example, we consider the optimal experimental design (OED) problem for uncertain Kuramoto models, where the goal is to predict the experiments that can most effectively enhance robust synchronization performance through uncertainty reduction. We show that our proposed approach can accelerate MOCU-based OED by four to five orders of magnitude, without any visible performance loss compared to the state-of-the-art. The proposed approach applies to general OED tasks, beyond the Kuramoto model.

Details DOI

NeurIPS Conference 2021 Conference Paper

Dual Progressive Prototype Network for Generalized Zero-Shot Learning

Chaoqun Wang
Shaobo Min
Xuejin Chen
Xiaoyan Sun
Houqiang Li

Generalized Zero-Shot Learning (GZSL) aims to recognize new categories with auxiliary semantic information, e. g. , category attributes. In this paper, we handle the critical issue of domain shift problem, i. e. , confusion between seen and unseen categories, by progressively improving cross-domain transferability and category discriminability of visual representations. Our approach, named Dual Progressive Prototype Network (DPPN), constructs two types of prototypes that record prototypical visual patterns for attributes and categories, respectively. With attribute prototypes, DPPN alternately searches attribute-related local regions and updates corresponding attribute prototypes to progressively explore accurate attribute-region correspondence. This enables DPPN to produce visual representations with accurate attribute localization ability, which benefits the semantic-visual alignment and representation transferability. Besides, along with progressive attribute localization, DPPN further projects category prototypes into multiple spaces to progressively repel visual representations from different categories, which boosts category discriminability. Both attribute and category prototypes are collaboratively learned in a unified framework, which makes visual representations of DPPN transferable and distinctive. Experiments on four benchmarks prove that DPPN effectively alleviates the domain shift problem in GZSL.

PDF Details

AAAI Conference 2021 Conference Paper

Task-Independent Knowledge Makes for Transferable Representations for Generalized Zero-Shot Learning

Chaoqun Wang
Xuejin Chen
Shaobo Min
Xiaoyan Sun
Houqiang Li

Generalized Zero-Shot Learning (GZSL) targets recognizing new categories by learning transferable image representations. Existing methods find that, by aligning image representations with corresponding semantic labels, the semanticaligned representations can be transferred to unseen categories. However, supervised by only seen category labels, the learned semantic knowledge is highly task-specific, which makes image representations biased towards seen categories. In this paper, we propose a novel Dual-Contrastive Embedding Network (DCEN) that simultaneously learns taskspecific and task-independent knowledge via semantic alignment and instance discrimination. First, DCEN leverages task labels to cluster representations of the same semantic category by cross-modal contrastive learning and exploring semantic-visual complementarity. Besides task-specific knowledge, DCEN then introduces task-independent knowledge by attracting representations of different views of the same image and repelling representations of different images. Compared to high-level seen category supervision, this instance discrimination supervision encourages DCEN to capture low-level visual knowledge, which is less biased toward seen categories and alleviates the representation bias. Consequently, the task-specific and task-independent knowledge jointly make for transferable representations of DCEN, which obtains averaged 4. 1% improvement on four public benchmarks.

PDF Details

AAAI Conference 2019 Conference Paper

A Two-Stream Mutual Attention Network for Semi-Supervised Biomedical Segmentation with Noisy Labels

Shaobo Min
Xuejin Chen
Zheng-Jun Zha
Feng Wu
Yongdong Zhang

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TS- MAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.

PDF Details

IJCAI Conference 2019 Conference Paper

Structure-Aware Residual Pyramid Network for Monocular Depth Estimation

Xiaotian Chen
Xuejin Chen
Zheng-Jun Zha

Monocular depth estimation is an essential task for scene understanding. The underlying structure of objects and stuff in a complex scene is critical to recovering accurate and visually-pleasing depth maps. Global structure conveys scene layouts, while local structure reflects shape details. Recently developed approaches based on convolutional neural networks (CNNs) significantly improve the performance of depth estimation. However, few of them take into account multi-scale structures in complex scenes. In this paper, we propose a Structure-Aware Residual Pyramid Network (SARPN) to exploit multi-scale structures for accurate depth prediction. We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details. At each level, we propose Residual Refinement Modules (RRM) that predict residual maps to progressively add finer structures on the coarser structure predicted at the upper level. In order to fully exploit multi-scale image features, an Adaptive Dense Feature Fusion (ADFF) module, which adaptively fuses effective features from all scales for inferring structures of each scale, is introduced. Experiment results on the challenging NYU-Depth v2 dataset demonstrate that our proposed approach achieves state-of-the-art performance in both qualitative and quantitative evaluation. The code is available at https: //github. com/Xt-Chen/SARPN.

PDF Details