Author name cluster

Jiaxin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

EAAI Journal 2026 Journal Article

A label-free physics informed neural network with hard constraints and Fourier features spectrally-enhanced for multi-frequency seismic structural dynamic response

Ke Du
Zehua Huang
Jiaxin Li
Dongwang Tao
Zhuoshi Chen

Details DOI

AAAI Conference 2026 Conference Paper

DensiCrafter: Physically-Constrained Generation and Fabrication of Self-Supporting Hollow Structures

Shengqi Dang
Fu Chai
Jiaxin Li
Chao Yuan
Wei Ye
Nan Cao

The rise of 3D generative models has enabled automatic 3D geometry and texture synthesis from multimodal inputs (e.g., text or images). However, these methods often ignore physical constraints and manufacturability considerations. In this work, we address the challenge of producing 3D designs that are both lightweight and self-supporting. We present DensiCrafter, a framework for generating lightweight, self-supporting 3D hollow structures by optimizing the density field. Starting from coarse voxel grids produced by Trellis, we interpret these as continuous density fields to optimize and introduce three differentiable, physically constrained, and simulation-free loss terms. Additionally, a mass regularization penalizes unnecessary material, while a restricted optimization domain preserves the outer surface. Our method seamlessly integrates with pretrained Trellis-based models (e.g., Trellis, DSO) without any architectural changes. In extensive evaluations, we achieve up to 43% reduction in material mass on the text-to-3D task. Compared to state-of-the-art baselines, our method could improve the stability and maintain high geometric fidelity. Real-world 3D-printing experiments confirm that our hollow designs can be reliably fabricated and could be self-supporting.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Pansharpening for Thin-Cloud Contaminated Remote Sensing Images: A Unified Framework and Benchmark Dataset

Songcheng Du
Yang Zou
Jiaxin Li
Mingxuan Liu
Ying Li
Changjing Shang
Qiang Shen

Pansharpening under thin cloudy conditions is a practically significant yet rarely addressed task, challenged by simultaneous spatial resolution degradation and cloud-induced spectral distortions. Existing methods often address cloud removal and pansharpening sequentially, leading to cumulative errors and suboptimal performance due to the lack of joint degradation modeling. To address these challenges, we propose a Unified Pansharpening Model with Thin Cloud Removal (Pan-TCR), an end-to-end framework that integrates physical priors. Motivated by theoretical analysis in the frequency domain, we design a frequency-decoupled restoration (FDR) block that disentangles the restoration of multispectral image (MSI) features into amplitude and phase components, each guided by complementary degradation-robust prompts: the near-infrared (NIR) band amplitude for cloud-resilient restoration, and the panchromatic (PAN) phase for high-resolution structural enhancement. To ensure coherence between the two components, we further introduce an interactive inter-frequency consistency (IFC) module, enabling cross-modal refinement that enforces consistency and robustness across frequency cues. Furthermore, we introduce the first real-world thin-cloud contaminated pansharpening dataset (PanTCR-GF2), comprising paired clean and cloudy PAN-MSI images, to enable robust benchmarking under realistic conditions. Extensive experiments on real-world and synthetic datasets demonstrate the superiority and robustness of Pan-TCR, establishing a new benchmark for pansharpening under realistic atmospheric degradations.

PDF Details DOI

EAAI Journal 2025 Journal Article

Autonomous dynamic formation for maritime target tracking using multi-agent reinforcement learning

Hua Wang
Jiaxin Li
Hao Tao
Junnan Liu
Chaochao Li
Ke Wang
Mingliang Xu

Details DOI

AAAI Conference 2025 Conference Paper

FloNa: Floor Plan Guided Embodied Visual Navigation

Jiaxin Li
Weiqi Huang
Zan Wang
Wei Liang
Huijun Di
Feng Liu

Humans naturally rely on floor plans to navigate in unfamiliar environments, as they are readily available, reliable, and provide rich geometrical guidance. However, existing visual navigation settings overlook this valuable prior knowledge, leading to limited efficiency and accuracy. To eliminate this gap, we introduce a novel navigation task: Floor Plan Visual Navigation (FloNa), the first attempt to incorporate floor plans into embodied visual navigation. While the floor plan offers significant advantages, two key challenges emerge: (1) handling the spatial inconsistency between the floor plan and the actual scene layout for collision-free navigation, and (2) aligning observed images with the floor plan sketch despite their distinct modalities. To address these challenges, we propose FloDiff, a novel diffusion policy framework incorporating a localization module to facilitate alignment between the current observation and the floor plan. We further collect 20k navigation episodes across 117 scenes in the iGibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge.

PDF Details DOI

ICML Conference 2025 Conference Paper

HetSSNet: Spatial-Spectral Heterogeneous Graph Learning Network for Panchromatic and Multispectral Images Fusion

Mengting Ma
Yizhen Jiang
Mengjiao Zhao
Jiaxin Li
Wei Zhang 0243

Remote sensing pansharpening aims to reconstruct spatial-spectral properties during the fusion of panchromatic (PAN) images and lowresolution multi-spectral (LR-MS) images, finally generating the high-resolution multi-spectral (HRMS) images. In the mainstream modeling strategies, i. e. , CNN and Transformer, the input images are treated as the equal-sized grid of pixels in the Euclidean space. They have limitations in facing remote sensing images with irregular ground objects. Graph is the more flexible structure, however, there are two major challenges when modeling spatial-spectral properties with graph: 1) constructing the customized graph structure for spatial-spectral relationship priors; 2) learning the unified spatial-spectral representation through the graph. To address these challenges, we propose the spatial-spectral heterogeneous graph learning network, named HetSSNet. Specifically, HetSSNet initially constructs the heterogeneous graph structure for pansharpening, which explicitly describes pansharpening-specific relationships. Subsequently, the basic relationship pattern generation module is designed to extract the multiple relationship patterns from the heterogeneous graph. Finally, relationship pattern aggregation module is exploited to collaboratively learn unified spatial-spectral representation across different relationships among nodes with adaptive importance learning from local and global perspectives. Extensive experiments demonstrate the significant superiority and generalization of HetSSNet.

Details

AAAI Conference 2025 Conference Paper

Modality-Aware Shot Relating and Comparing for Video Scene Detection

Jiawei Tan
Hongxing Wang
Kang Dang
Jiaxin Li
Zhilong Ou

Video scene detection involves assessing whether each shot and its surroundings belong to the same scene. Achieving this requires meticulously correlating multi-modal cues, e.g., visual entity and place modalities, among shots and comparing semantic changes around each shot. However, most methods treat multi-modal semantics equally and do not examine contextual differences between the two sides of a shot, leading to sub-optimal detection performance. In this paper, we propose the Modality-Aware Shot Relating and Comparing approach (MASRC), which enables relating shots per their own characteristics of visual entity and place modalities, as well as comparing multi-shots similarities to have scene changes explicitly encoded. Specifically, to fully harness the potential of visual entity and place modalities in modeling shot relations, we mine long-term shot correlations from entity semantics while simultaneously revealing short-term shot correlations from place semantics. In this way, we can learn distinctive shot features that consolidate coherence within scenes and amplify distinguishability across scenes. Once equipped with distinctive shot features, we further encode the relations between preceding and succeeding shots of each target shot by similarity convolution, aiding in the identification of scene ending shots. We validate the broad applicability of the proposed components in MASRC. Extensive experimental results on public benchmark datasets demonstrate that the proposed MASRC significantly advances video scene detection.

PDF Details DOI

YNIMG Journal 2025 Journal Article

Morphological changes of the choroid plexus in the lateral ventricle across the lifespan: 5551 subjects from fetus to elderly

Jiaxin Li
Yuxuan Gao
Yunzhi Xu
Weiying Dai
Yueqin Hu
Xue Feng
Dan Wu
Li Zhao

Details DOI

NeurIPS Conference 2025 Conference Paper

Open-Vocabulary Part Segmentation via Progressive and Boundary-Aware Strategy

Xinlong Li
Di Lin
Shaoyiyi Gao
Jiaxin Li
Ruonan Liu
Qing Guo

Open-vocabulary part segmentation (OVPS) struggles with structurally connected boundaries due to the inherent conflict between continuous image features and discrete classification mechanism. To address this, we propose PBAPS, a novel training-free framework specifically designed for OVPS. PBAPS leverages structural knowledge of object-part relationships to guide a progressive segmentation from objects to fine-grained parts. To further improve accuracy at challenging boundaries, we introduce a Boundary-Aware Refinement (BAR) module that identifies ambiguous boundary regions by quantifying classification uncertainty, enhances the discriminative features of these ambiguous regions using high-confidence context, and adaptively refines part prototypes to better align with the specific image. Experiments on Pascal-Part-116, ADE20K-Part-234, PartImageNet demonstrate that PBAPS significantly outperforms state-of-the-art methods, achieving 46. 35\% mIoU and 34. 46\% bIoU on Pascal-Part-116. Our code is available at https: //github. com/TJU-IDVLab/PBAPS.

PDF Details

NeurIPS Conference 2025 Conference Paper

StreamForest: Efficient Online Video Understanding with Persistent Event Memory

Xiangyu Zeng
Kefan Qiu
Qingyu Zhang
Xinhao Li
Jing Wang
Jiaxin Li
Ziang Yan
Kun Tian

Multimodal Large Language Models (MLLMs) have recently achieved remarkable progress in video understanding. However, their effectiveness in real-time streaming scenarios remains limited due to storage constraints of historical visual features and insufficient real-time spatiotemporal reasoning. To address these challenges, we propose StreamForest, a novel architecture specifically designed for streaming video understanding. Central to StreamForest is the Persistent Event Memory Forest, a memory mechanism that adaptively organizes video frames into multiple event-level tree structures. This process is guided by penalty functions based on temporal distance, content similarity, and merge frequency, enabling efficient long-term memory retention under limited computational resources. To enhance real-time perception, we introduce a Fine-grained Spatiotemporal Window, which captures detailed short-term visual cues to improve current scene perception. Additionally, we present OnlineIT, an instruction-tuning dataset tailored for streaming video tasks. OnlineIT significantly boosts MLLM performance in both real-time perception and future prediction. To evaluate generalization in practical applications, we introduce ODV-Bench, a new benchmark focused on real-time streaming video understanding in autonomous driving scenarios. Experimental results demonstrate that StreamForest achieves the state-of-the-art performance, with accuracies of 77. 3% on StreamingBench, 60. 5% on OVBench, and 55. 6% on OVO-Bench. In particular, even under extreme visual token compression (limited to 1024 tokens), the model retains 96. 8% of its average accuracy in eight benchmarks relative to the default setting. These results underscore the robustness, efficiency, and generalizability of StreamForest for streaming video understanding.

PDF Details

IJCAI Conference 2025 Conference Paper

TSTAI: A Time-varying Brain Effective Connectivity Network Construction Method Combining with Brain Active Information

Qi Chen
Zhiqiong Wang
Jiaxin Li
Jinying Tao
Junchang Xin

More accurate construction of brain effective conncetivity networks remains a great challenge to achieve accurate auxiliary diagnosis of brain diseases and in-depth exploration of brain function. However, existing methods only consider higher-order or non-stationary assumptions, rather than simultaneously constructing higher-order and non-stationary networks. Among many existing methods, Bayesian network methods demonstrate superior network structure learning ability. In this work, the forward-backward search (FBS) method is optimized by using brain active information, which is improved to a higher-order network structure learning method, called TSTAI. Firstly, in the process of non-stationary network structure learning, two-stage idea is used to search the change points. Then, in the process of learning higher-order network structure, FBS method is combined with two kinds of brain active information to improve the condition set filtering process and scoring function, respectively. Finally, the pruning strategy is used to reduce the search space. Extensive experiments on simulated and real data demonstrate the effectiveness of TSTAI. Through experiments, the TSTAI is compared with state-of-the-art higher-order network construction methods, and the proposed method achieves an improvement of 3. 6% and 17. 4% respectively in the network construction accuracy.

PDF Details DOI

ICLR Conference 2024 Conference Paper

BENO: Boundary-embedded Neural Operators for Elliptic PDEs

Haixin Wang 0003
Jiaxin Li
Anubhav Dwivedi
Kentaro Hara
Tailin Wu

Elliptic partial differential equations (PDEs) are a major class of time-independent PDEs that play a key role in many scientific and engineering domains such as fluid dynamics, plasma physics, and solid mechanics. Recently, neural operators have emerged as a promising technique to solve elliptic PDEs more efficiently by directly mapping the input to solutions. However, existing networks typically neglect complex geometries and inhomogeneous boundary values present in the real world. Here we introduce Boundary-Embedded Neural Operators (BENO), a novel neural operator architecture that embeds the complex geometries and inhomogeneous boundary values into the solving of elliptic PDEs. Inspired by classical Green's function, BENO consists of two Graph Neural Networks (GNNs) for interior source term and boundary values, respectively. Furthermore, a Transformer encoder maps the global boundary geometry into a latent vector which influences each message passing layer of the GNNs. We test our model and strong baselines extensively in elliptic PDEs with complex boundary conditions. We show that all existing baseline methods fail to learn the solution operator. In contrast, our model, endowed with boundary-embedded architecture, outperforms state-of-the-art neural operators and strong baselines by an average of 60.96%.

Details

YNIMG Journal 2024 Journal Article

Brain fingerprinting and cognitive behavior predicting using functional connectome of high inter-subject variability

Jiayu Lu
Tianyi Yan
Lan Yang
Xi Zhang
Jiaxin Li
Dandan Li
Jie Xiang
Bin Wang

Details DOI

AAMAS Conference 2024 Conference Paper

JDRec: Practical Actor-Critic Framework for Online Combinatorial Recommender System

Xin Zhao
Jiaxin Li
Zhiwei Fang
Yuchen Guo
Jinyuan Zhao
Jie He
Wenlong Chen
Changping Peng

In the realm of online recommendation systems, the Combinatorial Recommender (CR) system stands out for its unique approach. It presents users with a list of items on a result page, where user behavior is simultaneously influenced by contextual information and the items listed. Formulated as a combinatorial optimization problem, the objective of the CR system is to maximize the recommendation reward across the entire list of items. Despite the significant potential of CR systems, developing a practical and efficient model remains substantial challenges. These challenges stem from the dynamic nature of online environments and the pressing need for personalized recommendations. To tackle these challenges, we decompose the overarching problem into two sub-problems: list generation and list evaluation. We propose novel and pragmatic model architectures for each sub-problem aiming to concurrently enhance both effectiveness and efficiency. To further adapt the CR system to online scenarios, we integrate a bootstrap algorithm into an actor-critic reinforcement framework. This innovative approach called JD Recommender System (JDRec) is designed to continuously refine the recommendation mode through sustained user interaction, ensuring the system’s adaptability and relevance. The proposed JDRec framework, tested through rigorous offline and online experiments, has shown promising results. It has been successfully deployed in online JD recommendation systems, yielding a notable improvement in click-through rate by 2. 6% and augmenting the total value of the platform by 5. 03%. Besides, we release the large scale dataset used in our work to facilitate further research. This work is licensed under a Creative Commons Attribution International 4. 0 License. *Equal contribution. Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), N. Alechina, V. Dignum, M. Dastani, J. S. Sichman (eds.), May 6 – 10, 2024, Auckland, New Zealand. © 2024 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

PDF

IROS Conference 2024 Conference Paper

Visual Loop Closure Detection with Thorough Temporal and Spatial Context Exploitation

Jiaxin Li
Zan Wang
Huijun Di
Jian Li
Wei Liang 0008

Despite advancements in visual Simultaneous Localization and Mapping (SLAM), prevailing visual Loop Closure Detection (LCD) methods primarily rely on computationally intensive image similarity comparisons, neglecting temporal-spatial context during long-term exploration. To address this issue, we propose TOSA, a novel visual LCD algorithm harnessing TempOral and SpAtial context for efficient LCD. Specifically, as the agent explores through time, our approach recurrently updates a latent feature incorporating historical information via a Long Short-Term Memory (LSTM) module. Upon receiving a query frame, TOSA seamlessly fuses the latent feature with the query feature to predict the candidates’ distribution, thus averting intensive similarity computation. Additionally, TOSA integrates a temporal-spatial convolution for candidate refinement by thoroughly exploiting the temporal consistency and spatial correlation to enhance selected candidates, further boosting the performance. Extensive experiments across four standard datasets showcase the superiority of our method over existing state-of-the-art techniques, demonstrating the effectiveness of utilizing rich temporal-spatial contexts.

Details

IROS Conference 2022 Conference Paper

A Deep-Learning-based System for Indoor Active Cleaning

Yike Yun
Linjie Hou
Zijian Feng
Wei Jin
Yang Liu
Heng Wang
Ruonan He
Weitao Guo

Cleaning public areas like commercial complexes is challenging due to their sophisticated surroundings and the vast kinds of real-life dirt. Robots are required to distinguish dirts and apply corresponding cleaning strategies. In this work, we proposed an active-cleaning framework by utilizing deep-learning methods for both solid wastes detection and liquid stains segmentation. Our system consists of 4 components: a Perception module integrated with deep-learning models, a Post-processing module for projection, a Tracking module for map localization, and a Planning and Control module for cleaning strategies. Compared with classic approaches, our vision-based system significantly improves cleaning efficiency. Besides, we released the largest real-world indoor hybrid dirt cleaning dataset (HD10K) containing 10K labeled images, together with a track-level evaluation metric for better cleaning performance measurement. The proposed deep-learning based system is verified with extensive experiments on our dataset, and deployed to Gaussian Robotics's robots operating globally. Dataset is available at: https://gaussianopensource.github.io/projects/active_cleaning.

Details

ICRA Conference 2019 Conference Paper

Discrete Rotation Equivariance for Point Cloud Recognition

Jiaxin Li
Yingcai Bi
Gim Hee Lee

Despite the recent active research on processing point clouds with deep networks, few attention has been on the sensitivity of the networks to rotations. In this paper, we propose a deep learning architecture that achieves discrete SO(2)/SO(3) rotation equivariance for point cloud recognition. Specifically, the rotation of an input point cloud with elements of a rotation group is similar to shuffling the feature vectors generated by our approach. The equivariance is easily reduced to invariance by eliminating the permutation with operations such as maximum or average. Our method can be directly applied to any existing point cloud based networks, resulting in significant improvements in their performance for rotated inputs. We show state-of-the-art results in the classification tasks with various datasets under both SO(2) and SO(3) rotations. In addition, we further analyze the necessary conditions of applying our approach to PointNet [1] based networks.

Details

IROS Conference 2017 Conference Paper

Deep learning for 2D scan matching and loop closure

Jiaxin Li
Huangying Zhan
Ben M. Chen
Ian D. Reid 0001
Gim Hee Lee

Although 2D LiDAR based Simultaneous Localization and Mapping (SLAM) is a relatively mature topic nowadays, the loop closure problem remains challenging due to the lack of distinctive features in 2D LiDAR range scans. Existing research can be roughly divided into correlation based approaches e. g. scan-to-submap matching and feature based methods e. g. bag-of-words (BoW). In this paper, we solve loop closure detection and relative pose transformation using 2D LiDAR within an end-to-end Deep Learning framework. The algorithm is verified with simulation data and on an Unmanned Aerial Vehicle (UAV) flying in indoor environment. The loop detection ConvNet alone achieves an accuracy of 98. 2% in loop closure detection. With a verification step using the scan matching ConvNet, the false positive rate drops to around 0. 001%. The proposed approach processes 6000 pairs of raw LiDAR scans per second on a Nvidia GTX1080 GPU.

Details