Author name cluster

Bolei Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

AAAI Conference 2026 Conference Paper

Perspective from a Broader Context: Can Room Style Knowledge Help Visual Floorplan Localization?

Bolei Chen
Shengsheng Yan
Yongzheng Cui
Jiaxu Kang
Ping Zhong
Jianxin Wang

Since a building's floorplan remains consistent over time and is inherently robust to changes in visual appearance, visual Floorplan Localization (FLoc) has received increasing attention from researchers. However, as a compact and minimalist representation of the building's layout, floorplans contain many repetitive structures (e.g., hallways and corners), thus easily result in ambiguous localization. Existing methods either pin their hopes on matching 2D structural cues in floorplans or rely on 3D geometry-constrained visual pre-trainings, ignoring the richer contextual information provided by visual images. In this paper, we suggest using broader visual scene context to empower FLoc algorithms with scene layout priors to eliminate localization uncertainty. In particular, we propose an unsupervised learning technique with clustering constraints to pre-train a room discriminator on self-collected unlabeled room images. Such a discriminator can empirically extract the hidden room type of the observed image and distinguish it from other room types. By injecting the scene context information summarized by the discriminator into an FLoc algorithm, the room style knowledge is effectively exploited to guide definite visual FLoc. We conducted sufficient comparative studies on two standard visual Floc benchmarks. Our experiments show that our approach outperforms state-of-the-art methods and achieves significant improvements in robustness and accuracy.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Ultrasound-based Reliable Disease Diagnosis Using Causal Inference

Bolei Chen
Jiaxu Kang
Haonan Yang
Ping Zhong
Yixiong Liang
Rui Fan
Jianxin Wang

Aligning the decision-making process of deep learning models with that of experienced sonographers is essential for ultrasound-based reliable disease diagnosis. Although existing methods have made significant progress in this aspect, their alignments are primarily associational rather than causal, leading to pseudo-correlations between features and diagnostic results. Such a biased diagnosis blindly models the sonographer's diagnostic skills and attention to specific patterns, which we argue hardly produces an AI diagnoser that is comparable to human experts. To address this issue, we propose a causality-based diagnostic framework to align the model's diagnostic behaviors with those of experts. Specifically, by delving into both conspicuous and inconspicuous confounders within the ultrasound images, the back-door and front-door adjustment causal learning modules are proposed to promote unbiased learning by mitigating potential pseudo-correlations. In addition, we integrate causal inference into a well-designed dual-branch model with feature interaction bridges for compatibility with multimodal ultrasound inputs. To fully evaluate our method, we conduct comparative studies on different diseases and ultrasound modalities. In particular, we publish a carefully constructed multimodal ultrasound dataset for breast lesion diagnosis and segmentation. Sufficient comparative and ablation studies on this dataset emphasize that our method outperforms state-of-the-art methods.

PDF Details DOI

ICRA Conference 2024 Conference Paper

HSPNav: Hierarchical Scene Prior Learning for Visual Semantic Navigation Towards Real Settings

Jiaxu Kang
Bolei Chen
Ping Zhong 0002
Haonan Yang 0001
Yu Sheng
Jianxin Wang 0001

Visual Semantic Navigation (VSN) aims at navigating a robot to a given target object in a previously unseen scene. To tackle this task, the robot must learn a nimble navigation policy by utilizing spatial patterns and semantic co-occurrence relations among objects in the scene. Prevailing approaches extract scene priors from the instant visual observations and solidify them in neural episodic memory to achieve flexible navigation. However, due to the oblivion and underuse of the scene priors, these methods are plagued by repeated exploration, effective-knowledge sparsity, and wrong decisions. To alleviate these issues, we propose a novel VSN policy, HSPNav, based on Hierarchical Scene Priors (HSP) and Deep Reinforcement Learning (DRL). The HSP contains two components, i. e. , the egocentric semantic map-based Local Scene Priors (LSP) and the commonsense relational graph-based Global Scene Priors (GSP). Then, efficient semantic navigation is achieved by employing an immediate LSP to retrieve conducive contextual memories from the GSP. By utilizing the MP3D dataset, the experimental results in the Habitat simulator demonstrate that our HSP brings a significant boost over the baselines. Furthermore, we take an essential step from simulation to reality by bridging the gap from Habitat to ROS. The migration evaluations show that HSPNav can generalize to realistic settings well and achieve promising performance.

Details

EAAI Journal 2024 Journal Article

SemNav-HRO: A target-driven semantic navigation strategy with human–robot–object ternary fusion

Bolei Chen
Siyi Lu
Ping Zhong
Yongzheng Cui
Yixiong Liang
Jianxin Wang

Target-Driven Semantic Navigation (TDSN) shows great potential to be applied in intelligent domestic assistants supporting humans with daily activities. Although numerous methods have been explored to achieve efficient static TDSN, socially aware TDSN in dynamic and crowded scenarios remains challenging and has not been adequately investigated. The main challenges come from the complex human–robot interaction mechanisms and the semantic relation exploitation, which requires the robot to understand the surroundings and perform foresighted behaviors. In this paper, a TDSN strategy named SemNav-HRO is proposed by considering Human–Robot–Object (HRO) ternary feature fusion. Specifically, a Deep Reinforcement Learning (DRL) based Dual-Channel Value Estimation Network (DCVEN) is first proposed by integrating multi-granularity map features and social awareness to learn crowded TDSN strategies. Meanwhile, the tricky and socially aware TDSN problem is slackened by eliminating the dependence on costly features (e. g. , pedestrian speed) and introducing a pedestrian trajectory predictor. For the learning and evaluation of crowded TDSN strategies, a novel and semantic-rich simulator with complex layouts is constructed based on realistic domestic scenes, instead of employing the previous naive simulation settings. Experimental results show that our method relatively improves the navigation success rates by 12. 8 % ∼ 25. 5 % and 14. 6 % ∼ 19. 2 % compared to the baselines on the MP3D and Gibson datasets, respectively. Furthermore, we experimentally verify the promising generalization and interpretability of our method.

Details DOI

IROS Conference 2024 Conference Paper

SocialNav-FTI: Field-Theory-Inspired Social-aware Navigation Framework based on Human Behavior and Social Norms

Siyi Lu
Ping Zhong 0002
Shuqi Ye
Bolei Chen
Yu Sheng
Run Liu 0001

Social navigation is a key consideration for integrating robots into human environments. Concurrently, it imposes heightened requisites: tasks must not only be executed succesfully without collisions, but also adhere to principles encompassing comprehensibility, courtesy, social compliance, comprehension, foresight, and scenario compliance. In this paper, we present the incorporation of social norms as a guiding framework for robot navigation within social contexts. We adopt field theory to provide a formal elucidation of the social norms, using Physical-Informed Neural Network (PINN) to predict pedestrian movement under the influence of social norms, respectively, and using Reinforcement Learning (RL) for navigation. We use supervised learning to train the pedestrian velocity field prediction model and reinforcement learning to train the navigation policy. We conduct three parts of experiments: (1) analyzing the spatiotemporal characteristics of the velocity field in the walking pedestrians dataset; (2) evaluating the accuracy of the vector field prediction in the pedestrian dataset; (3) using Gazebo simulation and the PEDSIM library to evaluate the improvement of navigation performance under constraints of social norms. Experiments have confirmed that the pedestrian motion data set indeed satisfies the Gaussian divergence theorem and can be described by the concept of field. The performance of navigation strategies incorporating social rules has been improved to a certain extent.

Details

TIST Journal 2023 Journal Article

STExplorer: A Hierarchical Autonomous Exploration Strategy with Spatio-temporal Awareness for Aerial Robots

Bolei Chen
Yongzheng Cui
Ping Zhong
Wang Yang
Yixiong Liang
Jianxin Wang

The autonomous exploration task we consider requires Unmanned Aerial Vehicles (UAVs) to actively navigate through unknown environments with the goal of fully perceiving and mapping the environments. Some existing exploration strategies suffer from rough cost budgets, ambiguous Information Gain (IG), and unnecessary backtracking exploration caused by Fragmented Regions (FRs). In our work, a hierarchical spatio-temporal-aware exploration framework is proposed to alleviate these problems. At the local exploration level, the Asymmetrical Traveling Salesman Problem (ATSP) is solved by comprehensively considering exploration time, IG, and heading consistency to avoid blindly exploring. Specifically, the exploration time is reasonably budgeted by fast marching in an artificial potential field. Meanwhile, a transformer-based map occupancy predictor is designed to assist in IG calculation by imagining spatial clues out of the Field of View (FoV), facilitating the prescient exploration. We verify that our local exploration is effective in alleviating the unnecessary back-and-forth movements caused by FRs and the interference of potential obstacle occlusion on the IG calculation. At the global exploration level, the classical Next Best View Points (NBVP) are generalized to Next Best Sub-Regions (NBSR) to choose informative sub-regions for further forward-looking exploration based on a well-designed utility function. Safe flight paths and dynamically feasible trajectories are reasonably generated throughout the exploration process by fast marching and B-spline curve optimization. Comparative simulations and benchmark tests demonstrate that our proposed exploration strategy is quite competitive in terms of exploration path length, total exploration time, and exploration ratio.

Details DOI