Author name cluster

Weihao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

EAAI Journal 2026 Journal Article

An advanced detector and dual-shortest distance intersection algorithm for navigation path extraction in complex orchards

Pengfei Lv
Jinlin Xue
Wenbo Wei
Shaohua Liu
Weiwei Gao
Han Sun
Hanzhao Miao
Weihao Wang

Accurate navigation path extraction is crucial for autonomous operation of intelligent agricultural machinery in orchards. However, the limited accuracy and deployability of existing detection algorithms, combined with the complexity of orchard environments, hinder accurate path extraction. This study proposes a navigation path extraction method using an advanced detector and the dual-shortest distance intersection (DSDI) algorithm. First, an advanced detector was developed for accurate identification and extraction of trunk localization feature points. Specifically, a systematic analysis of the sample dataset revealed a high proportion of small targets. In response, a new feature fusion architecture was designed, upon which an advanced detector was developed for enhancing small target detection. Furthermore, the detector was optimized to balance detection accuracy and computational efficiency by pruning redundant network weights and neurons. Second, a novel DSDI algorithm was proposed for accurate navigation path extraction, based on the extracted localization feature points. It leverages geometric constraints and dual-shortest distance principles to generate paths through intersection and angle bisector calculations. Experimental results demonstrate that the proposed detector outperforms the baseline You Only Look Once 11 small (YOLO11s), achieving a 5. 9 percent (%) improvement in mean average precision, an 88. 04 % reduction in model size, a 64. 32 % decrease in floating-point operations, and a 64. 71 % increase in frames per second. Moreover, its generalization capability is further validated through evaluations on two public benchmark datasets. Compared with eight mainstream detectors, the proposed detector exhibits superior overall performance. Under both weed-free and weed-interfered conditions, the average navigation path extraction accuracy is 89 %, with an average heading angle deviation of 2. 48°. This study delivers theoretical and technical support for advancing autonomous navigation in orchard robots.

Details DOI

AAAI Conference 2026 Conference Paper

EMAformer: Enhancing Transformer Through Embedding Armor for Time Series Forecasting

Zhiwei Zhang
Xinyi Du
Xuanchi Guo
Weihao Wang
Wenjuan Han

Multivariate time series forecasting is crucial across a wide range of domains. While presenting notable progress for the Transformer architecture, iTransformer still lags behind the latest MLP-based models. We attribute this performance gap to unstable inter-channel relationships. To bridge this gap, we propose EMAformer, a simple yet effective model that enhances the Transformer with an auxiliary embedding suite, akin to armor that reinforces its ability. By introducing three key inductive biases, i.e., global stability, phase sensitivity, and cross-axis specificity, EMAformer unlocks the further potential of the Transformer architecture, achieving state-of-the-art performance on 12 real-world benchmarks and reducing forecasting errors by an average of 2.73% in MSE and 5.15% in MAE. This significantly advances the practical applicability of Transformer-based approaches for multivariate time series forecasting.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Imagine: Image-Guided 3D Part Assembly with Structure Knowledge Graph

Weihao Wang
Yu Lan
Mingyu You
Bin He

3D part assembly is a promising task in 3D computer vision and robotics, focusing on assembling 3D parts together by predicting their 6-DoF poses. Like most 3D shape understanding tasks, existing methods primarily address this task by memorizing the poses of parts during the training process, leading to inaccuracies in complex assemblies and poor generalization to novel categories. In order to essentially improve the performance, structure knowledge of the target assembly is indispensable before assembling, which abstracts the potential part composition and their structural relationships. An image of the target assembly can serve as a common source for constructing this structure knowledge. Nevertheless, the image is far from enough, as its knowledge can be incomplete and ambiguous due to part occlusion and varying views. To tackle these issues, we propose Imagine, a novel Image-guided 3D part assembly framework with structure knowledge graph. As a novel assembly prior, the structure knowledge graph originates from the image and is refined as understanding the 3D parts. It encodes robust part-aware structural and semantic information of the assembly, guides the 3D parts from a coarse super-structure to a fine assembly, and co-evolves progressively throughout the assembly process. Extensive experiments demonstrate the state-of-the-art performance of our framework, along with strong generalization to novel images and categories.

PDF Details DOI

JBHI Journal 2025 Journal Article

MuFuBP-Net: A Multimodal Fusion Network for Cuffless Blood Pressure Estimation Using Dual-Feature Pipeline With Probabilistic Feature Encoder

Farhad Hassan
Mubashir Ali
Zubair Akbar
Jingzhen Li
Yuhang Liu
Weihao Wang
Lixin Guo
Zedong Nie

Cuffless blood pressure (BP) estimation is critical for managing growing concerns about hypertension and cardiovascular diseases. Despite recent advancements in multimodal (ECG and PPG) BP estimation methods, which have achieved varying degrees of success, several challenges remain to be addressed. These include capturing the full spectrum of BP-relevant information, redundant feature spaces, and handling the multigrade classification. To address these issues, we propose a Multimodal Fusion BP Network (MuFuBP-Net), featuring a novel dual-feature pipeline architecture designed to extract hierarchical and modality-specific features from both ECG and PPG signals. Additionally, the Cascading Cross-Feature Enhancer (CCFE) module integrates multiple fusion strategies with a squeeze-and-excitation mechanism to apply channel-wise attention to spatial features, enabling dynamic re-weighting. We also employed a Sequence Context Network (SCN) module to capture global sequential features. Subsequently, a Probabilistic Feature Encoder (PFE) encodes the multilevel features from both pipelines into a compact latent space, preserving their discriminative characteristics. Our approach achieved MAE $\pm$ SDE of 2. 99 $\pm$ 4. 37 mmHg (SBP) and 2. 63 $\pm$ 4. 19 mmHg (DBP) on MIMIC-II, and 2. 27 $\pm$ 4. 15 mmHg (SBP) and 1. 63 $\pm$ 2. 96 mmHg (DBP) on MIMIC-III dataset, meeting AAMI, BHS, and IEEE grade A standards. The proposed approach demonstrated competitive results compared to existing techniques, highlighting its significance as a reliable solution for cuffless BP monitoring.

Details DOI

ICLR Conference 2025 Conference Paper

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen

We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various and mixed modalities. The unified model flexibly supports a wide range of vision-language tasks including visual question-answering, text-to-image generation, text-guided inpainting/extrapolation, and mixed-modality generation. Across various benchmarks, it demonstrates comparable or superior performance to existing individual models with an equivalent or larger number of parameters tailored for understanding or generation. This significantly highlights its potential as a next-generation foundation model.

Details

AAAI Conference 2023 Conference Paper

3D Assembly Completion

Weihao Wang
Rufeng Zhang
Mingyu You
Hongjun Zhou
Bin He

Automatic assembly is a promising research topic in 3D computer vision and robotics. Existing works focus on generating assembly (e.g., IKEA furniture) from scratch with a set of parts, namely 3D part assembly. In practice, there are higher demands for the robot to take over and finish an incomplete assembly (e.g., a half-assembled IKEA furniture) with an off-the-shelf toolkit, especially in human-robot and multi-agent collaborations. Compared to 3D part assembly, it is more complicated in nature and remains unexplored yet. The robot must understand the incomplete structure, infer what parts are missing, single out the correct parts from the toolkit and finally, assemble them with appropriate poses to finish the incomplete assembly. Geometrically similar parts in the toolkit can interfere, and this problem will be exacerbated with more missing parts. To tackle this issue, we propose a novel task called 3D assembly completion. Given an incomplete assembly, it aims to find its missing parts from a toolkit and predict the 6-DoF poses to make the assembly complete. To this end, we propose FiT, a framework for Finishing the incomplete 3D assembly with Transformer. We employ the encoder to model the incomplete assembly into memories. Candidate parts interact with memories in a memory-query paradigm for final candidate classification and pose prediction. Bipartite part matching and symmetric transformation consistency are embedded to refine the completion. For reasonable evaluation and further reference, we design two standard toolkits of different difficulty, containing different compositions of candidate parts. We conduct extensive comparisons with several baseline methods and ablation studies, demonstrating the effectiveness of the proposed method.

PDF Details DOI

ICRA Conference 2023 Conference Paper

Optimized Design and Analysis of Active Propeller-driven Capsule Endoscopic Robot for Gastric Examination

Yi Zhang
Weihao Wang
Wende Ke
Chengzhi Hu

Capsule endoscopic robot holds great promise for the early diagnosis of gastrointestinal diseases without causing discomfort to patients. However, currently available active capsule endoscopic robots suffer from issues such as complex structure, poor mobility, large size, and high cost, which have hindered their widespread adoption and resulted in a lower screening rate for gastrointestinal diseases. To address these challenges, this paper proposes a highly integrated propeller-driven capsule endoscopic robot (PCER) system that integrates STM32 processor, magnetic sensor, IMU, RF communication unit, and motor drive. The micro propeller of the PCER has been analyzed through finite element simulation to ensure its efficiency. FLUENT software has been utilized to simulate the fluid force acting on the PCER as it moves through a liquid medium. The results of the simulation are then used to determine the optimal pitch angle for the robot's movement. The thrust generated by the capsule robot propellers has been measured using a lever mechanism to investigate the relationship between the thrust and voltage applied to the motors. The experiments confirmed that the PCER is capable of performing flexible motions within fluid environments, such as changing pitch angle during movement, passing circular obstacles, horizontal motion, and spiral ascent. These findings demonstrate the feasibility of the proposed PCER as an effective tool for non-invasive early screening of gastrointestinal diseases.

Details