Author name cluster

Han Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

EAAI Journal 2026 Journal Article

An advanced detector and dual-shortest distance intersection algorithm for navigation path extraction in complex orchards

Pengfei Lv
Jinlin Xue
Wenbo Wei
Shaohua Liu
Weiwei Gao
Han Sun
Hanzhao Miao
Weihao Wang

Accurate navigation path extraction is crucial for autonomous operation of intelligent agricultural machinery in orchards. However, the limited accuracy and deployability of existing detection algorithms, combined with the complexity of orchard environments, hinder accurate path extraction. This study proposes a navigation path extraction method using an advanced detector and the dual-shortest distance intersection (DSDI) algorithm. First, an advanced detector was developed for accurate identification and extraction of trunk localization feature points. Specifically, a systematic analysis of the sample dataset revealed a high proportion of small targets. In response, a new feature fusion architecture was designed, upon which an advanced detector was developed for enhancing small target detection. Furthermore, the detector was optimized to balance detection accuracy and computational efficiency by pruning redundant network weights and neurons. Second, a novel DSDI algorithm was proposed for accurate navigation path extraction, based on the extracted localization feature points. It leverages geometric constraints and dual-shortest distance principles to generate paths through intersection and angle bisector calculations. Experimental results demonstrate that the proposed detector outperforms the baseline You Only Look Once 11 small (YOLO11s), achieving a 5. 9 percent (%) improvement in mean average precision, an 88. 04 % reduction in model size, a 64. 32 % decrease in floating-point operations, and a 64. 71 % increase in frames per second. Moreover, its generalization capability is further validated through evaluations on two public benchmark datasets. Compared with eight mainstream detectors, the proposed detector exhibits superior overall performance. Under both weed-free and weed-interfered conditions, the average navigation path extraction accuracy is 89 %, with an average heading angle deviation of 2. 48°. This study delivers theoretical and technical support for advancing autonomous navigation in orchard robots.

Details DOI

EAAI Journal 2026 Journal Article

Lightweight multi-classification pear fruit high-precision detection model in complex orchard scenes

Shaohua Liu
Jinlin Xue
Tianyu Zhang
Pengfei Lv
Tianxing Zhao
Han Sun
ruikai liu
Yihang Chen

Obstacle occlusion in modern orchards significantly reduces the operational efficiency of pear-picking robots. To address this issue, a lightweight multi-classification pear fruit high-precision detection model, named MultiPL-YOLO (You Only Look Once), is proposed, which employs an anchor-based detection algorithm. Firstly, the network structure is adjusted to enhance feature extraction while reducing parameters and complexity. Additionally, the target anchor boxes are redesigned using the k-means algorithm. These modifications optimize the model for better feature extraction of multi-classification pear targets. Secondly, a redesigned C3 (CSP Bottleneck with 3 convolutions) module, incorporating Coordinate Attention and Efficient Multi-Head Convolution (C3-CAEMHC), is introduced to capture multi-scale features and expand the receptive field. Finally, Efficient Intersection over Union (EIoU) loss function is used in the head network to accelerate convergence and improve detection accuracy. Test results show that our model achieves a streamlined design with only 2, 499, 072 parameters, a 64. 4% reduction compared to the original model, and a model size of 5. 9 MB (MegaBytes), representing a 59. 0% decrease. Its feature extraction ability under complex environments is significantly improved, with precision of 96. 1%, recall of 92. 6%, and mAP@50 (mean Average Precision at IoU threshold of 50%) of 97. 0%, marking improvements of 0. 8, 2. 1, and 2. 0 percentage points over the original model, respectively. Furthermore, our model demonstrates superior detection performance on embedded industrial computers with limited computational resources. This study highlights the role of artificial intelligence in strengthening the perception and autonomous decision-making of agricultural robots, thereby facilitating intelligent fruit harvesting under challenging orchard conditions.

Details DOI

AAAI Conference 2026 Conference Paper

RSPlace: Rotation Sensing Macro Placement via Bidirectional Tree Expansion

Tianyi Liu
Yaxin Xu
Lin Geng
Ningzhong Liu
Han Sun
Yu Wang

Macro placement is a crucial subproblem of chip design, focusing on determining the locations of numerous macros while minimizing multiple metrics. In recent years, reinforcement learning (RL) has gained traction as a favorable technique to improve placement performance. However, existing RL-based placers ignore the orientation of macros, resulting in the state space constrained to two-dimensional discrete coordinates and greatly restricting the exploration opportunities. To address this issue, we propose a novel macro placement method, RSPlace, which guides the bidirectional expansion of the global search tree to offer the RL agent more exploration opportunities, incorporating rotation into the RL-based macro placement solution for the first time. RSPlace intelligently determines the optimal rotation angle to maximize placement benefits by leveraging rotation sensing and placement perturbations. Extensive experiments demonstrate that taking the macro orientation into account substantially broadens the feasible locations and effectively reduces the half-perimeter wirelength (HPWL), thus ensuring that our approach significantly improves the optimization effect compared to the state-of-the-art method.

PDF Details DOI

JBHI Journal 2025 Journal Article

A Novel Framework for Predicting Phage-Host Interactions via Host Specificity-Aware Graph Autoencoder

Zhen Xiao
Han Sun
Ankang Wei
Weizhong Zhao
Xingpeng Jiang

Due to the abuse of antibiotics, some pathogenic bacteria have developed resistance to most antibiotics, leading to the emergence of antibiotic-resistant superbugs. Therefore, researchers resort to phage therapy for bacterial infections. For phage therapy, the fundamental step is to accurately identify phage-host interactions. Although various methods have been proposed, the existing methods suffer from the following two shortcomings: 1) they fail to make full use of genetic information including both genome and protein sequence of phages; 2) host specificity of phages is not explicitly utilized when learning representations of phages and bacteria. In this paper, we present an efficient computational method called PHISGAE for predicting phage-host interactions, in which the host specificity is explicitly employed. Firstly, initial phage-phage connections are efficiently constructed via utilizing phage genome and protein sequence. Then, the refined heterogeneous network is derived by applying K-nearest neighbor strategy, keeping relatively more meaningful local semantics among phages and bacteria. Finally, a host specificity-aware graph autoencoder is proposed to learn high-quality representations of phages and bacteria for predicting phage-host interactions. Experimental results show that PHISGAE outperforms the state-of-the-art methods on predicting phage-host interactions at both species level and genus level (AUC values of 94. 73% and 96. 32%, respectively). Moreover, results of case study demonstrate that PHISGAE is able to identify candidate hosts with high probability for previously unseen phages identified from metagenomics, effectively predicting potential phage-host interactions in real-world applications.

Details DOI

IROS Conference 2025 Conference Paper

DCT-Diffusion: Depth Completion for Transparent Objects with Diffusion Denoising Approach

Zhenning Zhou
Weiqing Shen
Han Sun
Yizhao Wang
Qixin Cao

Transparent objects are common in industrial automation and daily life. However, accurate visual perception of these objects remains challenging due to their reflective and refractive properties. Most previous studies fail to capture contextual information or typically rely on regression-based methods at the decoder stage, suffering from overfitting and unsatisfactory object details. To overcome these limitations, we present a novel depth completion framework for transparent objects with diffusion denoising approach (DCT-Diffusion). First, we adopt a transformer-based encoder to globally learn the depth relationships from different parts of the input by modeling long-distance dependencies. Then, we propose to introduce the diffusion model to generate refined depth maps from random depth distribution. Through iterative refinement, our model can progressively enhance depth map details and achieves fine-grained performance. Lastly, a conditioned fusion module is developed, which utilizes encoder features as visual conditions and fuses them with the denoising block at each step using augmented attention. Extensive comparative studies and cross-domain experiments prove that the DCT-Diffusion outperforms previous methods and significantly improves the robustness and generalization ability. Moreover, visualization results further illustrate that our method can generate depth maps with more complete geometry and clearer boundaries, achieving satisfactory results.

Details

ICLR Conference 2025 Conference Paper

DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation

Han Sun
Rui Gong
Ismail Nejjar
Olga Fink

Current unsupervised domain adaptation (UDA) methods for semantic segmentation typically assume identical class labels between the source and target domains. This assumption ignores the label-level domain gap, which is common in real-world scenarios, and limits their ability to identify finer-grained or novel categories without requiring extensive manual annotation. A promising direction to address this limitation lies in recent advancements in foundation models, which exhibit strong generalization abilities due to their rich prior knowledge. However, these models often struggle with domain-specific nuances and underrepresented fine-grained categories. To address these challenges, we introduce DynAlign, a two-stage framework that integrates UDA with foundation models to bridge both the image-level and label-level domain gaps. Our approach leverages prior semantic knowledge to align source categories with target categories that can be novel, more fine-grained, or named differently. (e.g., vehicle to car, truck, bus). Foundation models are then employed for precise segmentation and category reassignment. To further enhance accuracy, we propose a knowledge fusion approach that dynamically adapts to varying scene contexts. DynAlign generates accurate predictions in a new target label space without requiring any manual annotations, allowing seamless adaptation to new taxonomies through either model retraining or direct inference. Experiments on the GTA $\rightarrow$ IDD and GTA$\rightarrow$ Mapillary benchmarks validate the effectiveness of our approach, achieving a significant improvement over existing methods. Our code is publically available at https://github.com/hansunhayden/DynAlign.

Details

EAAI Journal 2024 Journal Article

Detection of fruit tree diseases in natural environments: A novel approach based on stereo camera and deep learning

Han Sun
Jinlin Xue
Yue Song
Peixiao Wang
Yu Wen
Tianyu Zhang

The occurrence of diseases in orchards has a significant impact on fruit yield and quality. Inspection devices equipped with cameras can effectively replace manual intervention in the process of orchard management by swiftly detecting diseases. However, the images captured by such devices often exhibit a wide vision field and contain a significant amount of extraneous information. This paper presented a method for detecting diseases in natural environments based on binocular cameras and deep learning techniques applied to fruit tree leaf images with a wide visual field. Firstly, the ZED2i binocular camera was utilized to capture image pairs from a long distance, simulating the visual field of an inspection device. These image pairs were then processed using the Unimatch stereo matching algorithm to obtain a disparity map and calculated the corresponding depth map. The depth information was used to create a mask, eliminating irrelevant background information from the images. Secondly, a lightweight disease detection (LDD) model was proposed based on the advanced YOLOv5 framework for detecting pear rust and plum perforation diseases. The backbone network consisted of shuffle channel block, inverted shuffle channel block, and convolutional block attention module, with only one detection head used in the classifier part. The final experiments evaluated the segmentation, model improvement, and disease spots detection performance. The results showed that the depth map obtained using Unimatch for stereo matching was more accurate than that obtained using the ZED software development kit. In ablation experiments, LDD achieved a mean average precision of 93. 0%, with a model size of only 3. 9 MB, outperforming the original YOLOv5-s model. Preprocessed images with depth information exhibited improved detection performance, achieved a F1 score of 93. 62%, which was a 10. 92% improvement over direct detection of the original images. Overall, the presented method successfully addresses the issue of background interference when detecting diseases of fruit tree leaf with a wide visual field, providing a technical basis for automated orchard inspection operations.

Details DOI

JBHI Journal 2023 Journal Article

An Effective Model for Predicting Phage-Host Interactions Via Graph Embedding Representation Learning With Multi-Head Attention Mechanism

Yue Wang
Han Sun
Haodong Wang
Dandan Li
Weizhong Zhao
Xingpeng Jiang
Xianjun Shen

In the treatment of bacterial infectious diseases, overuse of antibiotics may lead to not only bacterial resistance to antibiotics but also dysbiosis of beneficial bacteria which are essential for maintaining normal human life activities. Instead, phage therapy, which invades and lyses specific pathogenic bacteria without affecting beneficial bacteria, becomes more and more popular to treat bacterial infectious diseases. For the effective phage therapy, it requires to accurately predict potential phage-host interactions from heterogeneous information network consisting of bacteria and phages. Although many models have been proposed for predicting phage-host interactions, most methods fail to consider fully the sparsity and unconnectedness of phage-host heterogeneous information network, deriving the undesirable performance on phage-host interactions prediction. To address the challenge, we propose an effective model called GERMAN-PHI for predicting Phage-Host Interactions via Graph Embedding Representation learning with Multi-head Attention mechaNism. In GERMAN-PHI, the multi-head attention mechanism is utilized to learn representations of phages and hosts from multiple perspectives of phage-host associations, addressing the sparsity and unconnectedness in phage-host heterogeneous information network. More specifically, a module of GAT with talking-heads is employed to learn representations of phages and bacteria, on which neural induction matrix completion is conducted to reconstruct the phage-host association matrix. Results of comprehensive experiments demonstrate that GERMAN-PHI performs better than the state-of-the-art methods on phage-host interactions prediction. In addition, results of case study for two high-risk human pathogens show that GERMAN-PHI can predict validated phages with high accuracy, and some potential or new associated phages are provided as well.

Details DOI

IROS Conference 2023 Conference Paper

PanelPose: A 6D Pose Estimation of Highly-Variable Panel Object for Robotic Robust Cockpit Panel Inspection

Han Sun
Peiyuan Ni
Zhiqi Li
Yizhao Wang
Xiaoxiao Zhu
Qixin Cao

In robotic cockpit inspection scenarios, the 6D pose of highly-variable panel objects is necessary. However, the buttons with different states on the panel cause the variable texture and point cloud, which confuses the traditional invariable object pose estimation method. The bottleneck is the variable texture and point cloud. To address this issue, we propose a simple yet effective method denoted as PanelPose that leverages synthetic data and edge-line features. Specifically, we extract edge and line features of RGB images and fuse these feature maps as a multi-feature fusion map (MFF Map) to focus on the shape features of panel objects. Moreover, we design an effective keypoint selection algorithm considering the shape information of panel objects, which simplifies keypoint localization for precise pose estimation. Finally, the panel object pose is estimated via PNP/RANSAC, refined by the multi-state template (MST) and multi-scale ICP. We experimentally show that state-of-the-art 6D pose estimation methods alone are not sufficient to solve the cockpit panel inspection task but that our method significantly improves the performance. In cockpit inspection scenarios, the panel localization error is less than 3mm using our method. Code and data are available at https://github.com/sunhan1997/PaneIPose.

Details

NeurIPS Conference 2023 Conference Paper

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization

Hao Dong
Ismail Nejjar
Han Sun
Eleni Chatzi
Olga Fink

In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https: //github. com/donghao51/SimMMDG.

PDF Details

EAAI Journal 2019 Journal Article

Web image annotation based on Tri-relational Graph and semantic context analysis

Jing Zhang
Ti Tao
Yakun Mu
Han Sun
Dongdong Li
Zhe Wang

Web image annotation has became a hot research topic owing to massive image data and abundant semantic context. In this paper, we propose a Tri-relational Graph (TG) model for web image annotation, which comprises the image data graph, the region data graph and the label graph as subgraphs, and connects them by an additional tripartite graph induced from image segmentation results and label assignments. Through analyzing the global visual similarity between images, the visual similarity between regions, the semantic correlations between labels and the relationships between the three subgraphs by TG model, we perform multilevel Random Walk with Restart algorithm on TG to produce vertex-to-vertex relevance, including image-to-region, region-to-label and image-to-label relevances. Then semi-supervised learning is used to predict labels for unannotated image regions by inserting unlabeled images and their regions into TG. In addition, we also analyze the text context information of web image and achieve the semantic and proper nouns for the further label expansion through WordNet. Experiments on public web images datasets demonstrate that our proposed TG model and multilevel RWR algorithm can achieve good performance on image region annotation and outperform the similar image annotation methods. Moreover label expansion by web semantic context analysis can achieve more accurate and abundant annotation results.

Details DOI