Arrow Research search

Author name cluster

Yu Ren

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

EAAI Journal 2025 Journal Article

A lightweight model based on multi-scale feature fusion for ultrasonic welding surface defect detection

  • Rui Liu
  • Lun Zhao
  • Yu Ren
  • Zhonghua Shen
  • Liya Li
  • Jianfeng Luo
  • Zeshan Abbas

Ultrasonic welding technology is crucial in industrial and medical fields, relying on precise surface defect detection for quality assurance. Traditional methods suffer from low accuracy, efficiency, high costs, and complex implementation. Additionally, current neural networks for ultrasonic surface defect detection struggle to balance parameter optimization with detection accuracy. To solve this problem, we proposed a lightweight model based on multi-scale feature fusion for the Ultrasonic Weld Surface Defect Detection Network (UWSDNet). First, the feature extraction module with reparameterization technology (FRT) and application of efficient multi-scale attention (EMA) are proposed to alleviate network redundant parameters and computational overhead brought by welding background. Secondly, the multi-core feature enhancement module (MCM) is introduced. It enhances multi-scale object detection with fewer parameters to cope with the actual edge deployment of ultrasonic welding. Finally, the lightweight asymmetric detection head (LADH) and contextual and spatial feature calibration network (CSFCN) are introduced into the network. To improve the multi-core dimensional feature capture capability, to solve the problem of large size span of ultrasonic welding surface defects. Experimental evaluations on a self-built ultrasonic welding wire harness defect dataset show that UWSDNet achieves the mean average precision (mAP) of 88. 9%, the precision of 95. 6% with parameters of 12. 7M. In addition, UWSDNet achieves excellent performance on the publicly available NEU-DET dataset, demonstrating strong generalization and application potential in industrial defect detection.

EAAI Journal 2025 Journal Article

A lightweight vision transformer with embedded hybrid attention for quick response code defect classification

  • Dianlu Hu
  • Lun Zhao
  • Yu Ren
  • Sen Wang
  • Xuanlin Ye
  • Haohan Zhang
  • Changqing Peng

Quick Response (QR) code label printing quality is crucial to product control. Due to the limited number of defect samples, unclear features, and the need to detect a large number of labels in real time, automated visual inspection faces challenges. For efficient and accurate automated visual defect recognition of printed QR code production, we propose a lightweight Vision Transformer network, Vision Transformer with Embedded Hybrid Attention (ViT-EHA). First, the Mixed Depthwise Convolution Block (MDConvBlock) is introduced to capture QR code defect details and feature information. This method additionally reduces the number of model parameters and computational costs. Furthermore, the LeAttention-Local Convolution-Multilayer Perceptron (LeALCM) module is proposed to enhance the ability to capture global information of the model and improve the effect of minor defect recognition. Ultimately, a hybrid attention (HA) module has been integrated to enhance the processing of low-level image features and to strengthen the interplay between shallow and deep features. To verify the validity and generalization of the model, the experimental results show that the proposed ViT-EHA method achieved an accuracy of 99. 00% and a parameter count of 4. 198 million (M) on the self-constructed dataset Code-10 (QR Code Dataset with 10 Classes), and the accuracy reached 98. 33% and 97. 73% on the public datasets NEU-CLS (Northeastern University Classification Dataset) and NEU-CLS-64 (Northeastern University Classification Dataset with 64 × 64 images), respectively.

IROS Conference 2025 Conference Paper

Learning Generalizable 3D Manipulation With 10 Demonstrations

  • Yu Ren
  • Yang Cong
  • Bohao Huang
  • Jiahao Long
  • Ronghan Chen
  • Hongbo Li
  • Huijie Fan

Learning robust and generalizable manipulation skills from few demonstrations remains a key challenge in robotics, with broad applications in industrial automation and service robotics. Although recent imitation learning methods have achieved impressive results, they often require a large amount of demonstration data and struggle to generalize across different spatial variants. In this work, we propose a framework that learns 3D manipulation policies from only 10 demonstrations while achieving robust generalization to unseen spatial configurations through semantic-guided perception and spatial-equivariant policy learning. Our framework consists of two key modules: a Semantic Guided Perception module that extracts task-aware 3D representations from RGB-D inputs using semantic priors and a Spatial Generalized Decision module implementing a diffusion-based policy that preserves spatial equivariance through denoising. Central to our framework is a spatially equivariant training strategy, which adapts 2D data augmentation principles to 3D manipulation by maintaining gripper-object spatial relationships during trajectory augmentation. We validate our framework through extensive experiments on both simulation benchmarks and real-world robotic systems. Our method demonstrates a significant improvement in success rates over state-of-the-art approaches on a series of challenging tasks, particularly under significant object pose variations. This work shows significant potential to advance efficient and generalizable manipulation skill learning in real-world applications.

EAAI Journal 2024 Journal Article

A conditional generative model for end-to-end stress field prediction of composite bolted joints

  • Yong Zhao
  • Yuming Liu
  • Qingyuan Lin
  • Wei Pan
  • Wencai Yu
  • Yu Ren
  • Sheng Liu

Carbon Fiber Reinforced Polymer (CFRP) laminates, prized for their lightweight and high stiffness, are extensively used in aerospace and maritime applications. Bolted joints play a crucial role in connecting these laminates. However, manufacturing variations arise during the assembly process, impacting performance due to material-related factors. Predicting the assembly stress fields of Carbon Fiber Reinforced Polymer bolted joints is of great significance in design optimization, manufacturing process control, and structural health monitoring. The currently prevalent finite element analysis methods incur extremely high computational costs, failing to meet the requirements for real-time prediction of the assembly and multiparametric design of composite bolted joints. Proposing a methodological framework for rapidly predicting the assembly physical field is necessary. This paper introduces a stress prediction framework to enhance analysis and aid material parameter design. The framework is inspired by image processing and artificial intelligence drawing by analogizing the computed physical field results to the generated images. Therefore, the Bolted Tightening Generative Adversarial Network (BT-GAN), a cascaded generative model, is proposed in this paper to predict stress fields of the composite bolted joints during assembly. The model starts with data augmentation of the stress filed results from the finite element analysis in a super-resolution network, which realizes an integral interpolation mapping from coarse-grid to fine-grid results. Then, the results of the data enhancement are fed into the subsequent conditional generative adversarial network for learning. Similar to the text-guided image generation approach, the network learns to understand the physical mapping relationships between different parameters and assembly stress fields. Moreover, the network achieves higher accuracy in stress field prediction by extraction the understanding of multi-scale features through the skip connection and the attention mechanism. This method effectively learns the physical mapping relationship between multiple parameters and the stress field, applying a graph generation approach to end-to-end predictions of the field. Compared to the results of finite element analysis from the coarse-grid, the Structure Similarity Index Measure (SSIM) of the cascaded generative network proposed in this paper has been improved from 0. 584 to 0. 962 and the Peak Signal-to-Noise Ratio (PSNR) metric has been increased from 17. 3 dB to 58. 2 dB. What's more, the mean relative error on the maximum values of the stress field has reached 6. 9%. The trained model takes only 6. 1s to complete a single prediction, significantly improving the prediction efficiency compared with finite element analysis. It is compared with the other networks commonly used for physical field prediction and shows improvement in the metrics proposed in the article. By constructing such an end-to-end stress field prediction framework during assembly, efficient forecasting for the assembly of composite bolted joints can be achieved. This is advantageous for the digital twin modeling of the assembly lines and the effective control of assembly quality, providing a powerful tool for assembly design and analysis.

ICRA Conference 2024 Conference Paper

Marrying NeRF with Feature Matching for One-step Pose Estimation

  • Ronghan Chen
  • Yang Cong
  • Yu Ren

Given the image collection of an object, we aim at building a real-time image-based pose estimation method, which requires neither its CAD model nor hours of object-specific training. Recent NeRF-based methods provide a promising solution by directly optimizing the pose from pixel loss between rendered and target images. However, during inference, they require long converging time, and suffer from local minima, making them impractical for real-time robot applications. We aim at solving this problem by marrying image matching with NeRF. With 2D matches and depth rendered by NeRF, we directly solve the pose in one step by building 2D-3D correspondences between target and initial view, thus allowing for real-time prediction. Moreover, to improve the accuracy of 2D-3D correspondences, we propose a 3D consistent point mining strategy, which effectively discards unfaithful points reconstruted by NeRF. Moreover, current NeRF-based methods naively optimizing pixel loss fail at occluded images. Thus, we further propose a 2D matches based sampling strategy to preclude the occluded area. Experimental results on representative datasets prove that our method outperforms state-of-the-art methods, and improves inference efficiency by 90×, achieving real-time prediction at 6 FPS.

EAAI Journal 2024 Journal Article

Transformer-based dual-view X-ray security inspection image analysis

  • Xianglong Meng
  • Hao Feng
  • Yu Ren
  • Haigang Zhang
  • Weidong Zou
  • Xinyu Ouyang

Artificial intelligence technology is rapidly advancing and has been widely applied in the field of intelligent security inspection. Utilizing computer vision technology to detect prohibited items in X-ray images has drawn much attention. Due to the transmission effect of X-rays, single-view security inspection images are prone to object occlusion and poor imaging angles, which seriously affects the performance of object detection models. Dual-view security inspection equipment can simultaneously capture X-ray transmission images of the item under inspection from both horizontal and vertical angles, which can effectively address issues of poor imaging angles and object occlusions that single-view imaging cannot resolve. In this paper, we introduced the artificial intelligence technology in dual-view security inspection image analysis, and proposed the dual-view feature fusion and prohibited item detection model in X-ray security inspection images based on the Vision Transformer framework. The detection model contains two input channels: the main and secondary channel. The main function of the main channel is to detect prohibited items in security inspection images, while the secondary channel is dedicated to providing effective feature information of prohibited items for the main channel. Two feature interaction modules are applied in the proposed model to realize dual channel information exchange and supplement from local and global perspectives respectively. Simulation results based on the public Dualray dataset have demonstrated the state-of-the-art performance of the proposed dual-view X-ray image detection model. Code is available at https: //github. com/zhg-SZPT/Trans2Ray.

EAAI Journal 2013 Journal Article

Superpixel-wise semi-supervised structural sparse coding classifier for image segmentation

  • Shuyuan Yang
  • Yuan Lv
  • Yu Ren
  • Licheng Jiao

Sparse coding based classifier (SCC) proves to lead to the state-of-the-art result in pattern recognition. Compared with traditional generative models and discriminative models, it neither casts some assumption on the distribution of data, nor learns a hyperplane to separate samples. However, SCC is characteristic of slow prediction because an l 0 -norm minimization need to be solved to assign the label for each sample. In this paper, we propose a Superpixel-wise Structural Sparse Coding based Classifier (S3CC) for image segmentation. An unsupervised superpixel segmentation is first used to derive the initial labeled samples, and SCC is extended to the semi-supervised pattern where unlabeled samples are incrementally labeled and taken as the dictionary to improve the classification accuracy. Moreover, a neighborhood spatial constraint is cast on the prediction of pixel labels, to avoid the speckle-like mis-segmentation of images. Some experiments are taken on some artificial texture images, to investigate the segmentation result of our proposed S3CC. Some aspects including (1) Comparison of S3CC with SCC, (2) Comparisons of S3CC with and without spatial constraint, (3) Comparison of S3CC with semi-supervised S3CC, are tested, and the results prove the efficiency and superiority of S3CC to its counterparts.