Arrow Research search

Author name cluster

Shengyong Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

AAAI Conference 2025 Conference Paper

Can Students Beyond the Teacher? Distilling Knowledge from Teacher’s Bias

  • Jianhua Zhang
  • Yi Gao
  • Ruyu Liu
  • Xu Cheng
  • Houxiang Zhang
  • Shengyong Chen

Knowledge distillation (KD) is a model compression technique that transfers knowledge from a large teacher model to a smaller student model to enhance its performance. Existing methods often assume that the student model is inherently inferior to the teacher model. However, we identify that the fundamental issue affecting student performance is the bias transferred by the teacher. Current KD frameworks transmit both right and wrong knowledge, introducing bias that misleads the student model. To address this issue, we propose a novel strategy to rectify bias and greatly improve the student model's performance. Our strategy involves three steps: First, we differentiate knowledge and design a bias elimination method to filter out biases, retaining only the right knowledge for the student model to learn. Next, we propose a bias rectification method to rectify the teacher model's wrong predictions, fundamentally addressing bias interference. The student model learns from both the right knowledge and the rectified biases, greatly improving its prediction accuracy. Additionally, we introduce a dynamic learning approach with a loss function that updates weights dynamically, allowing the student model to quickly learn right knowledge-based easy tasks initially and tackle hard tasks corresponding to biases later, greatly enhancing the student model's learning efficiency. To the best of our knowledge, this is the first strategy enabling the student model to surpass the teacher model. Experiments demonstrate that our strategy, as a plug-and-play module, is versatile across various mainstream KD frameworks.

AAAI Conference 2025 Conference Paper

Dust-Mamba: An Efficient Dust Storm Detection Network with Multiple Data Sources

  • Cong Bai
  • Zhonghao Lin
  • Jinglin Zhang
  • Shengyong Chen

Accurate detection of dust storms is challenging due to complex meteorological interactions. With the development of deep learning, deep neural networks have been increasingly applied to dust storm detection, offering better learning and generalization capabilities compared to traditional physical modeling. However, existing methods face some limitations, leading to performance bottlenecks in dust storm detection. From the task perspective, existing research focuses on occurrence detection while neglecting intensity detection. From the data perspective, existing research fails to explore the utilization of multi-source data. From the model perspective, most models are built on convolutional neural networks, which have an inherent limitation in capturing long-range dependencies. To address the challenges mentioned, this study proposes Dust-Mamba. To the best of our knowledge, this study is the first attempt to accomplish both the occurrence and intensity detection of dust storms with advanced deep learning technology. In Dust-Mamba, multi-source data is introduced to provide a comprehensive perspective, Mamba and attention are applied to boost feature selection while maintaining long-range modeling capability. Additionally, this study proposes Structure Sharing Transfer Learning Strategies for intensity detection, which further enhances the performance of Dust-Mamba with minimal time cost. As shown by experiments, Dust-Mamba achieves Dice scores of 0.963 for occurrence detection and 0.560 for intensity detection, surpassing several baseline models. In conclusion, this study offers valuable baselines for dust storm detection, with significant reference value and promising application potential.

ICRA Conference 2025 Conference Paper

Multi-Scale Convolutional Networks with Class-Normalized Logit Clipping for Robust Sea State Estimation from Noisy Ship Motion Data

  • Xin Qin
  • Mengna Liu
  • Xu Cheng 0003
  • Xiufeng Liu 0001
  • Fan Shi 0001
  • Jianhua Zhang 0002
  • Shengyong Chen

Autonomous ships utilize automation systems to achieve unmanned navigation, driving innovation in maritime transportation. However, sea conditions, influenced by dynamic factors such as wave height, wind speed, and ocean currents, present a challenge in accurately assessing these conditions. Traditional classification models often assume accurate labels, but noisy labels are prevalent in real-world applications. Existing methods, such as noise sample filtering or loss function adjustment, have limited applicability and poor generalization when dealing with complex sea condition data. To address this issue, this study proposes an end-to-end neural network model. The model's feature extraction module uses deep representation learning to capture latent patterns in the data, and a loss function is designed to mitigate the impact of outliers. The integration of these components allows the model to perform accurate classification even in the presence of noisy labels. Extensive experiments on public and sea condition datasets validate the effectiveness of this approach, demonstrating that the model exhibits strong generalization capabilities and holds great promise for practical applications.

AAAI Conference 2024 Conference Paper

Intentional Evolutionary Learning for Untrimmed Videos with Long Tail Distribution

  • Yuxi Zhou
  • Xiujie Wang
  • Jianhua Zhang
  • Jiajia Wang
  • Jie Yu
  • Hao Zhou
  • Yi Gao
  • Shengyong Chen

Human intention understanding in untrimmed videos aims to watch a natural video and predict what the person’s intention is. Currently, exploration of predicting human intentions in untrimmed videos is far from enough. On the one hand, untrimmed videos with mixed actions and backgrounds have a significant long-tail distribution with concept drift characteristics. On the other hand, most methods can only perceive instantaneous intentions, but cannot determine the evolution of intentions. To solve the above challenges, we propose a loss based on Instance Confidence and Class Accuracy (ICCA), which aims to alleviate the prediction bias caused by the long-tail distribution with concept drift characteristics in video streams. In addition, we propose an intention-oriented evolutionary learning method to determine the intention evolution pattern (from what action to what action) and the time of evolution (when the action evolves). We conducted extensive experiments on two untrimmed video datasets (THUMOS14 and ActivityNET v1.3), and our method has achieved excellent results compared to SOTA methods. The code and supplementary materials are available at https://github.com/Jennifer123www/UntrimmedVideo.

JBHI Journal 2024 Journal Article

Prediction of LncRNA-Protein Interactions Based on Kernel Combinations and Graph Convolutional Networks

  • Cong Shen
  • Dongdong Mao
  • Jijun Tang
  • Zhijun Liao
  • Shengyong Chen

The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w. r. t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15. 5% positive samples has an AUC value of 0. 9714 and an AUPR value of 0. 9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0. 9907 and an AUPR value of 0. 9267.

ICRA Conference 2022 Conference Paper

HMD-former: a Transformer-based Human Mesh Deformer with Inter-layer Semantic Consistency

  • Siyu Zou
  • Sheng Liu 0002
  • Chaonan Li
  • Lu Yao
  • Shengyong Chen

We present a transformer-based network, Human Mesh Deformer (HMD-former), to tackle the problem of 3D human mesh reconstruction from a single RGB image. HMD-former applies a pre-trained CNN to extract image grid features and a transformer decoder to gradually warp the template 3D mesh to the deformed mesh. On each decoder layer, the fine-grained local information of grid features is well utilized using cross-attention by softly and content-dependently transforming the grid features to vertex embeddings. Auxiliary losses and proposed bi-directional mapping layers inherently ensure semantic consistency throughout the whole decoder, which free the network from learning unnecessary embedding transformation between layers. This further induces each layer of the decoder to focus on refining vertex embeddings and makes the whole network work in a progressively refining manner. Experiments on different public datasets Human3. 6M and 3DPW show better reconstruction accuracy and faster inference speed than previous state-of-the-art methods, demonstrating the effectiveness and generalizability of HMD-former. Code is publicly available at https://github.com/siyuzou/HMD-former.

ICRA Conference 2022 Conference Paper

PA-AWCNN: Two-stream Parallel Attention Adaptive Weight Network for RGB-D Action Recognition

  • Lu Yao
  • Sheng Liu 0002
  • Chaonan Li
  • Siyu Zou
  • Shengyong Chen
  • Diyi Guan

Due to overly relying on appearance information or adopting direct static feature fusion, most of the existing action recognition methods based on multi-modality have poor robustness and insufficient consideration of modality differences. To address these problems, we propose a two-stream adaptive weight integration network with a three-dimensional parallel attention module, PA-AWCNN. Firstly, a three-dimensional Parallel Attention (PA) module is proposed to effectively extract features of spatial, temporal and channel dimensions and reduce the cross-dimensional interference, to achieve better robustness. Secondly, a Common Feature-driven (CFD) feature integration module is proposed to dynamically integrate appearance and depth features with adaptive weights, utilizing modality differences to redeem the lack of each feature, thereby balance the influence of both. The proposed PA-AW CNN uses the representative integrated feature generated by attention enhancement and feature integration for action recognition; it can not only get higher recognition accuracy but also improve the performance of distinguishing similar actions. Experiments illustrate that the proposed method achieves com-parable performances to state-of-the-art methods and obtains the accuracy of 92. 76% and 95. 65% on NTU RGB+D Dataset and SBU Kinect Interaction Dataset, respectively. The code is publicly available at: https://github.com/Luu-Yao/PA-AWCNN.

TIST Journal 2021 Journal Article

Parallel Connected LSTM for Matrix Sequence Prediction with Elusive Correlations

  • Qi Zhao
  • Chuqiao Chen
  • Guangcan Liu
  • Qingshan Liu
  • Shengyong Chen

This article is about a challenging problem called matrix sequence prediction, which is motivated from the application of taxi order prediction. Remarkably, the problem differs greatly from previous sequence prediction tasks in the sense that the time-wise correlations are quite elusive; namely, distant entries could be strongly correlated and nearby entries are unnecessarily related. Such distinct specifics make prevalent convolution-recurrence-based methods inadequate to apply. To remedy this trouble, we propose a novel architecture called Parallel Connected LSTM (PcLSTM), which integrates two new mechanisms, Multi-channel Linearized Connection (McLC) and Adaptive Parallel Unit (APU), into the framework of LSTM. Benefiting from the strengths of McLC and APU, our PcLSTM is able to handle well both the elusive correlations within each timestamp and the temporal dependencies across different timestamps, achieving state-of-the-art performance in a set of experiments demonstrated on synthetic and real-world datasets.

IROS Conference 2020 Conference Paper

CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints

  • Jieying Shi
  • Ziheng Zhu
  • Jianhua Zhang 0002
  • Ruyu Liu
  • Zhenhua Wang 0003
  • Shengyong Chen
  • Honghai Liu 0001

In this paper, we present Calibration Recurrent Convolutional Neural Network (CalibRCNN) to infer a 6 degrees of freedom (DOF) rigid body transformation between 3D LiDAR and 2D camera. Different from the existing methods, our 3D-2D CalibRCNN not only uses the LSTM network to extract the temporal features between 3D point clouds and RGB images of consecutive frames, but also uses the geometric loss and photometric loss obtained by the interframe constraint to refine the calibration accuracy of the predicted transformation parameters. The CalibRCNN aims at inferring the correspondence between projected depth image and RGB image to learn the underlying geometry of 2D-3D calibration. Thus, the proposed calibration model achieves a good generalization ability to adapt to unknown initial calibration error ranges, and other 3D LiDAR and 2D camera pairs with different intrinsic parameters from the training dataset. Extensive experiments have demonstrated that our CalibRCNN can achieve state-of-the-art accuracy by comparison with other CNN based methods.

ICRA Conference 2019 Conference Paper

Modeling and Analysis of Motion Data from Dynamically Positioned Vessels for Sea State Estimation

  • Xu Cheng 0003
  • Guoyuan Li
  • Robert Skulstad
  • Shengyong Chen
  • Hans Petter Hildre
  • Houxiang Zhang

Developing a reliable model to identify the sea state is significant for the autonomous ship. This paper introduces a novel deep neural network model (SeaStateNet) to estimate the sea state based on the ship motion data from dynamically positioned vessels. The SeaStateNet mainly consists of three components: an Long-Short-Term Memory (LSTM) recurrent neural network to capture the long dependency in the ship motion data; a convolutional neural network (CNN) to extract time-invariant features; and a Fast Fourier Transform (FFT) block to extract frequency features. A feature fusion layer is designed to learn the degree affected by each component. The proposed model is applied directly to the raw time series data, without needing of any hand-engineered features. A sensitivity analysis (SA) method is applied to assess the influence of data preprocessing. Through benchmark test and experiment on ship motion dataset, SeaStateNet is verified effective for sea state estimation. The investigation on real-time test further shows the practicality of the proposed model.

IROS Conference 2019 Conference Paper

Robust High Accuracy Visual-Inertial-Laser SLAM System

  • Zengyuan Wang
  • Jianhua Zhang 0002
  • Shengyong Chen
  • Conger Yuan
  • Jingqian Zhang
  • Jianwei Zhang 0001

In recent years, many excellent works on visual-inertial SLAM and laser-based SLAM have been proposed. Although inertial measurement unit (IMU) significantly improve the motion estimate performance by reducing the impact of illumination variation or texture-less region on visual tracking, tracking failures occur when in such an environment for a long time. Similarly, when in structure-less environments, laser module will fail since lack of sufficient geometric features. Besides, motion estimation by moving lidar has the problem of distortion since range measurements are received continuously. To solve these problems, we propose a robust and high-accuracy visual-inertial-laser SLAM system. The system starts with a visual-inertial tightly-coupled method for motion estimation, followed by scan matching to further optimize the estimation and register point cloud on the map. Furthermore, we enable modules to be adjusted automatically and flexibly. That is, when one of these modules fails, the remaining modules will undertake the motion-tracking task. For further improving the accuracy, loop closure and proximity detection are implemented to eliminate drift accumulation. When loop or proximity is detected, we perform six degree-of-freedom (6-DOF) pose graph optimization to achieve the global consistency. The performance of our system is verified on public dataset, and the experimental results show that the proposed method achieves superior accuracy against other state-of-the-art algorithms.

ICRA Conference 2017 Conference Paper

Compressive tracking with locality sensitive histograms features

  • Sixian Chan 0001
  • Xiaolong Zhou 0001
  • Zhuo Zhang 0012
  • Shengyong Chen

Currently, Compressive Tracking (CT) method has drawn great attention because of its high efficiency. However, it cannot well deal with some appearance variations due to its limitations of feature expression and it only uses a fixed parameter to update the appearance model. In order to handle such matters, we propose an adaptive CT method that combines the predicted target position with CT based on Locality Sensitive Histograms (LSH) features. Our method significantly improves CT in four aspects. First, the efficient illumination invariant features extracted based on LSH are used to represent an effective appearance model that is robust to illumination changes. Second, the color attributes tracker is adopted to predict the target position for re-building the new weighted discriminant function which brings in the color information to make up for the inadequacy of Haar-like characteristics. Third, a new model update mechanism is proposed to preserve the stable features while avoid the noisy appearance variations during tracking. Fourth, a trajectory rectification method is employed to refine the tracking location when possible inaccurate tracking occurs. Finally, we show that our tracker achieves state-of-the-art performance in a comprehensive evaluation over 47 challenging color sequences.

IROS Conference 2012 Conference Paper

Constructing dynamic category hierarchies for novel visual category discovery

  • Jianhua Zhang 0002
  • Jianwei Zhang 0001
  • Shengyong Chen
  • Ying Hu 0001
  • Haojun Guan

Category hierarchies are commonly used to compactly represent large numbers of categories and reduce the complexity of the classification problem. In this paper we introduce a novel and extended application of category hierarchies which is a powerful novel framework developed to construct dynamic category hierarchies and automatically discover novel visual categories. The dynamic is a characteristic of category hierarchies which can facilitate an important cognitive ability, the discovering of novel categories. We develop a constrained hierarchical latent Dirichlet allocation to build accurate category hierarchies. We employ object attributes as features to describe objects, which can transfer knowledge across categories and can efficiently describe novel categories. By combining them in the novel framework, novel visual object categories can be efficiently discovered and described. Extensive experiments based on PASCAL VOC 2008 and the LabelMe image database show the satisfactory performance of the proposed framework.

IROS Conference 2011 Conference Paper

Integrate multi-modal cues for category-independent object detection and localization

  • Jianhua Zhang 0002
  • Junhao Xiao 0001
  • Jianwei Zhang 0001
  • Houxiang Zhang
  • Shengyong Chen

To detect and localize objects is an indispensable step for many computer vision tasks. Most of the state-of-the-art methods of object detection and localization are category-dependent. These methods can achieve a significant performance. However, they are useless for detecting and localizing objects belonging to an unknown category when applying them to an unknown environment. In this paper, a method is proposed for detecting and localizing generic objects without specifying their categories. The proposed method combines diverse cues, including multi-scale saliency, superpixels straddling, intensity, depth and global information, into a uniform Bayesian framework to obtain accurate detection and localization. By comparison to state-of-the-art methods, our experiments show the promising performance of the proposed method based on the PASCAL VOC 08 dataset and our indoor scene dataset.

ICRA Conference 2007 Conference Paper

Active Illumination for Robot Vision

  • Shengyong Chen
  • Jianwei Zhang 0001
  • Houxiang Zhang
  • Wanliang Wang
  • You-Fu Li 0001

A vision sensor is the robot's eye to perceive its environment, but the perception performance can be significantly affected by illumination conditions. This paper presents strategies of adaptive illumination control for robot vision to achieve the best scene interpretation. It investigates how to obtain the most comfortable illumination conditions for a vision sensor. In a "comfort" condition the image reflects the natural properties of the concerned object. "Discomfort" may occur if some scene information is lost. Strategies are proposed to optimize the pose and optical parameters of the luminaire and the sensor, with emphasis on controlling the intensity and avoiding glare.

ICRA Conference 2007 Conference Paper

Realtime Structured Light Vision with the Principle of Unique Color Codes

  • Shengyong Chen
  • You-Fu Li 0001
  • Jianwei Zhang 0001

To date, several successful structured light vision systems for accurate 3D measurement in machine vision have been set up. However, these are usually limited to scanning stationary objects or static environments since tens of images have to be captured for recovering one 3D scene, which results in the industry largely avoiding this technology. This paper presents a method of grid-pattern design based on the principles of uniquely color-encoded structured light, to improve the reconstruction efficiency for real-time processing. For a live scene, the 3D measurement is desired to only capture a single image. To realize this, an important problem for the color-encoded projection is the unique indexing of the color codes in the image. It is essential that each light grid be uniquely identified by incorporating the local neighborhoods in the pattern so that 3D reconstruction can be performed with only local analysis of a single image. This paper describes such a method in the design of the special grid patterns and its corresponding 3D reconstruction method for fast vision perception.

IROS Conference 2007 Conference Paper

Runtime reconfiguration of a modular mobile robot with serial and parallel mechanisms

  • Houxiang Zhang
  • Shengyong Chen
  • Wanliang Wang
  • Jianwei Zhang 0001
  • Guanghua Zong

This paper presents a novel field robot JL-I based on a reconfigurable concept for urban search and rescue applications. The robot consists of three identical modules; each module is an entire robotic system that can perform distributed activities. It features three-degrees-of-freedom (DOF) active joints actuated by serial and parallel mechanisms for changing shape and flexible docking mechanism. The docking mechanism enables adjacent modules to connect or disconnect flexibly and automatically. DOF analysis, working space analysis and the kinematics of the 3D active joint between connected modules are studied thoroughly. In the end a series of successful tests confirm the principles and the robot’s capabilities.

IROS Conference 2006 Conference Paper

A Focal Cue for Metric Measurement of 3D Surfaces

  • Shengyong Chen
  • You-Fu Li 0001
  • Jianwei Zhang 0001

This paper finds a method for computing the best-focused location from an image and using it as a dimensional cue for acquisition of a 3D scene surface. In some situations in 3D vision, an object cannot be reconstructed into a 3D model with metric dimensions. Rather, it can only be reconstructed into a 3D structure up to a similarity transformation. To upgrade the 3D model from a similarity transformation to a Euclidean transformation, we propose a method based on the best-focused locations. By analyzing the blur distribution in an image, this method finds the best-focused locations from an image, which provides an additional cue for upgrading the reconstructed 3D structure. Hence, we can obtain not only the object's shape, but also the dimensions and sizes of surface features

ICRA Conference 2004 Conference Paper

Active Viewpoint Planning for Model Construction

  • Shengyong Chen
  • You-Fu Li 0001

This paper presents a novel method of viewpoint planning for incrementally building the models of unknown objects or environments by an active vision system. The proposed method is based on the model of trend surface, which is the regional feature of a surface for describing the global tendency of change. A new mathematical model is developed for predicting the unknown area of the object surface. A unique surface model is established by analyzing the surface curvature. Furthermore, a criterion is defined to determine the exploration direction. The algorithm is developed for determining the next view pose, which satisfies the placement constraints such as resolution, focus, and field of view. Finally, implementation of the method is carried out to verify the proposed method.

ICRA Conference 2003 Conference Paper

Dynamically reconfigurable visual sensing for 3D perception

  • Shengyong Chen
  • You-Fu Li 0001

In many applications, a vision sensor often needs to move from one place to another and change its configuration for perception of different object features. A dynamic reconfigurable vision sensor is useful in such a case to gaze at the features. This paper introduces this concept and investigates the issues in self-recalibrating a 6-DOF structured light system under changing sensing configuration. The relative pose between the projector and camera of the system is calibrated by taking a single view of the scene, so that the 3D measurements and reconstruction can be performed immediately when and if the configuration of the system is changed. Experiments were carried out to demonstrate the implementation of the proposed method.

ICRA Conference 2002 Conference Paper

A Method of Automatic Sensor Placement for Robot Vision in Inspection Tasks

  • Shengyong Chen
  • You-Fu Li 0001

This paper presents an automatic sensor placement technique for robot vision in inspection tasks. In such vision systems, a sensor often needs to be moved from one pose to another around the object to sample all features of interest. Multiple 3D images are taken from different vantage points. The technique involves deciding the optimal sensor placements and a shortest path through these viewpoints for automatic generation of an inspection plan. A viewpoint is expressed by N parameters and a topology of viewpoints is achieved by genetic algorithm. The inspection plan is evaluated using a min-max criterion and the shortest path is determined by Christofides algorithm. In addition, a computation example is presented to illustrate the techniques and algorithms.

ICRA Conference 2002 Conference Paper

Self Recalibration of a Structured Light Vision System from a Single View

  • Shengyong Chen
  • You-Fu Li 0001

Structured-light system is widely used for reconstructing 3D objects in machine vision. One of the major tasks in establishing such a system is the laborious and tedious calibration of the sensors. This paper presents a new method which dynamically calibrates the system automatically, if and when the relative pose between the camera and the projector is changed. A distinct advantage of this method is that neither the design of a calibration pattern/device nor the pre-knowledge of the movement of camera or scene is required. Several important cues for self-recalibration, including geometrical cue and focus cue, are explored in this paper Finally, some experimental observations are presented to illustrate the implementation of this new method.