Author name cluster

Chao Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

EAAI Journal 2026 Journal Article

Asynchronous multithreading reinforcement learning with attention-based significance measurement for collision-free robot navigation

Chao Sun
Jiang Wang
Xing Wu
Chaoxu Mu
Changyin Sun

Collision avoidance is one crucial technique to achieve safe and efficient robotic vehicle navigation in unknown environments. However, moving obstacles with unpredictability in dynamic scenarios, usually increase the difficulty and complexity in collision avoidance of robotic vehicles. To enhance the stability of collision avoidance and boost its adaptability to uncertain dynamic scenes, a new attention-based significance measurement actor–critic (ASMAC) architecture is proposed. It is an end-to-end robot navigation model that uses imperfect local observation to directly plan precise collision-free motion commands. Firstly, a significance-measured rollout replaybuffer (SMRR) is presented to categorize the experiences into different pools. It can prevent any overfitting or bias that may result from repeatedly sampling experience of a certain type during policy learning. Then, we enhance the traditional actor–critic network by integrating a multi-head local attention module to extract the local information at entity level. This way, the collision avoidance system can focus on key environmental features to compute more lightweight and respond more swiftly to dynamic changes in environment. Besides, a multi-step lookahead prediction (MLP) reward function is designed in the ASMAC-based reinforcement learning (RL) framework to prevent the generation of unnatural, intrusive, and short-sighted motion decisions. Finally, the asynchronous multithreading (AM) mechanism and proximal policy optimization (PPO) algorithm are extended to ASMAC model to offload the expensive online computation to an offline training process, enhancing the exploration efficiency in navigation policy learning of robotic vehicles. Extensive simulation and real-world physical experiments show that our method can generate time-efficient and collision-free guide paths in complex dynamic scenes, to successfully dodge collisions while moving towards the goal.

Details DOI

EAAI Journal 2026 Journal Article

Dynamic estimation of mean skin temperature during physical exercises using an explainable attention-based neural network: A data-driven alternative to segmental weighting methods

Qing Zhang
Hetian Feng
Li Ding
Tian Liu
Chao Sun
Jing Zhang
Jiachen Nie

Background Mean skin temperature (MST) is a crucial physiological parameter of the human body. Traditionally, MST has been estimated using fixed-weight formulas based on body surface area proportions. However, these static methods neglect physiological changes during exercise and may not generalize well across activity types. Methods We developed ThermoAttention Network (TANet), a deep learning model combining long short-term memory networks with an attention mechanism. The model processes local skin temperature and dynamically assigns weights to eight body segments across resting, weighted hiking, and heavy lifting conditions via a proxy classification task. The attention outputs indicate each segment's contribution to MST estimation, enhancing transparency and user trust. Results TANet achieved 94. 2% classification accuracy on the proxy task. Different exercise conditions had significant effects on local skin temperature. The attention weights (mean values across windows and subjects) revealed physiologically consistent patterns: the hand dominated MST estimation at rest (21. 1%); the upper arm (17. 0%) and forearm (11. 2%) gained importance during hiking; and the chest (15. 3%) and upper arm (15. 6%) during lifting. Conclusions TANet overcomes the limitations of traditional MST formulas by adaptively assigning segment weights, enabling accurate and interpretable assessment of MST during physical activity. This framework advances explainable artificial intelligence (AI) for thermal health monitoring and supports human-centered design of wearable systems.

Details DOI

JBHI Journal 2026 Journal Article

MsGA: Gestational Age Estimation with Multi-plane Unified Measurements Driven by Anatomic Segmentation

Mingjun Huang
Junbo Zhang
Wei Hu
Chao Sun
Xiantao Cai
Bo Du

An accurate estimation of gestational age is critical for prenatal care and clinical decision-making. Existing ultrasound-based gestational age estimation methods are limited by the insufficient information representation capacity of conventional medical segmentation models, noise interference in ultrasound images, and inter-observer variability in traditional geometry-based measurement methods. To address these challenges, we propose the MsGA model to estimate gestational age with multi-plane unified measurements driven by anatomic segmentation. In the anatomic segmentation stage, a lightweight and high-performance LGF-UNet module is proposed, which utilizes the Deep Patch Embedding module to expand the receptive field, the Local-Global Fusion Transformer block to enhance local-global feature fusion, and the Focusing Attention Bottleneck module to suppress ultrasound noise via an adaptive threshold. In the measurement stage, a Point Regression module is introduced to refine biometric landmark localization. Furthermore, we create a fully annotated ultrasound plane dataset for the estimation of gestational age across various gestational stages. Extensive experiments on the dataset have demonstrated the effectiveness of the whole model and each module. Our MsGA model is superior to existing models with fewer parameters and achieves state-of-the-art performance on the Gestational Age Estimation task.

Details DOI

AAAI Conference 2026 Conference Paper

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination

Junzhe Chen
TIANSHU ZHANG
Shiyu Huang
Yuwei Niu
Chao Sun
Rongzhou Zhang
Guanyu Zhou
Lijie Wen

Recently, Omni-modal large language models (OLLMs) have sparked a new wave of research, achieving impressive results in tasks such as audio-video understanding and real-time environment perception. However, hallucination issues still persist. Similar to the bimodal setting, the priors from the text modality tend to dominate, leading OLLMs to rely more heavily on textual cues while neglecting visual and audio information. In addition, fully multimodal scenarios introduce new challenges. Most existing models align visual or auditory modalities with text independently during training, while ignoring the intrinsic correlations between video and its corresponding audio. This oversight results in hallucinations when reasoning requires interpreting hidden audio cues embedded in video content. To address these challenges, we propose OmniDPO, a preference-alignment framework designed to mitigate hallucinations in OLLMs. Specifically, OmniDPO incorporates two strategies: (1) constructing text-preference sample pairs to enhance the model’s understanding of audio-video interactions; and (2) constructing multimodal-preference sample pairs to strengthen the model’s attention to visual and auditory information. By tackling both challenges, OmniDPO effectively improves multimodal grounding and reduces hallucination. Experiments conducted on two OLLMs demonstrate that OmniDPO not only effectively mitigates multimodal hallucinations but also significantly enhances the models' reasoning capabilities across modalities.

PDF Details DOI

EAAI Journal 2025 Journal Article

A lightweight method for identifying safflower picking points based on the integration of object detection and instance segmentation networks

He Zhang
Chao Sun
Yun Ge
Baojian Ma
Hao Xia

Challenged by environmental complexities such as branch obstructions, target density variations, and hardware limitations, safflower detection and localization algorithms require innovative solutions. This study proposes a novel selective harvesting framework that integrates model structure optimization, channel pruning, and knowledge distillation, along with a picking point recognition strategy based on the fusion of detection and segmentation information, achieving lightweight and high-precision picking point recognition. The methodology features three key innovations: (1) An improved feature extractor that enhances sensitivity to safflower morphology in cluttered environments, and when integrated with a multi-scale fusion network, it further improves small-target detection; (2) Channel pruning combined with knowledge distillation achieves a 94. 3 % reduction in parameters while recovering accuracy losses; (3) Cross-modal fusion of detection and segmentation outputs enables accurate picking point localization. Experiments demonstrate significant improvements over the original You Only Look Once (YOLO) model: the mean average precision (mAP) in detection and segmentation tasks reach 93. 9 % and 92. 9 %, respectively, while retaining merely 5. 6 % of original parameters and 7. 0 % of baseline model size. The picking point identification method achieves an accuracy of 96. 7 % in identifying flowering and decaying safflowers. Ultimately, the model is successfully deployed on embedded computing devices, providing a reliable theoretical basis and technical support for automated harvesting of safflower.

Details DOI

EAAI Journal 2025 Journal Article

A self-attention enhanced generative adversarial autoencoder for anomaly detection of self-discharge of lithium-ion batteries

Zhengyu Liu
Chuanqing Wu
Tong Wu
Chao Sun
Yining Liu

Self-discharge is a crucial parameter affecting the reliability and lifespan of lithium-ion batteries (LIBs). However, traditional methods for detecting self-discharge rely heavily on time-consuming experimental testing. To address this limitation, we propose a novel model, self-attention enhanced generative adversarial autoencoder (SAE-GAAE), for rapid and accurate LIB self-discharge anomaly detection. SAE-GAAE integrates recent advances in artificial intelligence into battery production scenarios by combining a dot-product self-attention mechanism within the encoder-which captures inter-feature dependencies and highlights key indicators-with a generative adversarial component in the latent space, which enhances generalization and robustness by regularizing feature representations. This end-to-end deep learning framework enables automatic extraction of informative representations from raw input data without relying on manual feature engineering. Moreover, a tree-structured Parzen estimator (TPE)-based Bayesian optimization algorithm is employed to efficiently fine-tune model hyperparameters, improving detection performance. Applied to the capacity grading stage in LIB production, the model uses 40 features – including voltage, current, capacity, and temperature – extracted from charge–discharge curves. Experimental evaluation on real-world production data demonstrates that SAE-GAAE achieves a 26. 27% improvement in the average area under the receiver operating characteristic curve (AUC-ROC) in four models based on the autoencoder, with a detection accuracy of 99. 05% and a recall rate of 100%. These results highlight the model’s practical value in enhancing battery screening efficiency while reducing reliance on long-duration standing tests.

Details DOI

EAAI Journal 2025 Journal Article

Complex system modeling using deviation-smoothing belief rule base with training and optimization

Chao Sun
Jiahao Mai
Wei He
Hailong Zhu
Qi Liu

In the operation of complex systems, damage caused by aging and other factors can significantly impact their safety. Consequently, rational modeling approaches are essential for ensuring the stable and reliable operation of such systems. Due to their intricate internal structures and the influence of external environments, models of complex systems confront numerous uncertainties. The belief rule base (BRB), as an effective artificial intelligence (AI) tool for knowledge representation and reasoning, is capable of handling uncertain information through data-driven and knowledge-driven approaches, making it suitable for complex system modeling. However, existing BRB modeling approaches consider only a single interval when calculating rule matching degrees, which can lead to the issue of zero activation of rules; additionally, some rules may lack relevance and rationality in actual systems, thereby increasing model complexity and affecting performance due to redundancy. Therefore, a complex system modeling system based on deviation-smoothing belief rule base (DS-BRB) has been proposed. Firstly, this approach employs a matching degree calculation approach that integrates bias computation with smoothing techniques, enabling effective rule activation over a broader range of intervals. Subsequently, a specialized rule reduction approach designed for rule-level analysis is utilized, iteratively reducing rules based on their weights. Finally, the proposed methodology is evaluated using pipeline leakage detection and various public benchmark datasets. Finally, experiments on pipeline leakage detection and benchmark datasets demonstrate that DS-BRB resolves zero activation, achieves higher modeling accuracy with a reduced rule base, and highlights BRB's potential as an interpretable AI tool for complex system modeling.

Details DOI

NeurIPS Conference 2025 Conference Paper

Conditional Representation Learning for Customized Tasks

Honglin Liu
Chao Sun
Peng Hu
Yunfan Li
Xi Peng

Conventional representation learning methods learn a universal representation that primarily captures dominant semantics, which may not always align with customized downstream tasks. For instance, in animal habitat analysis, researchers prioritize scene-related features, whereas universal embeddings emphasize categorical semantics, leading to suboptimal results. As a solution, existing approaches resort to supervised fine-tuning, which however incurs high computational and annotation costs. In this paper, we propose Conditional Representation Learning (CRL), aiming to extract representations tailored to arbitrary user-specified criteria. Specifically, we reveal that the semantics of a space are determined by its basis, thereby enabling a set of descriptive words to approximate the basis for a customized feature space. Building upon this insight, given a user-specified criterion, CRL first employs a large language model (LLM) to generate descriptive texts to construct the semantic basis, then projects the image representation into this conditional feature space leveraging a vision-language model (VLM). The conditional representation better captures semantics for the specific criterion, which could be utilized for multiple customized tasks. Extensive experiments on classification and retrieval tasks demonstrate the superiority and generality of the proposed CRL. The code is available at https: //github. com/XLearning-SCU/2025-NeurIPS-CRL.

PDF Details

EAAI Journal 2025 Journal Article

Dynamic neighborhood point cloud registration and optimized path planning for manipulator grasping in complex environments

Chao Sun
Zhi‐Ming Hu
Lin‐Yi Peng
Cheng‐Ji Qin
Chao Zhang
Jia‐Hang Liang
Jian‐Jun Ding

In industrial automation, real-time object recognition and grasping by manipulators are critical. Although deep learning has excelled in image-based tasks, it often demands significant computational resources, large annotated datasets, and suffers from limited interpretability. Conversely, traditional methods can outperform under constrained data or hardware conditions. Conventional point cloud template registration is computationally intensive and vulnerable to noise and occlusion. To address these issues, we introduce a refined Intrinsic Shape Signatures (ISS) feature extraction scheme that employs a multi-scale dynamic neighborhood search to adaptively adjust local geometric constraints around keypoints. This mechanism analyzes point density in real time across varying radii and iteratively updates neighborhood size to sustain robustness in complex scenes. Experiments show that, compared to the standard Iterative Closest Point (ICP) and an ISS + ICP baseline, our method increases registration accuracy by 6. 25 % and 3. 34 %, while accelerating processing speed by 21. 35 % and 10. 18 %, respectively. Furthermore, to overcome the poor convergence and limited obstacle avoidance of Rapidly-Exploring Random Tree Connect (RRT-Connect) in high-dimensional spaces, we propose a composite-optimized RRT-Connect approach. It first generates multiple sampling trees via node resampling, then biases sampling toward the goal to improve path guidance, and finally applies gradient-based pruning to remove redundant nodes and smooth trajectories. Simulation results demonstrate a 28. 90 % improvement in planning efficiency and a 29. 16 % reduction in path length. Tests on a pick-and-place platform confirm that our optimized algorithms consistently outperform existing methods in both simple and complex scenarios, validating their effectiveness and robustness.

Details DOI

EAAI Journal 2025 Journal Article

Nonlinear time-varying system response modeling via a real-time updated Runge-Kutta physics-informed neural network

Huaguan Li
Chao Sun

Accurate structural response modeling advances the understanding of complex dynamic systems and enables effective structural design, control and monitoring. Due to damage, engineering structures will exhibit nonlinear and time-varying characteristics, which makes the structural response highly complex and challenging to be accurately modeled by traditional methods. This study proposes a Runge-Kutta-based real-time updated physics-informed neural network (RTU-PINN) to model structural responses of complex dynamic systems. The Recurrent Neural Network (RNN) is enhanced with Runge-Kutta method and neural Ordinary Differential Equations (neural ODEs) to model the system responses. The unknown structural parameters and nonlinear restoring force can be identified from the neural network. To capture the nonlinear and time-varying characteristics caused by structural damage or hysteretic behaviors, a Real-Time Updating (RTU) strategy is proposed to update the time-varying parameters and nonlinear restoring force with a sliding time window to minimize the discrepancy between predicted result and testing data. In addition, the proposed method can be applied to high-dimensional time-varying structures. Performance of the proposed method is examined via numerical and laboratory case studies. It is found that the RTU-PINN models the dynamic response with Relative Root Mean Square Error (RRMSE) values less than 0. 001 in the target nonlinear time-varying dynamic system. Research results show that the proposed RTU-PINN method can accurately model the complex dynamic responses of the numerical and experimental structures with time-varying and nonlinear characteristics. The proposed method has the potential to address modeling uncertainties/errors and is applicable for system identification of complex systems with nonlinear and time-varying features.

Details DOI

EAAI Journal 2025 Journal Article

The multi-module joint modeling approach: Predicting urban crowd flow by integrating spatial–temporal patterns and dynamic periodic relationship

Zain Ul Abideen
Xiaodong Sun
Chao Sun

Accurately predicting traffic flow is crucial for forecasting the movement of people in specific regions, a core task for smart cities. Traffic flow data exhibit cyclic periodicity, with recurring patterns at regular intervals, particularly weekly. This study addresses deficiencies in periodicity modeling within crowd-flow data by proposing the Multi-Module Spatial–Temporal based Nested Encoder (MMSTNE). Unlike current methodologies, MMSTNE employs parallel periodic learning to predict traffic flow, simultaneously modeling cyclic variations between historical and future time frames. Direct prediction of dynamic traffic flow is challenging; thus, our approach focuses on learning stationary deviations to enhance model robustness. Using a highway approach within a nested framework, we establish parallel connections, incorporating Nested Long–Short-term Memory (NLSTM). These connections, linked with corresponding weekly observations, improve multi-step ahead predictions’ accuracy. Comprehensive experiments on two real-world datasets demonstrate our model’s superior accuracy compared to state-of-the-art approaches. On average, MMSTNE improved prediction accuracy by 20% over the best baseline model. This significant improvement highlights MMSTNE’s potential for dynamic urban traffic flow prediction, offering a robust solution for addressing the inherent challenges in periodicity modeling and forecasting in smart city applications.

Details DOI

EAAI Journal 2024 Journal Article

A data-driven approach to full-field stress reconstruction of ship hull structure using deep learning

Chao Sun
Zhen Chen
Junan Yi
Dongyang Li

Reconstructing full-field stress distribution has extensive engineering applications in design optimization and structure health monitoring. This article develops a data-driven approach for efficient and accurate reconstruction of stress fields in ship hull structure. The method integrates numerical simulation with conditional generative adversarial network (cGAN) to infer full-field responses based on stress values of limited monitoring points. The network architecture, which consists of a generator and a discriminator, is optimized through adversarial training. Based on the training database derived from finite element analysis (FEA), full-field von Mises stress distribution of the inner bottom plate of an oil tanker is reconstructed. The discrete stresses obtained from FEA are utilized as input to simulate on-board sensor data used in cGAN-based model. According to the stresses comparison between cGAN-based model and finite element model, it is observed that a sparse arrangement of monitoring points enables accurate reconstruction of the full-field stresses. It is hence shown that this method provides a potential alternative for field monitoring.

Details DOI

ICML Conference 2023 Conference Paper

MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior

Jennifer J. Sun
Markus Marks
Andrew Wesley Ulmer
Dipam Chakraborty
Brian Geuther
Edward Hayes
Heng Jia
Vivek Kumar

We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. This dataset is collected from a variety of biology experiments, and includes triplets of interacting mice (4. 7 million frames video+pose tracking data, 10 million frames pose only), symbiotic beetle-ant interactions (10 million frames video data), and groups of interacting flies (4. 4 million frames of pose tracking data). Accompanying these data, we introduce a panel of real-life downstream analysis tasks to assess the quality of learned representations by evaluating how well they preserve information about the experimental conditions (e. g. strain, time of day, optogenetic stimulation) and animal behavior. We test multiple state-of-the-art self-supervised video and trajectory representation learning methods to demonstrate the use of our benchmark, revealing that methods developed using human action datasets do not fully translate to animal datasets. We hope that our benchmark and dataset encourage a broader exploration of behavior representation learning methods across species and settings.

Details

TIST Journal 2015 Journal Article

Latent Support Vector Machine Modeling for Sign Language Recognition with Kinect

Chao Sun
Tianzhu Zhang
Changsheng Xu

Vision-based sign language recognition has attracted more and more interest from researchers in the computer vision field. In this article, we propose a novel algorithm to model and recognize sign language performed in front of a Microsoft Kinect sensor. Under the assumption that some frames are expected to be both discriminative and representative in a sign language video, we first assign a binary latent variable to each frame in training videos for indicating its discriminative capability, then develop a latent support vector machine model to classify the signs, as well as localize the discriminative and representative frames in each video. In addition, we utilize the depth map together with the color image captured by the Kinect sensor to obtain a more effective and accurate feature to enhance the recognition accuracy. To evaluate our approach, we conducted experiments on both word-level sign language and sentence-level sign language. An American Sign Language dataset including approximately 2,000 word-level sign language phrases and 2,000 sentence-level sign language phrases was collected using the Kinect sensor, and each phrase contains color, depth, and skeleton information. Experiments on our dataset demonstrate the effectiveness of the proposed method for sign language recognition.

Details DOI

YNIMG Journal 2008 Journal Article

Global evaluation of contributions of GABA A, AMPA and NMDA receptors to orientation maps in cat's visual cortex

Hongbo Yu
Xin Chen
Chao Sun
Tiande Shou

Orientation selectivity is a fundamental property of neurons in the visual cortex for form perception. Cortical cells with similar preferred orientations are organized in a columnar manner to form a two-dimensional orientation map in the primary visual cortex (area 17). There are several mechanisms underlying the generation of orientation selectivity at the single neuron level; however, their relative contributions to the overall orientation maps are unclear. Using optical imaging combined with in vivo application of AMPA, NMDA and GABAA receptor antagonists, we observed that CNQX or AP-5 weakened the orientation map of area 17, whereas simultaneous application of both antagonists abolished the map completely. Furthermore, removal of GABAergic inhibition by application of bicuculline and/or picrotoxin, which are GABAA receptor antagonists, led to cortical epilepsy and wiped out the orientation map completely; although low doses of bicuculline enhanced the orientation map. The orientation map reappeared after the bicuculline-induced epilepsy was prevented by applying CNQX to partially block the excitatory inputs. During those drug application experiments that did not abolish orientation selectivity, the remained map pattern was unchanged. Bicuculline combined with CNQX could only reduce the amplitude of orientation mapping signals but could not alter the preferred orientation maps. These results indicate that the excitatory inputs to cortical neurons are essential and sufficient for generating orientation maps. However, intracortical GABAergic inhibition is necessary to sustain a normal excitation–inhibition balance in area 17. Overall, both excitatory and inhibitory inputs have spatially homogenous impact on orientation maps.

Details DOI

YNIMG Journal 2006 Journal Article

Slab-like functional architecture of higher order cortical area 21a showing oblique effect of orientation preference in the cat

Luoxiu Huang
Tiande Shou
Xin Chen
Hongbo Yu
Chao Sun
Zhiyin Liang

Optical imaging based on intrinsic signals is a powerful tool for in vivo studying functional organization of various cortices. Here, the functional architecture of orientation-sensitive neurons in higher order extrastriate cortical area 21a was investigated in cats using optical imaging combined with electrophysiological methods. It is found that neurons in area 21 with similar preferred orientations were functionally organized into a slab-like columnar structure orthogonal to the cortical surface, and the orientation columns were distributed more densely than those in area 17. The responsiveness and activated areas of optical maps visually elicited by the horizontal and vertical gratings were always larger than those by oblique gratings in areas 21a and 17. This neural oblique effect shown in orientation maps was more significant in area 21a than that in area 17. The findings suggest a neuronal mechanism in the higher order extrastriate cortex involving the visual perceptive process of the superiority of cardinal contours.

Details DOI