Author name cluster

Kun Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers

2 author rows

AAAI Conference 2026 Conference Paper

CABTO: Context-Aware Behavior Tree Grounding for Robot Manipulation

Yishuai Cai
Xinglin Chen
Yunxin Mao
Kun Hu
Minglong Li

Behavior Trees (BTs) offer a powerful paradigm for designing modular and reactive robot controllers. BT planning, an emerging field, provides theoretical guarantees for the automated generation of reliable BTs. However, BT planning typically assumes that a well-designed BT system is already grounded—comprising high-level action models and low-level control policies—which often requires extensive expert knowledge and manual effort. In this paper, we formalize the BT Grounding problem: the automated construction of a complete and consistent BT system. We analyze its complexity and introduce CABTO (Context-Aware Behavior Tree grOunding), the first framework to efficiently solve this challenge. CABTO leverages pre-trained Large Models (LMs) to heuristically search the space of action models and control policies, guided by contextual feedback from BT planners and environmental observations. Experiments spanning seven task sets across three distinct robotic manipulation scenarios demonstrate CABTO’s effectiveness and efficiency in generating complete and consistent behavior tree systems.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DuoCast: Duo-Probabilistic Diffusion for Precipitation Nowcasting

Penghui Wen
Mengwei He
Patrick Filippi
Na Zhao
Feng Zhang
Thomas Francis Bishop
Zhiyong Wang
Kun Hu

Accurate short-term precipitation forecasting is critical for weather-sensitive decision-making in agriculture, transportation, and disaster response. Existing deep learning approaches often struggle to balance global structural consistency with local detail preservation, especially under complex meteorological conditions. We propose DuoCast, a dual-diffusion framework that decomposes precipitation forecasting into low- and high-frequency components modeled in orthogonal latent subspaces. We theoretically prove that this frequency decomposition reduces prediction error compared to conventional single branch U-Net diffusion models. In DuoCast, the low-frequency model captures large-scale trends via convolutional encoders conditioned on weather front dynamics, while the high-frequency model refines fine-scale variability using a self-attention-based architecture. Experiments on four benchmark radar datasets show that DuoCast consistently outperforms state-of-the-art baselines, achieving superior accuracy in both spatial detail and temporal evolution.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ECD: Evidence-guided Contrastive Decoding in Retrieval-Augmented Generation with Accurate Knowledge Reference Adjustment

Yize Sui
Yan Xu
Kun Hu
Jing Ren
Wenjing Yang

Retrieval-Augmented Generation (RAG) enhances the quality of question answering by integrating external knowledge with internal knowledge. A robust RAG system needs to precisely regulate the dependence of the response on the two types of knowledge. The recently proposed context-aware contrastive decoding (CCD) method attempts to achieve this goal by adjusting the knowledge reference weights by comparing the output distribution differences of LLMS when they rely on different knowledge sources. However, these methods are based on probabilistic knowledge reference adjustment strategies (such as the highest probability or entropy), only focus on the relative confidence of the output responses at each decoding step, without considering the absolute confidence of the responses, which may lead to misjudgment of the external knowledge and internal knowledge reference degree in the decoding process. To this end, we propose a novel decoding method, Evidence-guided Contrastive Decoding (ECD), which conducts evidence modeling by constructing the Dirichlet distribution and regards logits as evidence vectors, so as to regulate the reference degree of internal and external knowledge more accurately, and finally improve the quality of generated responses. Extensive evaluations across four public benchmark datasets on three mainstream LLMs have demonstrated the effectiveness and advantages of ECD.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Pb4U-GNet: Resolution-Adaptive Garment Simulation via Propagation-before-Update Graph Network

Aoran Liu
Kun Hu
Clinton Ansun Mo
Qiuxia Wu
Wenxiong Kang
Zhiyong Wang

Garment simulation is fundamental to various applications in computer vision and graphics, from virtual try-on to digital human modelling. However, conventional physics-based methods remain computationally expensive, hindering their application in time-sensitive scenarios. While graph neural networks (GNNs) offer promising acceleration, existing approaches exhibit poor cross-resolution generalisation, demonstrating significant performance degradation on higher-resolution meshes beyond the training distribution. This stems from two key factors: (1) existing GNNs employ fixed message-passing depth that fails to adapt information aggregation to mesh density variation, and (2) vertex-wise displacement magnitudes are inherently resolution-dependent in garment simulation. To address these issues, we introduce Propagation-before-Update Graph Network (Pb4U-GNet), a resolution-adaptive framework that decouples message propagation from feature updates. Pb4U-GNet incorporates two key mechanisms: (1) dynamic propagation depth control, adjusting message-passing iterations based on mesh resolution, and (2) geometry-aware update scaling, which scales predictions according to local mesh characteristics. Extensive experiments show that even trained solely on low-resolution meshes, Pb4U-GNet exhibits strong generalisability across diverse mesh resolutions, addressing a fundamental challenge in neural garment simulation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TWINFUZZ: Dual-Model Fuzzing for Robustness Generalization in Deep Learning

Enze Dai
Wentao Mo
Kun Hu
Xiaogang Zhu
Xi Xiao
Sheng Wen
Shaohua Wang
Yang Xiang

Deep learning (DL) models are increasingly deployed in safety-critical applications such as face recognition, autonomous driving, and medical diagnosis. Despite their impressive accuracy, they remain vulnerable to adversarial examples - subtle perturbations that can cause incorrect predictions, i.e., the robustness issues. While adversarial training improves robustness against known attacks, it often fails to generalize to unseen or stronger threats, revealing a critical gap in robustness generalization. In this work, we propose a dual-model fuzzing framework to enhance generalized robustness in DL models. Central to our method is a lightweight metric, the Lagrangian Information Bottleneck (LIB), which guides entropy-based mutation toward semantically meaningful and high-risk regions of the input space. The executor uses a resistant model and a more error-prone vulnerable model; their prediction consistency forms the basis of agreement mining, a label-free oracle for isolating decision-boundary samples. To ensure fuzzing effectiveness, we further introduce a task-driven seed selection strategy (e.g., SSIM for vision) that filters out low-quality inputs. We implement a prototype, TWINFUZZ, and evaluate it on six benchmark datasets and nine DL models. Compared with state-of-the-art testing approaches, TWINFUZZ achieves superior improvements in both training-specific and generalized robustness.

PDF Details DOI

EAAI Journal 2025 Journal Article

Adaptive weighted multiple imputation with generative adversarial networks for improving wind speed data integrity

Weirui Jiang
Jinxing Che
Kun Hu
Yifan Xu
Wei Dong

IROS Conference 2025 Conference Paper

CM-LIUW-Odometry: Robust and High-Precision LiDAR-Inertial-UWB-Wheel Odometry for Extreme Degradation Coal Mine Tunnels

Kun Hu
Menggang Li
Zhiwen Jin
Chaoquan Tang
Eryi Hu
Gongbo Zhou

Simultaneous Localization and Mapping (SLAM) in large-scale, complex, and GPS-denied underground coal mine environments presents significant challenges. Sensors must contend with abnormal operating conditions: GPS unavailability impedes scene reconstruction and absolute geographic referencing, uneven or slippery terrain degrades wheel odometer accuracy, and long, feature-poor tunnels reduce LiDAR effectiveness. To address these issues, we propose CoalMine-LiDAR-IMU-UWB-Wheel-Odometry (CM-LIUW-Odometry), a multi-modal SLAM framework based on the Iterated Error-State Kalman Filter (IESKF). First, LiDAR-inertial odometry is tightly fused with UWB absolute positioning constraints to align the SLAM system with a global coordinate. Next, wheel odometer is integrated through tight coupling, enhanced by nonholonomic constraints (NHC) and vehicle lever arm compensation, to address performance degradation in areas beyond UWB measurement range. Finally, an adaptive motion mode switching mechanism dynamically adjusts the robot’s motion mode based on UWB measurement range and environmental degradation levels. Experimental results validate that our method achieves superior accuracy and robustness in real-world underground coal mine scenarios, outperforming state-of-the-art approaches. We open source our code of this work on Github 3 to benefit the robotics community.

AAAI Conference 2025 Conference Paper

DC-PCN: Point Cloud Completion Network with Dual-Codebook Guided Quantization

Qiuxia Wu
Haiyang Huang
Kunming Su
Zhiyong Wang
Kun Hu

Point cloud completion aims to reconstruct complete 3D shapes from partial 3D point clouds. With advancements in deep learning techniques, various methods for point cloud completion have been developed. Despite achieving encouraging results, a significant issue remains: these methods often overlook the variability in point clouds sampled from a single 3D object surface. This variability can lead to ambiguity and hinder the achievement of more precise completion results. Therefore, in this study, we introduce a novel point cloud completion network, namely Dual-Codebook Point Completion Network (DC-PCN), following an encder-decoder pipeline. The primary objective of DC-PCN is to formulate a singular representation of sampled point clouds originating from the same 3D surface. DC-PCN introduces a dual-codebook design to quantize point-cloud representations from a multilevel perspective. It consists of an encoder-codebook and a decoder-codebook, designed to capture distinct point cloud patterns at shallow and deep levels. Additionally, to enhance the information flow between these two codebooks, we devise an information exchange mechanism. This approach ensures that crucial features and patterns from both shallow and deep levels are effectively utilized for completion. Extensive experiments on the PCN, ShapeNet_Part, and ShapeNet34 datasets demonstrate the state-of-the-art performance of our method.

PDF Details DOI

IROS Conference 2025 Conference Paper

LiDAR-IMU Fusion System with Adaptive Scanning for High-Resolution Deformation Monitoring of Underground Infrastructures

Menggang Li
Zhuoqi Li
Kun Hu
Eryi Hu
Chaoquan Tang
Gongbo Zhou

A LiDAR-IMU fusion system utilizing adaptive scanning is developed for high-resolution deformation monitoring of underground coal mine infrastructure, such as sealed walls. The system integrates data from a LiDAR scanner and an IMU, employing a penalty function-based scanning strategy to optimize point cloud quality. Following feature extraction and state estimation, a 3D point cloud model of the sealed wall is constructed. Deformation monitoring is achieved through point cloud segmentation, registration, and error analysis across multiple time intervals. A methodology for optimizing equipment placement on walls of varying dimensions is proposed to efficiently capture deformation details. Two metrics, PATD and PARE, are introduced to evaluate system performance. Calibration experiments using standardized boards and blocks are designed to determine optimal monitoring parameters, including distance, height, and sampling frequency. Simulated deformation experiments under real-world conditions validate the system’s rationality and accuracy.

AAMAS Conference 2025 Conference Paper

PMAT: Optimizing Action Generation Order in Multi-Agent Reinforcement Learning

Kun Hu
Muning Wen
Xihuai Wang
Shao Zhang
Yiwei Shi
Minne Li
Minglong Li
Ying Wen

Multi-Agent Reinforcement Learning (MARL) faces challenges in coordinating agents due to complex interdependencies within multiagent systems. Most MARL algorithms use the simultaneous decisionmaking paradigm but ignore the action-level dependencies among agents, which reduces coordination efficiency. In contrast, the sequential decision-making paradigm provides finer-grained supervision for agent decision order, presenting the potential for handling dependencies via better decision order management. However, determining the optimal decision order remains a challenge. In this paper, we introduce Action Generation with Plackett-Luce Sampling (AGPS), a novel mechanism for agent decision order optimization. We model the order determination task as a Plackett-Luce sampling process to address issues such as ranking instability and vanishing gradient during the network training process. AGPS realizes credit-based decision order determination by establishing a bridge between the significance of agents’ local observations and their decision credits, thus facilitating order optimization and dependency management. Integrating AGPS with the Multi-Agent Transformer, we propose the Prioritized Multi-Agent Transformer (PMAT), a sequential decision-making MARL algorithm with decision order optimization. Experiments on benchmarks including StarCraft Multi-Agent Challenge, Google Research Football, and Multi-Agent MuJoCo show that PMAT outperforms state-ofthe-art algorithms, greatly enhancing coordination efficiency. ∗Correspondence to Minglong Li and Ying Wen. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

ICML Conference 2025 Conference Paper

Reidentify: Context-Aware Identity Generation for Contextual Multi-Agent Reinforcement Learning

Zhiwei Xu 0005
Kun Hu
Xin Xin 0003
Weiliang Meng
Yiwei Shi
Hangyu Mao
Bin Zhang 0052
Dapeng Li 0001

Generalizing multi-agent reinforcement learning (MARL) to accommodate variations in problem configurations remains a critical challenge in real-world applications, where even subtle differences in task setups can cause pre-trained policies to fail. To address this, we propose Context-Aware Identity Generation (CAID), a novel framework to enhance MARL performance under the Contextual MARL (CMARL) setting. CAID dynamically generates unique agent identities through the agent identity decoder built on a causal Transformer architecture. These identities provide contextualized representations that align corresponding agents across similar problem variants, facilitating policy reuse and improving sample efficiency. Furthermore, the action regulator in CAID incorporates these agent identities into the action-value space, enabling seamless adaptation to varying contexts. Extensive experiments on CMARL benchmarks demonstrate that CAID significantly outperforms existing approaches by enhancing both sample efficiency and generalization across diverse context variants.

AAAI Conference 2025 Conference Paper

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Kunming Su
Qiuxia Wu
Panpan Cai
Xiaogang Zhu
Xuequan Lu
Zhiyong Wang
Kun Hu

Masked point modeling methods have recently achieved great success in self-supervised learning for point cloud data. However, these methods are sensitive to rotations and often exhibit sharp performance drops when encountering rotational variations. In this paper, we propose a novel Rotation-Invariant Masked AutoEncoders (RI-MAE) to address two major challenges: 1) achieving rotation-invariant latent representations, and 2) facilitating self-supervised reconstruction in a rotation-invariant manner. For the first challenge, we introduce RI-Transformer, which features disentangled geometry content, rotation-invariant relative orientation and position embedding mechanisms for constructing rotation-invariant point cloud latent space. For the second challenge, a novel dual-branch student-teacher architecture is devised. It enables the self-supervised learning via the reconstruction of masked patches within the learned rotation-invariant latent space. Each branch is based on an RI-Transformer, and they are connected with an additional RI-Transformer predictor. The teacher encodes all point patches, while the student solely encodes unmasked ones. Finally, the predictor predicts the latent features of the masked patches using the output latent embeddings from the student, supervised by the outputs from the teacher. Extensive experiments demonstrate that our method is robust to rotations, achieving the state-of-the-art performance on various downstream tasks.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Wave-wise Discriminative Tracking by Phase-Amplitude Separation, Augmentation and Mixture

Huibin Tan
Mingyu Cao
Kun Hu
Xihuai He
Zhe Wang
Hao Li
Long Lan
Mengzhu Wang

Distinguishing key features in complex visual tasks is challenging. A novel approach treats image patches (tokens) as waves. By using both phase and amplitude, it captures richer semantics and specific invariances compared to pixel-based methods, and allows for feature fusion across regions for a holistic image representation. Based on this, we propose the Wave-wise Discriminative Transformer Tracker (WDT). During tracking, WDT represents features via phase-amplitude separation, enhancement, and mixture. First, we designed a Mutual Exclusive Phase-Amplitude Extractor (MEPAE) to separate phase and amplitude features with distinct semantics, representing spatial target info and background brightness respectively. Then, Wave-wise Feature Augmentation is carried out with two submodules: Phase-Amplitude Feature Augmentation and Mixture. The augmentation module disrupts the separated features in the same batch, and the mixture module recombines them to generate positive and negative waves. The original features are aggregated into the original wave. Positive waves have the same phase but different amplitudes, and negative waves have different phase components. Finally, self-supervised and tracking-supervised losses guide the global and local representation learning for original, positive, and negative waves, enhancing wave-level discrimination. Experiments on five benchmarks prove the effectiveness of our method.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation

Zhuqiang Lu
Kun Hu
Chaoyue Wang
Lei Bai
Zhiyong Wang

A 360-degree (omni-directional) image provides an all-encompassing spherical view of a scene. Recently, there has been an increasing interest in synthesising 360-degree images from conventional narrow field of view (NFoV) images captured by digital cameras and smartphones, for providing immersive experiences in various scenarios such as virtual reality. Yet, existing methods typically fall short in synthesizing intricate visual details or ensure the generated images align consistently with user-provided prompts. In this study, autoregressive omni-aware generative network (AOG-Net) is proposed for 360-degree image generation by outpainting an incomplete 360-degree image progressively with NFoV and text guidances joinly or individually. This autoregressive scheme not only allows for deriving finer-grained and text-consistent patterns by dynamically generating and adjusting the process but also offers users greater flexibility to edit their conditions throughout the generation process. A global-local conditioning mechanism is devised to comprehensively formulate the outpainting guidance in each autoregressive step. Text guidances, omni-visual cues, NFoV inputs and omni-geometry are encoded and further formulated with cross-attention based transformers into a global stream and a local stream into a conditioned generative backbone model. As AOG-Net is compatible to leverage large-scale models for the conditional encoder and the generative prior, it enables the generation to use extensive open-vocabulary text guidances. Comprehensive experiments on two commonly used 360-degree image datasets for both indoor and outdoor settings demonstrate the state-of-the-art performance of our proposed method. Our code is available at https://github.com/zhuqiangLu/AOG-NET-360.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Sequential Fusion Based Multi-Granularity Consistency for Space-Time Transformer Tracking

Kun Hu
Wenjing Yang
Wanrong Huang
Xianchen Zhou
Mingyu Cao
Jing Ren
Huibin Tan

Regarded as a template-matching task for a long time, visual object tracking has witnessed significant progress in space-wise exploration. However, since tracking is performed on videos with substantial time-wise information, it is important to simultaneously mine the temporal contexts which have not yet been deeply explored. Previous supervised works mostly consider template reform as the breakthrough point, but they are often limited by additional computational burdens or the quality of chosen templates. To address this issue, we propose a Space-Time Consistent Transformer Tracker (STCFormer), which uses a sequential fusion framework with multi-granularity consistency constraints to learn spatiotemporal context information. We design a sequential fusion framework that recombines template and search images based on tracking results from chronological frames, fusing updated tracking states in training. To further overcome the over-reliance on the fixed template without increasing computational complexity, we design three space-time consistent constraints: Label Consistency Loss (LCL) for label-level consistency, Attention Consistency Loss (ACL) for patch-level ROI consistency, and Semantic Consistency Loss (SCL) for feature-level semantic consistency. Specifically, in ACL and SCL, the label information is used to constrain the attention and feature consistency of the target and the background, respectively, to avoid mutual interference. Extensive experiments have shown that our STCFormer outperforms many of the best-performing trackers on several popular benchmarks.

PDF Details DOI

ECAI Conference 2024 Conference Paper

SFDFusion: An Efficient Spatial-Frequency Domain Fusion Network for Infrared and Visible Image Fusion

Kun Hu
Qingle Zhang
Maoxun Yuan
Yitian Zhang

Infrared and visible image fusion aims to utilize the complementary information from two modalities to generate fused images with prominent targets and rich texture details. Most existing algorithms only perform pixel-level or feature-level fusion from different modalities in the spatial domain. They usually overlook the information in the frequency domain, and some of them suffer from inefficiency due to excessively complex structures. To tackle these challenges, this paper proposes an efficient Spatial-Frequency Domain Fusion (SFDFusion) network for infrared and visible image fusion. First, we propose a Dual-Modality Refinement Module (DMRM) to extract complementary information. This module extracts useful information from both the infrared and visible modalities in the spatial domain and enhances fine-grained spatial details. Next, to introduce frequency domain information, we construct a Frequency Domain Fusion Module (FDFM) that transforms the spatial domain to the frequency domain through Fast Fourier Transform (FFT) and then integrates frequency domain information. Additionally, we design a frequency domain fusion loss to provide guidance for the fusion process. Extensive experiments on public datasets demonstrate that our method produces fused images with significant advantages in various fusion metrics and visual effects. Furthermore, our method demonstrates high efficiency in image fusion and good performance on downstream detection tasks, thereby satisfying the real-time demands of advanced visual tasks. The code is available at https: //github. com/lqz2/SFDFusion.

AAAI Conference 2024 Conference Paper

SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation

Wenxi Yue
Jing Zhang
Kun Hu
Yong Xia
Jiebo Luo
Zhiyong Wang

The Segment Anything Model (SAM) is a powerful foundation model that has revolutionised image segmentation. To apply SAM to surgical instrument segmentation, a common approach is to locate precise points or boxes of instruments and then use them as prompts for SAM in a zero-shot manner. However, we observe two problems with this naive pipeline: (1) the domain gap between natural objects and surgical instruments leads to inferior generalisation of SAM; and (2) SAM relies on precise point or box locations for accurate segmentation, requiring either extensive manual guidance or a well-performing specialist detector for prompt preparation, which leads to a complex multi-stage pipeline. To address these problems, we introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to effectively integrate surgical-specific information with SAM’s pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes and eliminates the use of explicit prompts for improved robustness and a simpler pipeline. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning, further enhancing the discrimination of the class prototypes for more accurate class prompting. The results of extensive experiments on both EndoVis2018 and EndoVis2017 datasets demonstrate that SurgicalSAM achieves state-of-the-art performance while only requiring a small number of tunable parameters. The source code is available at https://github.com/wenxi-yue/SurgicalSAM.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance

Zexin Hu
Kun Hu
Clinton Mo
Lei Pan
Zhiyong Wang

Sketch-based terrain generation seeks to create realistic landscapes for virtual environments in various applications such as computer games, animation and virtual reality. Recently, deep learning based terrain generation has emerged, notably the ones based on generative adversarial networks (GAN). However, these methods often struggle to fulfill the requirements of flexible user control and maintain generative diversity for realistic terrain. Therefore, we propose a novel diffusion-based method, namely terrain diffusion network (TDN), which actively incorporates user guidance for enhanced controllability, taking into account terrain features like rivers, ridges, basins, and peaks. Instead of adhering to a conventional monolithic denoising process, which often compromises the fidelity of terrain details or the alignment with user control, a multi-level denoising scheme is proposed to generate more realistic terrains by taking into account fine-grained details, particularly those related to climatic patterns influenced by erosion and tectonic activities. Specifically, three terrain synthesisers are designed for structural, intermediate, and fine-grained level denoising purposes, which allow each synthesiser concentrate on a distinct terrain aspect. Moreover, to maximise the efficiency of our TDN, we further introduce terrain and sketch latent spaces for the synthesizers with pre-trained terrain autoencoders. Comprehensive experiments on a new dataset constructed from NASA Topology Images clearly demonstrate the effectiveness of our proposed method, achieving the state-of-the-art performance. Our code is available at https://github.com/TDNResearch/TDN.

PDF Details DOI

JBHI Journal 2023 Journal Article

Multi-Level Adversarial Spatio-Temporal Learning for Footstep Pressure Based FoG Detection

Kun Hu
Shaohui Mei
Wei Wang
Kaylena A. Ehgoetz Martens
Liang Wang
Simon J. G. Lewis
David D. Feng
Zhiyong Wang

Freezing of gait (FoG) is one of the most common symptoms of Parkinson's disease, which is a neurodegenerative disorder of the central nervous system impacting millions of people around the world. To address the pressing need to improve the quality of treatment for FoG, devising a computer-aided detection and quantification tool for FoG has been increasingly important. As a non-invasive technique for collecting motion patterns, the footstep pressure sequences obtained from pressure sensitive gait mats provide a great opportunity for evaluating FoG in the clinic and potentially in the home environment. In this study, FoG detection is formulated as a sequential modelling task and a novel deep learning architecture, namely Adversarial Spatio-temporal Network (ASTN), is proposed to learn FoG patterns across multiple levels. ASTN introduces a novel adversarial training scheme with a multi-level subject discriminator to obtain subject-independent FoG representations, which helps to reduce the over-fitting risk due to the high inter-subject variance. As a result, robust FoG detection can be achieved for unseen subjects. The proposed scheme also sheds light on improving subject-level clinical studies from other scenarios as it can be integrated with many existing deep architectures. To the best of our knowledge, this is one of the first studies of footstep pressure-based FoG detection and the approach of utilizing ASTN is the first deep neural network architecture in pursuit of subject-independent representations. In our experiments on 393 trials collected from 21 subjects, the proposed ASTN achieved an AUC 0. 85, clearly outperforming conventional learning methods.

AAAI Conference 2023 Conference Paper

Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase

Lintao Wang
Kun Hu
Lei Bai
Yu Ding
Wanli Ouyang
Zhiyong Wang

Synthesizing controllable motion for a character using deep learning has been a promising approach due to its potential to learn a compact model without laborious feature engineering. To produce dynamic motion from weak control signals such as desired paths, existing methods often require auxiliary information such as phases for alleviating motion ambiguity, which limits their generalisation capability. As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase. Specifically, an encoder is devised to adaptively formulate the motion patterns of a character's past poses with multi-scale skeletons, and a decoder driven by control signals to further synthesize and predict the character's state by paying context-specialised attention to the encoded past motion patterns. As a result, it helps alleviate the issues of low responsiveness and slow transition which often happen in conventional methods not using auxiliary information. Both qualitative and quantitative experimental results on an existing biped locomotion dataset, which involves diverse types of motion transitions, demonstrate the effectiveness of our method. In particular, MCS-T is able to successfully generate motions comparable to those generated by the methods using auxiliary information.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels

Hao-Tian Li
Tong Wei
Hao Yang
Kun Hu
Chong Peng
Li-Bo Sun
Xun-Liang Cai
Min-Ling Zhang

Deep neural networks have shown promising results on a wide variety of tasks using large-scale and well-annotated training datasets. However, data collected from real-world applications can suffer from two prevalent biases, i. e. , long-tailed class distribution and label noise. Previous efforts on long-tailed learning and label-noise learning can only address a single type of data bias, leading to a severe deterioration of their performance. In this paper, we propose a distance-based sample selection algorithm called Stochastic Feature Averaging (SFA), which fits a Gaussian using the exponential running average of class centroids to capture uncertainty in representation space due to label noise and data scarcity. With SFA, we detect noisy samples based on their distances to class centroids sampled from this Gaussian distribution. Based on the identified clean samples, we then propose to train an auxiliary balanced classifier to improve the generalization for the minority class and facilitate the update of Gaussian parameters. Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. Further, we propose to combine SFA with the sample-selection approach, distribution-robust, and noise-robust loss functions, resulting in significant improvement in performance over the baselines. Our code is available at https: //github. com/HotanLee/SFA

PDF Details DOI

JBHI Journal 2020 Journal Article

Vision-Based Freezing of Gait Detection With Anatomic Directed Graph Representation

Kun Hu
Zhiyong Wang
Shaohui Mei
Kaylena A. Ehgoetz Martens
Tingting Yao
Simon J. G. Lewis
David Dagan Feng

Parkinson's disease significantly impacts the life quality of millions of people around the world. While freezing of gait (FoG) is one of the most common symptoms of the disease, it is time consuming and subjective to assess FoG for well-trained experts. Therefore, it is highly desirable to devise computer-aided FoG detection methods for the purpose of objective and time-efficient assessment. In this paper, in line with the gold standard of FoG clinical assessment, which requires video or direct observation, we propose one of the first vision-based methods for automatic FoG detection. To better characterize FoG patterns, instead of learning an overall representation of a video, we propose a novel architecture of graph convolution neural network and represent each video as a directed graph where FoG related candidate regions are the vertices. A weakly-supervised learning strategy and a weighted adjacency matrix estimation layer are proposed to eliminate the resource expensive data annotation required for fully supervised learning. As a result, the interference of visual information irrelevant to FoG, such as gait motion of supporting staff involved in clinical assessments, has been reduced to improve FoG detection performance by identifying the vertices contributing to FoG events. To further improve the performance, the global context of a clinical video is also considered and several fusion strategies with graph predictions are investigated. Experimental results on more than 100 videos collected from 45 patients during a clinical assessment demonstrated promising performance of our proposed method with an AUC of 0. 887.