Author name cluster

Yi Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers

2 author rows

AAAI Conference 2026 Conference Paper

Learning Knowledge from Textual Descriptions for 3D Human Pose Estimation

Yi Wu
Jingtian Li
Shangfei Wang
Guoming Li
Meng Mao
Linxiang Tan

Mainstream 3D human pose estimation methods directly predict 3D coordinates of joints from 2D keypoints, suffering from severe depth ambiguity. Pose textual descriptions contain abundant semantic information, which facilitates the model to learn the spatial relationship among different body parts, partially alleviating this issue. Leveraging this insight, we propose a 3D human pose estimation method assisted by textual descriptions. Specifically, we utilize an automatic captioning pipeline to generate textual descriptions of 3D poses based on spatial relations among joints. These descriptions include details regarding angles, distances, relative positions, pitch\&roll and ground-contacts. Subsequently, text features are extracted from these descriptions using a language model, while a 3D human pose estimation model extracts pose features. Aligning the pose features with the text features allows for a more targeted optimization of the estimation model. Therefore, we systematically introduce three alignment approaches to effectively align features extracted by two models operating in entirely different domains. Our method incorporates prior knowledge derived from the textual descriptions into the estimation model and can be seamlessly applied to various existing framework. Experimental results on the Human3.6M and MPI-INF-3DHP datasets demonstrate that our method surpasses state-of-the-art methods.

PDF Details DOI

EAAI Journal 2025 Journal Article

A lightweight network for category-level open-vocabulary object pose estimation with enhanced cross implicit space transformation

Pihong Hou
Yongfang Zhang
Wei Zhou
Boxuan Ye
Yi Wu

Details DOI

JBHI Journal 2025 Journal Article

A Multi-Modality Attention Network for Driver Fatigue Detection Based on Frontal EEG, EDA and PPG Signals

Yuanru Guo
Kunping Yang
Yi Wu

Fatigue driving is a common issue that often leads to traffic accidents, which has motivated numerous automatic driving fatigue detection methods based on various sources, especially reliable physiological signals. However, it still faces the challenges of accuracy, robustness and practicality, especially for the cross-subject detection. The fusion of multi-modality data can improve the effective estimation of driving fatigue. In this work, we take the advantages of user-friendly and multi-modality signals to build a Multi-Modality Attention Network (MMA-Net) for driver fatigue detection with frontal electroencephalography (EEG), electrodermal activity (EDA) and photoplethysmography (PPG) signals for a hybrid. Specifically, a signal adaptive coding module (SAC-M) has been constructed to fully excavate spatial-temporal information of signals, combining with an attention-based feature dissimilation module (AFD-M) to further obtain key comprehensive features. In addition, the performances of baseline models and state-of-the-art methods on signal sources with different window lengths are also compared. The cross-subject experiment is performed on two groups of 14 participants in the driving simulation experiment. The experimental results prove the superiority of our proposed method. It is possible to use the MMA-Net for driver fatigue detection with user-friendly multi-modality signals, such as our selected frontal EEG, EDA and PPG in real-world applications.

Details DOI

NeurIPS Conference 2025 Conference Paper

AREAL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu
Jiaxuan Gao
Xujie Shen
Chen Zhu
Zhiyu Mei
Chuyi He
Shusheng Xu
Guo Wei

Reinforcement learning (RL) has become a trending paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous by alternating generation and training in a batch setting, where the rollouts in each training batch are generated by the same (or latest) model. This stabilizes RL training but suffers from severe system-level inefficiency. Generation must wait until the longest output in the batch is completed before model update, resulting in GPU underutilization. We present AReaL, a fully asynchronous RL system that completely decouples generation from training. Rollout workers in AReaL continuously generate new outputs without waiting, while training workers update the model whenever a batch of data is collected. AReaL also incorporates a collection of system-level optimizations, leading to substantially higher GPU utilization. To stabilize RL training, AReaL balances the workload of rollout and training workers to control data staleness, and adopts a staleness-enhanced PPO variant to better handle outdated training samples. Extensive experiments on math and code reasoning benchmarks show that AReaL achieves up to 2. 77x training speedup compared to synchronous systems with the same number of GPUs and matched or even improved final performance. The code of AReaL is available at https: //github. com/inclusionAI/AReaL/.