Arrow Research search

Author name cluster

Yuxiang Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
2 author rows

Possible papers

15

AAAI Conference 2026 Conference Paper

PromptEmo: Learning Emotion with Bilateral Textual Prompts in Multi-Domain Open-set Scenarios

  • Xinyi Zeng
  • Yuxiang Yang
  • Pinxian Zeng
  • Wenxia Yin
  • Bo Liu
  • Xi Wu
  • Yan Wang

Facial Expression Recognition (FER) is crucial to human-computer interaction. Existing cross-domain FER (CD-FER) methods mainly focus on single-source closed-set scenarios, transferring knowledge from a single source domain to a target domain with identical class sets. However, CD-FER faces two real-world challenges: 1) the need to leverage information from multiple sources, leading to multi-domain shift, and 2) the necessity to recognize unseen target classes, resulting in class shift. These issues give rise to a novel and challenging task, which we define as Multi-domain Open-set FER (MO-FER). In this paper, we propose PromptEmo, a novel CLIP-based framework that leverages bilateral textual prompts to address both shifts in the MO-FER task. Leveraging the generalizability of LLM, PromptEmo constructs trainable positive prompts with LLM-generated emotion descriptions for seen classes, as well as template-derived negative prompts to enhance the reasoning for unseen classes. Then, we introduce a modal-task optimization paradigm organized from two perspectives: textual semantics and visual domains, yielding Intra-modal Space-specific Optimization (ISO) and Cross-modal Emotion-aware Interaction (CEI) strategies. ISO refines the CLIP-based textual space to ensure semantic separation between bilateral prompts and improves the latent visual space by promoting inter-domain alignment. Founded on ISO, CEI facilitates effective vision-language interactions, resulting in four joint loss terms that improve emotion recognition by shaping a domain-invariant, discriminative feature space. PromptEmo surpasses the current SOTA method by 7.7% AUC on unseen classes across four FER datasets, serving as a strong baseline for the MO-FER task.

EAAI Journal 2025 Journal Article

A multiple aging factor interactive learning framework for lithium-ion battery state-of-health estimation

  • Zhengyi Bao
  • Tingting Luo
  • Mingyu Gao
  • Zhiwei He
  • Yuxiang Yang
  • Jiahao Nie

As lithium-ion batteries are widely used in electric vehicles, it has become critical to accurately estimate the state-of-health of the battery. While neural networks have been proven to be effective for state-of-health estimation, such networks primarily focus on feature modeling of raw data without exploiting inherent correlation among multiple dimensional information in the raw data, limiting the estimation accuracy. We thereby propose an interactive learning network for state-of-health estimation. The novel network simultaneously models features and learns correlations among multiple dimensional information using multiple layer perceptron in an interactive manner. We then extract multiple aging factors from the raw voltage and current data as network inputs, which enables knowledge associated to state-of-health of the battery to be encoded in the proposed network. In addition, benefiting from aging factors in lower dimensions than the raw data, computational overhead of the network are significantly reduced. Comprehensive experiments are conducted on two widely-adopted datasets. The experimental results confirm that our proposed network performs accurate state-of-health estimation within a mean absolute error of less than 3% in both of the two datasets, outperforming previous recurrent neural network and Transformer-based methods. Moreover, computational load comparison further demonstrates the potential of the proposed framework in battery management systems.

IJCAI Conference 2025 Conference Paper

BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View

  • Yuxiang Yang
  • Yingqi Deng
  • Mian Pan
  • Zheng-Jun Zha
  • Jing Zhang

3D Single Object Tracking (SOT) is a fundamental task in computer vision and plays a critical role in applications like autonomous driving. However, existing algorithms often involve complex designs and multiple loss functions, making model training and deployment challenging. Furthermore, their reliance on fixed probability distribution assumptions (e. g. , Laplacian or Gaussian) hinders their ability to adapt to diverse target characteristics such as varying sizes and motion patterns, ultimately affecting tracking precision and robustness. To address these issues, we propose BEVTrack, a simple yet effective motion-based tracking method. BEVTrack directly estimates object motion in Bird's-Eye View (BEV) using a single regression loss. To enhance accuracy for targets with diverse attributes, it learns adaptive likelihood functions tailored to individual targets, avoiding the limitations of fixed distribution assumptions in previous methods. This approach provides valuable priors for tracking and significantly boosts performance. Comprehensive experiments on KITTI, NuScenes, and Waymo Open Dataset demonstrate that BEVTrack achieves state-of-the-art results while operating at 200 FPS, enabling real-time applicability. The code will be released at https: //github. com/xmm-prio/BEVTrack.

IJCAI Conference 2025 Conference Paper

DDPA-3DVG: Vision-Language Dual-Decoupling and Progressive Alignment for 3D Visual Grounding

  • Hongjie Gu
  • Jinlong Fan
  • Liang Zheng
  • Jing Zhang
  • Yuxiang Yang

3D visual grounding aims to localize target objects in point clouds based on free-form natural language, which often describes both target and reference objects. Effective alignment between visual and text features is crucial for this task. However, existing two-stage methods that rely solely on object-level features can yield suboptimal accuracy, while one-stage methods that align only point-level features can be prone to noise. In this paper, we propose DDPA-3DVG, a novel framework that progressively aligns visual locations and language descriptions at multiple granularities. Specifically, we decouple natural language descriptions into distinct representations of target objects, reference objects, and their mutual relationships, while disentangling 3D scenes into object-level, voxel-level, and point-level features. By progressively fusing these dual-decoupled features from coarse to fine, our method enhances cross-modal alignment and achieves state-of-the-art performance on three challenging benchmarks—ScanRefer, Nr3D, and Sr3D. The code will be released at https: //github. com/HDU-VRLab/DDPA-3DVG.

ICML Conference 2025 Conference Paper

Reinforced Learning Explicit Circuit Representations for Quantum State Characterization from Local Measurements

  • Manwen Liao
  • Yan Zhu
  • Weitian Zhang
  • Yuxiang Yang

Characterizing quantum states is essential for advancing many quantum technologies. Recently, deep neural networks have been applied to learn quantum states by generating compressed implicit representations. Despite their success in predicting properties of the states, these representations remain a black box, lacking insights into strategies for experimental reconstruction. In this work, we aim to open this black box by developing explicit representations through generating surrogate state preparation circuits for property estimation. We design a reinforcement learning agent equipped with a Transformer-based architecture and a local fidelity reward function. Relying solely on measurement data from a few neighboring qubits, our agent accurately recovers properties of target states. We also theoretically analyze the global fidelity the agent can achieve when it learns a good local approximation. Extensive experiments demonstrate the effectiveness of our framework in learning various states of up to 100 qubits, including those generated by shallow Instantaneous Quantum Polynomial circuits, evolved by Ising Hamiltonians, and many-body ground states. Furthermore, the learned circuit representations can be applied to Hamiltonian learning as a downstream task utilizing a simple linear model.

EAAI Journal 2025 Journal Article

Susceptibility risk assessment of oil and gas pipeline geological hazards in mountainous areas based on data-driven model

  • Yuxiang Yang
  • Benji Wang
  • Xiao Cen
  • Bowen Shao
  • Baikang Zhu
  • Jin Yang
  • Bingyuan Hong

Geological hazards are recognized as causing significant damage to oil and gas pipelines, often resulting in catastrophic loss of life and property and hindering societal progress. In this study, a data-driven evaluation model is developed by integrating the Information Value method (IVM) with a Back Propagation Neural Network (BPNN) to assess the susceptibility of geological hazards in mountainous oil and gas pipelines. The IVM is used to identify non-hazardous areas, optimizing sample selection and reducing training errors, while the BPNN is employed to determine the weights of evaluation indicators, enhancing accuracy. First, an evaluation index system is proposed that comprehensively considers the natural geographical conditions and main disaster types. Next, non-disaster areas are located using the IVM and combined with disaster-prone areas to form the sample data. The sample data is then input into a BPNN for training, and the weights of each evaluation index are obtained from the trained network. Finally, a susceptibility risk assessment model is developed based on the derived weights and information values to accurately evaluate the susceptibility of pipeline geological hazards. A pipeline in China's Zhejiang Province's mountainous region is used as an illustration. Compared to the single IVM model and the single BPNN model, the receiver operator characteristic curve shows that the proposed method achieves significant improvements in the area under the curve by 9. 8 % and 11. 2 %, respectively, indicating a high level of evaluation accuracy. This study provides a reliable approach for assessing geological hazard susceptibility, offering scientific support for pipeline planning and hazard mitigation in oil and gas operations.

AAAI Conference 2025 Conference Paper

UAWTrack: Universal 3D Single Object Tracking in Adverse Weather

  • Yuxiang Yang
  • Hongjie Gu
  • Yingqi Deng
  • Zhekang Dong
  • Zhiwei He
  • Jing Zhang

3D single object tracking (3D SOT) in LiDAR point clouds is essential for autonomous driving. Most existing 3D SOT methods focus on clear weather, where point clouds are more defined. However, adverse weather conditions lead to sparser and noisier point clouds, significantly degrading tracking performance and posing safety risks. In this study, we introduce UAWTrack, a universal 3D SOT model designed to perform effectively across diverse real-world weather conditions. UAWTrack comprises three key modules: 1) Voxel Feature Extraction, which mitigates the perturbations in point clouds caused by adverse weather; 2) Motion-centric Spatial-temporal Aggregation and Motion-guided Feature Fusion, capturing motion clues and sampling dense BEV motion features to address the issue of sparsity; and 3) Weather-Specific Tracker, which efficiently handles tracking in various weather conditions. To fill the gap of lacking benchmarks for 3D SOT in adverse weather, we simulate physically valid adverse weather conditions on the KITTI and NuScenes datasets, creating two benchmarks: KITTI-A and NuScenes-A. Extensive experiments demonstrate that UAWTrack achieves state-of-the-art performance under all weather conditions.

IROS Conference 2023 Conference Paper

DMCL: Robot Autonomous Navigation via Depth Image Masked Contrastive Learning

  • Jiahao Jiang
  • Ping Li
  • Xudong Lv
  • Yuxiang Yang

Achieving high performance in deep reinforcement learning relies heavily on the ability to obtain good state representations from pixel inputs. However, learning an observation-space-to-action-space mapping from high-dimensional inputs is challenging in reinforcement learning, particularly when dealing with consecutive depth images as input states. In addition, we observe that the consecutive inputs of depth images are highly correlated for the autonomous navigation of a mobile robot, which inspires us to capture temporal correlations between consecutive inputs and infer scene change relationships. To this end, we propose a novel end-to-end robot vision navigation method dubbed DMCL, which obtains good spatial-temporal state representation via Depth image Masked Contrastive Learning. It reconstructs the latent representation from consecutive depth images masked in both spatial and temporal dimensions, resulting in a complete environment state representation. To obtain the optimal navigation policy, we leverage the Soft Actor-Critic reinforcement learning in conjunction with the above representation learning. Extensive experiments demonstrate that the proposed DMCL outperforms representative state-of-the-art methods. The source code will be made publicly available.

AAAI Conference 2023 Conference Paper

GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

  • Jiahao Nie
  • Zhiwei He
  • Yuxiang Yang
  • Mingyu Gao
  • Jing Zhang

Current 3D single object tracking methods are typically based on VoteNet, a 3D region proposal network. Despite the success, using a single seed point feature as the cue for offset learning in VoteNet prevents high-quality 3D proposals from being generated. Moreover, seed points with different importance are treated equally in the voting process, aggravating this defect. To address these issues, we propose a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals. Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning. Subsequently, a simple yet effective training strategy is designed to train the GLT module. We develop an importance prediction branch to learn the potential importance of the seed points and treat the output weights vector as a training constraint term. By incorporating the above components together, we exhibit a superior tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the 3D single object tracking task. Besides, further ablation studies show the advantages of the proposed global-local transformer voting scheme over the original VoteNet. Code and models will be available at https://github.com/haooozi/GLT-T.

AAMAS Conference 2023 Conference Paper

Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation

  • Xinyi Yang
  • Shiyu Huang
  • Yiwen Sun
  • Yuxiang Yang
  • Chao Yu
  • Wei-Wei Tu
  • Huazhong Yang
  • Yu Wang

This paper investigates the multi-agent navigation problem, which requires multiple agents to reach the target goals in a limited time. Multi-agent reinforcement learning (MARL) has shown promising results for solving this issue. However, it is inefficient for MARL to directly explore the (nearly) optimal policy in the large search space, which is exacerbated as the agent number increases (e. g. , 10+ agents) or the environment is more complex (e. g. , 3𝐷 simulator). Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge by introducing a hierarchical structure to decompose the search space, where the low-level policy predicts primitive actions in the guidance of the goals derived from the high-level policy. In this paper, we propose Multi-Agent Graph-Enhanced Commander-EXecutor (MAGE-X), a graph-based goal-conditioned hierarchical method for multi-agent navigation tasks. MAGE-X comprises a high-level Goal Commander and a low-level Action Executor. The Goal Commander predicts the probability distribution of the goals and leverages them to assign the most appropriate final target to each agent. The Action Executor utilizes graph neural networks (GNN) to construct a subgraph for each agent that only contains its crucial partners to improve cooperation. Additionally, the Goal Encoder in the Action Executor captures the relationship between the agent and the designated goal to encourage the agent to reach the final target. The results show that MAGE-X outperforms the state-of-the-art MARL baselines with a 100% success rate with only 3 million training steps in multi-agent particle environments (MPE) with 50 agents, and at least a 12% higher success rate and 2× higher data efficiency in a more complicated quadrotor 3𝐷 navigation task.

IJCAI Conference 2023 Conference Paper

OSP2B: One-Stage Point-to-Box Network for 3D Siamese Tracking

  • Jiahao Nie
  • Zhiwei He
  • Yuxiang Yang
  • Zhengyi Bao
  • Mingyu Gao
  • Jing Zhang

Two-stage point-to-box network acts as a critical role in the recent popular 3D Siamese tracking paradigm, which first generates proposals and then predicts corresponding proposal-wise scores. However, such a network suffers from tedious hyper-parameter tuning and task misalignment, limiting the tracking performance. Towards these concerns, we propose a simple yet effective one-stage point-to-box network for point cloud-based 3D single object tracking. It synchronizes 3D proposal generation and center-ness score prediction by a parallel predictor without tedious hyper-parameters. To guide a task-aligned score ranking of proposals, a center-aware focal loss is proposed to supervise the training of the center-ness branch, which enhances the network's discriminative ability to distinguish proposals of different quality. Besides, we design a binary target classifier to identify target-relevant points. By integrating the derived classification scores with the center-ness scores, the resulting network can effectively suppress interference proposals and further mitigate task misalignment. Finally, we present a novel one-stage Siamese tracker OSP2B equipped with the designed network. Extensive experiments on challenging benchmarks including KITTI and Waymo SOT Dataset show that our OSP2B achieves leading performance with a considerable real-time speed.

NeurIPS Conference 2022 Conference Paper

APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking

  • Yuxiang Yang
  • Junjie Yang
  • Yufei Xu
  • Jing Zhang
  • Long Lan
  • Dacheng Tao

Animal pose estimation and tracking (APT) is a fundamental task for detecting and tracking animal keypoints from a sequence of video frames. Previous animal-related datasets focus either on animal tracking or single-frame animal pose estimation, and never on both aspects. The lack of APT datasets hinders the development and evaluation of video-based animal pose estimation and tracking methods, limiting the applications in real world, e. g. , understanding animal behavior in wildlife conservation. To fill this gap, we make the first step and propose APT-36K, i. e. , the first large-scale benchmark for animal pose estimation and tracking. Specifically, APT-36K consists of 2, 400 video clips collected and filtered from 30 animal species with 15 frames for each video, resulting in 36, 000 frames in total. After manual annotation and careful double-check, high-quality keypoint and tracking annotations are provided for all the animal instances. Based on APT-36K, we benchmark several representative models on the following three tracks: (1) supervised animal pose estimation on a single frame under intra- and inter-domain transfer learning settings, (2) inter-species domain generalization test for unseen animals, and (3) animal pose estimation with animal tracking. Based on the experimental results, we gain some empirical insights and show that APT-36K provides a useful animal pose estimation and tracking benchmark, offering new challenges and opportunities for future research. The code and dataset will be made publicly available at https: //github. com/pandorgan/APT-36K.

IJCAI Conference 2022 Conference Paper

SAR-to-Optical Image Translation via Neural Partial Differential Equations

  • Mingjin Zhang
  • Chengyu He
  • Jing Zhang
  • Yuxiang Yang
  • Xiaoqi Peng
  • Jie Guo

Synthetic Aperture Radar (SAR) becomes prevailing in remote sensing while SAR images are challenging to interpret by human visual perception due to the active imaging mechanism and speckle noise. Recent researches on SAR-to-optical image translation provide a promising solution and have attracted increasing attentions, though still suffering from low optical image quality with geometric distortion due to the large domain gap. In this paper, we mitigate this issue from a novel perspective, i. e. , neural partial differential equations (PDE). First, based on the efficient numerical scheme for solving PDE, i. e. , Taylor Central Difference (TCD), we devise a basic TCD residual block to build the backbone network, which promotes the extraction of useful information in SAR images by aggregating and enhancing features from different levels. Furthermore, inspired by the Perona-Malik Diffusion (PMD), we devise a PMD neural module to implement feature diffusion through layers, aiming at removing the noises in smooth regions while preserving the geometric structures. Assembling them together, we propose a novel SAR-to-Optical image translation network named S2O-NPDE, which delivers optical images with finer structures and less noise while enjoying an explainability advantage from explicit mathematical derivation. Experiments on the popular SEN1-2 dataset show that our model outperforms state-of-the-art methods in terms of both objective metrics and visual quality.

AAMAS Conference 2019 Conference Paper

NoRML: No-reward Meta Learning

  • Yuxiang Yang
  • Ken Caluwaerts
  • Atil Iscen
  • Jie Tan
  • Chelsea Finn

Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML’s finetune step. Our method has a more expressive update step than MAML, while maintaining MAML’s gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies’ parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks. Videos and source-code are available at https: //sites. google. com/ view/noreward-meta-rl/.

ICRA Conference 2019 Conference Paper

OpenRoACH: A Durable Open-Source Hexapedal Platform with Onboard Robot Operating System (ROS)

  • Liyu Wang
  • Yuxiang Yang
  • Gustavo J. Correa
  • Konstantinos Karydis
  • Ronald S. Fearing

OpenRoACH is a 15-cm 200-gram self-contained hexapedal robot with an onboard single-board computer. To our knowledge, it is the smallest legged robot with the capability of running the Robot Operating System (ROS) onboard. The robot is fully open sourced, uses accessible materials and off-the-shelf electronic components, can be fabricated with benchtop fast-prototyping machines such as a laser cutter and a 3D printer, and can be assembled by one person within two hours. Its sensory capacity has been tested with gyroscopes, accelerometers, Beacon sensors, color vision sensors, linescan sensors and cameras. It is low-cost within $150 including structure materials, motors, electronics, and a battery. The capabilities of OpenRoACH are demonstrated with multi-surface walking and running, 24-hour continuous walking burn-ins, carrying 200-gram dynamic payloads and 800-gram static payloads, and ROS control of steering based on camera feedback. Information and files related to mechanical design, fabrication, assembly, electronics, and control algorithms are all publicly available on https://wiki.eecs.berkeley.edu/biomimetics/Main/OpenRoACH.