Arrow Research search

Author name cluster

Yuhan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

AAAI Conference 2026 Conference Paper

CrossCut: Cross-Patch Aware Interactive Segmentation for Remote Sensing Images

  • Zheng Lin
  • Nan Zhou
  • Yuhan Wang
  • Bojian Zhang

Interactive segmentation aims to delineate a user-specified target in an image by leveraging positive and negative clicks. While effective on natural images, existing methods often fail in remote sensing scenarios, where satellite imagery is characterized by ultra-high resolution, sparse object distribution, and significant scale variation. These factors hinder accurate segmentation of fine-grained targets like roads, buildings, and aircraft. To overcome these problems, we propose CrossCut, a novel interactive segmentation framework tailored for remote sensing imagery. Unlike previous approaches that either process the entire image or treat each patch independently, CrossCut enables simultaneous segmentation across multiple patches by propagating user click information to all patches. This design allows the model to fully utilize click guidance regardless of object location, effectively resolving the challenge of inter-patch information isolation. Furthermore, CrossCut supports flexible inference by allowing segmentation results from different patch configurations to be fused, enhancing both accuracy and robustness. Extensive evaluations across multiple remote sensing datasets demonstrate that CrossCut achieves state-of-the-art performance. Quantitative results and visualizations show that CrossCut significantly advances the field of interactive segmentation for remote sensing imagery.

EAAI Journal 2026 Journal Article

From human and object interaction to threat detection: An interpretable threat detection method for human violence scenarios

  • Yuhan Wang
  • Cheng Liu
  • Daou Zhang
  • Zihan Zhao
  • Jinyang Chen
  • Purui Dong
  • Zuyuan Yu
  • Ziru Wang

In light of the mounting imperative for public security, the necessity for automated threat detection in high-risk scenarios is becoming increasingly pressing. However, existing methods generally suffer from the problems of uninterpretable inference and biased semantic understanding, which severely limits their reliability in practical deployment. In order to address the aforementioned challenges, this article proposes a threat detection method based on human and object interaction (HOI) tags. This method is based on the fine-grained multimodal dataset, called threat detection by HOI (TD-Hoi), enhancing the model’s semantic modeling ability for key entities and their behavioral interactions by using structured HOI tags to guide language generation. Furthermore, a set of metrics is designed for the evaluation of text response quality, with the objective of systematically measuring the model’s representation accuracy and comprehensibility during threat interpretation. The experimental results have demonstrated that Hoi2Threat attains substantial enhancement in several threat detection tasks, particularly in the core metrics of Correctness of Information, Behavioral Mapping Accuracy, and Threat Detailed Orientation, which are 5. 08, 5. 04, and 4. 76, and 7. 10%, 6. 80%, and 2. 63%, respectively, in comparison with the state-of-the-art method. The aforementioned results provide comprehensive validation of the merits of this approach in the domains of semantic understanding, entity behavior mapping, and interpretability. Ultimately, our work paves the way for more reliable and transparent automated threat detection in real-world security operations.

AAAI Conference 2026 Conference Paper

STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

  • Zijun Wang
  • Haoqin Tu
  • Yuhan Wang
  • Juncheng Wu
  • Yanqing Liu
  • Jieru Mei
  • Brian R. Bartoldson
  • Bhavya Kailkhura

This paper introduces STAR-1, a high-quality, just-1k-scale safety dataset specifically designed for large reasoning models (LRMs) like DeepSeek-R1. Built on three core principles --- diversity, deliberative reasoning, and rigorous filtering --- STAR-1 aims to address the critical needs for safety alignment in LRMs. Specifically, we begin by integrating existing open-source safety datasets from diverse sources. Then, we curate safety policies to generate policy-grounded deliberative reasoning samples. Lastly, we apply a GPT-4o-based safety scoring system to select training examples aligned with best practices. Experimental results show that fine-tuning LRMs with STAR-1 leads to an average 40% improvement in safety performance across four benchmarks, while only incurring a marginal decrease (e.g., an average of 1.1%) in reasoning ability measured across five reasoning tasks. Extensive ablation studies further validate the importance of our design principles in constructing STAR-1 and analyze its efficacy across both LRMs and traditional LLMs.

YNIMG Journal 2025 Journal Article

Developmental patterns of white matter functional networks in neonates

  • Yuhan Wang
  • Ningning Pan
  • Zhuoshuo Li
  • Yating Wang
  • Ruoqing Chen
  • Zhicong Fang
  • Minmin Pan
  • Hongzhuang Li

In recent years, the development of neonatal brain networks has become a research focus, with traditional studies primarily emphasizing gray matter (GM) functional networks. This study systematically explores the developmental characteristics of white matter (WM) functional networks in neonates. Utilizing data from the third release of the Developing Human Connectome Project (dHCP), we analyzed resting-state functional magnetic resonance imaging (rs-fMRI) data from 730 full-term and 157 preterm neonates. We successfully identified ten large-scale WM functional networks and validated their correspondence with established WM fiber tracts using diffusion tensor imaging (DTI). We examined WM functional networks from two dimensions: network functional connectivity and spontaneous activity, incorporating four factors: preterm birth status, age, sex, and hemispheric differences. The results indicate that WM network functional connectivity significantly increases with age, with preterm infants exhibiting lower connectivity than full-term infants, whereas no significant differences were observed between sexes or hemispheres. Regarding spontaneous activity, preterm infants showed lower amplitude in the low-frequency range, whereas in the high-frequency range, their amplitude distribution was more unstable and dispersed. Additionally, certain differences in spontaneous activity were observed between hemispheres and sexes. These findings provide novel insights into the early development of neonatal brain networks and hold significant implications for clinical interventions and treatment strategies for preterm infants.

IROS Conference 2025 Conference Paper

IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter

  • Xiaohong Liu
  • Xulong Zhao
  • Gang Liu
  • Zili Wu
  • Tao Wang
  • Lei Meng
  • Yuhan Wang

3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects, assisting robots or vehicles in smarter path planning and obstacle avoidance. Existing 3D MOT methods based on the Tracking-by-Detection framework typically use a single motion model to track an object throughout its entire tracking process. However, objects may change their motion patterns due to variations in the surrounding environment. In this paper, we introduce the Interacting Multiple Model filter in IMM-MOT, which accurately fits the complex motion patterns of individual objects, overcoming the limitation of single-model tracking in existing approaches. In addition, we incorporate a Damping Window mechanism into the trajectory lifecycle management, leveraging the continuous association status of trajectories to control their creation and termination, reducing the occurrence of overlooked low-confidence true targets. Furthermore, we propose the Distance-Based Score Enhancement module, which enhances the differentiation between false positives and true positives by adjusting detection scores, thereby improving the effectiveness of the Score Filter. On the NuScenes Val dataset, IMM-MOT outperforms most other single-modal models using 3D point clouds, achieving an AMOTA of 73. 8%. Our project is available at https://github.com/Ap01lo/IMM-MOT.

IROS Conference 2025 Conference Paper

LLM-Driven Hierarchical Planning: Long-horizon Task Allocation for Multi-Robot Systems in Cross-Regional Environments

  • Yachao Wang
  • Yangshuo Dong
  • Yunting Yang
  • Xiang Zhang
  • Yinchuan Wang
  • Yuhan Wang
  • Chaoqun Wang 0009
  • Max Q. -H. Meng

Long-horizon composite task planning for multi-robot systems in cross-regional complex scenarios faces dual challenges: spatial-semantic comprehension of natural language described tasks and collaborative optimization of subtask al-location. To address these challenges, this paper proposes a progressive three-stage task planning framework. First, an augmented scene graph is constructed to enable large language models (LLMs) to comprehend environmental structures, thereby generating simplified Linear Temporal Logic (LTL) task sequences. Subsequently, a novel heuristic function is employed to select optimal task allocation plans. Finally, LLMs are used to generate low-level executable robot instructions based on robotic system instruction templates. We establish a long-horizon composite task dataset for experimental validation on real-world quadrupedal multi-robot systems. Experimental results demonstrate the effectiveness of our approach in resolving cross-regional composite tasks.

NeurIPS Conference 2025 Conference Paper

MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations

  • Wenxiang Guo
  • Changhao Pan
  • Zhiyuan Zhu
  • Xintong Hu
  • Yu Zhang
  • Li Tang
  • Rui Yang
  • Han Wang

Humans rely on multisensory integration to perceive spatial environments, where auditory cues enable sound source localization in three-dimensional space. Despite the critical role of spatial audio in immersive technologies such as VR/AR, most existing multimodal datasets provide only monaural audio, which limits the development of spatial audio generation and understanding. To address these challenges, we introduce MRSAudio, a large-scale multimodal spatial audio dataset designed to advance research in spatial audio understanding and generation. MRSAudio spans four distinct components: MRSLife, MRSSpeech, MRSMusic, and MRSSing, covering diverse real-world scenarios. The dataset includes synchronized binaural and ambisonic audio, exocentric and egocentric video, motion trajectories, and fine-grained annotations such as transcripts, phoneme boundaries, lyrics, scores, and prompts. To demonstrate the utility and versatility of MRSAudio, we establish five foundational tasks: audio spatialization, and spatial text to speech, spatial singing voice synthesis, spatial music generation and sound event localization and detection. Results show that MRSAudio enables high-quality spatial modeling and supports a broad range of spatial audio research. Demos and dataset access are available at https: //mrsaudio. github. io.

AAAI Conference 2025 Conference Paper

On Designing the Optimal Integrated Ad Auction in E-commerce Platforms

  • Yuchao Ma
  • Weian Li
  • Yuhan Wang
  • Zitian Guo
  • Yuejia Dou
  • Qi Qi
  • Changyuan Yu

Currently, e-commerce platforms integrate ads and organic content into a mixed list for users. While platforms seek to maximize profit from advertisers, organic items enhance user experience. To ensure long-term development, platforms aim to design mechanisms that optimize both revenue and user satisfaction. Current methods rank ads and organic items separately before integrating them. Even if each part is locally optimal, the combined result may not be globally optimal. In this paper, we come up with the Joint Integrated Regret Network (JINTER Net). Unlike traditional methods, which pre-order ads and organic items separately, JINTER Net directly selects from the combined set of candidate ads and organic items to generate an optimal list. This approach aims to optimally balance platform revenue and user experience while satisfying approximate dominant strategy incentive compatibility and individual rationality. We validate the effectiveness of JINTER Net using both synthetic data and real dataset, and our experimental results show that it significantly outperforms baseline models across multiple metrics.

IJCAI Conference 2025 Conference Paper

Tensor Network: from the Perspective of AI4Science and Science4AI

  • Junchi Yan
  • Yehui Tang
  • Xinyu Ye
  • Hao Xiong
  • Xiaoqiu Zhong
  • Yuhan Wang
  • Yuan Qi

Tensor network has been a promising numerical tool for computational problems across science and AI. For their emerging and fast development especially in the intersection between AI and science, this paper tries to present a compact review, regarding both their applications and its own recent technical development including open-source tools. Specifically, we make the observations that tensor network plays a functional role in matrix compression and representation, information fusion, as well as quantum-inspired algorithms, which can be generally regarded as Science4AI in our survey. On the other hand, there is an emerging line of research in tensor network in AI4Science especially like learning quantum many-body physics by using e. g. neural network quantum state. Importantly, we unify tensorization methodologies across classical and modern architectures, and particularly show how tensorization bridges low-order parameter spaces to high-dimensional representations without exponential parameter growth, and further point out their potential use in scientific computing. We conclude the paper with outlook for future trends.

ICLR Conference 2024 Conference Paper

Symbol as Points: Panoptic Symbol Spotting via Point-based Representation

  • Wenlong Liu
  • Tianyu Yang
  • Yuhan Wang
  • Qizhi Yu
  • Lei Zhang

This work studies the problem of panoptic symbol spotting, which is to spot and parse both countable object instances (windows, doors, tables, etc.) and uncountable stuff (wall, railing, etc.) from computer-aided design (CAD) drawings. Existing methods typically involve either rasterizing the vector graphics into images and using image-based methods for symbol spotting, or directly building graphs and using graph neural networks for symbol recognition. In this paper, we take a different approach, which treats graphic primitives as a set of 2D points that are locally connected and use point cloud segmentation methods to tackle it. Specifically, we utilize a point transformer to extract the primitive features and append a mask2former-like spotting head to predict the final output. To better use the local connection information of primitives and enhance their discriminability, we further propose the attention with connection module (ACM) and contrastive connection learning scheme (CCL). Finally, we propose a KNN interpolation mechanism for the mask attention module of the spotting head to better handle primitive mask downsampling, which is primitive-level in contrast to pixel-level for the image. Our approach, named SymPoint, is simple yet effective, outperforming recent state-of-the-art method GAT-CADNet by an absolute increase of 9.6% PQ and 10.4% RQ on the FloorPlanCAD dataset. The source code and models will be available at \url{https://github.com/nicehuster/SymPoint}.

AIIM Journal 2023 Journal Article

Multi-task learning framework to predict the status of central venous catheter based on radiographs

  • Yuhan Wang
  • Hak Keung Lam
  • Yujia Xu
  • Faliang Yin
  • Kun Qian

Hospital patients can have catheters and lines inserted during the course of their admission to give medicines for the treatment of medical issues, especially the central venous catheter (CVC). However, malposition of CVC will lead to many complications, even death. Clinicians always detect the status of the catheter to avoid the above issues via X-ray images. To reduce the workload of clinicians and improve the efficiency of CVC status detection, a multi-task learning framework for catheter status classification based on the convolutional neural network (CNN) is proposed. The proposed framework contains three significant components which are modified HRNet, multi-task supervision including segmentation supervision and heatmap regression supervision as well as classification branch. The modified HRNet maintaining high-resolution features from the start to the end can ensure to generation of high-quality assisted information for classification. The multi-task supervision can assist in alleviating the presence of other line-like structures such as other tubes and anatomical structures shown in the X-ray image. Furthermore, during the inference, this module is also considered as an interpretation interface to show where the framework pays attention to. Eventually, the classification branch is proposed to predict the class of the status of the catheter. A public CVC dataset is utilized to evaluate the performance of the proposed method, which gains 0. 823 AUC (Area under the ROC curve) and 82. 6% accuracy in the test dataset. Compared with two state-of-the-art methods (ATCM method and EDMC method), the proposed method can perform best.

IJCAI Conference 2021 Conference Paper

HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping

  • Yuhan Wang
  • Xu Chen
  • Junwei Zhu
  • Wenqing Chu
  • Ying Tai
  • Chengjie Wang
  • Jilin Li
  • Yongjian Wu

In this work, we propose a high fidelity face swapping method, called HifiFace, which can well preserve the face shape of the source face and generate photo-realistic results. Unlike other existing face swapping works that only use face recognition model to keep the identity similarity, we propose 3D shape-aware identity to control the face shape with the geometric supervision from 3DMM and 3D face reconstruction method. Meanwhile, we introduce the Semantic Facial Fusion module to optimize the combination of encoder and decoder features and make adaptive blending, which makes the results more photo-realistic. Extensive experiments on faces in the wild demonstrate that our method can preserve better identity, especially on the face shape, and can generate more photo-realistic results than previous state-of-the-art methods. Code is available at: https: //johann. wang/HifiFace