Arrow Research search

Author name cluster

Yihao Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

Jupiter: Enhancing LLM Data Analysis Capabilities via Notebook and Inference-Time Value-Guided Search

  • Shuocheng Li
  • Yihao Liu
  • Silin Du
  • Wenxuan Zeng
  • Zhe Xu
  • Mengyu Zhou
  • Yeye He
  • Haoyu Dong

Large language models (LLMs) have shown great promise in automating data science workflows. However, existing models still struggle with multi-step reasoning and tool use, limiting their effectiveness on complex data analysis tasks. To address this limitation, we propose a scalable pipeline that extracts high-quality, tool-based data analysis tasks and their executable multi-step solutions from real-world Jupyter notebooks and associated data files. Using this pipeline, we introduce NbQA, a large-scale dataset of standardized task–solution pairs that reflect authentic tool-use patterns in practical data science scenarios. To further enhance the multi-step reasoning capabilities, we present Jupiter, a framework that formulates data analysis as a search problem and applies Monte Carlo Tree Search (MCTS) to generate diverse solution trajectories for value model learning. During inference, Jupiter combines the value model and node visit counts to efficiently collect executable multi-step plans with minimal search steps. Experimental results show that Qwen2.5-7B and 14B-Instruct models on NbQA solve 77.82% and 86.38% of tasks on InfiAgent-DABench, respectively—matching or surpassing GPT-4o and advanced agent frameworks. Further evaluations demonstrate improved generalization and stronger tool-use reasoning across diverse multi-step reasoning tasks.

AAAI Conference 2026 Conference Paper

SynWeather: Weather Observation Data Synthesis Across Multiple Regions and Variables via a General Diffusion Transformer

  • Kaiyi Xu
  • Junchao Gong
  • Zhiwang Zhou
  • Zhangrui Li
  • Yuandong Pu
  • Yihao Liu
  • Ben Fei
  • Fenghua Ling

With the advancement of meteorological instruments, abundant data has become available. However, due to instruments’ intrinsic limitations such as environmental sensitivity and orbital constraints, raw data often suffer from temporal or spatial gaps, making it urgent to leverage data synthesis techniques to fill in missing information. Current approaches are typically focus on single-variable, single-region tasks and primarily rely on deterministic modeling. This limits unified synthesis across variables and regions, overlooks cross-variable complementarity and often leads to over-smoothed results. To address above challenges, we introduce SynWeather, the first dataset designed for Unified Multi-region and Multi-variable Weather Observation Data Synthesis. SynWeather covers four representative regions: the Continental United States, Europe, East Asia, and Tropical Cyclone regions, as well as provides high-resolution observations of key weather variables, including Composite Radar Reflectivity, Hourly Precipitation, Visible Light, and Microwave Brightness Temperature. In addition, we introduce SynWeatherDiff, a general and probabilistic weather synthesis model built upon the Diffusion Transformer framework to address the over-smoothed problem. Experiments on the SynWeather dataset demonstrate the effectiveness of our network compared with both task-specific and general models. Moreover, SynWeatherDiff is able to generate results that are both fine-grained and accurate in high-value regions. Through the dataset and baseline model, we aim to advance meteorological downstream tasks and promote the development of general models for weather variable synthesis.

IROS Conference 2025 Conference Paper

dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale

  • Yihao Liu
  • Yu-Chun Ku
  • Jiaming Zhang
  • Hao Ding 0021
  • Peter Kazanzides
  • Mehran Armand

Data scarcity has long been an issue in the robot learning community. Particularly, in safety-critical domains like surgical applications, obtaining high-quality data can be especially difficult. It poses challenges to researchers seeking to exploit recent advancements in reinforcement learning and imitation learning, which have greatly improved generalizability and enabled robots to conduct tasks autonomously. We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings. The system uses Augmented Reality (AR) hand tracking and a high-fidelity physics engine to capture subtle maneuvers in primitive surgical tasks: By eliminating the need for a physical robot setup and providing flexibility in terms of time, space, and hardware resources-such as multiview sensors and actuators-specialized simulation is a viable alternative. At the same time, AR allows the robot data collection to be more egocentric, supported by its body tracking and content overlaying capabilities. Our user study confirms the proposed system’s efficiency and usability, where we use widely-used primitive tasks for training teleoperation with da Vinci surgical robots. Data throughput improves across all tasks compared to real robot settings by 41% on average. The total experiment time is reduced by an average of 10%. The temporal demand in the task load survey is improved. These gains are statistically significant. Additionally, the collected data is over 400 times smaller in size, requiring far less storage while achieving double the frequency. The source code for this project can be accessed at https://dartvinci.finite-state.com/.

NeurIPS Conference 2025 Conference Paper

Learning Differential Pyramid Representation for Tone Mapping

  • Qirui Yang
  • Yinbo Li
  • Yihao Liu
  • Peng-tao Jiang
  • Fangpu Zhang
  • cheng qihua
  • Huanjing Yue
  • Jingyu Yang

Existing tone mapping methods operate on downsampled inputs and rely on handcrafted pyramids to recover high-frequency details. Existing tone mapping methods operate on downsampled inputs and rely on handcrafted pyramids to recover high-frequency details. These designs typically fail to preserve fine textures and structural fidelity in complex HDR scenes. Furthermore, most methods lack an effective mechanism to jointly model global tone consistency and local contrast enhancement, leading to globally flat or locally inconsistent outputs such as halo artifacts. We present the Differential Pyramid Representation Network (DPRNet), an end-to-end framework for high-fidelity tone mapping. At its core is a learnable differential pyramid that generalizes traditional Laplacian and Difference-of-Gaussian pyramids through content-aware differencing operations across scales. This allows DPRNet to adaptively capture high-frequency variations under diverse luminance and contrast conditions. To enforce perceptual consistency, DPRNet incorporates global tone perception and local tone tuning modules operating on downsampled inputs, enabling efficient yet expressive tone adaptation. Finally, an iterative detail enhancement module progressively restores the full-resolution output in a coarse-to-fine manner, reinforcing structure and sharpness. Experiments show that DPRNet achieves state-of-the-art results, improving PSNR by 2. 39 dB on the 4K HDR+ dataset and 3. 01 dB on the 4K HDRI Haven dataset, while producing perceptually coherent, detail-preserving results. Demo available at DPRNet.

NeurIPS Conference 2025 Conference Paper

LOMIA: Label-Only Membership Inference Attacks against Pre-trained Large Vision-Language Models

  • Yihao Liu
  • Xinqi Lyu
  • Dong Wang
  • Yanjie Li
  • Bin Xiao

Large vision-language models (VLLMs) have driven significant progress in multi-modal systems, enabling a wide range of applications across domains such as healthcare, education, and content generation. Despite the success, the large-scale datasets used to train these models often contain sensitive or personally identifiable information, raising serious privacy concerns. To audit and better understand such risks, membership inference attacks (MIAs) have become a key tool. However, existing MIAs against VLLMs predominantly assume access to full-model logits, which are typically unavailable in many practical deployments. To facilitate MIAs in a more realistic and restrictive setting, we propose a novel framework: label-only membership inference attacks (LOMIA) targeting pre-trained VLLMs where only the model’s top-1 prediction is available. Within this framework, we propose three effective attack methods, all of which exploit the intuition that training samples are more likely to be memorized by the VLLMs, resulting in outputs that exhibit higher semantic alignment and lower perplexity. Our experiments show that our framework surpasses existing label-only attack adaptations for different VLLMs and competes with state-of-the-art logits-based attacks across all metrics on three widely used open-source VLLMs and GPT-4o.

IROS Conference 2025 Conference Paper

Look Before You Leap: Using Serialized State Machine for Language Conditioned Robotic Manipulation

  • Tong Mu
  • Yihao Liu
  • Mehran Armand

Imitation learning frameworks for robotic manipulation have drawn attention in the recent development of language model grounded robotics. However, the success of the frameworks largely depends on the coverage of the demonstration cases: When the demonstration set does not include examples of how to act in all possible situations, the action may fail and can result in cascading errors. To solve this problem, we propose a framework that uses serialized Finite State Machine (FSM) to generate demonstrations and improve the success rate in manipulation tasks requiring a long sequence of precise interactions. To validate its effectiveness, we use environmentally evolving and long-horizon puzzles that require long sequential actions. Experimental results show that our approach achieves a success rate of up to 98% in these tasks, compared to the controlled condition using existing approaches, which only had a success rate of up to 60%, and, in some tasks, almost failed completely. The source code for this project can be accessed at https://imitate.finite-state.com/.

NeurIPS Conference 2025 Conference Paper

StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations

  • Yanjie Li
  • Wenxuan Zhang
  • Xinqi Lyu
  • Yihao Liu
  • Bin Xiao

Recently, text-to-image diffusion models have been widely used for style mimicry and personalized customization through methods such as DreamBooth and Textual Inversion. This has raised concerns about intellectual property protection and the generation of deceptive content. Recent studies, such as Glaze and Anti-DreamBooth, have proposed using adversarial noise to protect images from these attacks. However, recent purification-based methods, such as DiffPure and Noise Upscaling, have successfully attacked these latest defenses, showing the vulnerabilities of these methods. Moreover, present methods show limited transferability across models, making them less effective against unknown text-to-image models. To address these issues, we propose a novel anti-mimicry method, StyleGuard. We propose a novel style loss that optimizes the style-related features in the latent space to make it deviate from the original image, which improves model-agnostic transferability. Additionally, to enhance the perturbation's ability to bypass diffusion-based purification, we designed a novel upscale loss that involves ensemble purifiers and upscalers during training. Extensive experiments on the WikiArt and CelebA datasets demonstrate that StyleGuard outperforms existing methods in robustness against various transformations and purifications, effectively countering style mimicry in various models. Moreover, StyleGuard is effective on different style mimicry methods, including DreamBooth and Textual Inversion. The code is available at \url{https: //github. com/PolyLiYJ/StyleGuard}.

ICLR Conference 2025 Conference Paper

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

  • Xiangyu Zhao
  • Zhiwang Zhou
  • Wenlong Zhang
  • Yihao Liu
  • Xiangyu Chen
  • Junchao Gong
  • Hao Chen 0045
  • Ben Fei

The Earth's weather system involves intricate weather data modalities and diverse weather understanding tasks, which hold significant value to human life. Existing data-driven models focus on single weather understanding tasks (e.g., weather forecasting). While these models have achieved promising results, they fail to tackle various complex tasks within a single and unified model. Moreover, the paradigm that relies on limited real observations for a single scenario hinders the model's performance upper bound. Inspired by the in-context learning paradigm from visual foundation models and large language models, in this paper, we introduce the first generalist weather generalist foundation model (WeatherGFM) to address weather understanding tasks in a unified manner. Specifically, we first unify the representation and definition for diverse weather understanding tasks. Subsequently, we design weather prompt formats to handle different weather data modalities, including single, multiple, and temporal modalities. Finally, we adopt a visual prompting question-answering paradigm for the training of unified weather understanding tasks. Extensive experiments indicate that our WeatherGFM can effectively handle up to 12 weather understanding tasks, including weather forecasting, super-resolution, weather image translation, and post-processing. Our method also showcases generalization ability on unseen tasks. The source code is available at https://github.com/xiangyu-mm/WeatherGFM.

AAMAS Conference 2023 Conference Paper

Minimising Task Tardiness for Multi-Agent Pickup and Delivery

  • Saravanan Ramanathan
  • Yihao Liu
  • Xueyan Tang
  • Wentong Cai
  • Jingning Li

Multi-agent pickup and delivery, a variant of the multi-agent path finding problem, aims to find collision-free paths for a set of agents performing a continuous stream of pickup and delivery tasks. Owing to the service guarantee nature of applications, these agents often need to execute the tasks within their stipulated deadlines. When failure to meet task deadlines is unavoidable, there is a need to minimise the tardiness experienced by the tasks. To address this problem, we propose a cost-based integrated task assignment and path planning algorithm to assign tasks to the agents.

AAAI Conference 2020 Conference Paper

FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing

  • Yu Dong
  • Yihao Liu
  • He Zhang
  • Shifeng Chen
  • Yu Qiao

Recently, convolutional neural networks (CNNs) have achieved great improvements in single image dehazing and attained much attention in research. Most existing learningbased dehazing methods are not fully end-to-end, which still follow the traditional dehazing procedure: first estimate the medium transmission and the atmospheric light, then recover the haze-free image based on the atmospheric scattering model. However, in practice, due to lack of priors and constraints, it is hard to precisely estimate these intermediate parameters. Inaccurate estimation further degrades the performance of dehazing, resulting in artifacts, color distortion and insufficient haze removal. To address this, we propose a fully end-to-end Generative Adversarial Networks with Fusiondiscriminator (FD-GAN) for image dehazing. With the proposed Fusion-discriminator which takes frequency information as additional priors, our model can generator more natural and realistic dehazed images with less color distortion and fewer artifacts. Moreover, we synthesize a large-scale training dataset including various indoor and outdoor hazy images to boost the performance and we reveal that for learning-based dehazing methods, the performance is strictly influenced by the training data. Experiments have shown that our method reaches state-of-the-art performance on both public synthetic datasets and real-world images with more visually pleasing dehazed results.