Arrow Research search

Author name cluster

Yi Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments

  • Jinwei Hu
  • Yi Dong
  • Youcheng Sun
  • Xiaowei Huang

Autonomous agents in safety-critical applications must continuously adapt to dynamic conditions without compromising performance and reliability. This work introduces TAPA (Training-free Adaptation of Programmatic Agents), a novel framework that positions large language models (LLMs) as intelligent moderators of the symbolic action space. Unlike prior programmatic agents typically generate a monolithic policy program or rely on fixed symbolic action sets, TAPA synthesizes and adapts modular programs for individual high-level actions, referred to as logical primitives. By decoupling strategic intent from execution, TAPA enables meta-agents to operate over an abstract, interpretable action space while the LLM dynamically generates, composes, and refines symbolic programs tailored to each primitive. Extensive experiments across cybersecurity and swarm intelligence domains validate TAPA's effectiveness. In autonomous DDoS defense scenarios, TAPA achieves 77.7% network uptime while maintaining near-perfect detection accuracy in unknown dynamic environments. In swarm intelligence formation control under environmental and adversarial disturbances, TAPA consistently preserves consensus at runtime where baseline methods fail. This work promotes a paradigm shift for autonomous system design in evolving environments, from policy adaptation to dynamic action adaptation.

IROS Conference 2025 Conference Paper

Affine-SLAM: A closed-form solution to the landmark-SLAM via affine relaxation

  • Shaoran Yang
  • Yi Dong

We propose a closed-form solution to the landmark simultaneous localization and mapping (landmarkSLAM) problem. The core idea is to extend the recent advancement in the generalized Procrustes analysis (GPA) research by incorporating an affine-relaxed odometry term. We show that the resulting affine relaxed landmark-SLAM formulation, termed affine-SLAM, can be solved globally in closed-form. Through numerical experiments, we demonstrate that the affine-SLAM solution is rather close to the optimal solution of the standard nonlinear least squares (NLS) optimization, and thus can be used either as a stand-alone approximate solution or as a high-quality initialization for NLS solvers.

NeurIPS Conference 2025 Conference Paper

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

  • Jinwei Hu
  • Zhenglin Huang
  • Xiangyu Yin
  • Wenjie Ruan
  • Guangliang Cheng
  • Yi Dong
  • Xiaowei Huang

Large language models have been widely applied, but can inadvertently encode sensitive or harmful information, raising significant safety concerns. Machine unlearning has emerged to alleviate this concern; however, existing training-time unlearning approaches, relying on coarse-grained loss combinations, have limitations in precisely separating knowledge and balancing removal effectiveness with model utility. In contrast, we propose $\textbf{F}$ine-grained $\textbf{A}$ctivation manipu$\textbf{L}$ation by $\textbf{C}$ontrastive $\textbf{O}$rthogonal u$\textbf{N}$alignment (FALCON), a novel representation-guided unlearning approach that leverages information-theoretic guidance for efficient parameter selection, employs contrastive mechanisms to enhance representation separation, and projects conflict gradients onto orthogonal subspaces to resolve conflicts between forgetting and retention objectives. Extensive experiments demonstrate that FALCON achieves superior unlearning effectiveness while maintaining model utility, exhibiting robust resistance against knowledge recovery attempts.

NeurIPS Conference 2025 Conference Paper

GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation

  • Yi Dong
  • Yuxi Wang
  • Xianhui Lin
  • Wenqi Ouyang
  • Zhiqi Shen
  • Peiran Ren
  • Ruoxi Fan
  • Rynson Lau

Color enhancement is a crucial yet challenging task in digital photography. It demands methods that are (i) expressive enough for fine-grained adjustments, (ii) adaptable to diverse inputs, and (iii) able to preserve texture. Existing approaches typically fall short in at least one of these aspects, yielding unsatisfactory results. We propose GenColor, a novel diffusion-based framework for sophisticated, texture-preserving color enhancement. GenColor reframes the task as conditional image generation. Leveraging ControlNet and a tailored training scheme, it learns advanced color transformations that adapt to diverse lighting and content. We train GenColor on ARTISAN, our newly collected large-scale dataset of 1. 2M high-quality photographs specifically curated for enhancement tasks. To overcome texture preservation limitations inherent in diffusion models, we introduce a color-transfer network with a novel degradation scheme that simulates texture–color relationships. This network achieves pixel-perfect texture preservation while enabling fine-grained color matching with the diffusion-generated reference images. Extensive experiments show that GenColor produces visually compelling results comparable to those of expert colorists and surpasses state-of-the-art methods in both subjective and objective evaluations. We have released the code and dataset.

NeurIPS Conference 2025 Conference Paper

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

  • Zhilin Wang
  • Jiaqi Zeng
  • Olivier Delalleau
  • Hoo-Chang Shin
  • Felipe Soares
  • Alexander Bukharin
  • Ellie Evans
  • Yi Dong

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4. 0), high-quality, human-annotated preference dataset comprising of over 40, 000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82. 4%) and JudgeBench (73. 7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs.

NeurIPS Conference 2025 Conference Paper

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

  • Mingjie Liu
  • Shizhe Diao
  • Ximing Lu
  • Jian Hu
  • Xin Dong
  • Yejin Choi
  • Jan Kautz
  • Yi Dong

Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whether RL truly expands a model’s reasoning capabilities or merely amplifies high-reward outputs already latent in the base model’s distribution, and whether continually scaling up RL compute reliably leads to improved reasoning performance. In this work, we challenge prevailing assumptions by demonstrating that prolonged RL (ProRL) training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling. We introduce ProRL, a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. Our empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@$k$ evaluations, including scenarios where base models fail entirely regardless of the number of attempts. We further show that reasoning boundary improvements correlates strongly with task competence of base model and training duration, suggesting that RL can explore and populate new regions of solution space over time. These findings offer new insights into the conditions under which RL meaningfully expands reasoning boundaries in language models and establish a foundation for future work on long-horizon RL for reasoning. We will release model weights and data to support further research.

AAAI Conference 2024 Conference Paper

ChromaFusionNet (CFNet): Natural Fusion of Fine-Grained Color Editing

  • Yi Dong
  • Yuxi Wang
  • Ruoxi Fan
  • Wenqi Ouyang
  • Zhiqi Shen
  • Peiran Ren
  • Xuansong Xie

Digital image enhancement aims to deliver visually striking, pleasing images that align with human perception. While global techniques can elevate the image's overall aesthetics, fine-grained color enhancement can further boost visual appeal and expressiveness. However, colorists frequently face challenges in achieving accurate, localized color adjustments. Direct composition of these local edits can result in spatial color inconsistencies. Existing methods, including color style transfer and image harmonization, exhibit inconsistencies, especially at boundary regions. Addressing this, we present ChromaFusionNet (CFNet), a novel approach that views the color fusion problem through the lens of image color inpainting. Built on the Vision Transformer architecture, CFNet captures global context and delivers high-fidelity outputs, seamlessly blending colors while preserving boundary integrity. Empirical studies on ImageNet and COCO datasets demonstrate CFNet's superiority over existing methods in maintaining color harmony and color fidelity. Robustness evaluations and user studies have further validated the effectiveness of CFNet. In conclusion, CFNet introduces an innovative approach to seamless, fine-grained color fusion, paving the way for advancements in the domain of fine-grained color editing. Code and pretrained models are available at our project page: https://yidong.pro/projects/cfnet.

NeurIPS Conference 2024 Conference Paper

HelpSteer 2: Open-source dataset for training top-performing reward models

  • Zhilin Wang
  • Yi Dong
  • Olivier Delalleau
  • Jiaqi Zeng
  • Gerald Shen
  • Daniel Egert
  • Jimmy J. Zhang
  • Makesh N. Sreedhar

High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers. To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4. 0). Using a powerful Nemotron-4-340B base model trained on HelpSteer2, we are able to achieve the SOTA score (92. 0%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of June 12th, 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e. g. , HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. Additionally, we propose SteerLM 2. 0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models. HelpSteer2 is available at https: //huggingface. co/datasets/nvidia/HelpSteer2 and code is available at https: //github. com/NVIDIA/NeMo-Aligner

IJCAI Conference 2024 Conference Paper

SpecAR-Net: Spectrogram Analysis and Representation Network for Time Series

  • Yi Dong
  • Liwen Zhang
  • Youcheng Zhang
  • Shi Peng
  • Wen Chen
  • Zhe Ma

Representing temporal-structured samples is essential for effective time series analysis tasks. So far, recurrent networks, convolution networks and transformer-style models have been successively applied in temporal data representation, yielding notable results. However, most existing methods primarily focus on modeling and representing the variation patterns within time series in the time domain. As a highly abstracted information entity, 1D time series couples various patterns such as trends, seasonality, and dramatic changes (instantaneous high dynamic), it is difficult to exploit these highly coupled properties merely by analysis tools on purely time domain. To this end, we present Spectrogram Analysis and Representation Network (SpecAR-Net). SpecAR-Net aims at learning more comprehensive representations by modeling raw time series in both time and frequency domain, where an efficient joint extraction of time-frequency features is achieved through a group of learnable 2D multi-scale parallel complex convolution blocks. Experimental results show that the SpecAR-Net achieves excellent performance on 5 major downstream tasks i. e. , classification, anomaly detection, imputation, long- and short-term forecasting. Code and appendix are available at https: //github. com/Dongyi2go/SpecAR_Net.

AAMAS Conference 2023 Conference Paper

Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation

  • Yi Dong
  • Zhongguo Li
  • Xingyu Zhao
  • Zhengtao Ding
  • Xiaowei Huang

Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms.

AAAI Conference 2022 Conference Paper

Imbalance-Aware Uplift Modeling for Observational Data

  • Xuanying Chen
  • Zhining Liu
  • Li Yu
  • Liuyi Yao
  • Wenpeng Zhang
  • Yi Dong
  • Lihong Gu
  • Xiaodong Zeng

Uplift modeling aims to model the incremental impact of a treatment on an individual outcome, which has attracted great interests of researchers and practitioners from different communities. Existing uplift modeling methods rely on either the data collected from randomized controlled trials (RCTs) or the observational data which is more realistic. However, we notice that on the observational data, it is often the case that only a small number of subjects receive treatment, but finally infer the uplift on a much large group of subjects. Such highly imbalanced data is common in various fields such as marketing and medical treatment but it is rarely handled by existing works. In this paper, we theoretically and quantitatively prove that the existing representative methods, transformed outcome (TOM) and doubly robust (DR), suffer from large bias and deviation on highly imbalanced datasets with skewed propensity scores, mainly because they are proportional to the reciprocal of the propensity score. To reduce the bias and deviation of uplift modeling with an imbalanced dataset, we propose an imbalance-aware uplift modeling (IAUM) method via constructing a robust proxy outcome, which adaptively combines the doubly robust estimator and the imputed treatment effects based on the propensity score. We theoretically prove that IAUM can obtain a better bias-variance trade-off than existing methods on a highly imbalanced dataset. We conduct extensive experiments on a synthetic dataset and two real-world datasets, and the experimental results well demonstrate the superiority of our method over state-of-the-art.

AAAI Conference 2020 Short Paper

Generating Engaging Promotional Videos for E-commerce Platforms (Student Abstract)

  • Chang Liu
  • Han Yu
  • Yi Dong
  • Zhiqi Shen
  • Yingxue Yu
  • Ian Dixon
  • Zhanning Gao
  • Pan Wang

There is an emerging trend for sellers to use videos to promote their products on e-commerce platforms such as Taobao. com. Current video production workflow includes the production of visual storyline by human directors. We propose a system to automatically generate visual storyline based on the input set of visual materials (e. g. video clips or still images) and then produce a promotional video. In particular, we propose an algorithm called Shot Composition, Selection and Plotting (ShotCSP), which generates visual storylines leveraging film-making principles to improve viewing experience and perceived persuasiveness.