Author name cluster

Yi Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Tapas Are Free! Training-Free Adaptation of Programmatic Agents via LLM-Guided Program Synthesis in Dynamic Environments

Jinwei Hu
Yi Dong
Youcheng Sun
Xiaowei Huang

Autonomous agents in safety-critical applications must continuously adapt to dynamic conditions without compromising performance and reliability. This work introduces TAPA (Training-free Adaptation of Programmatic Agents), a novel framework that positions large language models (LLMs) as intelligent moderators of the symbolic action space. Unlike prior programmatic agents typically generate a monolithic policy program or rely on fixed symbolic action sets, TAPA synthesizes and adapts modular programs for individual high-level actions, referred to as logical primitives. By decoupling strategic intent from execution, TAPA enables meta-agents to operate over an abstract, interpretable action space while the LLM dynamically generates, composes, and refines symbolic programs tailored to each primitive. Extensive experiments across cybersecurity and swarm intelligence domains validate TAPA's effectiveness. In autonomous DDoS defense scenarios, TAPA achieves 77.7% network uptime while maintaining near-perfect detection accuracy in unknown dynamic environments. In swarm intelligence formation control under environmental and adversarial disturbances, TAPA consistently preserves consensus at runtime where baseline methods fail. This work promotes a paradigm shift for autonomous system design in evolving environments, from policy adaptation to dynamic action adaptation.

PDF Details DOI

EAAI Journal 2025 Journal Article

A prior segmentation knowledge enhanced deep learning system for the classification of tumors in ultrasound image

Tao Jiang
Jun Guo
Wenyu Xing
Ming Yu
Yifang Li
Bo Zhang
Yi Dong
Dean Ta

Details DOI

IROS Conference 2025 Conference Paper

Affine-SLAM: A closed-form solution to the landmark-SLAM via affine relaxation

Shaoran Yang
Yi Dong

We propose a closed-form solution to the landmark simultaneous localization and mapping (landmarkSLAM) problem. The core idea is to extend the recent advancement in the generalized Procrustes analysis (GPA) research by incorporating an affine-relaxed odometry term. We show that the resulting affine relaxed landmark-SLAM formulation, termed affine-SLAM, can be solved globally in closed-form. Through numerical experiments, we demonstrate that the affine-SLAM solution is rather close to the optimal solution of the standard nonlinear least squares (NLS) optimization, and thus can be used either as a stand-alone approximate solution or as a high-quality initialization for NLS solvers.

Details

NeurIPS Conference 2025 Conference Paper

FALCON: Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model

Jinwei Hu
Zhenglin Huang
Xiangyu Yin
Wenjie Ruan
Guangliang Cheng
Yi Dong
Xiaowei Huang

Large language models have been widely applied, but can inadvertently encode sensitive or harmful information, raising significant safety concerns. Machine unlearning has emerged to alleviate this concern; however, existing training-time unlearning approaches, relying on coarse-grained loss combinations, have limitations in precisely separating knowledge and balancing removal effectiveness with model utility. In contrast, we propose $\textbf{F}$ine-grained $\textbf{A}$ctivation manipu$\textbf{L}$ation by $\textbf{C}$ontrastive $\textbf{O}$rthogonal u$\textbf{N}$alignment (FALCON), a novel representation-guided unlearning approach that leverages information-theoretic guidance for efficient parameter selection, employs contrastive mechanisms to enhance representation separation, and projects conflict gradients onto orthogonal subspaces to resolve conflicts between forgetting and retention objectives. Extensive experiments demonstrate that FALCON achieves superior unlearning effectiveness while maintaining model utility, exhibiting robust resistance against knowledge recovery attempts.

PDF Details

NeurIPS Conference 2025 Conference Paper

GenColor: Generative and Expressive Color Enhancement with Pixel-Perfect Texture Preservation

Yi Dong
Yuxi Wang
Xianhui Lin
Wenqi Ouyang
Zhiqi Shen
Peiran Ren
Ruoxi Fan
Rynson Lau

Color enhancement is a crucial yet challenging task in digital photography. It demands methods that are (i) expressive enough for fine-grained adjustments, (ii) adaptable to diverse inputs, and (iii) able to preserve texture. Existing approaches typically fall short in at least one of these aspects, yielding unsatisfactory results. We propose GenColor, a novel diffusion-based framework for sophisticated, texture-preserving color enhancement. GenColor reframes the task as conditional image generation. Leveraging ControlNet and a tailored training scheme, it learns advanced color transformations that adapt to diverse lighting and content. We train GenColor on ARTISAN, our newly collected large-scale dataset of 1. 2M high-quality photographs specifically curated for enhancement tasks. To overcome texture preservation limitations inherent in diffusion models, we introduce a color-transfer network with a novel degradation scheme that simulates texture–color relationships. This network achieves pixel-perfect texture preservation while enabling fine-grained color matching with the diffusion-generated reference images. Extensive experiments show that GenColor produces visually compelling results comparable to those of expert colorists and surpasses state-of-the-art methods in both subjective and objective evaluations. We have released the code and dataset.

PDF Details

NeurIPS Conference 2025 Conference Paper

HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages

Zhilin Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong

Preference datasets are essential for training general-domain, instruction-following language models with Reinforcement Learning from Human Feedback (RLHF). Each subsequent data release raises expectations for future data collection, meaning there is a constant need to advance the quality and diversity of openly available preference data. To address this need, we introduce HelpSteer3-Preference, a permissively licensed (CC-BY-4. 0), high-quality, human-annotated preference dataset comprising of over 40, 000 samples. These samples span diverse real-world applications of large language models (LLMs), including tasks relating to STEM, coding and multilingual scenarios. Using HelpSteer3-Preference, we train Reward Models (RMs) that achieve top performance on RM-Bench (82. 4%) and JudgeBench (73. 7%). This represents a substantial improvement (~10% absolute) over the previously best-reported results from existing RMs. We demonstrate HelpSteer3-Preference can also be applied to train Generative RMs and how policy models can be aligned with RLHF using our RMs.

PDF Details

NeurIPS Conference 2025 Conference Paper

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu
Shizhe Diao
Ximing Lu
Jian Hu
Xin Dong
Yejin Choi
Jan Kautz
Yi Dong

Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whether RL truly expands a model’s reasoning capabilities or merely amplifies high-reward outputs already latent in the base model’s distribution, and whether continually scaling up RL compute reliably leads to improved reasoning performance. In this work, we challenge prevailing assumptions by demonstrating that prolonged RL (ProRL) training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling. We introduce ProRL, a novel training methodology that incorporates KL divergence control, reference policy resetting, and a diverse suite of tasks. Our empirical analysis reveals that RL-trained models consistently outperform base models across a wide range of pass@$k$ evaluations, including scenarios where base models fail entirely regardless of the number of attempts. We further show that reasoning boundary improvements correlates strongly with task competence of base model and training duration, suggesting that RL can explore and populate new regions of solution space over time. These findings offer new insights into the conditions under which RL meaningfully expands reasoning boundaries in language models and establish a foundation for future work on long-horizon RL for reasoning. We will release model weights and data to support further research.

PDF Details

AAAI Conference 2024 Conference Paper

ChromaFusionNet (CFNet): Natural Fusion of Fine-Grained Color Editing

Yi Dong
Yuxi Wang
Ruoxi Fan
Wenqi Ouyang
Zhiqi Shen
Peiran Ren
Xuansong Xie

Digital image enhancement aims to deliver visually striking, pleasing images that align with human perception. While global techniques can elevate the image's overall aesthetics, fine-grained color enhancement can further boost visual appeal and expressiveness. However, colorists frequently face challenges in achieving accurate, localized color adjustments. Direct composition of these local edits can result in spatial color inconsistencies. Existing methods, including color style transfer and image harmonization, exhibit inconsistencies, especially at boundary regions. Addressing this, we present ChromaFusionNet (CFNet), a novel approach that views the color fusion problem through the lens of image color inpainting. Built on the Vision Transformer architecture, CFNet captures global context and delivers high-fidelity outputs, seamlessly blending colors while preserving boundary integrity. Empirical studies on ImageNet and COCO datasets demonstrate CFNet's superiority over existing methods in maintaining color harmony and color fidelity. Robustness evaluations and user studies have further validated the effectiveness of CFNet. In conclusion, CFNet introduces an innovative approach to seamless, fine-grained color fusion, paving the way for advancements in the domain of fine-grained color editing. Code and pretrained models are available at our project page: https://yidong.pro/projects/cfnet.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

HelpSteer 2: Open-source dataset for training top-performing reward models

Zhilin Wang
Yi Dong
Olivier Delalleau
Jiaqi Zeng
Gerald Shen
Daniel Egert
Jimmy J. Zhang
Makesh N. Sreedhar

High-quality preference datasets are essential for training reward models that can effectively guide large language models (LLMs) in generating high-quality responses aligned with human preferences. As LLMs become stronger and better aligned, permissively licensed preference datasets, such as Open Assistant, HH-RLHF, and HelpSteer need to be updated to remain effective for reward modeling. Methods that distil preference data from proprietary LLMs such as GPT-4 have restrictions on commercial usage imposed by model providers. To improve upon both generated responses and attribute labeling quality, we release HelpSteer2, a permissively licensed preference dataset (CC-BY-4. 0). Using a powerful Nemotron-4-340B base model trained on HelpSteer2, we are able to achieve the SOTA score (92. 0%) on Reward-Bench's primary dataset, outperforming currently listed open and proprietary models, as of June 12th, 2024. Notably, HelpSteer2 consists of only ten thousand response pairs, an order of magnitude fewer than existing preference datasets (e. g. , HH-RLHF), which makes it highly efficient for training reward models. Our extensive experiments demonstrate that reward models trained with HelpSteer2 are effective in aligning LLMs. Additionally, we propose SteerLM 2. 0, a model alignment approach that can effectively make use of the rich multi-attribute score predicted by our reward models. HelpSteer2 is available at https: //huggingface. co/datasets/nvidia/HelpSteer2 and code is available at https: //github. com/NVIDIA/NeMo-Aligner

PDF Details DOI

IJCAI Conference 2024 Conference Paper

SpecAR-Net: Spectrogram Analysis and Representation Network for Time Series

Yi Dong
Liwen Zhang
Youcheng Zhang
Shi Peng
Wen Chen
Zhe Ma

Representing temporal-structured samples is essential for effective time series analysis tasks. So far, recurrent networks, convolution networks and transformer-style models have been successively applied in temporal data representation, yielding notable results. However, most existing methods primarily focus on modeling and representing the variation patterns within time series in the time domain. As a highly abstracted information entity, 1D time series couples various patterns such as trends, seasonality, and dramatic changes (instantaneous high dynamic), it is difficult to exploit these highly coupled properties merely by analysis tools on purely time domain. To this end, we present Spectrogram Analysis and Representation Network (SpecAR-Net). SpecAR-Net aims at learning more comprehensive representations by modeling raw time series in both time and frequency domain, where an efficient joint extraction of time-frequency features is achieved through a group of learnable 2D multi-scale parallel complex convolution blocks. Experimental results show that the SpecAR-Net achieves excellent performance on 5 major downstream tasks i. e. , classification, anomaly detection, imputation, long- and short-term forecasting. Code and appendix are available at https: //github. com/Dongyi2go/SpecAR_Net.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Decentralised and Cooperative Control of Multi-Robot Systems through Distributed Optimisation

Yi Dong
Zhongguo Li
Xingyu Zhao
Zhengtao Ding
Xiaowei Huang

Multi-robot cooperative control has gained extensive research interest due to its wide applications in civil, security, and military domains. This paper proposes a cooperative control algorithm for multi-robot systems with general linear dynamics. The algorithm is based on distributed cooperative optimisation and output regulation, and it achieves global optimum by utilising only information shared among neighbouring robots. Technically, a high-level distributed optimisation algorithm for multi-robot systems is presented, which will serve as an optimal reference generator for each individual agent. Then, based on the distributed optimisation algorithm, an output regulation method is utilised to solve the optimal coordination problem for general linear dynamic systems. The convergence of the proposed algorithm is theoretically proved. Both numerical simulations and real-time physical robot experiments are conducted to validate the effectiveness of the proposed cooperative control algorithms.

PDF

AAAI Conference 2022 Conference Paper

Imbalance-Aware Uplift Modeling for Observational Data

Xuanying Chen
Zhining Liu
Li Yu
Liuyi Yao
Wenpeng Zhang
Yi Dong
Lihong Gu
Xiaodong Zeng

Uplift modeling aims to model the incremental impact of a treatment on an individual outcome, which has attracted great interests of researchers and practitioners from different communities. Existing uplift modeling methods rely on either the data collected from randomized controlled trials (RCTs) or the observational data which is more realistic. However, we notice that on the observational data, it is often the case that only a small number of subjects receive treatment, but finally infer the uplift on a much large group of subjects. Such highly imbalanced data is common in various fields such as marketing and medical treatment but it is rarely handled by existing works. In this paper, we theoretically and quantitatively prove that the existing representative methods, transformed outcome (TOM) and doubly robust (DR), suffer from large bias and deviation on highly imbalanced datasets with skewed propensity scores, mainly because they are proportional to the reciprocal of the propensity score. To reduce the bias and deviation of uplift modeling with an imbalanced dataset, we propose an imbalance-aware uplift modeling (IAUM) method via constructing a robust proxy outcome, which adaptively combines the doubly robust estimator and the imputed treatment effects based on the propensity score. We theoretically prove that IAUM can obtain a better bias-variance trade-off than existing methods on a highly imbalanced dataset. We conduct extensive experiments on a synthetic dataset and two real-world datasets, and the experimental results well demonstrate the superiority of our method over state-of-the-art.

PDF Details

AAAI Conference 2020 Short Paper

Generating Engaging Promotional Videos for E-commerce Platforms (Student Abstract)

Chang Liu
Han Yu
Yi Dong
Zhiqi Shen
Yingxue Yu
Ian Dixon
Zhanning Gao
Pan Wang

There is an emerging trend for sellers to use videos to promote their products on e-commerce platforms such as Taobao. com. Current video production workﬂow includes the production of visual storyline by human directors. We propose a system to automatically generate visual storyline based on the input set of visual materials (e. g. video clips or still images) and then produce a promotional video. In particular, we propose an algorithm called Shot Composition, Selection and Plotting (ShotCSP), which generates visual storylines leveraging ﬁlm-making principles to improve viewing experience and perceived persuasiveness.

PDF Details