Author name cluster

Kanghoon Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2026 System Paper

RAPID: A Rapid Prototyping Platform for Industrial Automation

Sunghoon Hong
Junseok Park
Whiyoung Jung
Deunsol Yoon
Woohyung Lim
Soonyoung Lee
Kanghoon Lee

Industrial automation in smart logistics and factories requires simulation platforms that support rapid environment building before costly physical deployment. Yet existing tools often require substantial expertise, complex setup, and long configuration times, hindering agile prototyping. We present RAPID, a simulation platform with two components: layout design, which enables intuitive visual configuration of factory layouts, and behavior simulation and validation, which allows users to attach behavior models and evaluate system performance. RAPID lowers the entry barrier to industrial simulation, letting users apply existing behavior models or trained reinforcement learning (RL) agents to new layouts with minimal effort. This approach lets practitioners prototype facilities in minutes rather than weeks and gives researchers a standardized environment for benchmarking multi-agent RL and coordination algorithms. By combining rapid design with simulation-based validation, RAPID accelerates automation development from concept to implementation.

PDF Details DOI

AAAI Conference 2026 System Paper

RL-Studio: A System for Multi-Phase Reinforcement Learning Experimentation

Whiyoung Jung
Sunghoon Hong
Deunsol Yoon
Jeonghye Kim
Yongjae Shin
Suhyun Jung
Hyundam Yoo
Youngjin Kim

Reinforcement learning (RL) has evolved beyond monolithic training, yet existing frameworks remain limited to single algorithms or simple offline-to-online transitions. We present multi-phase RL, a framework that orchestrates multiple learning phases for continual policy improvement. It enables efficient fine-tuning of pretrained policies with new data and smooth adaptation from simulation to real-world environments. To support this paradigm, we introduce RL-Studio, a platform that addresses key implementation barriers, including neural architecture mismatches, parameter transfer complexities, and experiment management overhead. It provides phase orchestration, transition-point monitoring, and full experiment lineage tracking. We demonstrate the effectiveness of multi-phase RL through representative scenarios and highlight RL-Studio’s capabilities.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution

Zhikai Zhao
Chuanbo Hua
Federico Berto
Kanghoon Lee
Zihan Ma
Jiachen Li
Jinkyoo Park

Trajectory prediction is a crucial task in modeling human behavior, especially in safety-critical fields such as social robotics and autonomous vehicle navigation. Traditional heuristics based on handcrafted rules often lack accuracy, while recently proposed deep learning approaches suffer from computational cost, slow inference speed, lack of explainability, and generalization issues that limit their practical adoption in such environments. In this paper, we introduce TrajEvo, a framework that leverages Large Language Models (LLMs) to automatically design trajectory prediction heuristics. TrajEvo employs an evolutionary algorithm to generate and refine prediction heuristics from past trajectory data. We introduce a Cross-Generation Elite Sampling to promote population diversity and a Statistics Feedback Loop allowing the LLM to analyze alternative predictions. Our evaluations show TrajEvo outperforms previous heuristic methods on various real-world datasets, and remarkably outperforms both heuristics and deep learning methods when generalizing to an unseen real-world dataset. TrajEvo represents a first step toward automated design of fast, explainable, and generalizable trajectory prediction heuristics. We make our source code publicly available to foster future research.

PDF Details DOI

ICML Conference 2025 Conference Paper

Agent-Centric Actor-Critic for Asynchronous Multi-Agent Reinforcement Learning

Whiyoung Jung
Sunghoon Hong
Deunsol Yoon
Kanghoon Lee
Woohyung Lim

Multi-Agent Reinforcement Learning (MARL) struggles with coordination in sparse reward environments. Macro-actions —sequences of actions executed as single decisions— facilitate long-term planning but introduce asynchrony, complicating Centralized Training with Decentralized Execution (CTDE). Existing CTDE methods use padding to handle asynchrony, risking misaligned asynchronous experiences and spurious correlations. We propose the Agent-Centric Actor-Critic (ACAC) algorithm to manage asynchrony without padding. ACAC uses agent-centric encoders for independent trajectory processing, with an attention-based aggregation module integrating these histories into a centralized critic for improved temporal abstractions. The proposed structure is trained via a PPO-based algorithm with a modified Generalized Advantage Estimation for asynchronous environments. Experiments show ACAC accelerates convergence and enhances performance over baselines in complex MARL tasks.

Details

IROS Conference 2025 Conference Paper

Human Implicit Preference-Based Policy Fine-tuning for Multi-Agent Reinforcement Learning in USV Swarm

Hyeonjun Kim
Kanghoon Lee
Junho Park
Jiachen Li 0001
Jinkyoo Park

Multi-Agent Reinforcement Learning (MARL) has shown promise in solving complex problems involving cooperation and competition among agents, such as an Unmanned Surface Vehicle (USV) swarm used in search and rescue, surveillance, and vessel protection. However, aligning system behavior with user preferences is challenging due to the difficulty of encoding expert intuition into reward functions. To address the issue, we propose a Reinforcement Learning with Human Feedback (RLHF) approach for MARL that resolves credit-assignment challenges through an Agent-Level Feedback system categorizing feedback into intra-agent, inter-agent, and intra-team types. To overcome the challenges of direct human feedback, we employ a Large Language Model (LLM) evaluator to validate our approach using feedback scenarios such as region constraints, collision avoidance, and task allocation. Our method effectively refines USV swarm policies, addressing key challenges in multi-agent systems while maintaining fairness and performance consistency.

Details

AAAI Conference 2025 Conference Paper

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

Shiqi Lei
Kanghoon Lee
Linjing Li
Jinkyoo Park

The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.

PDF Details DOI

ICML Conference 2025 Conference Paper

Online Pre-Training for Offline-to-Online Reinforcement Learning

Yongjae Shin
Jeonghye Kim
Whiyoung Jung
Sunghoon Hong
Deunsol Yoon
Youngsoo Jang
Geon-Hyeong Kim
Jongseong Chae

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

Details

ICML Conference 2025 Conference Paper

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

Jeonghye Kim
Yongjae Shin
Whiyoung Jung
Sunghoon Hong
Deunsol Yoon
Youngchul Sung
Kanghoon Lee
Woohyung Lim

Reinforcement learning with offline data suffers from Q-value extrapolation errors. To address this issue, we first demonstrate that linear extrapolation of the Q-function beyond the data range is particularly problematic. To mitigate this, we propose guiding the gradual decrease of Q-values outside the data range, which is achieved through reward scaling with layer normalization (RS-LN) and a penalization mechanism for infeasible actions (PA). By combining RS-LN and PA, we develop a new algorithm called PARS. We evaluate PARS across a range of tasks, demonstrating superior performance compared to state-of-the-art algorithms in both offline training and online fine-tuning on the D4RL benchmark, with notable success in the challenging AntMaze Ultra task.

Details

AAMAS Conference 2024 Conference Paper

ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum Games

Shiqi Lei
Kanghoon Lee
Linjing Li
Jinkyoo Park
Jiachen Li

Offline learning derives effective policies from expert demonstrators’ datasets without direct interaction. While recent research consider dataset characteristics like expertise level or multiple demonstrators, a distinct approach is necessary in zero-sum games, where outcomes significantly depend on the opponent’s strategy. In this study, we introduce a novel approach using unsupervised learning techniques to estimate the exploited level (EL) of each trajectory from the offline dataset of zero-sum games made by diverse demonstrators. The estimated EL is then integrated into offline learning to maximize the influence of the dominant strategy. Our method enables interpretable EL estimation in multiple zero-sum games, effectively identifying dominant strategies. Also, EL augmented offline learning significantly enhances the imitation and offline reinforcement learning algorithms in zero-sum games.

PDF

AAMAS Conference 2024 Conference Paper

Naphtha Cracking Center Scheduling Optimization using Multi-Agent Reinforcement Learning

Sunghoon Hong
Deunsol Yoon
Whiyoung Jung
Jinsang Lee
Hyundam Yoo
Jiwon Ham
Suhyun Jung
Chanwoo Moon

The Naphtha Cracking Center (NCC) is central to petrochemical feedstock production through the intricate process. It consists of receipt stage for unloading naphtha, blending stage for mixing naphtha, and furnace stage for producing marketable products. It is crucial to make an optimal schedule for NCC for profitability and efficiency. Traditionally managed by human experts, challenges arise in predicting complex chemical reactions and navigating real-world complexities. To address these issues, this paper aims to develop autonomous NCC operation using multi-agent reinforcement learning, where each agent is responsible for each stage and collaborates to achieve common objectives, while adhering to real-world constraints. We developed an online web service to allow the staff in LG Chem Daesan NCC facility to obtain an NCC schedule in real-time, and the staff are now operating the facility based on the schedules generated by the online web service.

PDF

AAAI Conference 2015 Conference Paper

Reward Shaping for Model-Based Bayesian Reinforcement Learning

Hyeoneun Kim
Woosang Lim
Kanghoon Lee
Yung-Kyun Noh
Kee-Eung Kim

Bayesian reinforcement learning (BRL) provides a formal framework for optimal exploration-exploitation tradeoff in reinforcement learning. Unfortunately, it is generally intractable to ﬁnd the Bayes-optimal behavior except for restricted cases. As a consequence, many BRL algorithms, model-based approaches in particular, rely on approximated models or real-time search methods. In this paper, we present potential-based shaping for improving the learning performance in model-based BRL. We propose a number of potential functions that are particularly well suited for BRL, and are domainindependent in the sense that they do not require any prior knowledge about the actual environment. By incorporating the potential function into real-time heuristic search, we show that we can signiﬁcantly improve the learning performance in standard benchmark domains.

PDF Details

AAAI Conference 2015 Conference Paper

Tighter Value Function Bounds for Bayesian Reinforcement Learning

Kanghoon Lee
Kee-Eung Kim

Bayesian reinforcement learning (BRL) provides a principled framework for optimal exploration-exploitation tradeoff in reinforcement learning. We focus on modelbased BRL, which involves a compact formulation of the optimal tradeoff from the Bayesian perspective. However, it still remains a computational challenge to compute the Bayes-optimal policy. In this paper, we propose a novel approach to compute tighter value function bounds of the Bayes-optimal value function, which is crucial for improving the performance of many model-based BRL algorithms. We then present how our bounds can be integrated into real-time AO* heuristic search, and provide a theoretical analysis on the impact of improved bounds on the search efficiency. We also provide empirical results on standard BRL domains that demonstrate the effectiveness of our approach.

PDF Details