Arrow Research search

Author name cluster

Jeonghye Kim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 System Paper

RL-Studio: A System for Multi-Phase Reinforcement Learning Experimentation

  • Whiyoung Jung
  • Sunghoon Hong
  • Deunsol Yoon
  • Jeonghye Kim
  • Yongjae Shin
  • Suhyun Jung
  • Hyundam Yoo
  • Youngjin Kim

Reinforcement learning (RL) has evolved beyond monolithic training, yet existing frameworks remain limited to single algorithms or simple offline-to-online transitions. We present multi-phase RL, a framework that orchestrates multiple learning phases for continual policy improvement. It enables efficient fine-tuning of pretrained policies with new data and smooth adaptation from simulation to real-world environments. To support this paradigm, we introduce RL-Studio, a platform that addresses key implementation barriers, including neural architecture mismatches, parameter transfer complexities, and experiment management overhead. It provides phase orchestration, transition-point monitoring, and full experiment lineage tracking. We demonstrate the effectiveness of multi-phase RL through representative scenarios and highlight RL-Studio’s capabilities.

ICML Conference 2025 Conference Paper

ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning

  • Myungsik Cho
  • Jongeui Park
  • Jeonghye Kim
  • Youngchul Sung

Multi-task reinforcement learning (RL) encounters significant challenges due to varying task complexities and their reward distributions from the environment. To address these issues, in this paper, we propose Adaptive Reward Scaling (ARS), a novel framework that dynamically adjusts reward magnitudes and leverages a periodic network reset mechanism. ARS introduces a history-based reward scaling strategy that ensures balanced reward distributions across tasks, enabling stable and efficient training. The reset mechanism complements this approach by mitigating overfitting and ensuring robust convergence. Empirical evaluations on the Meta-World benchmark demonstrate that ARS significantly outperforms baseline methods, achieving superior performance on challenging tasks while maintaining overall learning efficiency. These results validate ARS’s effectiveness in tackling diverse multi-task RL problems, paving the way for scalable solutions in complex real-world applications.

ICML Conference 2025 Conference Paper

Online Pre-Training for Offline-to-Online Reinforcement Learning

  • Yongjae Shin
  • Jeonghye Kim
  • Whiyoung Jung
  • Sunghoon Hong
  • Deunsol Yoon
  • Youngsoo Jang
  • Geon-Hyeong Kim
  • Jongseong Chae

Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.

ICML Conference 2025 Conference Paper

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

  • Jeonghye Kim
  • Yongjae Shin
  • Whiyoung Jung
  • Sunghoon Hong
  • Deunsol Yoon
  • Youngchul Sung
  • Kanghoon Lee
  • Woohyung Lim

Reinforcement learning with offline data suffers from Q-value extrapolation errors. To address this issue, we first demonstrate that linear extrapolation of the Q-function beyond the data range is particularly problematic. To mitigate this, we propose guiding the gradual decrease of Q-values outside the data range, which is achieved through reward scaling with layer normalization (RS-LN) and a penalization mechanism for infeasible actions (PA). By combining RS-LN and PA, we develop a new algorithm called PARS. We evaluate PARS across a range of tasks, demonstrating superior performance compared to state-of-the-art algorithms in both offline training and online fine-tuning on the D4RL benchmark, with notable success in the challenging AntMaze Ultra task.

NeurIPS Conference 2024 Conference Paper

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

  • Jeonghye Kim
  • Suyoung Lee
  • Woojun Kim
  • Youngchul Sung

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the highest trajectory returns across diverse offline RL benchmarks. QCS represents a breakthrough in offline RL, pushing the limits of what can be achieved and fostering further innovations.

ICLR Conference 2024 Conference Paper

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

  • Jeonghye Kim
  • Suyoung Lee
  • Woojun Kim
  • Youngchul Sung

The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.

ICML Conference 2023 Conference Paper

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

  • Woojun Kim
  • Jeonghye Kim
  • Youngchul Sung

In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic architecture. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy to realize an effective exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.