Wonil Song Papers

NeurIPS Conference 2024 Conference Paper

A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations

Wonil Song
Hyesong Choi
Kwanghoon Sohn
Dongbo Min

In the rapidly evolving domain of vision-based deep reinforcement learning (RL), a pivotal challenge is to achieve generalization capability to dynamic environmental changes reflected in visual observations. Our work delves into the intricacies of this problem, identifying two key issues that appear in previous approaches for visual RL generalization: (i) imbalanced saliency and (ii) observational overfitting. Imbalanced saliency is a phenomenon where an RL agent disproportionately identifies salient features across consecutive frames in a frame stack. Observational overfitting occurs when the agent focuses on certain background regions rather than task-relevant objects. To address these challenges, we present a simple yet effective framework for generalization in visual RL (SimGRL) under dynamic scene perturbations. First, to mitigate the imbalanced saliency problem, we introduce an architectural modification to the image encoder to stack frames at the feature level rather than the image level. Simultaneously, to alleviate the observational overfitting problem, we propose a novel technique called shifted random overlay augmentation, which is specifically designed to learn robust representations capable of effectively handling dynamic visual scenes. Extensive experiments demonstrate the superior generalization capability of SimGRL, achieving state-of-the-art performance in benchmarks including the DeepMind Control Suite.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Stereoscopic Image Super-Resolution with Stereo Consistent Feature

Wonil Song
Sungil Choi
Somi Jeong
Kwanghoon Sohn

We present a ﬁrst attempt for stereoscopic image superresolution (SR) for recovering high-resolution details while preserving stereo-consistency between stereoscopic image pair. The most challenging issue in the stereoscopic SR is that the texture details should be consistent for corresponding pixels in stereoscopic SR image pair. However, existing stereo SR methods cannot maintain the stereo-consistency, thus causing 3D fatigue to the viewers. To address this issue, in this paper, we propose a self and parallax attention mechanism (SPAM) to aggregate the information from its own image and the counterpart stereo image simultaneously, thus reconstructing high-quality stereoscopic SR image pairs. Moreover, we design an efﬁcient network architecture and effective loss functions to enforce stereo-consistency constraint. Finally, experimental results demonstrate the superiority of our method over state-of-the-art SR methods in terms of both quantitative metrics and qualitative visual quality while maintaining stereo-consistency between stereoscopic image pair.

PDF Details

Possible papers

A Simple Framework for Generalization in Visual RL under Dynamic Scene Perturbations

Stereoscopic Image Super-Resolution with Stereo Consistent Feature