EAAI Journal 2026 Journal Article
A hybrid deep reinforcement learning approach for target allocation and routing of multiple nonholonomic vehicles
- Minjae Jung
- Donghun Lee
- Hyondong Oh
- Jung Woo An
- Ji Won Woo
- Gyeong Rae Nam
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2026 Journal Article
EAAI Journal 2026 Journal Article
AAAI Conference 2025 Conference Paper
In causal inference, a randomized experiment is a de facto method to overcome various theoretical issues in observational study. However, the experimental design requires expensive costs, so an efficient experimental design is necessary. We propose ABC3, a Bayesian active learning policy for causal inference. We show a policy minimizing an estimation error on conditional average treatment effect is equivalent to minimizing an integrated posterior variance, similar to Cohn criteria. We theoretically prove ABC3 also minimizes an imbalance between the treatment and control groups and the type 1 error probability. Imbalance-minimizing characteristic is especially notable as several works have emphasized the importance of achieving balance. Through extensive experiments on real-world data sets, ABC3 achieves the highest efficiency, while empirically showing the theoretical results hold.
NeurIPS Conference 2024 Conference Paper
Event-driven spiking neural networks(SNNs) are promising neural networks that reduce the energy consumption of continuously growing AI models. Recently, keeping pace with the development of transformers, transformer-based SNNs were presented. Due to the incompatibility of self-attention with spikes, however, existing transformer-based SNNs limit themselves by either restructuring self-attention architecture or conforming to non-spike computations. In this work, we propose a novel transformer-to-SNN conversion method that outputs an end-to-end spike-based transformer, named SpikedAttention. Our method directly converts the well-trained transformer without modifying its attention architecture. For the vision task, the proposed method converts Swin Transformer into an SNN without post-training or conversion-aware training, achieving state-of-the-art SNN accuracy on ImageNet dataset, i. e. , 80. 0\% with 28. 7M parameters. Considering weight accumulation, neuron potential update, and on-chip data movement, SpikedAttention reduces energy consumption by 42\% compared to the baseline ANN, i. e. , Swin-T. Furthermore, for the first time, we demonstrate that SpikedAttention successfully converts a BERT model to an SNN with only 0. 3\% accuracy loss on average consuming 58\% less energy on GLUE benchmark. Our code is available at Github ( https: //github. com/sangwoohwang/SpikedAttention ).
AAAI Conference 2024 System Paper
A significant upsurge in the fashion e-commerce industry in recent years has brought considerable attention to image-based virtual fitting. This image-based technology allows users to try on clothes virtually without physically touching them. However, the current techniques have notable limitations in terms of real-world scenarios, noisy results, partial clothing categories and computational cost, thus limiting the real-world applications. To address these critical limitations, we propose a hybrid interactive network that allows actual users to interact with the system to try on clothes virtually. The network is composed of state of art keypoint extraction, appearance flow alteration and wrapping modules. The pro-posed network facilitates real-time application with high-quality noise-free results, a variety of clothing categories and efficient computational cost.
AAMAS Conference 2019 Conference Paper
We take a practical approach on learning how to bid in sponsored search auctions, and model the problem of improving real world profit of advertisers in sponsored search auction as a meta-learning problem of configuring adaptive bidding agents. We construct a fully agent-based sponsored search auction simulator that 1) captures the dynamic nature of sponsored search auctions, 2) emulates the interface of Google AdWords platforms, and 3) can be customized and extended by modules. We then present Meta-LQKG algorithm, an agent-based meta-learning algorithm using knowledge gradient, and show the effect of meta-learning with Meta-LQKG on the performance of adaptive bidding agents.
IJCAI Conference 2019 Conference Paper
We introduce a novel apprenticeship learning algorithm to learn an expert's underlying reward structure in off-policy model-free batch settings. Unlike existing methods that require hand-crafted features, on-policy evaluation, further data acquisition for evaluation policies or the knowledge of model dynamics, our algorithm requires only batch data (demonstrations) of the observed expert behavior. Such settings are common in many real-world tasks---health care, finance, or industrial process control---where accurate simulators do not exist and additional data acquisition is costly. We develop a transition-regularized imitation learning model to learn a rich feature representation and a near-expert initial policy that makes the subsequent batch inverse reinforcement learning process viable. We also introduce deep successor feature networks that perform off-policy evaluation to estimate feature expectations of candidate policies. Under the batch setting, our method achieves superior results on control benchmarks as well as a real clinical task of sepsis management in the Intensive Care Unit.
AAAI Conference 2012 Conference Paper
The transition to renewables requires storage to help smooth short-term variations in energy from wind and solar sources, as well as to respond to spikes in electricity spot prices, which can easily exceed 20 times their average. Efficient operation of an energy storage device is a fundamental problem, yet classical algorithms such as Q-learning can diverge for millions of iterations, limiting practical applications. We have traced this behavior to the max-operator bias, which is exacerbated by high volatility in the reward function, and high discount factors due to the small time steps. We propose an elegant bias correction procedure and demonstrate its effectiveness.