Arrow Research search

Author name cluster

Youhei Akimoto

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment

  • Shigeki Kusaka
  • Keita Saito
  • Mikoto Kudo
  • Takumi Tanabe
  • Akifumi Wachi
  • Youhei Akimoto

Large language models (LLMs) are increasingly deployed in real-world systems, making it critical to understand their vulnerabilities. While data poisoning attacks during RLHF/DPO alignment have been studied empirically, their theoretical foundations remain unclear. We investigate the minimum-cost poisoning attack required to steer an LLM’s policy toward an attacker’s target by flipping preference labels during RLHF/DPO, without altering the compared outputs. We formulate this as a convex optimization problem with linear constraints, deriving lower and upper bounds on the minimum attack cost. As a byproduct of this theoretical analysis, we show that any existing label-flipping attack can be post-processed via our proposed method to reduce the number of label flips required while preserving the intended poisoning effect. Empirical results demonstrate that this cost-minimization post-processing can significantly reduce poisoning costs over baselines, particularly when the reward model’s feature dimension is small relative to the dataset size. These findings highlight fundamental vulnerabilities in RLHF/DPO pipelines and provide tools to evaluate their robustness against low-cost poisoning attacks.

NeurIPS Conference 2025 Conference Paper

A Provable Approach for End-to-End Safe Reinforcement Learning

  • Akifumi Wachi
  • Kohei Miyaguchi
  • Takumi Tanabe
  • Rei Sato
  • Youhei Akimoto

A longstanding goal in safe reinforcement learning (RL) is a method to ensure the safety of a policy throughout the entire process, from learning to operation. However, existing safe RL paradigms inherently struggle to achieve this objective. We propose a method, called Provably Lifetime Safe RL (PLS), that integrates offline safe RL with safe policy deployment to address this challenge. Our proposed method learns a policy offline using return-conditioned supervised learning and then deploys the resulting policy while cautiously optimizing a limited set of parameters, known as target returns, using Gaussian processes (GPs). Theoretically, we justify the use of GPs by analyzing the mathematical relationship between target and actual returns. We then prove that PLS finds near-optimal target returns while guaranteeing safety with high probability. Empirically, we demonstrate that PLS outperforms baselines both in safety and reward performance, thereby achieving the longstanding goal to obtain high rewards while ensuring the safety of a policy throughout the lifetime from learning to operation.

NeurIPS Conference 2024 Conference Paper

Stepwise Alignment for Constrained Language Model Policy Optimization

  • Akifumi Wachi
  • Thien Q. Tran
  • Rei Sato
  • Takumi Tanabe
  • Youhei Akimoto

Safety and trustworthiness are indispensable requirements for real-world applications of AI systems using large language models (LLMs). This paper formulates human value alignment as an optimization problem of the language model policy to maximize reward under a safety constraint, and then proposes an algorithm, Stepwise Alignment for Constrained Policy Optimization (SACPO). One key idea behind SACPO, supported by theory, is that the optimal policy incorporating reward and safety can be directly obtained from a reward-aligned policy. Building on this key idea, SACPO aligns LLMs step-wise with each metric while leveraging simple yet powerful alignment algorithms such as direct preference optimization (DPO). SACPO offers several advantages, including simplicity, stability, computational efficiency, and flexibility of algorithms and datasets. Under mild assumptions, our theoretical analysis provides the upper bounds on optimality and safety constraint violation. Our experimental results show that SACPO can fine-tune Alpaca-7B better than the state-of-the-art method in terms of both helpfulness and harmlessness.

IJCAI Conference 2023 Conference Paper

Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs

  • Kaiwen Xu
  • Kazuto Fukuchi
  • Youhei Akimoto
  • Jun Sakuma

A concept-based classifier can explain the decision process of a deep learning model by human understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the knockoff sample to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our experiments on both synthetic and real data. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.

NeurIPS Conference 2022 Conference Paper

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

  • Takumi Tanabe
  • Rei Sato
  • Kazuto Fukuchi
  • Jun Sakuma
  • Youhei Akimoto

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

AAAI Conference 2022 Conference Paper

Unsupervised Causal Binary Concepts Discovery with VAE for Black-Box Model Explanation

  • Thien Q Tran
  • Kazuto Fukuchi
  • Youhei Akimoto
  • Jun Sakuma

We aim to explain a black-box classifier with the form: ‘data X is classified as class Y because X has A, B and does not have C’ in which A, B, and C are high-level concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i. e. , A, B and C, that is useful for explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user’s prior knowledge to induce high interpretability of concepts. Finally, using multiple datasets, we demonstrate that the proposed method can discover useful concepts for explanation in this form.

AAAI Conference 2021 Conference Paper

AdvantageNAS: Efficient Neural Architecture Search with Credit Assignment

  • Rei Sato
  • Jun Sakuma
  • Youhei Akimoto

Neural architecture search (NAS) is an approach for automatically designing a neural network architecture without human effort or expert knowledge. However, the high computational cost of NAS limits its use in commercial applications. Two recent NAS paradigms, namely one-shot and sparse propagation, which reduce the time and space complexities, respectively, provide clues for solving this problem. In this paper, we propose a novel search strategy for one-shot and sparse propagation NAS, namely AdvantageNAS, which further reduces the time complexity of NAS by reducing the number of search iterations. AdvantageNAS is a gradientbased approach that improves the search efficiency by introducing credit assignment in gradient estimation for architecture updates. Experiments on the NAS-Bench-201 and PTB dataset show that AdvantageNAS discovers an architecture with higher performance under a limited time budget compared to existing sparse propagation NAS. To further reveal the reliabilities of AdvantageNAS, we investigate it theoretically and find that it monotonically improves the expected loss and thus converges.

AAAI Conference 2021 Conference Paper

Warm Starting CMA-ES for Hyperparameter Optimization

  • Masahiro Nomura
  • Shuhei Watanabe
  • Youhei Akimoto
  • Yoshihiko Ozaki
  • Masaki Onishi

Hyperparameter optimization (HPO), formulated as blackbox optimization (BBO), is recognized as essential for automation and high performance of machine learning approaches. The CMA-ES is a promising BBO approach with a high degree of parallelism, and has been applied to HPO tasks, often under parallel implementation, and shown superior performance to other approaches including Bayesian optimization (BO). However, if the budget of hyperparameter evaluations is severely limited, which is often the case for end users who do not deserve parallel computing, the CMA- ES exhausts the budget without improving the performance due to its long adaptation phase, resulting in being outperformed by BO approaches. To address this issue, we propose to transfer prior knowledge on similar HPO tasks through the initialization of the CMA-ES, leading to significantly shortening the adaptation time. The knowledge transfer is designed based on the novel definition of task similarity, with which the correlation of the performance of the proposed approach is confirmed on synthetic problems. The proposed warm starting CMA-ES, called WS-CMA-ES, is applied to different HPO tasks where some prior knowledge is available, showing its superior performance over the original CMA-ES as well as BO approaches with or without using the prior knowledge.

AAAI Conference 2020 Conference Paper

Generate (Non-Software) Bugs to Fool Classifiers

  • Hiromu Yakura
  • Youhei Akimoto
  • Jun Sakuma

In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a systematic approach to generate natural adversarial examples against classification models by employing such natural-appearing perturbations that imitate a certain object or signal. We first show the feasibility of this approach in an attack against an image classifier by employing generative adversarial networks that produce image patches that have the appearance of a natural object to fool the target model. We also introduce an algorithm to optimize placement of the perturbation in accordance with the input image, which makes the generation of adversarial examples fast and likely to succeed. Moreover, we experimentally show that the proposed approach can be extended to the audio domain, for example, to generate perturbations that sound like the chirping of birds to fool a speech classifier.

ICML Conference 2019 Conference Paper

Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search

  • Youhei Akimoto
  • Shinichi Shirakawa
  • Nozomu Yoshinari
  • Kento Uchida
  • Shota Saito
  • Kouhei Nishida

High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i. e. , learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a coupled optimization of connection weights and neural architecture into a differentiable optimization by means of stochastic relaxation. It accepts arbitrary search space (widely-applicable) and enables to employ a gradient-based simultaneous optimization of weights and architecture (fast). We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust). Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks.

AAAI Conference 2018 Conference Paper

Dynamic Optimization of Neural Network Structures Using Probabilistic Modeling

  • Shinichi Shirakawa
  • Yasushi Iwata
  • Youhei Akimoto

Deep neural networks (DNNs) are powerful machine learning models and have succeeded in various artificial intelligence tasks. Although various architectures and modules for the DNNs have been proposed, selecting and designing the appropriate network structure for a target problem is a challenging task. In this paper, we propose a method to simultaneously optimize the network structure and weight parameters during neural network training. We consider a probability distribution that generates network structures, and optimize the parameters of the distribution instead of directly optimizing the network structure. The proposed method can apply to the various network structure optimization problems under the same framework. We apply the proposed method to several structure optimization problems such as selection of layers, selection of unit types, and selection of connections using the MNIST, CIFAR-10, and CIFAR-100 datasets. The experimental results show that the proposed method can find the appropriate and competitive network structures.