Author name cluster

Susumu Takeuchi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Hikari Otsuka
Daiki Chijiwa
Yasuyuki Okoshi
Daichi Fujiki
Susumu Takeuchi
Masato Motomura

The strong lottery ticket hypothesis (SLTH) conjectures that high-performing subnetworks, called strong lottery tickets (SLTs), are hidden in randomly initialized neural networks. Although recent theoretical studies have established the SLTH across various neural architectures, the SLTH for transformer architectures still lacks theoretical understanding. In particular, the current theory of the SLTH does not yet account for the multi-head attention (MHA) mechanism, a core component of transformers. To address this gap, we introduce a theoretical analysis of the existence of SLTs within MHAs. We prove that, if a randomly initialized MHA of H heads and input dimension d has the hidden dimension O(d log(Hd^(3/2))) for the key and value, it contains an SLT that approximates an arbitrary MHA with the same input dimension with high probability. Furthermore, by leveraging this theory for MHAs, we extend the SLTH to transformers without normalization layers. We empirically validate our theoretical findings, demonstrating that the approximation error between the SLT within a source model (MHA and transformer) and an approximate target counterpart decreases exponentially by increasing the hidden dimension of the source model.

PDF Details DOI

TMLR Journal 2025 Journal Article

Partially Frozen Random Networks Contain Compact Strong Lottery Tickets

Hikari Otsuka
Daiki Chijiwa
Ángel López García-Arias
Yasuyuki Okoshi
Kazushi Kawamura
Thiem Van Chu
Daichi Fujiki
Susumu Takeuchi

Randomly initialized dense networks contain subnetworks that achieve high accuracy without weight learning—strong lottery tickets (SLTs). Recently, Gadhikar et al. (2023) demonstrated that SLTs could also be found within a randomly pruned source network. This phenomenon can be exploited to further compress the small memory size required by SLTs. However, their method is limited to SLTs that are even sparser than the source, leading to worse accuracy due to unintentionally high sparsity. This paper proposes a method for reducing the SLT memory size without restricting the sparsity of the SLTs that can be found. A random subset of the initial weights is frozen by either permanently pruning them or locking them as a fixed part of the SLT, resulting in a smaller model size. Experimental results show that Edge-Popup (Ramanujan et al., 2020; Sreenivasan et al., 2022) finds SLTs with better accuracy-to-model size trade-off within frozen networks than within dense or randomly pruned source networks. In particular, freezing $70\%$ of a ResNet on ImageNet provides $3.3\times$ compression compared to the SLT found within a dense counterpart, raises accuracy by up to $14.12$ points compared to the SLT found within a randomly pruned counterpart, and offers a better accuracy-model size trade-off than both.

PDF Details

ICML Conference 2025 Conference Paper

Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

Daiki Chijiwa
Taku Hasegawa
Kyosuke Nishida
Kuniko Saito
Susumu Takeuchi

While foundation models have been exploited for various expert tasks with their fine-tuned parameters, any foundation model will be eventually outdated due to its old knowledge or limited capability, and thus should be replaced by a new foundation model. Subsequently, to benefit from its latest knowledge or improved capability, the new foundation model should be fine-tuned on each task again, which incurs not only the additional training cost but also the maintenance cost of the task-specific data. Existing work address this problem by inference-time tuning, i. e. , modifying the output probability from the new foundation model by the outputs from the old foundation model and its fine-tuned model, which involves an additional inference cost by the latter two models. In this paper, we explore a new fine-tuning principle (which we call portable reward tuning; PRT) that reduces the inference cost by its nature, based on the reformulation of fine-tuning as the reward maximization with Kullback-Leibler regularization. Specifically, instead of fine-tuning parameters of the foundation models, PRT trains the reward model explicitly through the same loss as in fine-tuning. During inference, the reward model can be used with any foundation model (with the same set of vocabularies or labels) through the formulation of reward maximization. Experimental results, including both vision and language models, show that the PRT-trained model can achieve comparable accuracy with less inference cost, in comparison to the existing work of inference-time tuning.

Details

AAAI Conference 2023 Conference Paper

Beam Search Optimized Batch Bayesian Active Learning

Jingyu Sun
Hongjie Zhai
Osamu Saisho
Susumu Takeuchi

Active Learning is an essential method for label-efficient deep learning. As a Bayesian active learning method, Bayesian Active Learning by Disagreement (BALD) successfully selects the most representative samples by maximizing the mutual information between the model prediction and model parameters. However, when applied to a batch acquisition mode, like batch construction with greedy search, BALD suffers from poor performance, especially with noises of near-duplicate data. To address this shortcoming, we propose a diverse beam search optimized batch active learning method, which explores a graph for every batch construction by expanding the highest-scored samples of a predetermined number. To avoid near duplicate beam branches (very similar beams generated from the same root and similar samples), which is undesirable for lacking diverse representations in the feature space, we design a self-adapted constraint within candidate beams. The proposed method is able to acquire data that can better represent the distribution of the unlabeled pool, and at the same time, be significantly different from existing beams. We observe that the proposed method achieves higher batch performance than the baseline methods on three benchmark datasets.

PDF Details DOI

IROS Conference 1997 Conference Paper

Adaptive visual servoing for legged robots-vision-cued swaying of legged robots in unknown environments

Koh Hosoda
Takahiro Miyashita
Susumu Takeuchi
Minoru Asada

This paper describes a method to achieve a vision-cued swaying task in unknown environments utilizing adaptive visual servoing. The proposed method has a hybrid structure consisting of a controller to keep the distances between feet constant (a stance servoing controller), and an adaptive visual servoing controller. Making use of the method, the motion of each joint need not be pre-programmed, but is generated by the method according to the motion of visual cues. An experimental result demonstrates how the proposed method realizes a vision-cued swaying behavior of the legged robot.

Details