Author name cluster

Ichiro Takeuchi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

39 papers

2 author rows

AAAI Conference 2026 Conference Paper

Statistically Robust Sparse High-order Interaction Model

Diptesh Das
Ichiro Takeuchi
Koji Tsuda

Deep learning models often achieve high accuracy but lack interpretability, making them unsuitable for critical applications such as medical diagnosis, biomolecule design, criminal justice, etc. The Sparse High-order Interaction Model (SHIM) addresses this limitation by providing both transparency and predictive reliability. However, real-world data often contain outliers, which can distort model performance. To overcome this, we propose Huberized-SHIM, an extension of SHIM that integrates Huber loss-based robust regression to mitigate the impact of outliers. We introduce a homotopy-based exact regularization path algorithm and a novel tree-pruning criterion to efficiently manage interaction complexity. Additionally, we incorporate the conformal prediction framework to enhance statistical reliability. Empirical evaluations on synthetic and real-world datasets demonstrate the superior robustness and accuracy of Huberized-SHIM in high-stakes decision-making contexts.

PDF Details DOI

TMLR Journal 2025 Journal Article

Change Point Detection in the Frequency Domain with Statistical Reliability

Akifumi Yamada
Tomohiro Shiraishi
Shuichi Nishino
Teruyuki Katsuoka
Kouichi Taji
Ichiro Takeuchi

Effective condition monitoring in complex systems requires identifying change points (CPs) in the frequency domain, as the structural changes often arise across multiple frequencies. This paper extends recent advancements in statistically significant CP detection, based on Selective Inference (SI), to the frequency domain. The proposed SI method quantifies the statistical significance of detected CPs in the frequency domain using $p$-values, ensuring that the detected changes reflect genuine structural shifts in the target system. We address two major technical challenges to achieve this. First, we extend the existing SI framework to the frequency domain by appropriately utilizing the properties of discrete Fourier transform (DFT). Second, we develop an SI method that provides valid $p$-values for CPs where changes occur across multiple frequencies. Experimental results demonstrate that the proposed method reliably identifies genuine CPs with strong statistical guarantees, enabling more accurate root-cause analysis in the frequency domain of complex systems.

TMLR Journal 2025 Journal Article

Conditional Latent Space Molecular Scaffold Optimization for Accelerated Molecular Design

Onur Boyar
Hiroyuki Hanada
Ichiro Takeuchi

The rapid discovery of new chemical compounds is essential for advancing global health and developing treatments. While generative models show promise in creating novel molecules, challenges remain in ensuring the real-world applicability of these molecules and finding such molecules efficiently. To address this challenge, we introduce Conditional Latent Space Molecular Scaffold Optimization (CLaSMO), which integrates a Conditional Variational Autoencoder (CVAE) with Latent Space Bayesian Optimization (LSBO) to strategically modify molecules while preserving similarity to the original input, effectively framing the task as constrained optimization. Our LSBO setting improves the sample-efficiency of the molecular optimization, and our modification approach helps us to obtain molecules with higher chances of real-world applicability. CLaSMO explores substructures of molecules in a sample-efficient manner by performing BO in the latent space of a CVAE conditioned on the atomic environment of the molecule to be optimized. Our extensive evaluations across diverse optimization tasks—including rediscovery, docking score, and multi‑property optimization—show that CLaSMO efficiently enhances target properties, delivers remarkable sample-efficiency crucial for resource‑limited applications while considering molecular similarity constraints, achieves state of the art performance, and maintains practical synthetic accessibility. We also provide an open-source web application that enables chemical experts to apply CLaSMO in a Human-in-the-Loop setting.

ICML Conference 2025 Conference Paper

Distributionally Robust Active Learning for Gaussian Process Regression

Shion Takeno
Yoshito Okura
Yu Inatsu
Tatsuya Aoyama
Tomonari Tanaka
Satoshi Akahane
Hiroyuki Hanada
Noriaki Hashimoto

Gaussian process regression (GPR) or kernel ridge regression is a widely used and powerful tool for nonlinear prediction. Therefore, active learning (AL) for GPR, which actively collects data labels to achieve an accurate prediction with fewer data labels, is an important problem. However, existing AL methods do not theoretically guarantee prediction accuracy for target distribution. Furthermore, as discussed in the distributionally robust learning literature, specifying the target distribution is often difficult. Thus, this paper proposes two AL methods that effectively reduce the worst-case expected error for GPR, which is the worst-case expectation in target distribution candidates. We show an upper bound of the worst-case expected squared error, which suggests that the error will be arbitrarily small by a finite number of data labels under mild conditions. Finally, we demonstrate the effectiveness of the proposed methods through synthetic and real-world datasets.

TMLR Journal 2025 Journal Article

Distributionally Robust Coreset Selection under Covariate Shift

Tomonari Tanaka
Hiroyuki Hanada
Hanting Yang
Aoyama Tatsuya
Yu Inatsu
Akahane Satoshi
Yoshito Okura
Noriaki Hashimoto

Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that the data distributions differ between the development phase and the deployment phase, with the latter being unknown. Thus, it is challenging to select an effective subset of training data that performs well across all deployment scenarios. We therefore propose Distributionally Robust Coreset Selection (DRCS). DRCS theoretically derives an estimate of the upper bound for the worst-case test error, assuming that the future covariate distribution may deviate within a defined range from the training distribution. Furthermore, by selecting instances in a way that suppresses the estimate of the upper bound for the worst-case test error, DRCS achieves distributionally robust training instance selection. This study is primarily applicable to convex training computation, but we demonstrate that it can also be applied to deep learning under appropriate approximations. In this paper, we focus on covariate shift, a type of data distribution shift, and demonstrate the effectiveness of DRCS through experiments.

NeurIPS Conference 2025 Conference Paper

Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference

Mizuki Niihori
Shuichi Nishino
Teruyuki Katsuoka
Tomohiro Shiraishi
Kouichi Taji
Ichiro Takeuchi

In real-world applications, anomaly detection (AD) often operates without access to anomalous data, necessitating semi-supervised methods that rely solely on normal data. Among these methods, deep $k$-nearest neighbor (deep $k$NN) AD stands out for its interpretability and flexibility, leveraging distance-based scoring in deep latent spaces. Despite its strong performance, deep $k$NN lacks a mechanism to quantify uncertainty—an essential feature for critical applications such as industrial inspection. To address this limitation, we propose a statistical framework that quantifies the significance of detected anomalies in the form of $p$-values, thereby enabling control over false positive rates at a user-specified significance level (e. g. ,0. 05). A central challenge lies in managing selection bias, which we tackle using Selective Inference—a principled method for conducting inference conditioned on data-driven selections. We evaluate our method on diverse datasets and demonstrate that it provides reliable AD well-suited for industrial use cases.

TMLR Journal 2025 Journal Article

Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization

Shion Takeno
Yu Inatsu
Masayuki Karasuyama
Ichiro Takeuchi

Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.

ICML Conference 2025 Conference Paper

Statistical Test for Feature Selection Pipelines by Selective Inference

Tomohiro Shiraishi
Tatsuya Matsukawa
Shuichi Nishino
Ichiro Takeuchi

A data analysis pipeline is a structured sequence of steps that transforms raw data into meaningful insights by integrating various analysis algorithms. In this paper, we propose a novel statistical test to assess the significance of data analysis pipelines. Our approach enables the systematic development of valid statistical tests applicable to any feature selection pipeline composed of predefined components. We develop this framework based on selective inference, a statistical technique that has recently gained attention for data-driven hypotheses. As a proof of concept, we focus on feature selection pipelines for linear models, composed of three missing value imputation algorithms, three outlier detection algorithms, and three feature selection algorithms. We theoretically prove that our statistical test can control the probability of false positive feature selection at any desired level, and demonstrate its validity and effectiveness through experiments on synthetic and real data. Additionally, we present an implementation framework that facilitates testing across any configuration of these feature selection pipelines without extra implementation costs.

TMLR Journal 2025 Journal Article

Statistical Test for Saliency Maps of Graph Neural Networks via Selective Inference

Shuichi Nishino
Tomohiro Shiraishi
Teruyuki Katsuoka
Ichiro Takeuchi

Graph Neural Networks (GNNs) have gained prominence for their ability to process graph-structured data across various domains. However, interpreting GNN decisions remains a significant challenge, leading to the adoption of saliency maps for identifying salient subgraphs composed of influential nodes and edges. Despite their utility, the reliability of GNN saliency maps has been questioned, particularly in terms of their robustness to input noise. In this study, we propose a statistical testing framework to rigorously evaluate the significance of saliency maps. Our main contribution lies in addressing the inflation of the Type I error rate caused by double-dipping of data, leveraging the framework of Selective Inference. Our method provides statistically valid $p$-values while controlling the Type I error rate, ensuring that identified salient subgraphs contain meaningful information rather than random artifacts. The method is applicable to a variety of saliency methods with piecewise linearity (e.g., Class Activation Mapping). We validate our method on synthetic and real-world datasets, demonstrating its capability in assessing the reliability of GNN interpretations.

TMLR Journal 2024 Journal Article

Active Learning for Level Set Estimation Using Randomized Straddle Algorithms

Yu Inatsu
Shion Takeno
Kentaro KUTSUKAKE
Ichiro Takeuchi

Level set estimation (LSE) the problem of identifying the set of input points where a function takes a value above (or below) a given threshold is important in practical applications. When the function is expensive to evaluate and black-box, the straddle algorithm, a representative heuristic for LSE based on Gaussian process models, and its extensions with theoretical guarantees have been developed. However, many existing methods include a confidence parameter, $\beta^{1/2}_t$, that must be specified by the user. Methods that choose $\beta^{1/2}_t$ heuristically do not provide theoretical guarantees. In contrast, theoretically guaranteed values of $\beta^{1/2}_t$ need to be increased depending on the number of iterations and candidate points; they are conservative and do not perform well in practice. In this study, we propose a novel method, the randomized straddle algorithm, in which $\beta_t$ in the straddle algorithm is replaced by a random sample from the chi-squared distribution with two degrees of freedom. The confidence parameter in the proposed method does not require adjustment, does not depend on the number of iterations and candidate points, and is not conservative. Furthermore, we show that the proposed method has theoretical guarantees that depend on the sample complexity and the number of iterations. Finally, we validate the applicability of the proposed method through numerical experiments using synthetic and real data.

AAAI Conference 2024 Conference Paper

Multi-Objective Bayesian Optimization with Active Preference Learning

Ryota Ozaki
Kazuki Ishikawa
Youhei Kanzaki
Shion Takeno
Ichiro Takeuchi
Masayuki Karasuyama

There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive search cost, while in many practical scenarios, the decision maker (DM) only needs a specific solution among the set of the Pareto optimal solutions. We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in the MOO with expensive objective functions, in which a Bayesian preference model of the DM is adaptively estimated by an interactive manner based on the two types of supervisions called the pairwise preference and improvement request. To explore the most preferred solution, we define an acquisition function in which the uncertainty both in the objective function and the DM preference is incorporated. Further, to minimize the interaction cost with the DM, we also propose an active learning strategy for the preference estimation. We empirically demonstrate the effectiveness of our proposed method through the benchmark function optimization and the hyper-parameter optimization problems for machine learning models.

PDF Details DOI

ICML Conference 2024 Conference Paper

Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds

Shion Takeno
Yu Inatsu
Masayuki Karasuyama
Ichiro Takeuchi

Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. Therefore, we analyze yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.

ICML Conference 2024 Conference Paper

Statistical Test for Attention Maps in Vision Transformers

Tomohiro Shiraishi
Daiki Miwa
Teruyuki Katsuoka
Vo Nguyen Le Duy
Kouichi Taji
Ichiro Takeuchi

The Vision Transformer (ViT) demonstrates exceptional performance in various computer vision tasks. Attention is crucial for ViT to capture complex wide-ranging relationships among image patches, allowing the model to weigh the importance of image patches and aiding our understanding of the decision-making process. However, when utilizing the attention of ViT as evidence in high-stakes decision-making tasks such as medical diagnostics, a challenge arises due to the potential of attention mechanisms erroneously focusing on irrelevant regions. In this study, we propose a statistical test for ViT’s attentions, enabling us to use the attentions as reliable quantitative evidence indicators for ViT’s decision-making with a rigorously controlled error rate. Using the framework called selective inference, we quantify the statistical significance of attentions in the form of p-values, which enables the theoretically grounded quantification of the false positive detection probability of attentions. We demonstrate the validity and the effectiveness of the proposed method through numerical experiments and applications to brain image diagnoses.

ICLR Conference 2023 Conference Paper

Valid P-Value for Deep Learning-driven Salient Region

Daiki Miwa
Vo Nguyen Le Duy
Ichiro Takeuchi

Various saliency map methods have been proposed to interpret and explain predictions of deep learning models. Saliency maps allow us to interpret which parts of the input signals have a strong influence on the prediction results. However, since a saliency map is obtained by complex computations in deep learning models, it is often difficult to know how reliable the saliency map itself is. In this study, we propose a method to quantify the reliability of a saliency region in the form of p-values. Our idea is to consider a saliency map as a selected hypothesis by the trained deep learning model and employ the selective inference framework. The proposed method provably provides a valid p-value for the detected salient region, i.e., we can provably control the false positive rate of the detected salient region. We demonstrate the validity of the proposed method through numerical examples in synthetic and real datasets. Furthermore, we develop a Keras-based framework for conducting the proposed selective inference for a wide class of CNNs without additional implementation cost.

ICML Conference 2022 Conference Paper

Bayesian Optimization for Distributionally Robust Chance-constrained Problem

Yu Inatsu
Shion Takeno
Masayuki Karasuyama
Ichiro Takeuchi

In black-box function optimization, we need to consider not only controllable design variables but also uncontrollable stochastic environment variables. In such cases, it is necessary to solve the optimization problem by taking into account the uncertainty of the environmental variables. Chance-constrained (CC) problem, the problem of maximizing the expected value under a certain level of constraint satisfaction probability, is one of the practically important problems in the presence of environmental variables. In this study, we consider distributionally robust CC (DRCC) problem and propose a novel DRCC Bayesian optimization method for the case where the distribution of the environmental variables cannot be precisely specified. We show that the proposed method can find an arbitrary accurate solution with high probability in a finite number of trials, and confirm the usefulness of the proposed method through numerical experiments.

AAAI Conference 2022 Conference Paper

Fast and More Powerful Selective Inference for Sparse High-Order Interaction Model

Diptesh Das
Vo Nguyen Le Duy
Hiroyuki Hanada
Koji Tsuda
Ichiro Takeuchi

Automated high-stake decision-making, such as medical diagnosis, requires models with high interpretability and reliability. We consider the sparse high-order interaction model as an interpretable and reliable model with a good prediction ability. However, finding statistically significant high-order interactions is challenging because of the intrinsically high dimensionality of the combinatorial effects. Another problem in data-driven modeling is the effect of “cherry-picking” (i. e. , selection bias). Our main contribution is extending the recently developed parametric programming approach for selective inference to high-order interaction models. An exhaustive search over the cherry tree (all possible interactions) can be daunting and impractical, even for small-sized problems. We introduced an efficient pruning strategy and demonstrated the computational efficiency and statistical power of the proposed method using both synthetic and real data.

JMLR Journal 2022 Journal Article

More Powerful Conditional Selective Inference for Generalized Lasso by Parametric Programming

Vo Nguyen Le Duy
Ichiro Takeuchi

Conditional selective inference (SI) has been studied intensively as a new statistical inference framework for data-driven hypotheses. The basic concept of conditional SI is to make the inference conditional on the selection event, which enables an exact and valid statistical inference to be conducted even when the hypothesis is selected based on the data. Conditional SI has mainly been studied in the context of model selection, such as vanilla lasso or generalized lasso. The main limitation of existing approaches is the low statistical power owing to over-conditioning, which is required for computational tractability. In this study, we propose a more powerful and general conditional SI method for a class of problems that can be converted into quadratic parametric programming, which includes generalized lasso. The key concept is to compute the continuum path of the optimal solution in the direction of the selected test statistic and to identify the subset of the data space that corresponds to the model selection event by following the solution path. The proposed parametric programming-based method not only avoids the aforementioned major drawback of over-conditioning, but also improves the performance and practicality of SI in various respects. We conducted several experiments to demonstrate the effectiveness and efficiency of our proposed method. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Quantifying Statistical Significance of Neural Network-based Image Segmentation by Selective Inference

Vo Nguyen Le Duy
Shogo Iwazaki
Ichiro Takeuchi

Although a vast body of literature relates to image segmentation methods that use deep neural networks (DNNs), less attention has been paid to assessing the statistical reliability of segmentation results. In this study, we interpret the segmentation results as hypotheses driven by DNN (called DNN-driven hypotheses) and propose a method to quantify the reliability of these hypotheses within a statistical hypothesis testing framework. To this end, we introduce a conditional selective inference (SI) framework---a new statistical inference framework for data-driven hypotheses that has recently received considerable attention---to compute exact (non-asymptotic) valid p-values for the segmentation results. To use the conditional SI framework for DNN-based segmentation, we develop a new SI algorithm based on the homotopy method, which enables us to derive the exact (non-asymptotic) sampling distribution of DNN-driven hypothesis. We conduct several experiments to demonstrate the performance of the proposed method.

ICML Conference 2021 Conference Paper

Active Learning for Distributionally Robust Level-Set Estimation

Yu Inatsu
Shogo Iwazaki
Ichiro Takeuchi

Many cases exist in which a black-box function $f$ with high evaluation cost depends on two types of variables $\bm x$ and $\bm w$, where $\bm x$ is a controllable \emph{design} variable and $\bm w$ are uncontrollable \emph{environmental} variables that have random variation following a certain distribution $P$. In such cases, an important task is to find the range of design variables $\bm x$ such that the function $f(\bm x, \bm w)$ has the desired properties by incorporating the random variation of the environmental variables $\bm w$. A natural measure of robustness is the probability that $f(\bm x, \bm w)$ exceeds a given threshold $h$, which is known as the \emph{probability threshold robustness} (PTR) measure in the literature on robust optimization. However, this robustness measure cannot be correctly evaluated when the distribution $P$ is unknown. In this study, we addressed this problem by considering the \textit{distributionally robust PTR} (DRPTR) measure, which considers the worst-case PTR within given candidate distributions. Specifically, we studied the problem of efficiently identifying a reliable set $H$, which is defined as a region in which the DRPTR measure exceeds a certain desired probability $\alpha$, which can be interpreted as a level set estimation (LSE) problem for DRPTR. We propose a theoretically grounded and computationally efficient active learning method for this problem. We show that the proposed method has theoretical guarantees on convergence and accuracy, and confirmed through numerical experiments that the proposed method outperforms existing methods.

ICML Conference 2021 Conference Paper

More Powerful and General Selective Inference for Stepwise Feature Selection using Homotopy Method

Kazuya Sugiyama
Vo Nguyen Le Duy
Ichiro Takeuchi

Conditional selective inference (SI) has been actively studied as a new statistical inference framework for data-driven hypotheses. The basic idea of conditional SI is to make inferences conditional on the selection event characterized by a set of linear and/or quadratic inequalities. Conditional SI has been mainly studied in the context of feature selection such as stepwise feature selection (SFS). The main limitation of the existing conditional SI methods is the loss of power due to over-conditioning, which is required for computational tractability. In this study, we develop a more powerful and general conditional SI method for SFS using the homotopy method which enables us to overcome this limitation. The homotopy-based SI is especially effective for more complicated feature selection algorithms. As an example, we develop a conditional SI method for forward-backward SFS with AIC-based stopping criteria and show that it is not adversely affected by the increased complexity of the algorithm. We conduct several experiments to demonstrate the effectiveness and efficiency of the proposed method.

NeurIPS Conference 2020 Conference Paper

Computing Valid p-value for Optimal Changepoint by Selective Inference using Dynamic Programming

Vo Nguyen Le Duy
Hiroki Toda
Ryota Sugiyama
Ichiro Takeuchi

Although there is a vast body of literature related to methods for detecting change-points (CPs), less attention has been paid to assessing the statistical reliability of the detected CPs. In this paper, we introduce a novel method to perform statistical inference on the significance of the CPs, estimated by a Dynamic Programming (DP)-based optimal CP detection algorithm. Our main idea is to employ a Selective Inference (SI) approach---a new statistical inference framework that has recently received a lot of attention---to compute exact (non-asymptotic) valid p-values for the detected optimal CPs. Although it is well-known that SI has low statistical power because of over-conditioning, we address this drawback by introducing a novel method called parametric DP, which enables SI to be conducted with the minimum amount of conditioning, leading to high statistical power. We conduct experiments on both synthetic and real-world datasets, through which we offer evidence that our proposed method is more powerful than existing methods, has decent performance in terms of computational efficiency, and provides good results in many practical applications.

ICML Conference 2020 Conference Paper

Multi-fidelity Bayesian Optimization with Max-value Entropy Search and its Parallelization

Shion Takeno
Hitoshi Fukuoka
Yuhki Tsukada
Toshiyuki Koyama
Motoki Shiga
Ichiro Takeuchi
Masayuki Karasuyama

In a standard setting of Bayesian optimization (BO), the objective function evaluation is assumed to be highly expensive. Multi-fidelity Bayesian optimization (MFBO) accelerates BO by incorporating lower fidelity observations available with a lower sampling cost. We propose a novel information-theoretic approach to MFBO, called multi-fidelity max-value entropy search (MF-MES), that enables us to obtain a more reliable evaluation of the information gain compared with existing information-based methods for MFBO. Further, we also propose a parallelization of MF-MES mainly for the asynchronous setting because queries typically occur asynchronously in MFBO due to a variety of sampling costs. We show that most of computations in our acquisition functions can be derived analytically, except for at most only two dimensional numerical integration that can be performed efficiently by simple approximations. We demonstrate effectiveness of our approach by using benchmark datasets and a real-world application to materials science data.

NeurIPS Conference 2019 Conference Paper

Computing Full Conformal Prediction Set with Approximate Homotopy

Eugene Ndiaye
Ichiro Takeuchi

If you are predicting the label $y$ of a new object with $\hat y$, how confident are you that $y = \hat y$? Conformal prediction methods provide an elegant framework for answering such question by building a $100 (1 - \alpha)\%$ confidence region without assumptions on the distribution of the data. It is based on a refitting procedure that parses all the possibilities for $y$ to select the most likely ones. Although providing strong coverage guarantees, conformal set is impractical to compute exactly for many regression problems. We propose efficient algorithms to compute conformal prediction set using approximated solution of (convex) regularized empirical risk minimization. Our approaches rely on a new homotopy continuation technique for tracking the solution path with respect to sequential changes of the observations. We also provide a detailed analysis quantifying its complexity.

ICML Conference 2019 Conference Paper

Safe Grid Search with Optimal Complexity

Eugène Ndiaye
Tam Le
Olivier Fercoq
Joseph Salmon
Ichiro Takeuchi

Popular machine learning estimators involve regularization parameters that can be challenging to tune, and standard strategies rely on grid search for this task. In this paper, we revisit the techniques of approximating the regularization path up to predefined tolerance $\epsilon$ in a unified framework and show that its complexity is $O(1/\sqrt[d]{\epsilon})$ for uniformly convex loss of order $d \geq 2$ and $O(1/\sqrt{\epsilon})$ for Generalized Self-Concordant functions. This framework encompasses least-squares but also logistic regression, a case that as far as we know was not handled as precisely in previous works. We leverage our technique to provide refined bounds on the validation error as well as a practical algorithm for hyperparameter tuning. The latter has global convergence guarantee when targeting a prescribed accuracy on the validation set. Last but not least, our approach helps relieving the practitioner from the (often neglected) task of selecting a stopping criterion when optimizing over the training set: our method automatically calibrates this criterion based on the targeted accuracy on the validation set.

AAAI Conference 2018 Conference Paper

Efficiently Monitoring Small Data Modification Effect for Large-Scale Learning in Changing Environment

Hiroyuki Hanada
Atsushi Shibagaki
Jun Sakuma
Ichiro Takeuchi

We study large-scale machine learning problems in changing environments where a small part of the dataset is modiﬁed, and the effect of the data modiﬁcation must be monitored in order to know how much the modiﬁcation changes the optimal model. When the entire dataset is large, even if the amount of the data modiﬁcation is fairly small, the computational cost for re-training the model would be prohibitively large. In this paper, we propose a novel method, called the optimal solution bounding (OSB), for monitoring such a data modiﬁcation effect on the optimal model by efﬁciently evaluating (without actually re-training) it. The proposed method provides bounds on the unknown optimal model with the cost proportional only to the size of the data modiﬁcation.

ICML Conference 2017 Conference Paper

Selective Inference for Sparse High-Order Interaction Models

Shinya Suzumura
Kazuya Nakagawa
Yuta Umezu
Koji Tsuda
Ichiro Takeuchi

Finding statistically significant high-order interactions in predictive modeling is important but challenging task because the possible number of high-order interactions is extremely large (e. g. , $> 10^{17}$). In this paper we study feature selection and statistical inference for sparse high-order interaction models. Our main contribution is to extend recently developed selective inference framework for linear models to high-order interaction models by developing a novel algorithm for efficiently characterizing the selection event for the selective inference of high-order interactions. We demonstrate the effectiveness of the proposed algorithm by applying it to an HIV drug response prediction problem.

ICML Conference 2016 Conference Paper

Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling

Atsushi Shibagaki
Masayuki Karasuyama
Kohei Hatano
Ichiro Takeuchi

The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and vice-versa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.

NeurIPS Conference 2015 Conference Paper

Regularization Path of Cross-Validation Error Lower Bounds

Atsushi Shibagaki
Yoshiki Suzuki
Masayuki Karasuyama
Ichiro Takeuchi

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e. g. , it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds. The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.

ICML Conference 2014 Conference Paper

Outlier Path: A Homotopy Algorithm for Robust SVM

Shinya Suzumura
Kohei Ogawa
Masashi Sugiyama
Ichiro Takeuchi

In recent applications with massive but less reliable data (e. g. , labels obtained by a semi-supervised learning method or crowdsourcing), non-robustness of the support vector machine (SVM) often causes considerable performance deterioration. Although improving the robustness of SVM has been investigated for long time, robust SVM (RSVM) learning still poses two major challenges: obtaining a good (local) solution from a non-convex optimization problem and optimally controlling the robustness-efficiency trade-off. In this paper, we address these two issues simultaneously in an integrated way by introducing a novel homotopy approach to RSVM learning. Based on theoretical investigation of the geometry of RSVM solutions, we show that a path of local RSVM solutions can be computed efficiently when the influence of outliers is gradually suppressed as simulated annealing. We experimentally demonstrate that our algorithm tends to produce better local solutions than the alternative approach based on the concave-convex procedure, with the ability of stable and efficient model selection for controlling the influence of outliers.

NeurIPS Conference 2013 Conference Paper

Global Solver and Its Efficient Approximation for Variational Bayesian Low-rank Subspace Clustering

Shinichi Nakajima
Akiko Takeda
S. Derin Babacan
Masashi Sugiyama
Ichiro Takeuchi

When a probabilistic model and its prior are given, Bayesian learning offers inference with automatic parameter tuning. However, Bayesian learning is often obstructed by computational difficulty: the rigorous Bayesian learning is intractable in many models, and its variational Bayesian (VB) approximation is prone to suffer from local minima. In this paper, we overcome this difficulty for low-rank subspace clustering (LRSC) by providing an exact global solver and its efficient approximation. LRSC extracts a low-dimensional structure of data by embedding samples into the union of low-dimensional subspaces, and its variational Bayesian variant has shown good performance. We first prove a key property that the VB-LRSC model is highly redundant. Thanks to this property, the optimization problem of VB-LRSC can be separated into small subproblems, each of which has only a small number of unknown variables. Our exact global solver relies on another key property that the stationary condition of each subproblem is written as a set of polynomial equations, which is solvable with the homotopy method. For further computational efficiency, we also propose an efficient approximate variant, of which the stationary condition can be written as a polynomial equation with a single variable. Experimental results show the usefulness of our approach.

ICML Conference 2013 Conference Paper

Infinitesimal Annealing for Training Semi-Supervised Support Vector Machines

Kohei Ogawa
Motoki Imamura
Ichiro Takeuchi
Masashi Sugiyama

The semi-supervised support vector machine (S3VM) is a maximum-margin classification algorithm based on both labeled and unlabeled data. Training S3VM involves either a combinatorial or non-convex optimization problem and thus finding the global optimal solution is intractable in practice. It has been demonstrated that a key to successfully find a good (local) solution of S3VM is to gradually increase the effect of unlabeled data, a la annealing. However, existing algorithms suffer from the trade-off between the resolution of annealing steps and the computation cost. In this paper, we go beyond this trade-off by proposing a novel training algorithm that efficiently performs annealing with an infinitesimal resolution. Through experiments, we demonstrate that the proposed infinitesimal annealing algorithm tends to produce better solutions with less computation time than existing approaches.

NeurIPS Conference 2013 Conference Paper

Parametric Task Learning

Ichiro Takeuchi
Tatsuya Hongo
Masashi Sugiyama
Shinichi Nakajima

We introduce a novel formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle infinitely many tasks parameterized by a continuous parameter. Our key finding is that, for a certain class of PTL problems, the path of optimal task-wise solutions can be represented as piecewise-linear functions of the continuous task parameter. Based on this fact, we employ a parametric programming technique to obtain the common shared representation across all the continuously parameterized tasks efficiently. We show that our PTL formulation is useful in various scenarios such as learning under non-stationarity, cost-sensitive learning, and quantile regression, and demonstrate the usefulness of the proposed method experimentally in these scenarios.

ICML Conference 2013 Conference Paper

Safe Screening of Non-Support Vectors in Pathwise SVM Computation

Kohei Ogawa
Yoshiki Suzuki
Ichiro Takeuchi

In this paper, we claim that some of the non-support vectors (non-SVs) that have no influence on the classifier can be screened out prior to the training phase in pathwise SVM computation scenario, in which one is asked to train a sequence (or path) of SVM classifiers for different regularization parameters. Based on a recently proposed framework so-called safe screening rule, we derive a rule for screening out non-SVs in advance, and discuss how we can exploit the advantage of the rule in pathwise SVM computation scenario. Experiments indicate that our approach often substantially reduce the total pathwise computation cost.

NeurIPS Conference 2012 Conference Paper

Density-Difference Estimation

Masashi Sugiyama
Takafumi Kanamori
Taiji Suzuki
Marthinus Plessis
Song Liu
Ichiro Takeuchi

We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the first step is performed without regard to the second step and thus a small estimation error incurred in the first stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric finite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2-distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.

ICML Conference 2011 Conference Paper

Suboptimal Solution Path Algorithm for Support Vector Machine

Masayuki Karasuyama
Ichiro Takeuchi

NeurIPS Conference 2011 Conference Paper

Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification

Ichiro Takeuchi
Masashi Sugiyama

We consider feature selection and weighting for nearest neighbor classifiers. A technical challenge in this scenario is how to cope with the discrete update of nearest neighbors when the feature space metric is changed during the learning process. This issue, called the target neighbor change, was not properly addressed in the existing feature weighting and metric learning literature. In this paper, we propose a novel feature weighting algorithm that can exactly and efficiently keep track of the correct target neighbors via sequential quadratic programming. To the best of our knowledge, this is the first algorithm that guarantees the consistency between target neighbors and the feature space metric. We further show that the proposed algorithm can be naturally combined with regularization path tracking, allowing computationally efficient selection of the regularization parameter. We demonstrate the effectiveness of the proposed algorithm through experiments.

NeurIPS Conference 2009 Conference Paper

Multiple Incremental Decremental Learning of Support Vector Machines

Masayuki Karasuyama
Ichiro Takeuchi

We propose a multiple incremental decremental algorithm of Support Vector Machine (SVM). Conventional single cremental decremental SVM can update the trained model efficiently when single data point is added to or removed from the training set. When we add and/or remove multiple data points, this algorithm is time-consuming because we need to repeatedly apply it to each data point. The roposed algorithm is computationally more efficient when multiple data points are added and/or removed simultaneously. The single incremental decremental algorithm is built on an optimization technique called parametric programming. We extend the idea and introduce multi-parametric programming for developing the proposed algorithm. Experimental results on synthetic and real data sets indicate that the proposed algorithm can significantly reduce the computational cost of multiple incremental decremental operation. Our approach is especially useful for online SVM learning in which we need to remove old data points and add new data points in a short amount of time.

JMLR Journal 2006 Journal Article

Nonparametric Quantile Estimation

Ichiro Takeuchi
Quoc V. Le
Timothy D. Sears
Alexander J. Smola

In regression, the desired estimate of y | x is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of y | x, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. [abs] [ pdf ][ bib ] &copy JMLR 2006. ( edit, beta )

NeurIPS Conference 2001 Conference Paper

Estimating Car Insurance Premia: a Case Study in High-Dimensional Data Inference

Nicolas Chapados
Yoshua Bengio
Pascal Vincent
Joumana Ghosn
Charles Dugas
Ichiro Takeuchi
Linyan Meng

Estimating insurance premia from data is a difficult regression problem for several reasons: the large number of variables, many of which are. discrete, and the very peculiar shape of the noise distri(cid: 173) bution, asymmetric with fat tails, with a large majority zeros and a few unreliable and very large values. We compare several machine learning methods for estimating insurance premia, and test them on a large data base of car insurance policies. We find that func(cid: 173) tion approximation methods that do not optimize a squared loss, like Support Vector Machines regression, do not work well in this context. Compared methods include decision trees and generalized linear models. The best results are obtained with a mixture of experts, which better identifies the least and most risky contracts, and allows to reduce the median premium by charging more to the most risky customers.