Arrow Research search

Author name cluster

Gavin Brown

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

TMLR Journal 2024 Journal Article

Bias/Variance is not the same as Approximation/Estimation

  • Gavin Brown
  • Riccardo Ali

We study the relation between two classical results: the bias-variance decomposition, and the approximation-estimation decomposition. Both are important conceptual tools in Machine Learning, helping us describe the nature of model fitting. It is commonly stated that they are “closely related”, or “similar in spirit”. However, sometimes it is said they are equivalent. In fact they are different, but have subtle connections cutting across learning theory, classical statistics, and information geometry, that (very surprisingly) have not been previously observed. We present several results for losses expressible as Bregman divergences: a broad family with a known bias-variance decomposition. Discussion and future directions are presented for more general losses, including the 0/1 classification loss.

JMLR Journal 2023 Journal Article

A Unified Theory of Diversity in Ensemble Learning

  • Danny Wood
  • Tingting Mu
  • Andrew M. Webb
  • Henry W. J. Reeve
  • Mikel Luján
  • Gavin Brown

We present a theory of ensemble diversity, explaining the nature of diversity for a wide range of supervised learning scenarios. This challenge has been referred to as the “holy grail” of ensemble learning, an open research issue for over 30 years. Our framework reveals that diversity is in fact a hidden dimension in the bias-variance decomposition of the ensemble loss. We prove a family of exact bias-variance-diversity decompositions, for a wide range of losses in both regression and classification, e.g., squared, cross-entropy, and Poisson losses. For losses where an additive bias-variance decomposition is not available (e.g., 0/1 loss) we present an alternative approach: quantifying the effects of diversity, which turn out to be dependent on the label distribution. Overall, we argue that diversity is a measure of model fit, in precisely the same sense as bias and variance, but accounting for statistical dependencies between ensemble members. Thus, we should not be ‘maximising diversity’ as so many works aim to do---instead, we have a bias/variance/diversity trade-off to manage. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

Covariance-Aware Private Mean Estimation Without Private Covariance Estimation

  • Gavin Brown
  • Marco Gaboardi
  • Adam Smith
  • Jonathan Ullman
  • Lydia Zakynthinou

We present two sample-efficient differentially private mean estimators for $d$-dimensional (sub)Gaussian distributions with unknown covariance. Informally, given $n \gtrsim d/\alpha^2$ samples from such a distribution with mean $\mu$ and covariance $\Sigma$, our estimators output $\tilde\mu$ such that $\| \tilde\mu - \mu \|_{\Sigma} \leq \alpha$, where $\| \cdot \|_{\Sigma}$ is the \emph{Mahalanobis distance}. All previous estimators with the same guarantee either require strong a priori bounds on the covariance matrix or require $\Omega(d^{3/2})$ samples. Each of our estimators is based on a simple, general approach to designing differentially private mechanisms, but with novel technical steps to make the estimator private and sample-efficient. Our first estimator samples a point with approximately maximum Tukey depth using the exponential mechanism, but restricted to the set of points of large Tukey depth. Proving that this mechanism is private requires a novel analysis. Our second estimator perturbs the empirical mean of the data set with noise calibrated to the empirical covariance. Only the mean is released, however; the covariance is only used internally. Its sample complexity guarantees hold more generally for subgaussian distributions, albeit with a slightly worse dependence on the privacy parameter. For both estimators, careful preprocessing of the data is required to satisfy differential privacy.

JMLR Journal 2018 Journal Article

On the Stability of Feature Selection Algorithms

  • Sarah Nogueira
  • Konstantinos Sechidis
  • Gavin Brown

Feature Selection is central to modern data science, from exploratory data analysis to predictive model-building. The “stability” of a feature selection algorithm refers to the robustness of its feature preferences, with respect to data sampling and to its stochastic nature. An algorithm is `unstable' if a small change in data leads to large changes in the chosen feature subset. Whilst the idea is simple, quantifying this has proven more challenging---we note numerous proposals in the literature, each with different motivation and justification. We present a rigorous statistical treatment for this issue. In particular, with this work we consolidate the literature and provide (1) a deeper understanding of existing work based on a small set of properties, and (2) a clearly justified statistical approach with several novel benefits. This approach serves to identify a stability measure obeying all desirable properties, and (for the first time in the literature) allowing confidence intervals and hypothesis tests on the stability, enabling rigorous experimental comparison of feature selection algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

KR Conference 2014 Short Paper

Predicting Performance of OWL Reasoners: Locally or Globally?

  • Viachaslau Sazonau
  • Uli Sattler
  • Gavin Brown

ontologies which are performance homogeneous and others are performance heterogeneous for a reasoner (Gonçalves, Parsia, and Sattler 2012). An ontology O is performance heterogeneous if classification time of O0 ⊂ O is not linearly correlated with its relative size. The latter suggests that interaction effects come into play when certain axioms are part of the ontology and, consequently, loose their impact on reasoner’s performance when those axioms are removed. These observations make performance prediction of a reasoner on a given input challenging and often impossible even for experienced and skilled ontology engineers. However, the ability to predict performance is attractive. Firstly, reasoner developers equipped with a predictor would be able to find out which ontologies are harder for a reasoner. This can speed up tests of reasoners and facilitate new optimizations. Secondly, an estimate of classification time might allow an ontology engineer to decide whether the reasoning task is worth waiting for or not. As a practical example, the progress bar of an ontology editor, such as Protégé 4, 3 could be made more reliable with a good performance predictor. It could also supply auto-picking the fastest reasoner for a given ontology from the list of available reasoners. The contributions of this work are three-fold. Firstly, we suggest an approach to analyse intercorrelation of ontology features using PCA, given a corpus of ontologies, and observe that ontology features are highly intercorrelated so that 57 features can be “faithfully” replaced by one or two features. Secondly, we introduce a new method for performance prediction which is competitive with existing performance predictors. Thirdly, we discuss different prediction accuracy measures. All details can be found in the technical report (Sazonau, Sattler, and Brown 2013). We propose a novel approach for performance prediction of OWL reasoners that selects suitable, small ontology subsets, and then extrapolates reasoner’s performance on them to the whole ontology. We investigate intercorrelation of ontology features using PCA and discuss various error measures for performance prediction.

JMLR Journal 2013 Journal Article

Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications

  • Ming-Jie Zhao
  • Narayanan Edakunni
  • Adam Pocock
  • Gavin Brown

Fano's inequality lower bounds the probability of transmission error through a communication channel. Applied to classification problems, it provides a lower bound on the Bayes error rate and motivates the widely used Infomax principle. In modern machine learning, we are often interested in more than just the error rate. In medical diagnosis, different errors incur different cost; hence, the overall risk is cost-sensitive. Two other popular criteria are balanced error rate (BER) and F-score. In this work, we focus on the two-class problem and use a general definition of conditional entropy (including Shannon's as a special case) to derive upper/lower bounds on the optimal F-score, BER and cost-sensitive risk, extending Fano's result. As a consequence, we show that Infomax is not suitable for optimizing F-score or cost-sensitive risk, in that it can potentially lead to low F-score and high risk. For cost-sensitive risk, we propose a new conditional entropy formulation which avoids this inconsistency. In addition, we consider the common practice of using a threshold on the posterior probability to tune performance of a classifier. As is widely known, a threshold of 0.5, where the posteriors cross, minimizes error rate---we derive similar optimal thresholds for F-score and BER. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

JMLR Journal 2012 Journal Article

Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection

  • Gavin Brown
  • Adam Pocock
  • Ming-Jie Zhao
  • Mikel Luján

We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature-instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

JMLR Journal 2005 Journal Article

Managing Diversity in Regression Ensembles

  • Gavin Brown
  • Jeremy L. Wyatt
  • Peter Tiňo

Ensembles are a widely used and effective technique in machine learning---their success is commonly attributed to the degree of disagreement, or 'diversity', within the ensemble. For ensembles where the individual estimators output crisp class labels, this 'diversity' is not well understood and remains an open research issue. For ensembles of regression estimators, the diversity can be exactly formulated in terms of the covariance between individual estimator outputs, and the optimum level is expressed in terms of a bias-variance-covariance trade-off. Despite this, most approaches to learning ensembles use heuristics to encourage the right degree of diversity. In this work we show how to explicitly control diversity through the error function. The first contribution of this paper is to show that by taking the combination mechanism for the ensemble into account we can derive an error function for each individual that balances ensemble diversity with individual accuracy. We show the relationship between this error function and an existing algorithm called negative correlation learning, which uses a heuristic penalty term added to the mean squared error function. It is demonstrated that these methods control the bias-variance-covariance trade-off systematically, and can be utilised with any estimator capable of minimising a quadratic error function, for example MLPs, or RBF networks. As a second contribution, we derive a strict upper bound on the coefficient of the penalty term, which holds for any estimator that can be cast in a generalised linear regression framework, with mild assumptions on the basis functions. Finally we present the results of an empirical study, showing significant improvements over simple ensemble learning, and finding that this technique is competitive with a variety of methods, including boosting, bagging, mixtures of experts, and Gaussian processes, on a number of tasks. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )