Arrow Research search

Author name cluster

Pierre Baldi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
2 author rows

Possible papers

41

AIJ Journal 2025 Journal Article

A theory of synaptic neural balance: From local to global order

  • Pierre Baldi
  • Antonios Alexos
  • Ian Domingo
  • Alireza Rahmansetayesh

We develop a general theory of synaptic neural balance and how it can emerge or be enforced in neural networks. For a given regularizer, a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. Gradient descent on the error function alone does not converge in general to a balanced state, where every neuron is in balance, even when starting from a balanced state. However, gradient descent on the regularized error function ought to converge to a balanced state, and thus network balance can be used to assess learning progress. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. Simulations show that balancing neurons prior to learning, or during learning in alternation with gradient descent steps, can improve learning speed and final performance.

JMLR Journal 2025 Journal Article

ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation

  • Sungduk Yu
  • Zeyuan Hu
  • Akshay Subramaniam
  • Walter Hannah
  • Liran Peng
  • Jerry Lin
  • Mohamed Aziz Bhouri
  • Ritwik Gupta

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

AAAI Conference 2025 Conference Paper

Improving Deep Learning Speed and Performance Through Synaptic Neural Balance

  • Antonios Alexos
  • Ian Domingo
  • Pierre Baldi

We present theory of synaptic neural balance and we show experimentally that synaptic neural balance can improve deep learning speed, and accuracy, even in data-scarce environments. Given an additive cost function (regularizer) of the synaptic weights, a neuron is said to be in balance if the total cost of its incoming weights is equal to the total cost of its outgoing weights. For large classes of networks, activation functions, and regularizers, neurons can be balanced fully or partially using scaling operations that do not change their functionality. Furthermore, these balancing operations are associated with a strictly convex optimization problem with a single optimum and can be carried out in any order. In our simulations, we systematically observe that: (1) Fully balancing before training results in better performance as compared to several other training approaches; (2) Interleaving partial (layer-wise) balancing and stochastic gradient descent steps during training results in faster learning convergence and better overall accuracy (with L1 balancing converging faster than L2 balancing); and (3) When given limited training data, neural balanced models outperform plain or regularized models; and this is observed in both feedforward and recurrent networks. In short, the evidence supports that neural balancing operations could be added to the arsenal of methods used to regularize and train neural networks. Furthermore, balancing operations are entirely local and can be carried out asynchronously, making them plausible for biological or neuromorphic systems.

PRL Workshop 2024 Workshop Paper

Finding Reaction Mechanism Pathways with Deep Reinforcement Learning and Heuristic Search

  • Rojina Panta
  • Mohammadamin Tavakoli
  • Christian Geils
  • Pierre Baldi
  • Forest Agostinelli

Artificial intelligence (AI) has been used to predict the outcomes of chemical reactions. However, most of these reaction predictors are designed to predict the major outcome of overall transformations, skipping the chemical reactions at a mechanistic level. Therefore, we are unable to identify intermediates and byproducts of the reaction. Information on the reaction mechanisms enable practitioners to validate the feasibility of that reaction, identify intermediate molecules, improve reaction efficiency, and anticipate the results of similar reactions under various conditions. Despite recent efforts in developing mechanistic reaction predictors, predicting the sequence of mechanistic reactions given the reactants and products is currently an open area of research in chemistry. To address this issue, we use DeepCubeA with Hindsight Experience Replay to learn a heuristic function that generalizes over start and goal states to guide A* search to predict the sequence of mechanistic reactions of an overall chemical transformation, from reactants to products. problem, we will pose finding a sequence of reaction mechanisms as a pathfinding problem where the start states are reactants and the goal states are products. We will build on the DeepCubeA (Agostinelli et al. 2019) algorithm to learn a heuristic function represented as a deep neural network (DNN) (Schmidhuber 2015) with deep reinforcement learning and use the learned heuristic function with A* search (Hart, Nilsson, and Raphael 1968) to find paths. Overall Transformation S O Cl S Cl O S OH O Cl Major Product 1 S S Cl 21 O 10 OH 10 S 20 O Cl O Cl H O+ S 20 O- Cl 21 10=20; 20, 21=21

NeurIPS Conference 2024 Conference Paper

Nuclear Fusion Diamond Polishing Dataset

  • Antonios Alexos
  • Junze Liu
  • Shashank Galla
  • Sean Hayes
  • Kshitij Bhardwaj
  • Alexander Schwartz
  • Monika Biener
  • Pierre Baldi

In the Inertial Confinement Fusion (ICF) process, roughly a 2mm spherical shell made of high-density carbon is used as a target for laser beams, which compress and heat it to energy levels needed for high fusion yield in nuclear fusion. These shells are polished meticulously to meet the standards for a fusion shot. However, the polishing of these shells involves multiple stages, with each stage taking several hours. To make sure that the polishing process is advancing in the right direction, we are able to measure the shell surface roughness. This measurement, however, is very labor-intensive, time-consuming, and requires a human operator. To help improve the polishing process we have released the first dataset to the public that consists of raw vibration signals with the corresponding polishing surface roughness changes. We show that this dataset can be used with a variety of neural network based methods for prediction of the change of polishing surface roughness, hence eliminating the need for the time-consuming manual process. This is the first dataset of its kind to be released in public and its use will allow the operator to make any necessary changes to the ICF polishing process for optimal results. This dataset contains the raw vibration data of multiple polishing runs with their extracted statistical features and the corresponding surface roughness values. Additionally, to generalize the prediction models to different polishing conditions, we also apply domain adaptation techniques to improve prediction accuracy for conditions unseen by the trained model. The dataset is available in \url{https: //junzeliu. github. io/Diamond-Polishing-Dataset/}.

PRL Workshop 2024 Workshop Paper

Q* Search: Heuristic Search with Deep Q-Networks

  • Forest Agostinelli
  • Shahaf S. Shperberg
  • Alexander Shmakov
  • Stephen Marcus McAleer
  • Roy Fox
  • Pierre Baldi

Efficiently solving problems with large action spaces using A* search has been of importance to the artificial intelligence community for decades. This is because the computation and memory requirements of A* search grow linearly with the size of the action space. This burden becomes even more apparent when A* search uses a heuristic function learned by computationally expensive function approximators, such as deep neural networks. To address this problem, we introduce Q* search, a search algorithm that uses deep Q-networks to guide search in order to take advantage of the fact that the sum of the transition costs and heuristic values of the children of a node can be computed with a single forward pass through a deep Q-network without explicitly generating those children. This significantly reduces computation time and requires only one node to be generated per iteration. We use Q* search on different domains and action spaces, showing that Q* suffers from only a small runtime overhead as the action size increases. In addition, our empirical results show Q* search is up to 129 times faster and generates up to 1288 times fewer nodes than A* search. Finally, although obtaining admissible heuristic functions from deep neural networks is an ongoing area of research, we prove that Q* search is guaranteed to find a shortest path given a heuristic function does not overestimate the sum of the transition cost and cost-to-go of the state.

ICLR Conference 2024 Conference Paper

Toward Optimal Policy Population Growth in Two-Player Zero-Sum Games

  • Stephen Marcus McAleer
  • JB Lanier
  • Kevin A. Wang
  • Pierre Baldi
  • Tuomas Sandholm
  • Roy Fox

In competitive two-agent environments, deep reinforcement learning (RL) methods like Policy Space Response Oracles (PSRO) often increase exploitability between iterations, which is problematic when training in large games. To address this issue, we introduce anytime double oracle (ADO), an algorithm that ensures exploitability does not increase between iterations, and its approximate extensive-form version, anytime PSRO (APSRO). ADO converges to a Nash equilibrium while iteratively reducing exploitability. However, convergence in these algorithms may require adding all of a game's deterministic policies. To improve this, we propose Self-Play PSRO (SP-PSRO), which incorporates an approximately optimal stochastic policy into the population in each iteration. APSRO and SP-PSRO demonstrate lower exploitability and near-monotonic exploitability reduction in games like Leduc poker and Liar's Dice. Empirically, SP-PSRO often converges much faster than APSRO and PSRO, requiring only a few iterations in many games.

NeurIPS Conference 2023 Conference Paper

AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

  • Mohammadamin Tavakoli
  • Pierre Baldi
  • Ann Marie Carlton
  • Yin Ting Chiu
  • Alexander Shmakov
  • David Van Vranken

Deep learning-based reaction predictors have undergone significant architectural evolution. However, their reliance on reactions from the US Patent Office results in a lack of interpretable predictions and limited generalizability to other chemistry domains, such as radical and atmospheric chemistry. To address these challenges, we introduce a new reaction predictor system, RMechRP, that leverages contrastive learning in conjunction with mechanistic pathways, the most interpretable representation of chemical reactions. Specifically designed for radical reactions, RMechRP provides different levels of interpretation of chemical reactions. We develop and train multiple deep-learning models using RMechDB, a public database of radical reactions, to establish the first benchmark for predicting radical reactions. Our results demonstrate the effectiveness of RMechRP in providing accurate and interpretable predictions of radical reactions, and its potential for various applications in atmospheric chemistry.

NeurIPS Conference 2023 Conference Paper

ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation

  • Sungduk Yu
  • Walter Hannah
  • Liran Peng
  • Jerry Lin
  • Mohamed Aziz Bhouri
  • Ritwik Gupta
  • Björn Lütjens
  • Justus C. Will

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5. 7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https: //huggingface. co/datasets/LEAP/ClimSim_high-res) and code (https: //leap-stc. github. io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.

NeurIPS Conference 2023 Conference Paper

End-To-End Latent Variational Diffusion Models for Inverse Problems in High Energy Physics

  • Alexander Shmakov
  • Kevin Greif
  • Michael Fenton
  • Aishik Ghosh
  • Pierre Baldi
  • Daniel Whiteson

High-energy collisions at the Large Hadron Collider (LHC) provide valuable insights into open questions in particle physics. However, detector effects must be corrected before measurements can be compared to certain theoretical predictions or measurements from other detectors. Methods to solve this inverse problem of mapping detector observations to theoretical quantities of the underlying collision are essential parts of many physics analyses at the LHC. We investigate and compare various generative deep learning methods to approximate this inverse mapping. We introduce a novel unified architecture, termed latent variational diffusion models, which combines the latent learning of cutting-edge generative art approaches with an end-to-end variational framework. We demonstrate the effectiveness of this approach for reconstructing global distributions of theoretical kinematic quantities, as well as for ensuring the adherence of the learned posterior distributions to known physics constraints. Our unified approach achieves a distribution-free distance to the truth of over 20 times smaller than non-latent state-of-the-art baseline and 3 times smaller than traditional latent diffusion models.

NeurIPS Conference 2023 Conference Paper

Language Models can Solve Computer Tasks

  • Geunwoo Kim
  • Pierre Baldi
  • Stephen McAleer

Agents capable of carrying out general tasks on a computer can improve efficiency and productivity by automating repetitive tasks and assisting in complex problem-solving. Ideally, such agents should be able to solve new computer tasks presented to them through natural language commands. However, previous approaches to this problem require large amounts of expert demonstrations and task-specific reward functions, both of which are impractical for new tasks. In this work, we show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme where the agent \textbf{R}ecursively \textbf{C}riticizes and \textbf{I}mproves its output (RCI). The RCI approach significantly outperforms existing LLM methods for automating computer tasks and surpasses supervised learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++ benchmark. We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful of demonstrations per task rather than tens of thousands, and without a task-specific reward function. Furthermore, we demonstrate RCI prompting's effectiveness in enhancing LLMs' reasoning abilities on a suite of natural language reasoning tasks, outperforming chain of thought (CoT) prompting with external feedback. We find that RCI combined with CoT performs better than either separately. Our code can be found here: https: //github. com/posgnu/rci-agent.

AIJ Journal 2023 Journal Article

The quarks of attention: Structure and capacity of neural attention building blocks

  • Pierre Baldi
  • Roman Vershynin

Attention plays a fundamental role in both natural and artificial intelligence systems. In deep learning, attention-based neural architectures, such as transformer architectures, are widely used to tackle problems in natural language processing and beyond. Here we investigate the most fundamental building blocks of attention and their computational properties within the standard model of deep learning. We first derive a systematic taxonomy of all possible attention mechanisms within, or as extensions of the standard model into 18 classes depending on the origin of the attention signal, the target of the attention signal, and whether the interaction is additive or multiplicative. Second, using this taxonomy, we identify three key attention mechanisms: additive activation attention (multiplexing), multiplicative output attention (output gating), and multiplicative synaptic attention (synaptic gating). Output gating and synaptic gating are proper extensions of the standard model and all current attention-based architectures, including transformers, use either output gating or synaptic gating, or a combination of both. Third, we develop a theory of attention capacity and derive mathematical results about the capacity of basic attention networks comprising linear or polynomial threshold gates. For example, the output gating of a linear threshold gate of n variables by another linear threshold gate of the same n variables has capacity 2 n 2 ( 1 + o ( 1 ) ), achieving the maximal doubling of the capacity for a doubling of the number of parameters. Perhaps surprisingly, multiplexing attention is used in the proofs of these results. Synaptic and output gating provide computationally efficient extensions of the standard model enabling sparse quadratic activation functions. They can also be viewed as primitives for collapsing several layers of processing in the standard model into shallow compact representations.

IJCAI Conference 2021 Conference Paper

Deep Bucket Elimination

  • Yasaman Razeghi
  • Kalev Kask
  • Yadong Lu
  • Pierre Baldi
  • Sakshi Agarwal
  • Rina Dechter

Bucket Elimination (BE) is a universal inference scheme that can solve most tasks over probabilistic and deterministic graphical models exactly. However, it often requires exponentially high levels of memory (in the induced-width) preventing its execution. In the spirit of exploiting Deep Learning for inference tasks, in this paper, we will use neural networks to approximate BE. The resulting Deep Bucket Elimination (DBE) algorithm is developed for computing the partition function. We provide a proof-of-concept empirically using instances from several different benchmarks, showing that DBE can be a more accurate approximation than current state-of-the-art approaches for approximating BE (e. g. the mini-bucket schemes), especially when problems are sufficiently hard.

PRL Workshop 2021 Workshop Paper

Obtaining Approximately Admissible Heuristic Functions through Deep Reinforcement Learning and A* Search

  • Forest Agostinelli
  • Stephen McAleer
  • Alexander Shmakov
  • Roy Fox
  • Marco Valtorta
  • Biplav Srivastava
  • Pierre Baldi

real world applications would ensure that artificial intelligence agents can solve problems in the most efficient way Deep reinforcement learning has been shown to be able to possible, or close to the most efficient way possible, which train deep neural networks to implement effective heuristic could significantly reduce the resources consumed by such functions that can be used with A* search to solve probagents. lems with large state spaces. However, these learned heuristic Obtaining an admissible heuristic function often requires functions are not guaranteed to be admissible. We introduce domain-specific knowledge. For example, pattern databases approximately admissible conversion, an algorithm that can convert any inadmissible heuristic function into a heuristic (PDBs) (Culberson and Schaeffer 1998) have been sucfunction that is admissible in the vast majority of cases with cessful at finding optimal solutions to puzzles such as the no domain-specific heuristic information. We apply approxiRubik’s cube (Korf 1997), 15-puzzle, and 24-puzzle (Korf mately admissible conversion to heuristic functions parameand Felner 2002; Felner, Korf, and Hanan 2004). Howterized by deep neural networks and show that these heuristic ever, ensuring that these PDBs produce admissible heurisfunctions can be used to find optimal solutions, or bounded tics requires knowledge about how the puzzle pieces intersuboptimal solutions, even when doing a batched version of act. There has been previous research on using deep neuA* search. We test our method on the 15-puzzle and 24ral networks to learn heuristic functions (Chen and Wei puzzle and obtain a heuristic function that is empirically ad2011; Wang et al. 2019; Ferber, Helmert, and Hoffmann missible over 99. 99% of the time and that finds optimal so2020) including the DeepCubeA algorithm (McAleer et al. lutions for 100% of all test configurations. To the best of our knowledge, this is the first demonstration that approximately 2019; Agostinelli et al. 2019) which used deep reinforceadmissible heuristics can be obtained using deep neural netment learning and weighted A* search (Pohl 1970) to solve works in a domain independent fashion. the aforementioned puzzles. However, the heuristic functions produced by DeepCubeA are not admissible. In this paper, we define an approximately admissible

NeurIPS Conference 2021 Conference Paper

XDO: A Double Oracle Algorithm for Extensive-Form Games

  • Stephen McAleer
  • JB Lanier
  • Kevin A Wang
  • Pierre Baldi
  • Roy Fox

Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games.

NeurIPS Conference 2020 Conference Paper

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

  • Stephen McAleer
  • JB Lanier
  • Roy Fox
  • Pierre Baldi

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable PSRO-based method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of 10^50. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots. Experiment code is available at https: //github. com/JBLanier/pipeline-psro.

AAAI Conference 2019 Short Paper

Efficient Neutrino Oscillation Parameter Inference with Gaussian Process

  • Lingge Li
  • Nitish Nayak
  • Jianming Bian
  • Pierre Baldi

Many experiments have been set-up to measure the parameters governing the neutrino oscillation probabilities accurately, with implications for the fundamental structure of the universe. Very often, this involves inferences from tiny samples of data which have complicated dependencies on multiple oscillation parameters simultaneously. This is typically carried out using the unified approach of Feldman and Cousins which is very computationally expensive, on the order of tens of millions of CPU hours. In this work, we propose an iterative method using Gaussian Process to efficiently find a confidence contour for the oscillation parameters and show that it produces the same results at a fraction of the computation cost.

IJCAI Conference 2019 Conference Paper

Learning in the Machine: Random Backpropagation and the Deep Learning Channel (Extended Abstract)

  • Pierre Baldi
  • Peter Sadowski
  • Zhiqin Lu

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the requirement of maintaining symmetric weights in a physical neural system. To better understand RBP, we compare different algorithms in terms of the information available locally to each neuron. In the process, we derive several alternatives to RBP, including skipped RBP (SRBP), adaptive RBP (ARBP), sparse RBP, and study their behavior through simulations. These simulations show that many variants are also robust deep learning algorithms, but that the derivative of the transfer function is important in the learning rule. Finally, we prove several mathematical results including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

NeurIPS Conference 2019 Conference Paper

Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes

  • Lingge Li
  • Dustin Pluta
  • Babak Shahbaba
  • Norbert Fortin
  • Hernando Ombao
  • Pierre Baldi

Dynamic functional connectivity, as measured by the time-varying covariance of neurological signals, is believed to play an important role in many aspects of cognition. While many methods have been proposed, reliably establishing the presence and characteristics of brain connectivity is challenging due to the high dimensionality and noisiness of neuroimaging data. We present a latent factor Gaussian process model which addresses these challenges by learning a parsimonious representation of connectivity dynamics. The proposed model naturally allows for inference and visualization of the time-varying connectivity. As an illustration of the scientific utility of the model, application to a data set of rat local field potential activity recorded during a complex non-spatial memory task provides evidence of stimuli differentiation.

AIJ Journal 2018 Journal Article

Learning in the machine: Random backpropagation and the deep learning channel

  • Pierre Baldi
  • Peter Sadowski
  • Zhiqin Lu

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRBP), adaptive RBP (ARBP), sparse RBP, and their combinations (e. g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 benchmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results for RBP and its variants including: (1) the convergence to optimal fixed points for linear chains of arbitrary length; (2) convergence to fixed points for linear autoencoders with decorrelated data; (3) long-term existence of solutions for linear systems with a single hidden layer, and their convergence in special cases; and (4) convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.

NeurIPS Conference 2018 Conference Paper

On Neuronal Capacity

  • Pierre Baldi
  • Roman Vershynin

We define the capacity of a learning machine to be the logarithm of the number (or volume) of the functions it can implement. We review known results, and derive new results, estimating the capacity of several neuronal models: linear and polynomial threshold gates, linear and polynomial threshold gates with constrained weights (binary weights, positive weights), and ReLU neurons. We also derive capacity estimates and bounds for fully recurrent networks and layered feedforward networks.

NeurIPS Conference 2014 Conference Paper

Searching for Higgs Boson Decay Modes with Deep Learning

  • Peter Sadowski
  • Daniel Whiteson
  • Pierre Baldi

Particle colliders enable us to probe the fundamental nature of matter by observing exotic particles produced by high-energy collisions. Because the experimental measurements from these collisions are necessarily incomplete and imprecise, machine learning algorithms play a major role in the analysis of experimental data. The high-energy physics community typically relies on standardized machine learning software packages for this analysis, and devotes substantial effort towards improving statistical power by hand crafting high-level features derived from the raw collider measurements. In this paper, we train artificial neural networks to detect the decay of the Higgs boson to tau leptons on a dataset of 82 million simulated collision events. We demonstrate that deep neural network architectures are particularly well-suited for this task with the ability to automatically discover high-level features from the data and increase discovery significance.

NeurIPS Conference 2013 Conference Paper

Understanding Dropout

  • Pierre Baldi
  • Peter Sadowski

Dropout is a relatively new algorithm for training neural networks which relies on stochastically dropping out'' neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characterized by three recursive equations, including the approximation of expectations by normalized weighted geometric means. We provide estimates and bounds for these approximations and corroborate the results with simulations. We also show in simple cases how dropout performs stochastic gradient descent on a regularized error function. "

NeurIPS Conference 2012 Conference Paper

Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction

  • Pietro Lena
  • Ken Nagata
  • Pierre Baldi

Residue-residue contact prediction is a fundamental problem in protein structure prediction. Hower, despite considerable research efforts, contact prediction methods are still largely unreliable. Here we introduce a novel deep machine-learning architecture which consists of a multidimensional stack of learning modules. For contact prediction, the idea is implemented as a three-dimensional stack of Neural Networks NN^k_{ij}, where i and j index the spatial coordinates of the contact map and k indexes ''time''. The temporal dimension is introduced to capture the fact that protein folding is not an instantaneous process, but rather a progressive refinement. Networks at level k in the stack can be trained in supervised fashion to refine the predictions produced by the previous level, hence addressing the problem of vanishing gradients, typical of deep architectures. Increased accuracy and generalization capabilities of this approach are established by rigorous comparison with other classical machine learning approaches for contact prediction. The deep approach leads to an accuracy for difficult long-range contacts of about 30%, roughly 10% above the state-of-the-art. Many variations in the architectures and the training algorithms are possible, leaving room for further improvements. Furthermore, the approach is applicable to other problems with strong underlying spatial and temporal components.

NeurIPS Conference 2011 Conference Paper

A Machine Learning Approach to Predict Chemical Reactions

  • Matthew Kayala
  • Pierre Baldi

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not high-throughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an existing rule-based expert system to derive a dataset consisting of 2, 989 productive mechanistic steps and 6. 14 million non-productive mechanistic steps. We then pose identifying productive mechanistic steps as a ranking problem: rank potential orbital interactions such that the top ranked interactions yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94. 0% of non-productive reactions with less than a 0. 1% false negative rate. Then, we train an ensemble of ranking models on pairs of interacting orbitals to learn a relative productivity function over single mechanistic reactions in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanisms at the top 89. 1% of the time, rising to 99. 9% of the time when top ranked lists with at most four non-productive reactions are considered. The final system allows multi-step reaction prediction. Furthermore, it is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert system does not handle.

NeurIPS Conference 2007 Conference Paper

Mining Internet-Scale Software Repositories

  • Erik Linstead
  • Paul Rigor
  • Sushil Bajracharya
  • Cristina Lopes
  • Pierre Baldi

Large repositories of source code create new challenges and opportunities for statistical machine learning. Here we first develop an infrastructure for the automated crawling, parsing, and database storage of open source software. The infrastructure allows us to gather Internet-scale source code. For instance, in one experiment, we gather 4, 632 java projects from SourceForge and Apache totaling over 38 million lines of code from 9, 250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, SLOC, and method call distributions. We then develop and apply unsupervised author-topic, probabilistic models to automatically discover the topics embedded in the code and extract topic-word and author-topic distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the AUC metric to 0. 86-- roughly 10-30% better than previous approaches based on text alone.

NeurIPS Conference 2006 Conference Paper

A Scalable Machine Learning Approach to Go

  • Lin Wu
  • Pierre Baldi

Go is an ancient board game that poses unique opportunities and challenges for AI and machine learning. Here we develop a machine learning approach to Go, and related board games, focusing primarily on the problem of learning a good eval- uation function in a scalable way. Scalability is essential at multiple levels, from the library of local tactical patterns, to the integration of patterns across the board, to the size of the board itself. The system we propose is capable of automatically learning the propensity of local patterns from a library of games. Propensity and other local tactical information are fed into a recursive neural network, derived from a Bayesian network architecture. The network integrates local information across the board and produces local outputs that represent local territory owner- ship probabilities. The aggregation of these probabilities provides an effective strategic evaluation function that is an estimate of the expected area at the end (or at other stages) of the game. Local area targets for training can be derived from datasets of human games. A system trained using only 9 × 9 amateur game data performs surprisingly well on a test set derived from 19 × 19 professional game data. Possible directions for further improvements are briefly discussed.

NeurIPS Conference 2005 Conference Paper

Bayesian Surprise Attracts Human Attention

  • Laurent Itti
  • Pierre Baldi

The concept of surprise is central to sensory processing, adaptation, learning, and attention. Yet, no widely-accepted mathematical theory currently exists to quantitatively characterize surprise elicited by a stimulus or event, for observers that range from single neurons to complex natural or engineered systems. We describe a formal Bayesian definition of surprise that is the only consistent formulation under minimal axiomatic assumptions. Surprise quantifies how data affects a natural or artificial observer, by measuring the difference between posterior and prior beliefs of the observer. Using this framework we measure the extent to which humans direct their gaze towards surprising items while watching television and video games. We find that subjects are strongly attracted towards surprising locations, with 72% of all human gaze shifts directed towards locations more surprising than the average, a figure which rises to 84% when considering only gaze targets simultaneously selected by all subjects. The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction. Life is full of surprises, ranging from a great christmas gift or a new magic trick, to wardrobe malfunctions, reckless drivers, terrorist attacks, and tsunami waves. Key to survival is our ability to rapidly attend to, identify, and learn from surprising events, to decide on present and future courses of action [1]. Yet, little theoretical and computational understanding exists of the very essence of surprise, as evidenced by the absence from our everyday vocabulary of a quantitative unit of surprise: Qualities such as the "wow factor" have remained vague and elusive to mathematical analysis. Informal correlates of surprise exist at nearly all stages of neural processing. In sensory neuroscience, it has been suggested that only the unexpected at one stage is transmitted to the next stage [2]. Hence, sensory cortex may have evolved to adapt to, to predict, and to quiet down the expected statistical regularities of the world [3, 4, 5, 6], focusing instead on events that are unpredictable or surprising. Electrophysiological evidence for this early sensory emphasis onto surprising stimuli exists from studies of adaptation in visual [7, 8, 4, 9], olfactory [10, 11], and auditory cortices [12], subcortical structures like the LGN [13], and even retinal ganglion cells [14, 15] and cochlear hair cells [16]: neural response greatly attenuates with repeated or prolonged exposure to an initially novel stimulus. Surprise and novelty are also central to learning and memory formation [1], to the point that surprise is believed to be a necessary trigger for associative learning [17, 18], as supported by mounting evidence for a role of the hippocampus as a novelty detector [19, 20, 21]. Finally, seeking novelty is a well-identified human character trait, with possible association with the dopamine D4 receptor gene [22, 23, 24]. In the Bayesian framework, we develop the only consistent theory of surprise, in terms of the difference between the posterior and prior distributions of beliefs of an observer over the available class of models or hypotheses about the world. We show that this definition derived from first principles presents key advantages over more ad-hoc formulations, typically relying on detecting outlier stimuli. Armed with this new framework, we provide direct experimental evidence that surprise best characterizes what attracts human gaze in large amounts of natural video stimuli. We here extend a recent pilot study [25], adding more comprehensive theory, large-scale human data collection, and additional analysis.

NeurIPS Conference 2004 Conference Paper

Large-Scale Prediction of Disulphide Bond Connectivity

  • Jianlin Cheng
  • Alessandro Vullo
  • Pierre Baldi

The formation of disulphide bridges among cysteines is an important fea- ture of protein structures. Here we develop new methods for the predic- tion of disulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cys- teine pairs. These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method con- sistently achieves better results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain.

JMLR Journal 2003 Journal Article

The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem

  • Pierre Baldi
  • Gianluca Pollastri

We describe a general methodology for the design of large-scale recursive neural network architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden node variables; (2) parameterization of the relationship between each variable and its parent variables by feedforward neural networks; and (3) application of weight-sharing within appropriate subsets of DAG connections to capture stationarity and control model complexity. Here we use these principles to derive several specific classes of DAG-RNN architectures based on lattices, trees, and other structured graphs. These architectures can process a wide range of data structures with variable sizes and dimensions. While the overall resulting models remain probabilistic, the internal deterministic dynamics allows efficient propagation of information, as well as training by gradient descent, in order to tackle large-scale problems. These methods are used here to derive state-of-the-art predictors for protein structural features such as secondary structure (1D) and both fine- and coarse-grained contact maps (2D). Extensions, relationships to graphical models, and implications for the design of neural architectures are briefly discussed. The protein prediction servers are available over the Web at: www.igb.uci.edu/tools.htm. [abs] [ pdf ][ ps.gz ][ ps ]

NeurIPS Conference 2002 Conference Paper

Prediction of Protein Topologies Using Generalized IOHMMs and RNNs

  • Gianluca Pollastri
  • Pierre Baldi
  • Alessandro Vullo
  • Paolo Frasconi

We develop and test new machine learning methods for the predic- tion of topological representations of protein structures in the form of coarse- or (cid: 12)ne-grained contact or distance maps that are transla- tion and rotation invariant. The methods are based on generalized input-output hidden Markov models (GIOHMMs) and generalized recursive neural networks (GRNNs). The methods are used to pre- dict topology directly in the (cid: 12)ne-grained case and, in the coarse- grained case, indirectly by (cid: 12)rst learning how to score candidate graphs and then using the scoring function to search the space of possible con(cid: 12)gurations. Computer simulations show that the pre- dictors achieve state-of-the-art performance. 1 Introduction: Protein Topology Prediction Predicting the 3D structure of protein chains from the linear sequence of amino acids is a fundamental open problem in computational molecular biology [1]. Any approach to the problem must deal with the basic fact that protein structures are translation and rotation invariant. To address this invariance, we have proposed a machine learning approach to protein structure prediction [4] based on the predic- tion of topological representations of proteins, in the form of contact or distance maps. The contact or distance map is a 2D representation of neighborhood rela- tionships consisting of an adjacency matrix at some distance cuto(cid: 11) (typically in the range of 6 to 12 (cid: 23)A), or a matrix of pairwise Euclidean distances. Fine-grained maps are derived at the amino acid or even atomic level. Coarse maps are obtained by looking at secondary structure elements, such as helices, and the distance between their centers of gravity or, as in the simulations below, the minimal distances be- tween their C(cid: 11) atoms. Reasonable methods for reconstructing 3D coordinates from contact/distance maps have been developed in the NMR literature and elsewhere

NeurIPS Conference 1995 Conference Paper

Universal Approximation and Learning of Trajectories Using Oscillators

  • Pierre Baldi
  • Kurt Hornik

Natural and artificial neural circuits must be capable of travers(cid: 173) ing specific state space trajectories. A natural approach to this problem is to learn the relevant trajectories from examples. Un(cid: 173) fortunately, gradient descent learning of complex trajectories in amorphous networks is unsuccessful. We suggest a possible ap(cid: 173) proach where trajectories are realized by combining simple oscil(cid: 173) lators, in various modular ways. We contrast two regimes of fast and slow oscillations. In all cases, we show that banks of oscillators with bounded frequencies have universal approximation properties. Open questions are also discussed briefly. 1 INTRODUCTION: TRAJECTORY LEARNING The design of artificial neural systems, in robotics applications and others, often leads to the problem of constructing a recurrent neural network capable of producing a particular trajectory, in the state space of its visible units. Throughout evolution, biological neural systems, such as central pattern generators, have also been faced with similar challenges. A natural approach to tackle this problem is to try to "learn" the desired trajectory, for instance through a process of trial and error and subsequent optimization. Unfortunately, gradient descent learning of complex trajectories in amorphous networks is unsuccessful. Here, we suggest a possible approach where trajectories are realized, in a modular and hierarchical fashion, by combining simple oscillators. In particular, we show that banks of oscillators have universal approximation properties. Also with the Jet Propulsion Laboratory, California Institute of Technology. 452 P. BALDI, K. HORNIK To begin with, we can restrict ourselves to the simple case of a network with one! visible linear unit and consider the problem of adjusting the network parameters in a way that the output unit activity u(t) is equal to a target function I(t), over an interval of time [0, T]. The hidden units of the network may be non-linear and satisfy, for instance, one of the usual neural network charging equations such as dUi dt = - Ti + L. .JjWij/jUj(t - Tij),

NeurIPS Conference 1994 Conference Paper

Inferring Ground Truth from Subjective Labelling of Venus Images

  • Padhraic Smyth
  • Usama Fayyad
  • Michael Burl
  • Pietro Perona
  • Pierre Baldi

In remote sensing applications "ground-truth" data is often used as the basis for training pattern recognition algorithms to gener(cid: 173) ate thematic maps or to detect objects of interest. In practical situations, experts may visually examine the images and provide a subjective noisy estimate of the truth. Calibrating the reliability and bias of expert labellers is a non-trivial problem. In this paper we discuss some of our recent work on this topic in the context of detecting small volcanoes in Magellan SAR images of Venus. Empirical results (using the Expectation-Maximization procedure) suggest that accounting for subjective noise can be quite signifi(cid: 173) cant in terms of quantifying both human and algorithm detection performance.

NeurIPS Conference 1993 Conference Paper

Hidden Markov Models for Human Genes

  • Pierre Baldi
  • Søren Brunak
  • Yves Chauvin
  • Jacob Engelbrecht
  • Anders Krogh

Human genes are not continuous but rather consist of short cod(cid: 173) ing regions (exons) interspersed with highly variable non-coding regions (introns). We apply HMMs to the problem of modeling ex(cid: 173) ons, introns and detecting splice sites in the human genome. Our most interesting result so far is the detection of particular oscilla(cid: 173) tory patterns, with a minimal period ofroughly 10 nucleotides, that seem to be characteristic of exon regions and may have significant biological implications. • and Division of Biology, California Institute of Technology. t and Department of Psychology, Stanford University.

NeurIPS Conference 1992 Conference Paper

Hidden Markov Models in Molecular Biology: New Algorithms and Applications

  • Pierre Baldi
  • Yves Chauvin
  • Tim Hunkapiller
  • Marcella McClure

Hidden Markov Models (HMMs) can be applied to several impor(cid: 173) tant problems in molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied on-line or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif de(cid: 173) tection, and classification. *and Division of Biology, California Institute of Technology. t and Department of Psychology, Stanford University.

NeurIPS Conference 1990 Conference Paper

Computing with Arrays of Bell-Shaped and Sigmoid Functions

  • Pierre Baldi

We consider feed-forward neural networks with one non-linear hidden layer and linear output units. The transfer function in the hidden layer are ei(cid: 173) ther bell-shaped or sigmoid. In the bell-shaped case, we show how Bern(cid: 173) stein polynomials on one hand and the theory of the heat equation on the other are relevant for understanding the properties of the corresponding networks. In particular, these techniques yield simple proofs of universal approximation properties, i. e. of the fact that any reasonable function can be approximated to any degree of precision by a linear combination of bell(cid: 173) shaped functions. In addition, in this framework the problem of learning is equivalent to the problem of reversing the time course of a diffusion pro(cid: 173) cess. The results obtained in the bell-shaped case can then be applied to the case of sigmoid transfer functions in the hidden layer, yielding similar universality results. A conjecture related to the problem of generalization is briefly examined.

NeurIPS Conference 1987 Conference Paper

On Properties of Networks of Neuron-Like Elements

  • Pierre Baldi
  • Santosh Venkatesh

The complexity and computational capacity of multi-layered, feedforward neural networks is examined. Neural networks for special purpose (structured) functions are examined from the perspective of circuit complexity. Known re(cid: 173) sults in complexity theory are applied to the special instance of neural network circuits, and in particular, classes of functions that can be implemented in shallow circuits characterised. Some conclusions are also drawn about learning complexity, and some open problems raised. The dual problem of determining the computational capacity of a class of multi-layered networks with dynamics regulated by an algebraic Hamiltonian is considered. Formal results are pre(cid: 173) sented on the storage capacities of programmed higher-order structures, and a tradeoff between ease of programming and capacity is shown. A precise de(cid: 173) termination is made of the static fixed point structure of random higher-order constructs, and phase-transitions (0-1 laws) are shown.