Author name cluster

Daniel Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers

1 author row

TMLR Journal 2026 Journal Article

AC$\oplus$DC search: behind the winning solution to the FlyWire graph-matching challenge

Daniel Lee
Arie Matsliah
Lawrence K. Saul

This paper describes the Alternating Continuous and Discrete Combinatorial (AC$\oplus$DC) optimizations behind the winning solution to the FlyWire Ventral Nerve Cord Matching Challenge. The challenge was organized by the Princeton Neuroscience Institute and held over three months, ending on January 31, 2025. During this period, the challenge attracted teams of researchers with expertise in machine learning, high-performance computing, graph data mining, biological network analysis, and quadratic assignment problems. The goal of the challenge was to align the connectomes of a male and female fruit fly, and more specifically, to determine a one-to-one correspondence between the neurons in their ventral nerve cords. The connectomes were represented as sparse weighted graphs with thousands of nodes and millions of edges, and the challenge was to find the permutation that best maps the nodes and edges of one graph onto those of the other. The winning solution to the challenge alternated between two complementary approaches to graph matching---the first, a combinatorial optimization over the symmetric group of permutations, and the second, a continuous relaxation of this problem to the space of doubly stochastic matrices. For the latter, the doubly stochastic matrices were optimized by combining Frank-Wolfe methods with a fast preconditioner to solve the linear assignment problem at each iteration. We provide a complete implementation of these methods with a few hundred lines of code in MATLAB. Notably, this implementation obtains a winning score to the challenge in less than 10 minutes on a laptop computer.

AAAI Conference 2025 System Paper

Rewind and Render: Towards Factually Accurate Text-to-Video Generation with Distilled Knowledge Retrieval

Daniel Lee
Arjun Chandra
Yang Zhou
Yunyao Li
Simone Conia

Text-to-Video (T2V) models, despite recent advancements, struggle with factual accuracy, especially for knowledge-dense content. We introduce FACT-V (Factual Accuracy in Content Translation to Video), a system integrating multi-source knowledge retrieval into T2V pipelines. FACT-V offers two key benefits: i) improved factual accuracy of generated videos through dynamically retrieved information, and ii) increased interpretability by providing users with the augmented prompt information. A preliminary evaluation demonstrates the potential of knowledge-augmented approaches in improving the accuracy and reliability of T2V systems, particularly for entity-specific or time-sensitive prompts.

PDF Details DOI

AAAI Conference 2024 System Paper

Enhancing Machine Translation Experiences with Multilingual Knowledge Graphs

Simone Conia
Daniel Lee
Min Li
Umar Farooq Minhas
Yunyao Li

Translating entity names, especially when a literal translation is not correct, poses a significant challenge. Although Machine Translation (MT) systems have achieved impressive results, they still struggle to translate cultural nuances and language-specific context. In this work, we show that the integration of multilingual knowledge graphs into MT systems can address this problem and bring two significant benefits: i) improving the translation of utterances that contain entities by leveraging their human-curated aliases from a multilingual knowledge graph, and, ii) increasing the interpretability of the translation process by providing the user with information from the knowledge graph.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

A theory of weight distribution-constrained learning

Weishun Zhong
Ben Sorscher
Daniel Lee
Haim Sompolinsky

A central question in computational neuroscience is how structure determines function in neural networks. Recent large-scale connectomic studies have started to provide a wealth of structural information such as the distribution of excitatory/inhibitory cell and synapse types as well as the distribution of synaptic weights in the brains of different species. The emerging high-quality large structural datasets raise the question of what general functional principles can be gleaned from them. Motivated by this question, we developed a statistical mechanical theory of learning in neural networks that incorporates structural information as constraints. We derived an analytical solution for the memory capacity of the perceptron, a basic feedforward model of supervised learning, with constraint on the distribution of its weights. Interestingly, the theory predicts that the reduction in capacity due to the constrained weight-distribution is related to the Wasserstein distance between the cumulative distribution function of the constrained weights and that of the standard normal distribution. To test the theoretical predictions, we use optimal transport theory and information geometry to develop an SGD-based algorithm to find weights that simultaneously learn the input-output task and satisfy the distribution constraint. We show that training in our algorithm can be interpreted as geodesic flows in the Wasserstein space of probability distributions. Given a parameterized family of weight distributions, our theory predicts the shape of the distribution with optimal parameters. We apply our theory to map out the experimental parameter landscape for the estimated distribution of synaptic weights in mammalian cortex and show that our theory’s prediction for optimal distribution is close to the experimentally measured value. We further developed a statistical mechanical theory for teacher-student perceptron rule learning and ask for the best way for the student to incorporate prior knowledge of the rule (i. e. , the teacher). Our theory shows that it is beneficial for the learner to adopt different prior weight distributions during learning, and shows that distribution-constrained learning outperforms unconstrained and sign-constrained learning. Our theory and algorithm provide novel strategies for incorporating prior knowledge about weights into learning, and reveal a powerful connection between structure and function in neural networks.

NeurIPS Conference 2022 Conference Paper

Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications

Daniel Lee
Georgy Noarov
Mallesh Pai
Aaron Roth

We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax theorem does not apply), we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret. We demonstrate the power of our framework by using it to (re)derive optimal bounds and efficient algorithms across a variety of domains, ranging from multicalibration to a large set of no-regret algorithms, to a variant of Blackwell's approachability theorem for polytopes with fast convergence rates. As a new application, we show how to ``(multi)calibeat'' an arbitrary collection of forecasters --- achieving an exponentially improved dependence on the number of models we are competing against, compared to prior work.

NeurIPS Conference 2021 Conference Paper

Local Disentanglement in Variational Auto-Encoders Using Jacobian $L_1$ Regularization

Travers Rhodes
Daniel Lee

There have been many recent advances in representation learning; however, unsupervised representation learning can still struggle with model identification issues related to rotations of the latent space. Variational Auto-Encoders (VAEs) and their extensions such as $\beta$-VAEs have been shown to improve local alignment of latent variables with PCA directions, which can help to improve model disentanglement under some conditions. Borrowing inspiration from Independent Component Analysis (ICA) and sparse coding, we propose applying an $L_1$ loss to the VAE's generative Jacobian during training to encourage local latent variable alignment with independent factors of variation in images of multiple objects or images with multiple parts. We demonstrate our results on a variety of datasets, giving qualitative and quantitative results using information theoretic and modularity measures that show our added $L_1$ cost encourages local axis alignment of the latent representation with individual factors of variation.

IJCAI Conference 2020 Conference Paper

Reward Prediction Error as an Exploration Objective in Deep RL

Riley Simmons-Edler
Ben Eisner
Daniel Yang
Anthony Bisulco
Eric Mitchell
Sebastian Seung
Daniel Lee

A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than epsilon-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and escaping local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal difference error of a Q-function to solve hard exploration tasks in high-dimensional MDPs. We demonstrate the exploration behavior of QXplore on several OpenAI Gym MuJoCo tasks and Atari games and observe that QXplore is comparable to or better than a baseline state-novelty method in all cases, outperforming the baseline on tasks where state novelty is not well-correlated with improved reward.

PDF Details DOI

AAAI Conference 2018 Conference Paper

Maximizing Activity in Ising Networks via the TAP Approximation

Christopher Lynn
Daniel Lee

A wide array of complex biological, social, and physical systems have recently been shown to be quantitatively described by Ising models, which lie at the intersection of statistical physics and machine learning. Here, we study the fundamental question of how to optimize the state of a networked Ising system given a budget of external inﬂuence. In the continuous setting where one can tune the inﬂuence applied to each node, we propose a series of approximate gradient ascent algorithms based on the Plefka expansion, which generalizes the naı̈ve mean ﬁeld and TAP approximations. In the discrete setting where one chooses a small set of inﬂuential nodes, the problem is equivalent to the famous inﬂuence maximization problem in social networks with an additional stochastic noise term. In this case, we provide sufﬁcient conditions for when the objective is submodular, allowing a greedy algorithm to achieve an approximation ratio of 1−1/e. Additionally, we compare the Ising-based algorithms with traditional inﬂuence maximization algorithms, demonstrating the practical importance of accurately modeling stochastic ﬂuctuations in the system.

NeurIPS Conference 2017 Conference Paper

Generative Local Metric Learning for Kernel Regression

Yung-Kyun Noh
Masashi Sugiyama
Kee-Eung Kim
Frank Park
Daniel Lee

This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression. Compared with standard approaches, such as bandwidth selection, we show how metric learning can significantly reduce the mean square error (MSE) in kernel regression, particularly for high-dimensional data. We propose a method for efficiently learning a good metric function based upon analyzing the performance of the NW estimator for Gaussian-distributed data. A key feature of our approach is that the NW estimator with a learned metric uses information from both the global and local structure of the training data. Theoretical and empirical results confirm that the learned metric can considerably reduce the bias and MSE for kernel regression even when the data are not confined to Gaussian.

RLDM Conference 2017 Conference Abstract

Neural Network Memory Architectures for Autonomous Robot Navigation

Steven Chen
Nikolay Atanasov
Arbaaz Khan
Konstantinos Karydis
Daniel Lee
Vijay Kumar

This paper highlights the significance of including memory structures in neural networks when the latter are used to learn perception-action loops for autonomous robot navigation. Traditional navigation approaches rely on global maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet, maintaining an accurate global map may be challenging in real-world settings. A possible way to mitigate this limitation is to use learning techniques that forgo hand-engineered map representations and infer appropriate control responses directly from sensed information. An important but unexplored aspect of such approaches is the effect of memory on their performance. This work is a study of memory structures for deep-neural-network-based robot navigation, and offers novel tools to train such networks from supervision and quantify their ability to generalize to unseen scenarios. We analyze the separation and generalization abilities of feedforward, long short-term memory, and differentiable neural computer networks by evaluating the generalization ability of neural networks by estimating the Vapnik-Chervonenkis (VC) dimension of maximum-margin hyperplanes trained in the feature space learned by the networks’ upstream layers. We validate that these VC-dimension measures are good predictors of actual test performance. The reported method can be applied to deep learning problems beyond robotics.

IJCAI Conference 2016 Conference Paper

Bayesian Reinforcement Learning with Behavioral Feedback

Teakgyu Hong
JongMin Lee
Kee-Eung Kim
Pedro A. Ortega
Daniel Lee

In the standard reinforcement learning setting, the agent learns optimal policy solely from state transitions and rewards from the environment. We consider an extended setting where a trainer additionally provides feedback on the actions executed by the agent. This requires appropriately incorporating the feedback, even when the feedback is not necessarily accurate. In this paper, we present a Bayesian approach to this extended reinforcement learning setting. Specifically, we extend Kalman Temporal Difference learning to compute the posterior distribution over Q-values given the state transitions and rewards from the environment as well as the feedback from the trainer. Through experiments on standard reinforcement learning tasks, we show that learning performance can be significantly improved even with inaccurate feedback.

NeurIPS Conference 2016 Conference Paper

Efficient Neural Codes under Metabolic Constraints

Zhuo Wang
Xue-Xin Wei
Alan Stocker
Daniel Lee

Neural codes are inevitably shaped by various kinds of biological constraints, \emph{e. g. } noise and metabolic cost. Here we formulate a coding framework which explicitly deals with noise and the metabolic costs associated with the neural representation of information, and analytically derive the optimal neural code for monotonic response functions and arbitrary stimulus distributions. For a single neuron, the theory predicts a family of optimal response functions depending on the metabolic budget and noise characteristics. Interestingly, the well-known histogram equalization solution can be viewed as a special case when metabolic resources are unlimited. For a pair of neurons, our theory suggests that under more severe metabolic constraints, ON-OFF coding is an increasingly more efficient coding scheme compared to ON-ON or OFF-OFF. The advantage could be as large as one-fold, substantially larger than the previous estimation. Some of these predictions could be generalized to the case of large neural populations. In particular, these analytical results may provide a theoretical basis for the predominant segregation into ON- and OFF-cells in early visual processing areas. Overall, we provide a unified framework for optimal neural codes with monotonic tuning curves in the brain, and makes predictions that can be directly tested with physiology experiments.

AAAI Conference 2016 Conference Paper

Learning Complex Stand-Up Motion for Humanoid Robots

Heejin Jeong
Daniel Lee

In order for humanoid robots to complete various assigned tasks without any human assistance, they must have the ability to stand up on their own. In this abstract, we introduce complex stand-up motion of humanoid robots learned by using Reinforcement Learning.

NeurIPS Conference 2016 Conference Paper

Maximizing Influence in an Ising Network: A Mean-Field Optimal Solution

Christopher Lynn
Daniel Lee

Influence maximization in social networks has typically been studied in the context of contagion models and irreversible processes. In this paper, we consider an alternate model that treats individual opinions as spins in an Ising system at dynamic equilibrium. We formalize the \textit{Ising influence maximization} problem, which has a natural physical interpretation as maximizing the magnetization given a budget of external magnetic field. Under the mean-field (MF) approximation, we present a gradient ascent algorithm that uses the susceptibility to efficiently calculate local maxima of the magnetization, and we develop a number of sufficient conditions for when the MF magnetization is concave and our algorithm converges to a global optimum. We apply our algorithm on random and real-world networks, demonstrating, remarkably, that the MF optimal external fields (i. e. , the external fields which maximize the MF magnetization) exhibit a phase transition from focusing on high-degree individuals at high temperatures to focusing on low-degree individuals at low temperatures. We also establish a number of novel results about the structure of steady-states in the ferromagnetic MF Ising model on general graphs, which are of independent interest.

AAAI Conference 2014 Conference Paper

An Adversarial Interpretation of Information-Theoretic Bounded Rationality

Pedro Ortega
Daniel Lee

Recently, there has been a growing interest in modeling planning with information constraints. Accordingly, an agent maximizes a regularized expected utility known as the free energy, where the regularizer is given by the information divergence from a prior to a posterior policy. While this approach can be justified in various ways, including from statistical mechanics and information theory, it is still unclear how it relates to decisionmaking against adversarial environments. This connection has previously been suggested in work relating the free energy to risk-sensitive control and to extensive form games. Here, we show that a single-agent free energy optimization is equivalent to a game between the agent and an imaginary adversary. The adversary can, by paying an exponential penalty, generate costs that diminish the decision maker’s payoffs. It turns out that the optimal strategy of the adversary consists in choosing costs so as to render the decision maker indifferent among its choices, which is a definining property of a Nash equilibrium, thus tightening the connection between free energy optimization and game theory.

NeurIPS Conference 2013 Conference Paper

Optimal Neural Population Codes for High-dimensional Stimulus Variables

Zhuo Wang
Alan Stocker
Daniel Lee

How does neural population process sensory information? Optimal coding theories assume that neural tuning curves are adapted to the prior distribution of the stimulus variable. Most of the previous work has discussed optimal solutions for only one-dimensional stimulus variables. Here, we expand some of these ideas and present new solutions that define optimal tuning curves for high-dimensional stimulus variables. We consider solutions for a minimal case where the number of neurons in the population is equal to the number of stimulus dimensions (diffeomorphic). In the case of two-dimensional stimulus variables, we analytically derive optimal solutions for different optimal criteria such as minimal L2 reconstruction error or maximal mutual information. For higher dimensional case, the learning rule to improve the population code is provided.

NeurIPS Conference 2012 Conference Paper

Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Yung-Kyun Noh
Frank Park
Daniel Lee

This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. Applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectivness of our classification criteria.

NeurIPS Conference 2012 Conference Paper

Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L_p$ Loss

Zhuo Wang
Alan Stocker
Daniel Lee

In this work we study how the stimulus distribution influences the optimal coding of an individual neuron. Closed-form solutions to the optimal sigmoidal tuning curve are provided for a neuron obeying Poisson statistics under a given stimulus distribution. We consider a variety of optimality criteria, including maximizing discriminability, maximizing mutual information and minimizing estimation error under a general $L_p$ norm. We generalize the Cramer-Rao lower bound and show how the $L_p$ loss can be written as a functional of the Fisher Information in the asymptotic limit, by proving the moment convergence of certain functions of Poisson random variables. In this manner, we show how the optimal tuning curve depends upon the loss function, and the equivalence of maximizing mutual information with minimizing $L_p$ loss in the limit as $p$ goes to zero.

AAAI Conference 2011 Conference Paper

Learning Dimensional Descent for Optimal Motion Planning in High-dimensional Spaces

Paul Vernaza
Daniel Lee

We present a novel learning-based method for generating optimal motion plans for high-dimensional motion planning problems. In order to cope with the curse of dimensionality, our method proceeds in a fashion similar to block coordinate descent in ﬁnite-dimensional optimization: at each iteration, the motion is optimized over a lower dimensional subspace while leaving the path ﬁxed along the other dimensions. Naive implementations of such an idea can produce vastly suboptimal results. In this work, we show how a profitable set of directions in which to perform this dimensional descent procedure can be learned efﬁciently. We provide suf- ﬁcient conditions for global optimality of dimensional descent in this learned basis, based upon the low-dimensional structure of the planning cost function. We also show how this dimensional descent procedure can easily be used for problems that do not exhibit such structure with monotonic convergence. We illustrate the application of our method to high dimensional shape planning and arm trajectory planning problems.

NeurIPS Conference 2010 Conference Paper

Generative Local Metric Learning for Nearest Neighbor Classification

Yung-Kyun Noh
Byoung-Tak Zhang
Daniel Lee

We consider the problem of learning a local metric to enhance the performance of nearest neighbor classification. Conventional metric learning methods attempt to separate data distributions in a purely discriminative manner; here we show how to take advantage of information from parametric generative models. We focus on the bias in the information-theoretic error arising from finite sampling effects, and find an appropriate local metric that maximally reduces the bias based upon knowledge from generative models. As a byproduct, the asymptotic theoretical analysis in this work relates metric learning with dimensionality reduction, which was not understood from previous discriminative approaches. Empirical experiments show that this learned local metric enhances the discriminative nearest neighbor performance on various datasets using simple class conditional generative models.

NeurIPS Conference 2010 Conference Paper

Learning via Gaussian Herding

Koby Crammer
Daniel Lee

We introduce a new family of online learning algorithms based upon constraining the velocity flow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniformly bounding this loss function, we demonstrate how to solve the resulting optimization analytically. We compare the resulting algorithms on a variety of real world datasets, and demonstrate how these algorithms achieve state-of-the-art robust performance, especially with high label noise in the training data.

AAAI Conference 2010 Conference Paper

Online Learning of Uneven Terrain for Humanoid Bipedal Walking

Seung Joon Yi
Byoung Tak Zhang
Daniel Lee

We present a novel method to control a biped humanoid robot to walk on unknown inclined terrains, using an online learning algorithm to estimate in real-time the local terrain from proprioceptive and inertial sensors. Compliant controllers for the ankle joints are used to actively probe the surrounding surface, and the measured sensor data are combined to explicitly learn the global inclination and local disturbances of the terrain. These estimates are then used to adaptively modify the robot locomotion and control parameters. Results from both a physically-realistic computer simulation and experiments on a commercially available small humanoid robot show that our method can rapidly adapt to changing surface conditions to ensure stable walking on uneven surfaces.

NeurIPS Conference 2008 Conference Paper

Extended Grassmann Kernels for Subspace-Based Learning

Jihun Hamm
Daniel Lee

Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces. We demonstrate the advantages of these extended kernels for classification and recognition tasks with Support Vector Machines and Kernel Discriminant Analysis using synthetic and real image databases.

NeurIPS Conference 2007 Conference Paper

Blind channel identification for speech dereverberation using l1-norm sparse learning

Yuanqing Lin
Jingdong Chen
Youngmoo Kim
Daniel Lee

Speech dereverberation remains an open problem after more than three decades of research. The most challenging step in speech dereverberation is blind chan- nel identiﬁcation (BCI). Although many BCI approaches have been developed, their performance is still far from satisfactory for practical applications. The main difﬁculty in BCI lies in ﬁnding an appropriate acoustic model, which not only can effectively resolve solution degeneracies due to the lack of knowledge of the source, but also robustly models real acoustic environments. This paper proposes a sparse acoustic room impulse response (RIR) model for BCI, that is, an acous- tic RIR can be modeled by a sparse FIR ﬁlter. Under this model, we show how to formulate the BCI of a single-input multiple-output (SIMO) system into a l1- norm regularized least squares (LS) problem, which is convex and can be solved efﬁciently with guaranteed global convergence. The sparseness of solutions is controlled by l1-norm regularization parameters. We propose a sparse learning scheme that infers the optimal l1-norm regularization parameters directly from microphone observations under a Bayesian framework. Our results show that the proposed approach is effective and robust, and it yields source estimates in real acoustic environments with high ﬁdelity to anechoic chamber measurements.

NeurIPS Conference 2005 Conference Paper

Beyond Gaussian Processes: On the Distributions of Infinite Networks

Ricky Der
Daniel Lee

A general analysis of the limiting distribution of neural network functions is performed, with emphasis on non-Gaussian limits. We show that with i. i. d. symmetric stable output weights, and more generally with weights distributed from the normal domain of attraction of a stable variable, that the neural functions converge in distribution to stable processes. Conditions are also investigated under which Gaussian limits do occur when the weights are independent but not identically distributed. Some particularly tractable classes of stable distributions are examined, and the possibility of learning with such processes.

NeurIPS Conference 2004 Conference Paper

Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation

Yuanqing Lin
Daniel Lee

Bayesian Regularization and Nonnegative Deconvolution (BRAND) is proposed for estimating time delays of acoustic signals in reverberant environments. Sparsity of the nonnegative filter coefficients is enforced using an L1-norm regularization. A probabilistic generative model is used to simultaneously estimate the regularization parameters and filter coefficients from the signal data. Iterative update rules are derived under a Bayesian framework using the Expectation-Maximization procedure. The resulting time delay estimation algorithm is demonstrated on noisy acoustic data.

NeurIPS Conference 2002 Conference Paper

Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines

Fei Sha
Lawrence Saul
Daniel Lee

We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotoni- cally to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables in parallel with a guarantee of im- provement at each iteration. We analyze the asymptotic convergence of the updates and show that the coefﬁcients of non-support vectors decay geometrically to zero at a rate that depends on their margins. In practice, the updates converge very rapidly to good classiﬁers.

NeurIPS Conference 2002 Conference Paper

Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Lawrence Saul
Daniel Lee
Charles Isbell
Yann Cun

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The al- gorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high res- olution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares ﬁt. The pitch tracker is used in two real time multimedia applica- tions: a voice-to-MIDI player that synthesizes electronic music from vo- calized melodies, and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user’s pitch scrolling across the screen as he or she sings into the computer.

NeurIPS Conference 2001 Conference Paper

Multiplicative Updates for Classification by Mixture Models

Lawrence Saul
Daniel Lee

We investigate a learning algorithm for the classiﬁcation of nonnegative data by mixture models. Multiplicative update rules are derived that directly optimize the performance of these models as classiﬁers. The update rules have a simple closed form and an intuitive appeal. Our algorithm retains the main virtues of the Expectation-Maximization (EM) algorithm—its guarantee of monotonic im- provement, and its absence of tuning parameters—with the added advantage of optimizing a discriminative objective function. The algorithm reduces as a spe- cial case to the method of generalized iterative scaling for log-linear models. The learning rate of the algorithm is controlled by the sparseness of the training data. We use the method of nonnegative matrix factorization (NMF) to discover sparse distributed representations of the data. This form of feature selection greatly accelerates learning and makes the algorithm practical on large problems. Ex- periments show that discriminatively trained mixture models lead to much better classiﬁcation than comparably sized models trained by EM.

NeurIPS Conference 2000 Conference Paper

Algorithms for Non-negative Matrix Factorization

Daniel Lee
H. Sebastian Seung

Non-negative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multi- plicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary func- tion analogous to that used for proving convergence of the Expectation- Maximization algorithm. The algorithms can also be interpreted as diag- onally rescaled gradient descent, where the rescaling factor is optimally chosen to ensure convergence.

NeurIPS Conference 2000 Conference Paper

An Information Maximization Approach to Overcomplete and Recurrent Representations

Oren Shriki
Haim Sompolinsky
Daniel Lee

The principle of maximizing mutual information is applied to learning overcomplete and recurrent representations. The underlying model con(cid: 173) sists of a network of input units driving a larger number of output units with recurrent interactions. In the limit of zero noise, the network is de(cid: 173) terministic and the mutual information can be related to the entropy of the output units. Maximizing this entropy with respect to both the feed(cid: 173) forward connections as well as the recurrent interactions results in simple learning rules for both sets of parameters. The conventional independent components (ICA) learning algorithm can be recovered as a special case where there is an equal number of output units and no recurrent con(cid: 173) nections. The application of these new learning rules is illustrated on a simple two-dimensional input example.

NeurIPS Conference 1999 Conference Paper

Algorithms for Independent Components Analysis and Higher Order Statistics

Daniel Lee
Uri Rokni
Haim Sompolinsky

A latent variable generative model with finite noise is used to de(cid: 173) scribe several different algorithms for Independent Components Anal(cid: 173) ysis (lCA). In particular, the Fixed Point ICA algorithm is shown to be equivalent to the Expectation-Maximization algorithm for maximum likelihood under certain constraints, allowing the conditions for global convergence to be elucidated. The algorithms can also be explained by their generic behavior near a singular point where the size of the opti(cid: 173) mal generative bases vanishes. An expansion of the likelihood about this singular point indicates the role of higher order correlations in determin(cid: 173) ing the features discovered by ICA. The application and convergence of these algorithms are demonstrated on a simple illustrative example.

NeurIPS Conference 1999 Conference Paper

The Nonnegative Boltzmann Machine

Oliver Downs
David MacKay
Daniel Lee

The nonnegative Boltzmann machine (NNBM) is a recurrent neural net(cid: 173) work model that can describe multimodal nonnegative data. Application of maximum likelihood estimation to this model gives a learning rule that is analogous to the binary Boltzmann machine. We examine the utility of the mean field approximation for the NNBM, and describe how Monte Carlo sampling techniques can be used to learn its parameters. Reflec(cid: 173) tive slice sampling is particularly well-suited for this distribution, and can efficiently be implemented to sample the distribution. We illustrate learning of the NNBM on a transiationally invariant distribution, as well as on a generative model for images of human faces.

NeurIPS Conference 1998 Conference Paper

Learning a Continuous Hidden Variable Model for Binary Data

Daniel Lee
Haim Sompolinsky

A directed generative model for binary data using a small number of hidden continuous units is investigated. A clipping nonlinear(cid: 173) ity distinguishes the model from conventional principal components analysis. The relationships between the correlations of the underly(cid: 173) ing continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally in(cid: 173) variant binary distribution and on handwritten digit images.

NeurIPS Conference 1997 Conference Paper

A Neural Network Based Head Tracking System

Daniel Lee
H. Seung

We have constructed an inexpensive video based motorized tracking system that learns to track a head. It uses real time graphical user inputs or an auxiliary infrared detector as supervisory signals to train a convolutional neural network. The inputs to the neural network consist of normalized luminance and chrominance images and motion information from frame differences. Subsampled images are also used to provide scale invariance. During the online training phases the neural network rapidly adjusts the input weights depending up on the reliability of the different channels in the surrounding environment. This quick adaptation allows the system to robustly track a head even when other objects are moving within a cluttered background.

NeurIPS Conference 1997 Conference Paper

The Rectified Gaussian Distribution

Nicholas Socci
Daniel Lee
H. Sebastian Seung

A simple but powerful modification of the standard Gaussian dis(cid: 173) tribution is studied. The variables of the rectified Gaussian are constrained to be nonnegative, enabling the use of nonconvex en(cid: 173) ergy functions. Two multimodal examples, the competitive and cooperative distributions, illustrate the representational power of the rectified Gaussian. Since the cooperative distribution can rep(cid: 173) resent the translations of a pattern, it demonstrates the potential of the rectified Gaussian for modeling pattern manifolds.

NeurIPS Conference 1996 Conference Paper

Unsupervised Learning by Convex and Conic Coding

Daniel Lee
H. Sebastian Seung

Unsupervised learning algorithms based on convex and conic en(cid: 173) coders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both al(cid: 173) gorithms are used to model handwritten digits and compared with vector quantization and principal component analysis. The neural network implementations involve feedback connections that project a reconstruction back to the input layer.