Arrow Research search

Author name cluster

Nir Shavit

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

ICLR Conference 2025 Conference Paper

Wasserstein Distances, Neuronal Entanglement, and Sparsity

  • Shashata Sawmya
  • Linghao Kong
  • Ilia Markov
  • Dan Alistarh
  • Nir Shavit

Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's output distribution to a Gaussian. Moreover, we show the existence of a small number of highly entangled "Wasserstein Neurons" in each linear layer of an LLM, characterized by their highly non-Gaussian output distributions, their role in mapping similar inputs to dissimilar outputs, and their significant impact on model accuracy. To study these phenomena, we propose a new experimental framework for disentangling polysemantic neurons. Our framework separates each layer's inputs to create a mixture of experts where each neuron's output is computed by a mixture of neurons of lower Wasserstein distance, each better at maintaining accuracy when sparsified without retraining. We provide strong evidence that this is because the mixture of sparse experts is effectively disentangling the input-output relationship of individual neurons, in particular the difficult Wasserstein neurons.

ICLR Conference 2022 Conference Paper

Connectome-constrained Latent Variable Model of Whole-Brain Neural Activity

  • Lu Mi
  • Richard Y. D. Xu
  • Sridhama Prakhya
  • Albert Lin
  • Nir Shavit
  • Aravinthan D. T. Samuel
  • Srinivas C. Turaga

The availability of both anatomical connectivity and brain-wide neural activity measurements in C. elegans make the worm a promising system for learning detailed, mechanistic models of an entire nervous system in a data-driven way. However, one faces several challenges when constructing such a model. We often do not have direct experimental access to important modeling details such as single-neuron dynamics and the signs and strengths of the synaptic connectivity. Further, neural activity can only be measured in a subset of neurons, often indirectly via calcium imaging, and significant trial-to-trial variability has been observed. To address these challenges, we introduce a connectome-constrained latent variable model (CC-LVM) of the unobserved voltage dynamics of the entire C. elegans nervous system and the observed calcium signals. We used the framework of variational autoencoders to fit parameters of the mechanistic simulation constituting the generative model of the LVM to calcium imaging observations. A variational approximate posterior distribution over latent voltage traces for all neurons is efficiently inferred using an inference network, and constrained by a prior distribution given by the biophysical simulation of neural dynamics. We applied this model to an experimental whole-brain dataset, and found that connectomic constraints enable our LVM to predict the activity of neurons whose activity were withheld significantly better than models unconstrained by a connectome. We explored models with different degrees of biophysical detail, and found that models with realistic conductance-based synapses provide markedly better predictions than current-based synapses for this system.

ICML Conference 2021 Conference Paper

On the Predictability of Pruning Across Scales

  • Jonathan S. Rosenfeld
  • Jonathan Frankle
  • Michael Carbin
  • Nir Shavit

We show that the error of iteratively magnitude-pruned networks empirically follows a scaling law with interpretable coefficients that depend on the architecture and task. We functionally approximate the error of the pruned networks, showing it is predictable in terms of an invariant tying width, depth, and pruning level, such that networks of vastly different pruned densities are interchangeable. We demonstrate the accuracy of this approximation over orders of magnitude in depth, width, dataset size, and density. We show that the functional form holds (generalizes) for large scale data (e. g. , ImageNet) and architectures (e. g. , ResNets). As neural networks become ever larger and costlier to train, our findings suggest a framework for reasoning conceptually and analytically about a standard method for unstructured pruning.

ICLR Conference 2020 Conference Paper

A Constructive Prediction of the Generalization Error Across Scales

  • Jonathan S. Rosenfeld
  • Amir Rosenfeld
  • Yonatan Belinkov
  • Nir Shavit

The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.

ICML Conference 2020 Conference Paper

Inducing and Exploiting Activation Sparsity for Fast Inference on Deep Neural Networks

  • Mark Kurtz
  • Justin Kopinsky
  • Rati Gelashvili
  • Alexander Matveev
  • John Carr
  • Michael Goin
  • William M. Leiserson
  • Sage Moore

Optimizing convolutional neural networks for fast inference has recently become an extremely active area of research. One of the go-to solutions in this context is weight pruning, which aims to reduce computational and memory footprint by removing large subsets of the connections in a neural network. Surprisingly, much less attention has been given to exploiting sparsity in the activation maps, which tend to be naturally sparse in many settings thanks to the structure of rectified linear (ReLU) activation functions. In this paper, we present an in-depth analysis of methods for maximizing the sparsity of the activations in a trained neural network, and show that, when coupled with an efficient sparse-input convolution algorithm, we can leverage this sparsity for significant performance gains. To induce highly sparse activation maps without accuracy loss, we introduce a new regularization technique, coupled with a new threshold-based sparsification method based on a parameterized activation function called Forced-Activation-Threshold Rectified Linear Unit (FATReLU). We examine the impact of our methods on popular image classification models, showing that most architectures can adapt to significantly sparser activation maps without any accuracy loss. Our second contribution is showing that these these compression gains can be translated into inference speedups: we provide a new algorithm to enable fast convolution operations over networks with sparse activations, and show that it can enable significant speedups for end-to-end inference on a range of popular models on the large-scale ImageNet image classification task on modern Intel CPUs, with little or no retraining cost.

ICML Conference 2017 Conference Paper

Deep Tensor Convolution on Multicores

  • David M. Budden
  • Alexander Matveev
  • Shibani Santurkar
  • Shraman Ray Chaudhuri
  • Nir Shavit

Deep convolutional neural networks (ConvNets) of 3-dimensional kernels allow joint modeling of spatiotemporal features. These networks have improved performance of video and volumetric image analysis, but have been limited in size due to the low memory ceiling of GPU hardware. Existing CPU implementations overcome this constraint but are impractically slow. Here we extend and optimize the faster Winograd-class of convolutional algorithms to the $N$-dimensional case and specifically for CPU hardware. First, we remove the need to manually hand-craft algorithms by exploiting the relaxed constraints and cheap sparse access of CPU memory. Second, we maximize CPU utilization and multicore scalability by transforming data matrices to be cache-aware, integer multiples of AVX vector widths. Treating 2-dimensional ConvNets as a special (and the least beneficial) case of our approach, we demonstrate a 5 to 25-fold improvement in throughput compared to previous state-of-the-art.

STOC Conference 2014 Conference Paper

Are lock-free concurrent algorithms practically wait-free?

  • Dan Alistarh
  • Keren Censor-Hillel
  • Nir Shavit

Lock-free concurrent algorithms guarantee that some concurrent operation will always make progress in a finite number of steps. Yet programmers prefer to treat concurrent code as if it were wait-free , guaranteeing that all operations always make progress. Unfortunately, designing wait-free algorithms is generally a very complex task, and the resulting algorithms are not always efficient. While obtaining efficient wait-free algorithms has been a long-time goal for the theory community, most non-blocking commercial code is only lock-free. This paper suggests a simple solution to this problem. We show that, for a large class of lock-free algorithms, under scheduling conditions which approximate those found in commercial hardware architectures, lock-free algorithms behave as if they are wait-free. In other words, programmers can keep on designing simple lock-free algorithms instead of complex wait-free ones, and in practice, they will get wait-free progress. Our main contribution is a new way of analyzing a general class of lock-free algorithms under a stochastic scheduler . Our analysis relates the individual performance of processes with the global performance of the system using Markov chain lifting between a complex per-process chain and a simpler system progress chain. We show that lock-free algorithms are not only wait-free with probability 1, but that in fact a general subset of lock-free algorithms can be closely bounded in terms of the average number of steps required until an operation completes. To the best of our knowledge, this is the first attempt to analyze progress conditions, typically stated in relation to a worst case adversary, in a stochastic model capturing their expected asymptotic behavior.

FOCS Conference 2005 Conference Paper

Linear Lower Bounds on Real-World Implementations of Concurrent Objects

  • Faith Ellen
  • Danny Hendler
  • Nir Shavit

This paper proves /spl Omega/(n) lower bounds on the time to perform a single instance of an operation in any implementation of a large class of data structures shared by n processes. For standard data structures such as counters, stacks, and queues, the bound is tight. The implementations considered may apply any deterministic primitives to a base object. No bounds are assumed on either the number of base objects or their size. Time is measured as the number of steps a process performs on base objects and the number of stalls it incurs as a result of contention with other processes.

FOCS Conference 1991 Conference Paper

Low Contention Linearizable Counting

  • Maurice Herlihy
  • Nir Shavit
  • Orli Waarts

The linearizable counting problem requires asynchronous concurrent processes to assign themselves successive values so that the order of the values assigned reflects the real-time order in which they were requested. It is shown that the problem can be solved without funneling all processes through a common memory location. Two new constructions for linearizable counting networks, data structures that solve the linearizable counting problem, are given. The first construction is nonblocking: some process takes a value after O(n) network gates have been traversed. The second construction is wait-free: it guarantees that each process takes a value after it traverses O(wn) gates, where w is a parameter affecting contention. It is shown that in any nonblocking or wait-free linearizable counting network, processes must traverse an average of Omega (n) gates, and so the constructions are close to optimal. A simpler and more efficient network is constructed by giving up the robustness requirements and allowing processes to wait for one another. >

FOCS Conference 1990 Conference Paper

Are Wait-Free Algorithms Fast? (Extended Abstract)

  • Hagit Attiya
  • Nancy A. Lynch
  • Nir Shavit

The time complexity of wait-free algorithms in so-called normal executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any wait-free algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a non-wait-free algorithm that solves this problem in constant time. This implies an Omega (log n)-time separation between the wait-free and non-wait-free computation models. An O(log n)-time wait-free approximate agreement algorithm is presented. Its complexity is within a small constant of the lower bound. >

FOCS Conference 1989 Conference Paper

Polynomial End-To-End Communication (Extended Abstract)

  • Baruch Awerbuch
  • Yishay Mansour
  • Nir Shavit

A dynamic communication network is one in which links may repeatedly fail and recover. In such a network, although it is impossible to establish a path of unfailed links, reliable communication is possible if there is no cut of permanently failed links between a sender and receiver. The authors consider for such a network the basic task of end-to-end communication, that is, delivery in finite time of data items generated online at the sender, to the receiver, in order and without duplication or omission. The best known previous solutions to this problem had exponential complexity. Moreover, it has been conjectured that a polynomial solution is impossible. The authors disprove this conjecture, presenting the first polynomial end-to-end protocol. The protocol uses methods adopted from shared-memory algorithms and introduces novel techniques for fast load balancing in communication networks. >