Arrow Research search

Author name cluster

Charles E. Leiserson

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

FOCS Conference 1999 Conference Paper

Cache-Oblivious Algorithms

  • Matteo Frigo
  • Charles E. Leiserson
  • Harald Prokop
  • Sridhar Ramachandran

This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z=/spl Omega/(L/sup 2/) the number of cache misses for an m/spl times/n matrix transpose is /spl Theta/(1+mn/L). The number of cache misses for either an n-point FFT or the sorting of n numbers is /spl Theta/(1+(n/L)(1+log/sub Z/n)). We also give an /spl Theta/(mnp)-work algorithm to multiply an m/spl times/n matrix by an n/spl times/p matrix that incurs /spl Theta/(1+(mn+np+mp)/L+mnp/L/spl radic/Z) cache faults. We introduce an "ideal-cache" model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model. Can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.

FOCS Conference 1993 Conference Paper

Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers (Extended Abstract)

  • Charles E. Leiserson
  • Satish Rao
  • Sivan Toledo

When a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called "out-of-core" algorithm must be used which moves data between primary and secondary memories. In this paper, we study out-of-core algorithms for sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors. We give a general method that can save substantially on the I/O traffic for many problems. For example, our technique allows a computer with M words of primary memory to perform T=/spl Omega/(M/sup 1/5/) cycles of a multigrid algorithm for a two-dimensional elliptic solver over an n-point domain using only /spl Theta/(nT/M/sup 1/5/) I/O transfers, as compared with the naive algorithm which requires /spl Omega/(nT) I/O's. >

FOCS Conference 1987 Conference Paper

The Organization of Permutation Architectures with Bussed Interconnections (Extended Abstract)

  • Joe Kilian
  • Shlomo Kipnis
  • Charles E. Leiserson

This paper explores the problem of efficiently permuting data stored in VLSI chips in accordance with a predetermined set of permutations. By connecting chips with shared bus interconnections, as opposed to point-to-point interconnections, we show that the number of pins per chip can often be reduced. For example, for infinitely many n, we exhibit permutation architectures with ⌈√n⌉ pins per chip that can realize any of the n cyclic shifts on n chips in one clock tick. When the set of permutations forms a group with p elements, any permutation in the group can be realized in one clock tick by an architecture with O(√p lg p) pins per chip. When the permutation group is abelian, O(√p) pins suffice. These results are all derived from a mathematical characterization of uniform permutation architectures based on the combinatorial notion of a difference cover.

FOCS Conference 1985 Conference Paper

Randomized Routing on Fat-Trees (Preliminary Version)

  • Ronald I. Greenberg
  • Charles E. Leiserson

Fat-trees are a class of routing networks for hardwareefficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor λ = Ω(lg n lg lg n) on a fat-tree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(λ) with probability 1-O(1/n). The best previous bound was O(λ lg n) for the off-line problem where switch settings can be determined in advance. In a VLSI-like model where hardware cost is equated with physical volume, we use the routing algorithm to demonstrate that fat-trees are universal routing networks in the sense that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost.

FOCS Conference 1982 Conference Paper

An Application of Number Theory to the Organization of Raster-Graphics Memory (Extended Abstract)

  • Benny Chor
  • Charles E. Leiserson
  • Ronald L. Rivest

A high-resolution raster-graphics display is usually combined with processing power and a memory organization that facilitates basic graphics operations. For many applications, including interactive text processing, the ability to quickly move or copy small rectangles of pixels is essential. This paper proposes a novel organization of raster-graphics memory that permits all small rectangles to be moved efficiently. The memory organization is based on a doubly periodic assignment of pixels to M memory chips according to a "Fibonacci" lattice. The memory organization guarantees that if a rectilinearly oriented rectangle contains fewer than M/√5 pixels, then all pixels will reside in different memory chips, and thus can be accessed simultaneously. We also define a continuous amdogue of the problem which can be posed as, "What is the maximum density of a set of points in the plane such that no two points are contained in the interior of a rectilinearly oriented rectangle of area N. " We give a lower bound of 1/2N on the density of such a set, and show that 1/√5N can be achieved.

FOCS Conference 1982 Conference Paper

Wafer-Scale Integration of Systolic Arrays (Extended Abstract)

  • Frank Thomson Leighton
  • Charles E. Leiserson

This paper describes and analyzes several algorithms for constructing systolic array networks from cells on a silicon wafer. Some of the cells may be defective, and thus the networks must be configured to avoid them. We adopt a probabilistic model of cell failure, and attempt to construct networks whose maximum wire length is minimal Although the algorithms presented are designed principally for application to the wafer-scale integration of one and two-dimensional systolic arrays, they can also be used to construct networks in well studied models of geometric complexity. Some of the algorithms are of considerable practical interest.

FOCS Conference 1981 Conference Paper

Optimizing Synchronous Systems

  • Charles E. Leiserson
  • James B. Saxe

The complexity of integrated-circuit chips produced today makes it feasible to build inexpensive, special-purpose subsystems that rapidly solve sophisticated problems on behalf of a general-purpose host computer. This paper contributes to the design methodology of efficient VLSI algorithms. We present a transformation that converts synchronous systems into more time-efficient, systolic implementations by removing combinational rippling. The problem of determining the optimized system can be reduced to the graph-theoretic single-destination-shortest-paths problem. More importantly from an engineering standpoint, however, the kinds of rippling that can be removed from a circuit at essentially no cost can be easily characterized. For example, if the only global communication in a system is broadcasting from the host computer, the broadcast can always be replaced by local communication.

FOCS Conference 1980 Conference Paper

Area-Efficient Graph Layouts (for VLSI)

  • Charles E. Leiserson

Minimizing the area of a circuit is an important problem in the domain of Very Large Scale Integration. We use a theoretical VLSI model to reduce this problem to one of laying out a graph, where the transistors and wires of the circuit are identified with the vertices and edges of the graph. We give an algorithm that produces VLSI layouts for classes of graphs that have good separator theorems. We show in particular that any planar graph of n vertices has an O(n lg2 n) area layout and that any tree of n vertices can be laid out in linear area. The algorithm maintains a sparse representation for layouts that is based on the well-known UNION-FIND data structure, and as a result, the running time devoted to bookkeeping is nearly linear.