Author name cluster

Meng He

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

TCS Journal 2022 Journal Article

Data structures for categorical path counting queries

Meng He
Serikzhan Kazi

Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [ σ ] = { 1, 2, …, σ }. We preprocess the tree T in order to support categorical path counting queries, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linear-space data structure with query time O ( n lg ⁡ lg ⁡ σ lg ⁡ w ), where w = Ω ( lg ⁡ n ) is the word size in the word-RAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time polynomially faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speed-ups. For a trade-off parameter 1 ≤ t ≤ n, we propose an O ( n + n 2 t 2 ) -word, O ( t lg ⁡ lg ⁡ σ lg ⁡ w ) query time data structure. We also consider c-approximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linear-space data structure that supports 2-approximate categorical path counting queries in O ( lg ⁡ n / lg ⁡ lg ⁡ n ) time. Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x, y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O ( n lg ⁡ lg ⁡ n + ( n / t ) 4 ) -word data structure with O ( t lg ⁡ lg ⁡ n ) query time, or an O ( n + ( n / t ) 4 ) -word data structure with O ( t lg ϵ ⁡ n ) query time. For an appropriate choice of the trade-off parameter t, this implies a linear-space data structure with O ( n 3 / 4 lg ϵ ⁡ n ) query time. We then extend the approach to the trees weighted with vectors from [ n ] d, where d is a constant integer greater than or equal to 2. We present a data structure with O ( n lg d − 1 + ϵ ⁡ n + ( n / t ) 2 d + 2 ) words of space and O ( t lg d − 1 ⁡ n ( lg ⁡ lg ⁡ n ) d − 2 ) query time. For an O ( n ⋅ polylog n ) -space solution, one thus has O ( n 2 d + 1 2 d + 2 ⋅ polylog n ) query time. The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on sketching. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a ( 1 ± ϵ ) -approximation (i. e. within 1 ± ϵ of the true answer) of the number of distinct categories on the given path, with probability 1 − δ, where 0 < ϵ, δ < 1 are constants. The data structure occupies O ( n + n t lg ⁡ n ) words of space, for the query time of O ( t lg ⁡ n ). For trees weighted with d-dimensional weight vectors ( d ≥ 1 ), we propose a data structure with O ( ( n + n t lg ⁡ n ) lg d ⁡ n ) words of space and O ( t lg d + 1 ⁡ n ) query time. All these problems generalize the corresponding categorical range counting problems in Euclidean space R d + 1, for respective d, by replacing one of the dimensions with a tree topology.

Details DOI

TCS Journal 2020 Journal Article

Tree path majority data structures

Travis Gagie
Meng He
Gonzalo Navarro
Carlos Ochoa

We present the first solution to finding τ-majorities on tree paths. Given a tree of n nodes, each with a label from [ 1. . σ ], and a fixed threshold 0 < τ < 1, such a query gives two nodes u and v and asks for all the labels that appear more than τ ⋅ | P u v | times in the path P u v from u to v, where | P u v | denotes the number of nodes in P u v. Note that the answer to any query is of size up to 1 / τ. On a w-bit RAM, we obtain a linear-space data structure with O ( ( 1 / τ ) lg ⁡ lg w ⁡ σ ) query time, which is worst-case optimal for polylogarithmic-sized alphabets. We also describe two succinct-space solutions with query time O ( ( 1 / τ ) lg ⁎ ⁡ n lg ⁡ lg w ⁡ σ ). One uses 2 n H + 4 n + o ( n ) ( H + 1 ) bits, where H ≤ lg ⁡ σ is the entropy of the label distribution; the other uses n H + O ( n ) + o ( n H ) bits. By using just o ( n lg ⁡ σ ) extra bits, our succinct structures allow τ to be specified at query time. We obtain analogous results to find a τ-minority, that is, an element that appears between 1 and τ ⋅ | P u v | times in P u v.

Details DOI

TCS Journal 2019 Journal Article

Path queries on functions

Travis Gagie
Meng He
Gonzalo Navarro

Let f: [ 1. . n ] → [ 1. . n ] be a function, and ℓ: [ 1. . n ] → [ 1. . σ ] indicate a label assigned to each element of the domain. We design several compact data structures that answer various kinds of summary queries on the labels of paths in f. For example, we can find either the minimum label in f k ( i ) for a given i and any k ≥ 0 in a given range [ k 1. . k 2 ], or the minimum label in f − k ( i ) for a given i and k > 0, using n lg ⁡ n + n lg ⁡ σ + o ( n lg ⁡ n ) bits and time O ( α ( n ) ), the inverse Ackermann function. Within similar space we can count, in time O ( lg ⁡ n / lg ⁡ lg ⁡ n ), the number of labels within a range, and report each element with such labels in O ( lg ⁡ n / lg ⁡ lg ⁡ n ) additional time. Several other tradeoffs and possible queries are considered, such as selection, top-r queries and τ-majorities. Finally, we consider queries that allow us navigate on the graph of the function, such as the nearest common successor of two elements, or the nearest successor or predecessor of an element within a range of labels.

Details DOI

TCS Journal 2017 Journal Article

Parallel construction of succinct trees

José Fuentes-Sepúlveda
Leo Ferres
Meng He
Norbert Zeh

Succinct representations of trees are an elegant solution to make large trees fit in main memory while still supporting navigational operations in constant time. However, their construction time remains a bottleneck. We introduce two parallel algorithms that improve the state of the art in succinct tree construction. Our results are presented in terms of work, the time needed to execute a parallel computation using one thread, and span, the minimum amount of time needed to execute a parallel computation, for any amount of threads. Given a tree on n nodes stored as a sequence of balanced parentheses, our first algorithm builds a succinct tree representation with O ( n ) work, O ( lg ⁡ n ) span and supports a rich set of operations in O ( lg ⁡ n ) time. Our second algorithm improves the query support. It constructs a succinct representation that supports queries in O ( c ) time, taking O ( n + n lg c ⁡ n lg ⁡ ( n lg c ⁡ n ) + c c ) work and O ( c + lg ⁡ ( n c c lg c ⁡ n ) ) span, for any positive constant c. Both algorithms use O ( n lg ⁡ n ) bits of working space. In experiments using up to 64 cores on inputs of different sizes, our first algorithm achieved good parallel speed-up. We also present an algorithm that takes O ( n ) work and O ( lg ⁡ n ) span to construct the balanced parenthesis sequence of the input tree required by our succinct tree construction algorithm.

Details DOI

TCS Journal 2016 Journal Article

Dynamic range majority data structures

Amr Elmasry
Meng He
J. Ian Munro
Patrick K. Nicholson

Given a set P of n coloured points on the real line, we study the problem of answering range α-majority (or “heavy hitter”) queries on P. More specifically, for a query range Q, we want to return each colour that is assigned to more than an α-fraction of the points contained in Q. We present a new data structure for answering range α-majority queries on a dynamic set of points, where α ∈ ( 0, 1 ). Our data structure uses O ( n ) space, supports queries in O ( ( lg ⁡ n ) / α ) time, and updates in O ( ( lg ⁡ n ) / α ) amortized time. If the coordinates of the points are integers, then the query time can be improved to O ( lg ⁡ n / ( α lg ⁡ lg ⁡ n ) ). For constant values of α, this improved query time matches an existing lower bound, for any data structure with polylogarithmic update time. We also generalize our data structure to handle sets of points in d dimensions, for d ≥ 2, as well as dynamic arrays, in which each entry is a colour.

Details DOI

I&C Journal 2013 Journal Article

Range majority in constant time and linear space

Stephane Durocher
Meng He
J. Ian Munro
Patrick K. Nicholson
Matthew Skala

Details DOI

TCS Journal 2011 Journal Article

Untangled monotonic chains and adaptive range search

Diego Arroyuelo
Francisco Claude
Reza Dorrigiv
Stephane Durocher
Meng He
Alejandro López-Ortiz
J. Ian Munro
Patrick K. Nicholson

We present the first adaptive data structure for two-dimensional orthogonal range search. Our data structure is adaptive in the sense that it gives improved search performance for data that is better than the worst case (Demaine et al. , 2000) [8]; in this case, data with more inherent sortedness. Given n points on the plane, the linear space data structure can answer range queries in O ( log n + k + m ) time, where m is the number of points in the output and k is the minimum number of monotonic chains into which the point set can be decomposed, which is O ( n ) in the worst case. Our result matches the worst-case performance of other optimal-time linear space data structures, or surpasses them when k = o ( n ). Our data structure can be made implicit, requiring no extra space beyond that of the data points themselves (Munro and Suwanda, 1980) [16], in which case the query time becomes O ( k log n + m ). We also present a novel algorithm of independent interest to decompose a point set into a minimum number of untangled, similarly directed monotonic chains in O ( k 2 n + n log n ) time.

Details DOI