Arrow Research search

Author name cluster

Siu On Chan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

ICLR Conference 2020 Conference Paper

The Gambler's Problem and Beyond

  • Baoxiang Wang 0001
  • Shuai Li 0010
  • Jiajin Li
  • Siu On Chan

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, not smooth on any interval, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could lead insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

NeurIPS Conference 2014 Conference Paper

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

  • Siu On Chan
  • Ilias Diakonikolas
  • Rocco Servedio
  • Xiaorui Sun

Let $p$ be an unknown and arbitrary probability distribution over $[0, 1)$. We consider the problem of \emph{density estimation}, in which a learning algorithm is given i. i. d. draws from $p$ and must (with high probability) output a hypothesis distribution that is close to $p$. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i. e. , a hypothesis distribution with a piecewise constant probability density function. In more detail, for any $k$ and $\eps$, we give an algorithm that makes $\tilde{O}(k/\eps^2)$ draws from $p$, runs in $\tilde{O}(k/\eps^2)$ time, and outputs a hypothesis distribution $h$ that is piecewise constant with $O(k \log^2(1/\eps))$ pieces. With high probability the hypothesis $h$ satisfies $\dtv(p, h) \leq C \cdot \opt_k(p) + \eps$, where $\dtv$ denotes the total variation distance (statistical distance), $C$ is a universal constant, and $\opt_k(p)$ is the smallest total variation distance between $p$ and any $k$-piecewise constant distribution. The sample size and running time of our algorithm are both optimal up to logarithmic factors. The ``approximation factor'' $C$ that is present in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of $k$ and $\eps$ can achieve $C < 2$ regardless of what kind of hypothesis distribution it uses.

SODA Conference 2014 Conference Paper

Optimal Algorithms for Testing Closeness of Discrete Distributions

  • Siu On Chan
  • Ilias Diakonikolas
  • Paul Valiant
  • Gregory Valiant

We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions p and q over an n -element set, we wish to distinguish whether p = q versus p is at least ∊ -far from q, in either ℓ 1 or ℓ 2 distance. Batu et al [BFR+00, BFR+13] gave the first sub-linear time algorithms for these problems, which matched the lower bounds of [Val11] up to a logarithmic factor in n, and a polynomial factor of ∊. In this work, we present simple testers for both the ℓ 1 and ℓ 2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on n, and the dependence on ∊; for the ℓ 1 testing problem we establish that the sample complexity is Θ(max{ n 2/3 / ∊ 4/3, n 1/2 / ∊ 2 }).

FOCS Conference 2013 Conference Paper

Approximate Constraint Satisfaction Requires Large LP Relaxations

  • Siu On Chan
  • James R. Lee
  • Prasad Raghavendra
  • David Steurer

We prove super-polynomial lower bounds on the size of linear programming relaxations for approximation versions of constraint satisfaction problems. We show that for these problems, polynomial-sized linear programs are exactly as powerful as programs arising from a constant number of rounds of the Sherali-Adams hierarchy. In particular, any polynomial-sized linear program for MAX CUT has an integrality gap of 1/2 and any such linear program for MAX 3-SAT has an integrality gap of 7/8.

SODA Conference 2013 Conference Paper

Learning mixtures of structured distributions over discrete domains

  • Siu On Chan
  • Ilias Diakonikolas
  • Rocco A. Servedio
  • Xiaorui Sun

Let be a class of probability distributions over the discrete domain [ n ] = {1, …, n }. We show that if satisfies a rather general condition – essentially, that each distribution in can be well-approximated by a variable-width histogram with few bins – then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of k unknown distributions from. We analyze several natural types of distributions over [ n ], including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems as described below. More precisely, Log-concave distributions: We learn any mixture of k log-concave distributions over [ n ] using k · Õ (1/ε 4 ) samples (independent of n ) and running in time Õ ( k log( n )/ε 4 ) bit-operations (note that reading a single sample from [ n ] takes Θ(log n ) bit operations). For the special case k = 1 we give an efficient algorithm using Õ (1/ε 3 ) samples; this generalizes the main result of [DDS12b] from the class of Poisson Binomial distributions to the much broader class of all log-concave distributions. Our upper bounds are not far from optimal since any algorithm for this learning problem requires Ω( k /ε 5/2 ) samples. Monotone hazard rate (MHR) distributions: We learn any mixture of k MHR distributions over [ n ] using O ( k log( n /ε)/ε 4 ) samples and running in time Õ ( k log ( n )/ε 4 ) bit-operations. Any algorithm for this learning problem must use Ω( k log( n )/ε 3 ) samples. Unimodal distributions: We give an algorithm that learns any mixture of k unimodal distributions over [ n ] using O ( k log( n )/ε 4 ) samples and running in time Õ ( k log 2 ( n )/ε 4 ) bit-operations. Any algorithm for this problem must use Ω( k log( n )/ε 3 ) samples.

FOCS Conference 2008 Conference Paper

A Dichotomy Theorem for the Resolution Complexity of Random Constraint Satisfaction Problems

  • Siu On Chan
  • Michael Molloy 0001

We consider random instances of constraint satisfaction problems where each variable has domain size O(1), each constraint is on O(1) variables and the constraints are chosen from a specified distribution. The number of constraints is cn where c is a constant. We prove that for every possible distribution, either the resolution complexity is almost surely polylogarithmic for sufficiently large c, or it is almost surely exponential for every c > 0. We characterize the distributions of each type. To do so, we introduce a closure operation on a set of constraints which yields the set of all constraints that, in some sense, appear implicitly in the random CSP.