Arrow Research search

Author name cluster

Li Chou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICML Conference 2022 Conference Paper

One-Pass Diversified Sampling with Application to Terabyte-Scale Genomic Sequence Streams

  • Benjamin Coleman
  • Benito Geordie
  • Li Chou
  • Ryan A. Leo Elworth
  • Todd J. Treangen
  • Anshumali Shrivastava

A popular approach to reduce the size of a massive dataset is to apply efficient online sampling to the stream of data as it is read or generated. Online sampling routines are currently restricted to variations of reservoir sampling, where each sample is selected uniformly and independently of other samples. This renders them unsuitable for large-scale applications in computational biology, such as metagenomic community profiling and protein function annotation, which suffer from severe class imbalance. To maintain a representative and diverse sample, we must identify and preferentially select data that are likely to belong to rare classes. We argue that existing schemes for diversity sampling have prohibitive overhead for large-scale problems and high-throughput streams. We propose an efficient sampling routine that uses an online representation of the data distribution as a prefilter to retain elements from rare groups. We apply this method to several genomic data analysis tasks and demonstrate significant speedup in downstream analysis without sacrificing the quality of the results. Because our algorithm is 2x faster and uses 1000x less memory than coreset, reservoir and sketch-based alternatives, we anticipate that it will become a useful preprocessing step for applications with large-scale streaming data.

AAAI Conference 2018 Conference Paper

Automatic Parameter Tying: A New Approach for Regularized Parameter Learning in Markov Networks

  • Li Chou
  • Pracheta Sahoo
  • Somdeb Sarkhel
  • Nicholas Ruozzi
  • Vibhav Gogate

Parameter tying is a regularization method in which parameters (weights) of a machine learning model are partitioned into groups by leveraging prior knowledge and all parameters in each group are constrained to take the same value. In this paper, we consider the problem of parameter learning in Markov networks and propose a novel approach called automatic parameter tying (APT) that uses automatic instead of a priori and soft instead of hard parameter tying as a regularization method to alleviate overfitting. The key idea behind APT is to set up the learning problem as the task of finding parameters and groupings of parameters such that the likelihood plus a regularization term is maximized. The regularization term penalizes models where parameter values deviate from their group mean parameter value. We propose and use a block coordinate ascent algorithm to solve the optimization task. We analyze the sample complexity of our new learning algorithm and show that it yields optimal parameters with high probability when the groups are well separated. Experimentally, we show that our method improves upon L2 regularization and suggest several pragmatic techniques for good practical performance.

UAI Conference 2018 Conference Paper

Dissociation-Based Oblivious Bounds for Weighted Model Counting

  • Li Chou
  • Wolfgang Gatterbauer
  • Vibhav Gogate

We consider the weighted model counting task which includes important tasks in graphical models, such as computing the partition function and probability of evidence as special cases. We propose a novel partition-based bounding algorithm that exploits logical structure and gives rise to a set of inequalities from which upper (or lower) bounds can be derived efficiently. The bounds come with optimality guarantees under certain conditions and are oblivious in that they require only limited observations of the structure and parameters of the problem. We experimentally compare our bounds with the mini-bucket scheme (which is also oblivious) and show that our new bounds are often superior and never worse on a wide variety of benchmark networks.

AAAI Conference 2016 Conference Paper

On Parameter Tying by Quantization

  • Li Chou
  • Somdeb Sarkhel
  • Nicholas Ruozzi
  • Vibhav Gogate

The maximum likelihood estimator (MLE) is generally asymptotically consistent but is susceptible to over- fitting. To combat this problem, regularization methods which reduce the variance at the cost of (slightly) increasing the bias are often employed in practice. In this paper, we present an alternative variance reduction (regularization) technique that quantizes the MLE estimates as a post processing step, yielding a smoother model having several tied parameters. We provide and prove error bounds for our new technique and demonstrate experimentally that it often yields models having higher test-set log-likelihood than the ones learned using the MLE. We also propose a new importance sampling algorithm for fast approximate inference in models having several tied parameters. Our experiments show that our new inference algorithm is superior to existing approaches such as Gibbs sampling and MC- SAT on models having tied parameters, learned using our quantization-based approach.