Arrow Research search

Author name cluster

Yi Hao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

JBHI Journal 2026 Journal Article

LncTracker: A Unified Multi-Channel Framework for Multi-Label lncRNA Localization

  • Yi Hao
  • Xudong Guo
  • Zixu Ran
  • Yue Bi
  • Fuyi Li

Long non-coding RNAs (lncRNAs) play essential roles in various biological processes, including chromatin modification, cell cycle regulation, transcription, and translation. Recent studies have revealed that the biological functions of lncRNAs are closely associated with their subcellular localizations, making accurate localization prediction critical for understanding their biological roles in cellular regulation and disease mechanisms. However, most existing methods mainly rely on sequence features while neglecting structural information, and they are often limited to single-label predictions covering only a small number of subcellular compartments. In this study, we proposed an efficient deep learning framework, LncTracker, for multi-label prediction of lncRNA subcellular localizations across seven distinct compartments. LncTracker adopts a multi-channel architecture that integrates diverse input features into model training, including both primary sequence and secondary structure information. Secondary structures are converted into attributed graphs to capture spatial relationships among nucleotides, including adjacency and base-pairing connections. These structural features are then combined with sequence-based features to predict subcellular localization probabilities. Such a design enables LncTracker to learn joint representations of sequences and structures, thereby enhancing predictive performance and robustness. Benchmarking experiments demonstrated the superiority of LncTracker over state-of-the-art approaches, particularly in handling imbalanced localization scenarios. Furthermore, we leveraged LncTracker to identify sequence motifs critical for each subcellular localization and analysed key sub-structures contributing to predictions.

ICML Conference 2022 Conference Paper

TURF: Two-Factor, Universal, Robust, Fast Distribution Learning Algorithm

  • Yi Hao
  • Ayush Jain 0001
  • Alon Orlitsky
  • Vaishakh Ravindrakumar

Approximating distributions from their samples is a canonical statistical-learning problem. One of its most powerful and successful modalities approximates every distribution to an $\ell_1$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d$ polynomial, where $t\ge1$ and $d\ge0$. Letting $c_{t, d}$ denote the smallest such factor, clearly $c_{1, 0}=1$, and it can be shown that $c_{t, d}\ge 2$ for all other $t$ and $d$. Yet current computationally efficient algorithms show only $c_{t, 1}\le 2. 25$ and the bound rises quickly to $c_{t, d}\le 3$ for $d\ge 9$. We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t, d}=2$ for all $(t, d)\ne(1, 0)$. Additionally, for many practical distributions, the lowest approximation distance is achieved by polynomials with vastly varying number of pieces. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation. Experiments combining the two techniques confirm improved performance over existing methodologies.

ICML Conference 2021 Conference Paper

Compressed Maximum Likelihood

  • Yi Hao
  • Alon Orlitsky

Maximum likelihood (ML) is one of the most fundamental and general statistical estimation techniques. Inspired by recent advances in estimating distribution functionals, we propose $\textit{compressed maximum likelihood}$ (CML) that applies ML to the compressed samples. We then show that CML is sample-efficient for several essential learning tasks over both discrete and continuous domains, including learning densities with structures, estimating probability multisets, and inferring symmetric distribution functionals.

ICML Conference 2020 Conference Paper

Data Amplification: Instance-Optimal Property Estimation

  • Yi Hao
  • Alon Orlitsky

The best-known and most commonly used technique for distribution-property estimation uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly “amplify” the effective amount of data available. For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples. Specifically, for Shannon entropy and a broad class of Lipschitz properties including the $L_1$ distance to a fixed distribution, the new estimators use $n$ samples to achieve the accuracy attained by the empirical estimators with $n\log n$ samples. For support-size and coverage, the new estimators use $n$ samples to achieve the performance of empirical frequency with sample size $n$ times the logarithm of the property value. Significantly strengthening the traditional min-max formulation, these results hold not only for the worst distributions, but for each and every underlying distribution. Furthermore, the logarithmic amplification factors are optimal. Experiments on a wide variety of distributions show that the new estimators outperform the previous state-of-the-art estimators designed for each specific property.

NeurIPS Conference 2020 Conference Paper

Optimal Prediction of the Number of Unseen Species with Multiplicity

  • Yi Hao
  • Ping Li

Based on a sample of size $n$, we consider estimating the number of symbols that appear at least $\mu$ times in an independent sample of size $a \cdot n$, where $a$ is a given parameter. This formulation includes, as a special case, the well-known problem of inferring the number of unseen species introduced by [Fisher et al. ] in 1943 and considered by many others. Of considerable interest in this line of works is the largest $a$ for which the quantity can be accurately predicted. We completely resolve this problem by determining the limit of estimation to be $a \approx (\log n)/\mu$, with both lower and upper bounds matching up to constant factors. For the particular case of $\mu = 1$, this implies the recent result by [Orlitsky et al. ] on the unseen species problem. Experimental evaluations show that the proposed estimator performs exceptionally well in practice. Furthermore, the estimator is a simple linear combination of symbols' empirical counts, and hence linear-time computable.

NeurIPS Conference 2020 Conference Paper

Profile Entropy: A Fundamental Measure for the Learnability and Compressibility of Distributions

  • Yi Hao
  • Alon Orlitsky

The profile of a sample is the multiset of its symbol frequencies. We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression. Specifically, profile entropy: a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of profile compression, for which we derive optimal near-linear-time block and sequential algorithms. To further our understanding of profile entropy, we investigate its attributes, provide algorithms for approximating its value, and determine its magnitude for numerous structural distribution families.

NeurIPS Conference 2020 Conference Paper

SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm

  • Yi Hao
  • Ayush Jain
  • Alon Orlitsky
  • Vaishakh Ravindrakumar

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning. We present $\SURF$, an algorithm for approximating distributions by piecewise polynomials. $\SURF$ is: simple, replacing prior complex optimization techniques by straight-forward empirical probability approximation of each potential polynomial piece through simple empirical-probability interpolation, and using plain divide-and-conquer to merge the pieces; universal, as well-known polynomial-approximation results imply that it accurately approximates a large class of common distributions; robust to distribution mis-specification as for any degree $d \le 8$, it estimates any distribution to an $\ell_1$ distance $< 3$ times that of the nearest degree-$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces; fast, using optimal sample complexity, running in near sample-linear time, and if given sorted samples it may be parallelized to run in sub-linear time. In experiments, $\SURF$ outperforms state-of-the art algorithms.

ICML Conference 2019 Conference Paper

Doubly-Competitive Distribution Estimation

  • Yi Hao
  • Alon Orlitsky

Distribution estimation is a statistical-learning cornerstone. Its classical min-max formulation minimizes the estimation error for the worst distribution, hence under-performs for practical distributions that, like power-law, are often rather simple. Modern research has therefore focused on two frameworks: structural estimation that improves learning accuracy by assuming a simple structure of the underlying distribution; and competitive, or instance-optimal, estimation that achieves the performance of a genie aided estimator up to a small excess error that vanishes as the sample size grows, regardless of the distribution. This paper combines and strengthens the two frameworks. It designs a single estimator whose excess error vanishes both at a universal rate as the sample size grows, as well as when the (unknown) distribution gets simpler. We show that the resulting algorithm significantly improves the performance guarantees for numerous competitive- and structural-estimation results. The algorithm runs in near-linear time and is robust to model misspecification and domain-symbol permutations.

AAAI Conference 2019 Conference Paper

HSME: Hypersphere Manifold Embedding for Visible Thermal Person Re-Identification

  • Yi Hao
  • Nannan Wang
  • Jie Li
  • Xinbo Gao

Person Re-identification(re-ID) has great potential to contribute to video surveillance that automatically searches and identifies people across different cameras. Heterogeneous person re-identification between thermal(infrared) and visible images is essentially a cross-modality problem and important for night-time surveillance application. Current methods usually train a model by combining classification and metric learning algorithms to obtain discriminative and robust feature representations. However, the combined loss function ignored the correlation between classification subspace and feature embedding subspace. In this paper, we use Sphere Softmax to learn a hypersphere manifold embedding and constrain the intra-modality variations and cross-modality variations on this hypersphere. We propose an end-to-end dualstream hypersphere manifold embedding network(HSMEnet) with both classification and identification constraint. Meanwhile, we design a two-stage training scheme to acquire decorrelated features, we refer the HSME with decorrelation as D-HSME. We conduct experiments on two crossmodality person re-identification datasets. Experimental results demonstrate that our method outperforms the state-of-the-art methods on two datasets. On RegDB dataset, rank-1 accuracy is improved from 33.47% to 50.85%, and mAP is improved from 31.83% to 47.00%.

NeurIPS Conference 2019 Conference Paper

The Broad Optimality of Profile Maximum Likelihood

  • Yi Hao
  • Alon Orlitsky

We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide range of learning tasks. In particular, for every alphabet size $k$ and desired accuracy $\varepsilon$: \textbf{Distribution estimation} Under $\ell_1$ distance, PML yields optimal $\Theta(k/(\varepsilon^2\log k))$ sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; \textbf{Additive property estimation} For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; \textbf{$\alpha$-R\'enyi entropy estimation} For an integer $\alpha>1$, the PML plug-in estimator has optimal $k^{1-1/\alpha}$ sample complexity; for non-integer $\alpha>3/4$, the PML plug-in estimator has sample complexity lower than the state of the art; \textbf{Identity testing} In testing whether an unknown distribution is equal to or at least $\varepsilon$ far from a given distribution in $\ell_1$ distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of $k$. With minor modifications, most of these results also hold for a near-linear-time computable variant of PML.

NeurIPS Conference 2019 Conference Paper

Unified Sample-Optimal Property Estimation in Near-Linear Time

  • Yi Hao
  • Alon Orlitsky

We consider the fundamental learning problem of estimating properties of distributions over large domains. Using a novel piecewise-polynomial approximation technique, we derive the first unified methodology for constructing sample- and time-efficient estimators for all sufficiently smooth, symmetric and non-symmetric, additive properties. This technique yields near-linear-time computable estimators whose approximation values are asymptotically optimal and highly-concentrated, resulting in the first: 1) estimators achieving the $\mathcal{O}(k/(\varepsilon^2\log k))$ min-max $\varepsilon$-error sample complexity for all $k$-symbol Lipschitz properties; 2) unified near-optimal differentially private estimators for a variety of properties; 3) unified estimator achieving optimal bias and near-optimal variance for five important properties; 4) near-optimal sample-complexity estimators for several important symmetric properties over both domain sizes and confidence levels.

NeurIPS Conference 2018 Conference Paper

Data Amplification: A Unified and Competitive Approach to Property Estimation

  • Yi Hao
  • Alon Orlitsky
  • Ananda Theertha Suresh
  • Yihong Wu

Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, ``amplification'' of the amount of data available relative to common-practice estimators. We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with n samples is even as good as that of the empirical estimator with n\log n samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.

NeurIPS Conference 2018 Conference Paper

On Learning Markov Chains

  • Yi Hao
  • Alon Orlitsky
  • Venkatadheeraj Pichapati

The problem of estimating an unknown discrete distribution from its samples is a fundamental tenet of statistical learning. Over the past decade, it attracted significant research effort and has been solved for a variety of divergence measures. Surprisingly, an equally important problem, estimating an unknown Markov chain from its samples, is still far from understood. We consider two problems related to the min-max risk (expected loss) of estimating an unknown k-state Markov chain from its n sequential samples: predicting the conditional distribution of the next sample with respect to the KL-divergence, and estimating the transition matrix with respect to a natural loss induced by KL or a more general f-divergence measure. For the first measure, we determine the min-max prediction risk to within a linear factor in the alphabet size, showing it is \Omega(k\log\log n/n) and O(k^2\log\log n/n). For the second, if the transition probabilities can be arbitrarily small, then only trivial uniform risk upper bounds can be derived. We therefore consider transition probabilities that are bounded away from zero, and resolve the problem for essentially all sufficiently smooth f-divergences, including KL-, L_2-, Chi-squared, Hellinger, and Alpha-divergences.

NeurIPS Conference 2017 Conference Paper

Maxing and Ranking with Few Assumptions

  • Moein Falahatgar
  • Yi Hao
  • Alon Orlitsky
  • Venkatadheeraj Pichapati
  • Vaishakh Ravindrakumar

PAC maximum selection (maxing) and ranking of $n$ elements via random pairwise comparisons have diverse applications and have been studied under many models and assumptions. With just one simple natural assumption: strong stochastic transitivity, we show that maxing can be performed with linearly many comparisons yet ranking requires quadratically many. With no assumptions at all, we show that for the Borda-score metric, maximum selection can be performed with linearly many comparisons and ranking can be performed with $\mathcal{O}(n\log n)$ comparisons.