Author name cluster

Michael Ng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

TMLR Journal 2025 Journal Article

AlgoFormer: An Efficient Transformer Framework with Algorithmic Structures

Yihang Gao
Chuanyang Zheng
Enze Xie
Han Shi
Tianyang Hu
Yu Li
Michael Ng
Zhenguo Li

Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by the recently proposed looped transformer (Yang et al., 2024; Giannou et al., 2023), we design a novel transformer framework, dubbed Algorithm Transformer (abbreviated as AlgoFormer). We provide an insight that efficient transformer architectures can be designed by leveraging prior knowledge of tasks and the underlying structure of potential algorithms. Compared with the standard transformer and vanilla looped transformer, the proposed AlgoFormer can perform efficiently in algorithm representation in some specific tasks. In particular, inspired by the structure of human-designed learning algorithms, our transformer framework consists of a pre-transformer that is responsible for task preprocessing, a looped transformer for iterative optimization algorithms, and a post-transformer for producing the desired results after post-processing. We provide theoretical evidence of the expressive power of the AlgoFormer in solving some challenging problems, mirroring human-designed algorithms. Furthermore, some theoretical and empirical results are presented to show that the designed transformer has the potential to perform algorithm representation and learning. Experimental results demonstrate the empirical superiority of the proposed transformer in that it outperforms the standard transformer and vanilla looped transformer in some specific tasks. An extensive experiment on real language tasks (e.g., neural machine translation of German and English, and text classification) further validates the expressiveness and effectiveness of AlgoFormer.

PDF Details

AAAI Conference 2025 Conference Paper

Hypergraph Learning for Unsupervised Graph Alignment via Optimal Transport

Yuguang Yan
Canlin Yang
Yuanlin Chen
Ruichu Cai
Michael Ng

Unsupervised graph alignment aims to find corresponding nodes across different graphs without supervision. Existing methods usually leverage the graph structure to aggregate features of nodes to find relations between nodes. However, the graph structure is inherently limited in pairwise relations between nodes without considering higher-order dependencies among multiple nodes. In this paper, we take advantage of the hypergraph structure to characterize higher-order structural information among nodes for better graph alignment. Specifically, we propose an optimal transport model to learn a hypergraph to capture complex relations among nodes, so that the nodes involved in one hyperedge can be adaptively based on local geometric information. In addition, inspired by the Dirichlet energy function of a hypergraph, we further refine our model to enhance the consistency between structural and feature information in each hyperedge. After that, we jointly leverage graphs and hypergraphs to extract structural and feature information to better model the relations between nodes, which is used to find node correspondences across graphs. We conduct experiments on several benchmark datasets with different settings, and the results demonstrate the effectiveness of our proposed method.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

DAPE: Data-Adaptive Positional Encoding for Length Extrapolation

Chuanyang Zheng
Yihang Gao
Han Shi
Minbin Huang
Jingyao Li
Jing Xiong
Xiaozhe Ren
Michael Ng

Positional encoding plays a crucial role in transformers, significantly impact- ing model performance and length generalization. Prior research has introduced absolute positional encoding (APE) and relative positional encoding (RPE) to distinguish token positions in given sequences. However, both APE and RPE remain fixed after model training regardless of input data, limiting their adaptability and flexibility. Hence, we expect that the desired positional encoding should be data-adaptive and can be dynamically adjusted with the given attention. In this paper, we propose a Data-Adaptive Positional Encoding (DAPE) method, which dynamically and semantically adjusts based on input context and learned fixed priors. Experimental validation on real-world datasets (Arxiv, Books3, and CHE) demonstrates that DAPE enhances model performances in terms of trained length and length generalization, where the improvements are statistically significant. The model visualization suggests that our model can keep both local and anti-local information. Finally, we successfully train the model on sequence length 128 and achieve better performance at evaluation sequence length 8192, compared with other static positional encoding methods, revealing the benefit of the adaptive positional encoding method.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing

Junren Chen
Jonathan Scarlett
Michael Ng
Zhaoqiang Liu

In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x^*}\in\mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x^*}\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results are non-uniform, i. e. , they hold with high probability for a fixed $\mathbf{x^*}$ rather than for all $\mathbf{x^*}$ simultaneously. In this paper, we build a unified framework to derive uniform recovery guarantees for nonlinear GCS where the observation model is nonlinear and possibly discontinuous or unknown. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index model as canonical examples. Specifically, using a single realization of the sensing ensemble and generalized Lasso, all $\mathbf{x^*}\in G(\mathbb{B}_2^k(r))$ can be recovered up to an $\ell_2$-error at most $\epsilon$ using roughly $\tilde{O}({k}/{\epsilon^2})$ samples, with omitted logarithmic factors typically being dominated by $\log L$. Notably, this almost coincides with existing non-uniform guarantees up to logarithmic factors, hence the uniformity costs very little. As part of our technical contributions, we introduce Lipschitz approximation to handle discontinuous observation models. We also develop a concentration inequality that produces tighter bound for product process whose index sets have low metric entropy. Experimental results are presented to corroborate our theory.

PDF Details

NeurIPS Conference 2022 Conference Paper

Approximate Secular Equations for the Cubic Regularization Subproblem

Yihang Gao
Man-Chung Yue
Michael Ng

The cubic regularization method (CR) is a popular algorithm for unconstrained non-convex optimization. At each iteration, CR solves a cubically regularized quadratic problem, called the cubic regularization subproblem (CRS). One way to solve the CRS relies on solving the secular equation, whose computational bottleneck lies in the computation of all eigenvalues of the Hessian matrix. In this paper, we propose and analyze a novel CRS solver based on an approximate secular equation, which requires only some of the Hessian eigenvalues and is therefore much more efficient. Two approximate secular equations (ASEs) are developed. For both ASEs, we first study the existence and uniqueness of their roots and then establish an upper bound on the gap between the root and that of the standard secular equation. Such an upper bound can in turn be used to bound the distance from the approximate CRS solution based ASEs to the true CRS solution, thus offering a theoretical guarantee for our CRS solver. A desirable feature of our CRS solver is that it requires only matrix-vector multiplication but not matrix inversion, which makes it particularly suitable for high-dimensional applications of unconstrained non-convex optimization, such as low-rank recovery and deep learning. Numerical experiments with synthetic and real data-sets are conducted to investigate the practical performance of the proposed CRS solver. Experimental results show that the proposed solver outperforms two state-of-the-art methods.

PDF Details

AAAI Conference 2019 Conference Paper

Oversampling for Imbalanced Data via Optimal Transport

Yuguang Yan
Mingkui Tan
Yanwu Xu
Jiezhang Cao
Michael Ng
Huaqing Min
Qingyao Wu

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

PDF Details

IJCAI Conference 2019 Conference Paper

Robust Low-Tubal-Rank Tensor Completion via Convex Optimization

Qiang Jiang
Michael Ng

This paper considers the problem of recovering multidimensional array, in particular third-order tensor, from a random subset of its arbitrarily corrupted entries. Our study is based on a recently proposed algebraic framework in which the tensor-SVD is introduced to capture the low-tubal-rank structure in tensor. We analyze the performance of a convex program, which minimizes a weighted combination of the tensor nuclear norm, a convex surrogate for the tensor tubal rank, and the tensor l1 norm. We prove that under certain incoherence conditions, this program can recover the tensor exactly with overwhelming probability, provided that its tubal rank is not too large and that the corruptions are reasonably sparse. The number of required observations is order optimal (up to a logarithm factor) when comparing with the degrees of freedom of the low-tubal-rank tensor. Numerical experiments verify our theoretical results and real-world applications demonstrate the effectiveness of our algorithm.

PDF Details

IJCAI Conference 2017 Conference Paper

Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation

Yuguang Yan
Wen Li
Michael Ng
Mingkui Tan
Hanrui Wu
Huaqing Min
Qingyao Wu

Domain adaptation aims to reduce the effort on collecting and annotating target data by leveraging knowledge from a different source domain. The domain adaptation problem will become extremely challenging when the feature spaces of the source and target domains are different, which is also known as the heterogeneous domain adaptation (HDA) problem. In this paper, we propose a novel HDA method to find the optimal discriminative correlation subspace for the source and target data. The discriminative correlation subspace is inherited from the canonical correlation subspace between the source and target data, and is further optimized to maximize the discriminative ability for the target domain classifier. We formulate a joint objective in order to simultaneously learn the discriminative correlation subspace and the target domain classifier. We then apply an alternating direction method of multiplier (ADMM) algorithm to address the resulting non-convex optimization problem. Comprehensive experiments on two real-world data sets demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods.

PDF Details

AAAI Conference 2010 Conference Paper

Multi-Instance Dimensionality Reduction

Yu-Yin Sun
Michael Ng
Zhi-Hua Zhou

Multi-instance learning deals with problems that treat bags of instances as training examples. In single-instance learning problems, dimensionality reduction is an essential step for high-dimensional data analysis and has been studied for years. The curse of dimensionality also exists in multiinstance learning tasks, yet this difficult task has not been studied before. Direct application of existing single-instance dimensionality reduction objectives to multi-instance learning tasks may not work well since it ignores the characteristic of multi-instance learning that the labels of bags are known while the labels of instances are unknown. In this paper, we propose an effective model and develop an efficient algorithm to solve the multi-instance dimensionality reduction problem. We formulate the objective as an optimization problem by considering orthonormality and sparsity constraints in the projection matrix for dimensionality reduction, and then solve it by the gradient descent along the tangent space of the orthonormal matrices. We also propose an approximation for improving the efficiency. Experimental results validate the effectiveness of the proposed method.

PDF Details