Arrow Research search

Author name cluster

Yiu-ming Cheung

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

AAAI Conference 2026 Conference Paper

Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering

  • Mingjie Zhao
  • Zhanpei Huang
  • Yang Lu
  • Mengke Li
  • Yiqun Zhang
  • Weifeng Su
  • Yiu-ming Cheung

Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets. Unlike the Euclidean distance of numerical attributes, the categorical attributes lack well-defined relationships of their possible values (also called categories interchangeably), which hampers the exploration of compact categorical data clusters. Although most attempts are made for developing appropriate distance metrics, they typically assume a fixed topological relationship between categories when learning distance metrics, which limits their adaptability to varying cluster structures and often leads to suboptimal clustering performance. This paper, therefore, breaks the intrinsic relationship tie of attribute categories and learns customized distance metrics suitable for flexibly and accurately revealing various cluster distributions. As a result, the fitting ability of the clustering algorithm is significantly enhanced, benefiting from the learnable category relationships. Moreover, the learned category relationships are proved to be Euclidean distance metric-compatible, enabling a seamless extension to mixed datasets that include both numerical and categorical attributes. Comparative experiments on 12 real benchmark datasets with significance tests show the superior clustering accuracy of the proposed method with an average ranking of 1.25, which is significantly higher than the 5.21 ranking of the best-performing methods. Code and extended version with detailed proofs are provided online.

AAAI Conference 2026 Conference Paper

DRFGD: Disentangled Representation-Focused Generative Defense for Attack-Tolerant Cross-Modal Hashing

  • Zhongqing Yu
  • Xin Liu
  • Yiu-ming Cheung
  • Zhikai Hu
  • Wentao Fan
  • Pan Zhou

With the widespread deployment of cross-modal retrieval in real-world scenarios, ensuring robustness against adversarial attacks is increasingly critical. Remarkably, deep cross-modal hashing is highly vulnerable to adversarial attacks due to its discrete nature and low-dimensional hash codes, while existing defense methods often fail to suppress perturbations embedded in vulnerable features and lack the capacity to model modality-specific structural differences, resulting in suboptimal adversarial robustness. To address these challenges, we propose a novel Disentangled Representation-Focused Generative Defense (DRFGD) framework for attack-tolerant cross-modal hashing. Without altering the structure of retrieval model, DRFGD defends against adversarial attacks by disentangling input representations into adversarial-robust and adversarial-vulnerable components, by an efficient dual-branch semantic-aware encoder. Guided by such disentangled robust features, an attack-tolerant generative module is seamlessly designed to synthesize semantically aligned and perturbation-resilient examples for robust adversarial training, thereby significantly promoting collaborative defense robustness to attackers. Consequently, the semantically consistent hash codes can be well obtained to enhance adversarial robustness in complex cross-modal attacking scenarios. Extensive experiments on public benchmarks demonstrate that DRFGD substantially improves retrieval robustness under various attacking scenarios, and shows its improved defense performance in comparison with the SOTA works.

NeurIPS Conference 2025 Conference Paper

Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

  • Chenwang Wu
  • Yiu-ming Cheung
  • Bo Han
  • Defu Lian

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: \url{https: //github. com/tmlr-group/Easy2Hard}.

AAAI Conference 2025 Conference Paper

Asynchronous Federated Clustering with Unknown Number of Clusters

  • Yunfan Zhang
  • Yiqun Zhang
  • Yang Lu
  • Mengke Li
  • Xi Chen
  • Yiu-ming Cheung

Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters. To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown proper cluster numbers are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to clients as a learning medium and coordinates them across clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.

ICML Conference 2025 Conference Paper

Bifurcate then Alienate: Incomplete Multi-view Clustering via Coupled Distribution Learning with Linear Overhead

  • Shengju Yu
  • Yiu-ming Cheung
  • Siwei Wang 0001
  • Xinwang Liu 0002
  • En Zhu

Despite remarkable advances, existing incomplete multi-view clustering (IMC) methods typically leverage either perspective-shared or perspective-specific determinants to encode cluster representations. To address this limitation, we introduce a BACDL algorithm designed to explicitly capture both concurrently, thereby exploiting heterogeneous data more effectively. It chooses to bifurcate feature clusters and further alienate them to enlarge the discrimination. With distribution learning, it successfully couples view guidance into feature clusters to alleviate dimension inconsistency. Then, building on the principle that samples in one common cluster own similar marginal distribution and conditional distribution, it unifies the association between feature clusters and sample clusters to bridge all views. Thereafter, all incomplete sample clusters are reordered and mapped to a common one to formulate clustering embedding. Last, the overall linear overhead endows it with a resource-efficient characteristic.

AAAI Conference 2025 Conference Paper

Component-Level Segmentation for Oracle Bone Inscription Decipherment

  • Zhikai Hu
  • Yiu-ming Cheung
  • Yonggang Zhang
  • Zhang Peiying
  • Tang Pui Ling

Oracle Bone Inscriptions (OBIs), as the earliest systematically organized pictographic script in China, hold significant importance in the study of the origins of Chinese civilization. Of the approximately 4,500 excavated OBI characters, only about one-third have been deciphered, leaving the remaining characters shrouded in mystery. Over the past decade, an increasing number of researchers have attempted to leverage artificial intelligence to assist in deciphering OBIs, but these efforts have not yet fully met the demands of this challenging objective. In this paper, we identify a key task—Component-Level OBI Segmentation—based on a successful deciphering case from 2018. This task aims to help experts quickly identify specific components within OBIs, thereby accelerating the deciphering process. Accordingly, we propose a new model to accomplish this task. Our model leverages a small amount of annotated data and a large amount of weakly annotated data and incorporates expert-provided prior knowledge, i.e., stroke rules, to automatically segment OBI components. Additionally, we train a series of auxiliary classifiers to evaluate the segmentation results during the test stage. We also invite experts to conduct a professional assessment of the results, which we cross-validated against our proposed evaluation metrics. Experimental results demonstrate that our method can accurately and clearly present the segmented components to experts.

NeurIPS Conference 2025 Conference Paper

Epistemic Uncertainty for Generated Image Detection

  • Jun Nie
  • Yonggang Zhang
  • Tongliang Liu
  • Yiu-ming Cheung
  • Bo Han
  • Xinmei Tian

We introduce a novel framework for AI-generated image detection through epistemic uncertainty, aiming to address critical security concerns in the era of generative models. Our key insight stems from the observation that distributional discrepancies between training and testing data manifest distinctively in the epistemic uncertainty space of machine learning models. In this context, the distribution shift between natural and generated images leads to elevated epistemic uncertainty in models trained on natural images when evaluating generated ones. Hence, we exploit this phenomenon by using epistemic uncertainty as a proxy for detecting generated images. This converts the challenge of generated image detection into the problem of uncertainty estimation, underscoring the generalization performance of the model used for uncertainty estimation. Fortunately, advanced large-scale vision models pre-trained on extensive natural images have shown excellent generalization performance for various scenarios. Thus, we utilize these pre-trained models to estimate the epistemic uncertainty of images and flag those with high uncertainty as generated. Extensive experiments demonstrate the efficacy of our method.

NeurIPS Conference 2025 Conference Paper

FedGPS: Statistical Rectification Against Data Heterogeneity in Federated Learning

  • Zhiqin Yang
  • Yonggang Zhang
  • Chenxin Li
  • Yiu-ming Cheung
  • Bo Han
  • Yixuan Yuan

Federated Learning (FL) confronts a significant challenge known as data heterogeneity, which impairs model performance and convergence. Existing methods have made notable progress in addressing this issue. However, improving performance in certain heterogeneity scenarios remains an overlooked question: How robust are these methods to deploy under diverse heterogeneity scenarios? To answer this, we conduct comprehensive evaluations across varied heterogeneity scenarios, showing that most existing methods exhibit limited robustness. Meanwhile, insights from these experiments highlight that sharing statistical information can mitigate heterogeneity by enabling clients to update with a global perspective. Motivated by this, we propose FedGPS ( Fed erated G oal- P ath S ynergy), a novel framework that seamlessly integrates statistical distribution and gradient information from others. Specifically, FedGPS statically modifies each client’s learning objective to implicitly model the global data distribution using surrogate information, while dynamically adjusting local update directions with gradient information from other clients at each round. Extensive experiments show that FedGPS outperforms state-of-the-art methods across diverse heterogeneity scenarios, validating its effectiveness and robustness. The code is available at: .

AAAI Conference 2025 Conference Paper

GBRIP: Granular Ball Representation for Imbalanced Partial Label Learning

  • Jintao Huang
  • Yiu-ming Cheung
  • Chi-man Vong
  • Wenbin Qian

Partial label learning (PLL) is a complicated weakly supervised multi-classification task compounded by class imbalance. Currently, existing methods only rely on inter-class pseudo-labeling from inter-class features, often overlooking the significant impact of the intra-class imbalanced features combined with the inter-class. To address these limitations, we introduce Granular Ball Representation for Imbalanced PLL (GBRIP), a novel framework for imbalanced PLL. GBRIP utilizes coarse-grained granular ball representation and multi-center loss to construct a granular ball-based feature space through unsupervised learning, effectively capturing the feature distribution within each class. GBRIP mitigates the impact of confusing features by systematically refining label disambiguation and estimating imbalance distributions. The novel multi-center loss function enhances learning by emphasizing the relationships between samples and their respective centers within the granular balls. Extensive experiments on standard benchmarks demonstrate that GBRIP outperforms existing state-of-the-art methods, offering a robust solution to the challenges of imbalanced PLL.

ICLR Conference 2025 Conference Paper

Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction

  • Shengju Yu
  • Zhibin Dong
  • Siwei Wang 0001
  • Pei Zhang 0008
  • Yi Zhang 0104
  • Xinwang Liu 0002
  • Naiyang Guan
  • Tiejun Li

Most of incomplete multi-view clustering (IMVC) methods typically choose to ignore the missing samples and only utilize observed unpaired samples to construct bipartite similarity. Moreover, they employ a single quantity of prototypes to extract the information of $\textbf{all}$ views. To eliminate these drawbacks, we present a simple yet effective IMVC approach, SIIHPC, in this work. It firstly transforms partial bipartition learning into original sample form by virtue of reconstruction concept to split out of observed similarity, and then loosens traditional non-negative constraints via regularizing samples to more freely characterize the similarity. Subsequently, it learns to recover the incomplete parts by utilizing the connection built between the similarity exclusive on respective view and the consensus graph shared for all views. On this foundation, it further introduces a group of hybrid prototype quantities for each individual view to flexibly extract the data features belonging to each view itself. Accordingly, the resulting graphs are with various scales and describe the overall similarity more comprehensively. It is worth mentioning that these all are optimized in one unified learning framework, which makes it possible for them to reciprocally promote. Then, to effectively solve the formulated optimization problem, we design an ingenious auxiliary function that is with theoretically proven monotonic-increasing properties. Finally, the clustering results are obtained by implementing spectral grouping action on the eigenvectors of stacked multi-scale consensus similarity. Experimental results confirm the effectiveness of SIIHPC.

NeurIPS Conference 2025 Conference Paper

Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data

  • shu chen
  • HongJun Xu
  • Ruichi Zhang
  • Mengke Li
  • Yonggang Zhang
  • Yang Lu
  • Bo Han
  • Yiu-ming Cheung

In real world, the observed label distribution of a dataset often mismatches its true distribution due to noisy labels. In this situation, noisy labels learning (NLL) methods directly integrated with long-tail learning (LTL) methods tend to fail due to a dilemma: NLL methods normally rely on unbiased model predictions to recover true distribution by selecting and correcting noisy labels; while LTL methods like logit adjustment depends on true distributions to adjust biased predictions, leading to a deadlock of mutual dependency defined in this paper. To address this, we propose \texttt{Unlocker}, a bilevel optimization framework that integrates NLL methods and LTL methods to iteratively disentangle this deadlock. The inner optimization leverages NLL to train the model, incorporating LTL methods to fairly select and correct noisy labels. The outer optimization adaptively determines an adjustment strength, mitigating model bias from over- or under-adjustment. We also theoretically prove that this bilevel optimization problem is convergent by transferring the outer optimization target to an equivalent problem with a closed-form solution. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our method in alleviating model bias and handling long-tailed noisy label data. Code is available at \url{https: //anonymous. 4open. science/r/neurips-2025-anonymous-1015/}.

ECAI Conference 2024 Conference Paper

Adapt PointFormer: 3D Point Cloud Analysis via Adapting 2D Visual Transformers

  • Mengke Li 0001
  • Da Li
  • Guoqing Yang
  • Yiu-ming Cheung
  • Hui Huang 0004

Pre-trained large-scale models have exhibited remarkable efficacy in computer vision, particularly for 2D image analysis. However, when it comes to 3D point clouds, the constrained accessibility of data, in contrast to the vast repositories of images, poses a challenge for the development of 3D pre-trained models. This paper therefore attempts to directly leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis. Accordingly, we propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds, obviating the need for mapping to images. Specifically, we convert raw point clouds into point embeddings for aligning dimensions with image tokens. Given the inherent disorder in point clouds, in contrast to the structured nature of images, we then sequence the point embeddings to optimize the utilization of 2D attention priors. To calibrate attention across 3D and 2D domains and reduce computational overhead, a trainable PointFormer with a limited number of parameters is subsequently concatenated to a frozen pre-trained image model. Extensive experiments on various benchmarks demonstrate the effectiveness of the proposed APF. The source code and more details are available at https: //vcc. tech/research/2024/PointFormer.

NeurIPS Conference 2024 Conference Paper

Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

  • Qingyuan Zeng
  • Zhenzhong Wang
  • Yiu-ming Cheung
  • Min Jiang

While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem. To efficiently solve the optimization problem, a three-stage process \textit{Ask, Attend, Attack}, called \textit{AAA}, is proposed to coordinate with the solver. \textit{Ask} guides attackers to create target texts that satisfy the specific semantics. \textit{Attend} identifies the crucial regions of the image for attacking, thus reducing the search space for the subsequent \textit{Attack}. \textit{Attack} uses an evolutionary algorithm to attack the crucial regions, where the attacks are semantically related to the target texts of \textit{Ask}, thus achieving targeted attacks without semantic loss. Experimental results on transformer-based and CNN+RNN-based image-to-text models confirmed the effectiveness of our proposed \textit{AAA}.

AAAI Conference 2024 Conference Paper

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

  • Mengke Li
  • Zhikai Hu
  • Yang Lu
  • Weichao Lan
  • Yiu-ming Cheung
  • Hui Huang

The imbalanced distribution of long-tailed data presents a considerable challenge for deep learning models, as it causes them to prioritize the accurate classification of head classes but largely disregard tail classes. The biased decision boundary caused by inadequate semantic information in tail classes is one of the key factors contributing to their low recognition accuracy. To rectify this issue, we propose to augment tail classes by grafting the diverse semantic information from head classes, referred to as head-to-tail fusion (H2T). We replace a portion of feature maps from tail classes with those belonging to head classes. These fused features substantially enhance the diversity of tail classes. Both theoretical analysis and practical experimentation demonstrate that H2T can contribute to a more optimized solution for the decision boundary. We seamlessly integrate H2T in the classifier adjustment stage, making it a plug-and-play module. Its simplicity and ease of implementation allow for smooth integration with existing long-tailed recognition methods, facilitating a further performance boost. Extensive experiments on various long-tailed benchmarks demonstrate the effectiveness of the proposed H2T. The source code is available at https://github.com/Keke921/H2T.

AAAI Conference 2024 Conference Paper

Federated Learning with Extremely Noisy Clients via Negative Distillation

  • Yang Lu
  • Lin Chen
  • Yonggang Zhang
  • Yiliang Zhang
  • Bo Han
  • Yiu-ming Cheung
  • Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., >90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a ‘bad teacher’ in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

  • Zhenheng Tang
  • Yonggang Zhang
  • Peijie Dong
  • Yiu-ming Cheung
  • Amelie C. Zhou
  • Bo Han
  • Xiaowen Chu

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

NeurIPS Conference 2024 Conference Paper

GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting

  • Xiufeng Huang
  • Ruiqi Li
  • Yiu-ming Cheung
  • Ka Chun Cheung
  • Simon See
  • Renjie Wan

3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets. To protect the copyright of these assets, digital watermarking techniques can be applied to embed ownership information discreetly within 3DGS mod- els. However, existing watermarking methods for meshes, point clouds, and implicit radiance fields cannot be directly applied to 3DGS models, as 3DGS models use explicit 3D Gaussians with distinct structures and do not rely on neural networks. Naively embedding the watermark on a pre-trained 3DGS can cause obvious distortion in rendered images. In our work, we propose an uncertainty- based method that constrains the perturbation of model parameters to achieve invisible watermarking for 3DGS. At the message decoding stage, the copyright messages can be reliably extracted from both 3D Gaussians and 2D rendered im- ages even under various forms of 3D and 2D distortions. We conduct extensive experiments on the Blender, LLFF, and MipNeRF-360 datasets to validate the effectiveness of our proposed method, demonstrating state-of-the-art performance on both message decoding accuracy and view synthesis quality.

NeurIPS Conference 2024 Conference Paper

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

  • Mengke Li
  • Ye Liu
  • Yang Lu
  • Yiqun Zhang
  • Yiu-ming Cheung
  • Hui Huang

Long-tailed visual recognition has received increasing attention recently. Despite fine-tuning techniques represented by visual prompt tuning (VPT) achieving substantial performance improvement by leveraging pre-trained knowledge, models still exhibit unsatisfactory generalization performance on tail classes. To address this issue, we propose a novel optimization strategy called Gaussian neighborhood minimization prompt tuning (GNM-PT), for VPT to address the long-tail learning problem. We introduce a novel Gaussian neighborhood loss, which provides a tight upper bound on the loss function of data distribution, facilitating a flattened loss landscape correlated to improved model generalization. Specifically, GNM-PT seeks the gradient descent direction within a random parameter neighborhood, independent of input samples, during each gradient update. Ultimately, GNM-PT enhances generalization across all classes while simultaneously reducing computational overhead. The proposed GNM-PT achieves state-of-the-art classification accuracies of 90. 3%, 76. 5%, and 50. 1% on benchmark datasets CIFAR100-LT (IR 100), iNaturalist 2018, and Places-LT, respectively. The source code is available at https: //github. com/Keke921/GNM-PT.

ICML Conference 2024 Conference Paper

Interpreting and Improving Large Language Models in Arithmetic Calculation

  • Wei Zhang
  • Chaoqun Wan
  • Yonggang Zhang 0003
  • Yiu-ming Cheung
  • Xinmei Tian 0001
  • Xu Shen 0001
  • Jieping Ye

Large language models (LLMs) have demonstrated remarkable potential across numerous applications and have shown an emergent ability to tackle complex reasoning tasks, such as mathematical computations. However, even for the simplest arithmetic calculations, the intrinsic mechanisms behind LLMs remains mysterious, making it challenging to ensure reliability. In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations. Through comprehensive experiments, we find that LLMs frequently involve a small fraction ($<$5%) of attention heads, which play a pivotal role in focusing on operands and operators during calculation processes. Subsequently, the information from these operands is processed through multi-layer perceptrons (MLPs), progressively leading to the final solution. These pivotal heads/MLPs, though identified on a specific dataset, exhibit transferability across different datasets and even distinct tasks. This insight prompted us to investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs’ computational performance. We empirically find that such precise tuning can yield notable enhancements on mathematical prowess, without compromising the performance on non-mathematical tasks. Our work serves as a preliminary exploration into the arithmetic calculation abilities inherent in LLMs, laying a solid foundation to reveal more intricate mathematical tasks.

ECAI Conference 2024 Conference Paper

Learning Order Forest for Qualitative-Attribute Data Clustering

  • Mingjie Zhao 0003
  • Sen Feng
  • Yiqun Zhang 0006
  • Mengke Li 0001
  • Yang Lu 0009
  • Yiu-ming Cheung

Clustering is a fundamental approach to understanding data patterns, wherein the intuitive Euclidean distance space is commonly adopted. However, this is not the case for implicit cluster distributions reflected by qualitative attribute values, e. g. , the nominal values of attributes like symptoms, marital status, etc. This paper, therefore, discovered a tree-like distance structure to flexibly represent the local order relationship among intra-attribute qualitative values. That is, treating a value as the vertex of the tree allows to capture rich order relationships among the vertex value and the others. To obtain the trees in a clustering-friendly form, a joint learning mechanism is proposed to iteratively obtain more appropriate tree structures and clusters. It turns out that the latent distance space of the whole dataset can be well-represented by a forest consisting of the learned trees. Extensive experiments demonstrate that the joint learning adapts the forest to the clustering task to yield accurate results. Comparisons of 10 counterparts on 12 real benchmark datasets with significance tests verify the superiority of the proposed method. Source code of the proposed method is available at [39].

NeurIPS Conference 2024 Conference Paper

Learning to Shape In-distribution Feature Space for Out-of-distribution Detection

  • Yonggang Zhang
  • Jie Lu
  • Bo Peng
  • Zhen Fang
  • Yiu-ming Cheung

Out-of-distribution (OOD) detection is critical for deploying machine learning models in the open world. To design scoring functions that discern OOD data from the in-distribution (ID) cases from a pre-trained discriminative model, existing methods tend to make rigorous distributional assumptions either explicitly or implicitly due to the lack of knowledge about the learned feature space in advance. The mismatch between the learned and assumed distributions motivates us to raise a fundamental yet under-explored question: \textit{Is it possible to deterministically model the feature distribution while pre-training a discriminative model? }This paper gives an affirmative answer to this question by presenting a Distributional Representation Learning (\texttt{DRL}) framework for OOD detection. In particular, \texttt{DRL} explicitly enforces the underlying feature space to conform to a pre-defined mixture distribution, together with an online approximation of normalization constants to enable end-to-end training. Furthermore, we formulate \texttt{DRL} into a provably convergent Expectation-Maximization algorithm to avoid trivial solutions and rearrange the sequential sampling to guide the training consistency. Extensive evaluations across mainstream OOD detection benchmarks empirically manifest the superiority of the proposed \texttt{DRL} over its advanced counterparts.

ICML Conference 2024 Conference Paper

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

  • Hongduan Tian
  • Feng Liu 0003
  • Tongliang Liu
  • Bo Du 0001
  • Yiu-ming Cheung
  • Bo Han 0003

In cross-domain few-shot classification, nearest centroid classifier (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, maximizing optimized kernel dependence (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in Hilbert-Schmidt independence criterion (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: https: //github. com/tmlr-group/MOKD.

NeurIPS Conference 2024 Conference Paper

Variational Multi-scale Representation for Estimating Uncertainty in 3D Gaussian Splatting

  • Ruiqi Li
  • Yiu-ming Cheung

Recently, 3D Gaussian Splatting (3DGS) has become popular in reconstructing dense 3D representations of appearance and geometry. However, the learning pipeline in 3DGS inherently lacks the ability to quantify uncertainty, which is an important factor in applications like robotics mapping and navigation. In this paper, we propose an uncertainty estimation method built upon the Bayesian inference framework. Specifically, we propose a method to build variational multi-scale 3D Gaussians, where we leverage explicit scale information in 3DGS parameters to construct diversified parameter space samples. We develop an offset table technique to draw local multi-scale samples efficiently by offsetting selected attributes and sharing other base attributes. Then, the offset table is learned by variational inference with multi-scale prior. The learned offset posterior can quantify the uncertainty of each individual Gaussian component, and be used in the forward pass to infer the predictive uncertainty. Extensive experimental results on various benchmark datasets show that the proposed method provides well-aligned calibration performance on estimated uncertainty and better rendering quality compared with the previous methods that enable uncertainty quantification with view synthesis. Besides, by leveraging the model parameter uncertainty estimated by our method, we can remove noisy Gaussians automatically, thereby obtaining a high-fidelity part of the reconstructed scene, which is of great help in improving the visual quality.

ECAI Conference 2023 Conference Paper

BEDCOE: Borderline Enhanced Disjunct Cluster Based Oversampling Ensemble for Online Multi-Class Imbalance Learning

  • Shuxian Li
  • Liyan Song
  • Yiu-ming Cheung
  • Xin Yao 0001

Multi-class imbalance learning usually confronts more challenges especially when learning from streaming data. Most existing methods focus on manipulating class imbalance ratios, disregarding other data properties such as the borderline and the disjunct. Recent studies have shown non-negligible impact of disregarding these properties on deteriorating predictive performance. Online multi-class imbalance would further exacerbate such negative impact. To abridge the research gap of online multi-class imbalance learning, we propose to enhance the number of training times of borderline samples based on the disjunct class-wise clusters that are adaptively constructed over time for each class individually. Specifically, we propose a borderline enhanced strategy for ensemble aiming to increase the number of training times of samples neighboring to borderline areas of different classes. We also propose to generate synthetic samples for training based on the adaptively learned disjunct clusters that are maintained for each class individually online, catering for online multi-class imbalance problem directly. These two components construct the Borderline Enhanced Disjunct Cluster Based Oversampling Ensemble (BEDCOE). Experimental studies are conducted and demonstrate the effectiveness of BEDCOE and each of its components in dealing with online multi-class imbalance.

IJCAI Conference 2022 Conference Paper

Het2Hom: Representation of Heterogeneous Attributes into Homogeneous Concept Spaces for Categorical-and-Numerical-Attribute Data Clustering

  • Yiqun Zhang
  • Yiu-ming Cheung
  • An Zeng

Data sets composed of a mixture of categorical and numerical attributes (also called mixed data hereinafter) are common in real-world cluster analysis. However, insightful analysis of such data under an unsupervised scenario using clustering is extremely challenging because the information provided by the two different types of attributes is heterogeneous, being at different concept hierarchies. That is, the values of a categorical attribute represent a set of different concepts (e. g. , professor, lawyer, and doctor of the attribute "occupation"), while the values of a numerical attribute describe the tendencies toward two different concepts (e. g. , low and high of the attribute "income"). To appropriately use such heterogeneous information in clustering, this paper therefore proposes a novel attribute representation learning method called Het2Hom, which first converts the heterogeneous attributes into a homogeneous form, and then learns attribute representations and data partitions on such a homogeneous basis. Het2Hom features low time complexity and intuitive interpretability. Extensive experiments show that Het2Hom outperforms the state-of-the-art counterparts.

AAAI Conference 2020 Conference Paper

An Ordinal Data Clustering Algorithm with Automated Distance Learning

  • Yiqun Zhang
  • Yiu-ming Cheung

Clustering ordinal data is a common task in data mining and machine learning fields. As a major type of categorical data, ordinal data is composed of attributes with naturally ordered possible values (also called categories interchangeably in this paper). However, due to the lack of dedicated distance metric, ordinal categories are usually treated as nominal ones, or coded as consecutive integers and treated as numerical ones. Both these two common ways will roughly define the distances between ordinal categories because the former way ignores the order relationship and the latter way simply assigns identical distances to different pairs of adjacent categories that may have intrinsically unequal distances. As a result, they may produce unsatisfactory ordinal data clustering results. This paper, therefore, proposes a novel ordinal data clustering algorithm, which iteratively learns: 1) The partition of ordinal dataset, and 2) the inter-category distances. To the best of our knowledge, this is the first attempt to dynamically adjust inter-category distances during the clustering process to search for a better partition of ordinal data. The proposed algorithm features superior clustering accuracy, low time complexity, fast convergence, and is parameter-free. Extensive experiments show its efficacy.

AAAI Conference 2018 Conference Paper

Uplink Communication Efficient Differentially Private Sparse Optimization With Feature-Wise Distributed Data

  • Jian Lou
  • Yiu-ming Cheung

Preserving differential privacy during empirical risk minimization model training has been extensively studied under centralized and sample-wise distributed dataset settings. This paper considers a nearly unexplored context with features partitioned among different parties under privacy restriction. Motivated by the nearly optimal utility guarantee achieved by centralized private Frank-Wolfe algorithm (Talwar, Thakurta, and Zhang 2015), we develop a distributed variant with guaranteed privacy, utility and uplink communication complexity. To obtain these guarantees, we provide a much generalized convergence analysis for block-coordinate Frank-Wolfe under arbitrary sampling, which greatly extends known convergence results that are only applicable to two specific block sampling distributions. We also design an active feature sharing scheme by utilizing private Johnson-Lindenstrauss transform, which is the key to updating local partial gradients in a differentially private and communication efficient manner.

AAAI Conference 2017 Conference Paper

Bilinear Probabilistic Canonical Correlation Analysis via Hybrid Concatenations

  • Yang Zhou
  • Haiping Lu
  • Yiu-ming Cheung

Canonical Correlation Analysis (CCA) is a classical technique for two-view correlation analysis, while Probabilistic CCA (PCCA) provides a generative and more general viewpoint for this task. Recently, PCCA has been extended to bilinear cases for dealing with two-view matrices in order to preserve and exploit the matrix structures in PCCA. However, existing bilinear PCCAs impose restrictive model assumptions for matrix structure preservation, sacrificing generative correctness or model flexibility. To overcome these drawbacks, we propose BPCCA, a new bilinear extension of PCCA, by introducing a hybrid joint model. Our new model preserves matrix structures indirectly via hybrid vector-based and matrix-based concatenations. This enables BPCCA to gain more model flexibility in capturing two-view correlations and obtain close-form solutions in parameter estimation. Experimental results on two real-world applications demonstrate the superior performance of BPCCA over competing methods.

IJCAI Conference 2017 Conference Paper

Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift

  • Yang Lu
  • Yiu-ming Cheung
  • Yuan Yan Tang

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i. e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.

IJCAI Conference 2015 Conference Paper

Efficient Generalized Conditional Gradient with Gradient Sliding for Composite Optimization

  • Yiu-ming Cheung
  • Jian Lou

Generalized conditional gradient method has regained increasing research interest as an alternative to another popular proximal gradient method for sparse optimization problems. For particular tasks, its low computation cost of linear subproblem evaluation on each iteration leads to superior practical performance. However, the inferior iteration complexity incurs excess number of gradient evaluations, which can counteract the efficiency gained by solving low cost linear subproblem. In this paper, we therefore propose a novel algorithm that requires optimal graduate evaluations as proximal gradient. We also present a refined variant for a type of gauge regularized problem where approximation techniques are allowed to further accelerate linear subproblem computation. Experiments of CUR-like matrix factorization problem with group lasso penalty on four real-world datasets demonstrate the efficiency of the proposed method.

IJCAI Conference 2013 Conference Paper

Online Group Feature Selection

  • Jing Wang
  • Zhong-Qiu Zhao
  • Xuegang Hu
  • Yiu-ming Cheung
  • Meng Wang
  • Xindong Wu

Online feature selection with dynamic features has become an active research area in recent years. However, in some real-world applications such as image analysis and email spam filtering, features may arrive by groups. Existing online feature selection methods evaluate features individually, while existing group feature selection methods cannot handle online processing. Motivated by this, we formulate the online group feature selection problem, and propose a novel selection approach for this problem. Our proposed approach consists of two stages: online intra-group selection and online inter-group selection. In the intra-group selection, we use spectral analysis to select discriminative features in each group when it arrives. In the inter-group selection, we use Lasso to select a globally optimal subset of features. This 2-stage procedure continues until there are no more features to come or some predefined stopping conditions are met. Extensive experiments conducted on benchmark and real-world data sets demonstrate that our proposed approach outperforms other state-of-theart online feature selection methods.