Author name cluster

Thanh-Toan Do

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

AAAI Conference 2026 Conference Paper

Coverage-Constrained Human-AI Cooperation with Multiple Experts

Zheng Zhang
Cuong C. Nguyen
Kevin Wells
Thanh-Toan Do
David Rosewarne
Gustavo Carneiro

Human-AI cooperative classification (HAI-CC) aims to develop hybrid intelligent systems that enhance decision-making in various high-stakes real-world scenarios by leveraging both human expertise and AI capabilities. Current HAI-CC methods primarily focus on learning-to-defer (L2D), where decisions are deferred to human experts when AI is not confident, and learning-to-complement (L2C), where AI and human experts make predictions cooperatively. However, existing research in both L2D and L2C has not effectively been explored under diverse expert knowledge to improve decision-making, particularly when constrained by the operation cost of human involvement. In this paper, we address this research gap by proposing the Coverage-constrained Learning to Defer and Complement with Specific Experts (CL2DC) method. In particular, CL2DC assesses input data before making final decisions through either AI prediction alone or by deferring to or complementing a specific human expert. Furthermore, we propose a coverage-constrained optimisation to control the cooperation cost, ensuring it approximates a target probability for AI-only selection. This approach enables an effective assessment of system performance within a specified budget. Comprehensive evaluations on both synthetic and real-world datasets demonstrate that CL2DC achieves superior performance compared to state-of-the-art HAI-CC methods.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Geometry-Aware Collaborative Multi-Solutions Optimizer for Model Fine-Tuning with Parameter Efficiency

Van-Anh Nguyen
Trung Le
Mehrtash Harandi
Ehsan Abbasnejad
Thanh-Toan Do
Dinh Phung

We propose a framework grounded in gradient flow theory and informed by geometric structure that provides multiple diverse solutions for a given task, ensuring collaborative results that enhance performance and adaptability across different tasks. This framework enables flexibility, allowing for efficient task-specific fine-tuning while preserving the knowledge of the pre-trained foundation models. Extensive experiments across transfer learning, few-shot learning, and domain generalization show that our proposed approach consistently outperforms existing Bayesian methods, delivering strong performance with affordable computational overhead and offering a practical solution by updating only a small subset of parameters.

PDF Details

TMLR Journal 2025 Journal Article

Maximising the Utility of Validation Sets for Imbalanced Noisy-label Meta-learning

Hoang Anh Dung
Cuong C. Nguyen
Vasileios Belagiannis
Thanh-Toan Do
Gustavo Carneiro

Meta-learning is an effective method to handle imbalanced and noisy-label learning, but it generally depends on a clean validation set. Unfortunately, this validation set has poor scalability when the number of classes increases, as traditionally these samples need to be randomly selected, manually labelled and balanced-distributed. This problem therefore has motivated the development of meta-learning methods to automatically select validation samples that are likely to have clean labels and balanced class distribution. Unfortunately, a common missing point of existing meta-learning methods for noisy label learning is the lack of consideration for data informativeness when constructing the validation set. The construction of an informative validation set requires hard samples, i.e., samples that the model has low confident prediction, but these samples are more likely to be noisy, which can degrade the meta reweighting process. Therefore, the balance between sample informativeness and cleanness is an important criteria for validation set optimization. In this paper, we propose new criteria to characterise the utility of such meta-learning validation sets, based on: 1) sample informativeness; 2) balanced class distribution; and 3) label cleanliness. We also introduce a new imbalanced noisy-label meta-learning (INOLML) algorithm that auto- matically builds a validation set by maximising such utility criteria. The proposed method shows state-of-the-art (SOTA) results compared to previous meta-learning and noisy-label learning approaches on several noisy-label learning benchmarks.

PDF Details

ICLR Conference 2025 Conference Paper

Multi-Perspective Data Augmentation for Few-shot Object Detection

Anh-Khoa Nguyen Vu
Quoc-Truong Truong
Vinh-Tiep Nguyen
Thanh Duc Ngo
Thanh-Toan Do
Tam V. Nguyen 0002

Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of $17.5\%$ in nAP50 over the baseline on PASCAL VOC.

Details

ICLR Conference 2025 Conference Paper

Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution

Cuong Cao Nguyen
Thanh-Toan Do
Gustavo Carneiro 0001

Recent progress in machine learning research is gradually shifting its focus towards *human-AI cooperation* due to the advantages of exploiting the reliability of human experts and the efficiency of AI models. One of the promising approaches in human-AI cooperation is *learning to defer* (L2D), where the system analyses the input data and decides to make its own decision or defer to human experts. Although L2D has demonstrated state-of-the-art performance, in its standard setting, L2D entails a severe limitation: all human experts must annotate the whole training dataset of interest, resulting in a time-consuming and expensive annotation process that can subsequently influence the size and diversity of the training set. Moreover, the current L2D does not have a principled way to control workload distribution among human experts and the AI classifier, which is critical to optimise resource allocation. We, therefore, propose a new probabilistic modelling approach inspired by the mixture-of-experts, where the Expectation - Maximisation algorithm is leverage to address the issue of missing expert's annotations. Furthermore, we introduce a constraint, which can be solved efficiently during the E-step, to control the workload distribution among human experts and the AI classifier. Empirical evaluation on synthetic and real-world datasets shows that our proposed probabilistic approach performs competitively, or surpasses previously proposed methods assessed on the same benchmarks.

Details

AAAI Conference 2024 Conference Paper

MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation

Minh-Quan Le
Tam V. Nguyen
Trung-Nghia Le
Thanh-Toan Do
Minh N. Do
Minh-Triet Tran

Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (e.g. mean of K-shot) for prediction, leading to performance instability. To overcome the disadvantage of the point estimation mechanism, we propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask, which is conditioned on an object region and K-shot information. Inspired by augmentation approaches that perturb data with Gaussian noise for populating low data density regions, we model the mask distribution with a diffusion probabilistic model. We also propose to utilize classifier-free guided mask sampling to integrate category information into the binary mask generation process. Without bells and whistles, our proposed method consistently outperforms state-of-the-art methods on both base and novel classes of the COCO dataset while simultaneously being more stable than existing methods. The source code is available at: https://github.com/minhquanlecs/MaskDiff.

PDF Details DOI

ICML Conference 2024 Conference Paper

Sharpness-Aware Data Generation for Zero-shot Quantization

Hoang Anh Dung
Cuong Pham 0007
Trung Le 0001
Jianfei Cai 0001
Thanh-Toan Do

Zero-shot quantization aims to learn a quantized model from a pre-trained full-precision model with no access to original real training data. The common idea in zero-shot quantization approaches is to generate synthetic data for quantizing the full-precision model. While it is well-known that deep neural networks with low sharpness have better generalization ability, none of the previous zero-shot quantization works considers the sharpness of the quantized model as a criterion for generating training data. This paper introduces a novel methodology that takes into account quantized model sharpness in synthetic data generation to enhance generalization. Specifically, we first demonstrate that sharpness minimization can be attained by maximizing gradient matching between the reconstruction loss gradients computed on synthetic and real validation data, under certain assumptions. We then circumvent the problem of the gradient matching without real validation set by approximating it with the gradient matching between each generated sample and its neighbors. Experimental evaluations on CIFAR-100 and ImageNet datasets demonstrate the superiority of the proposed method over the state-of-the-art techniques in low-bit quantization settings.

Details

NeurIPS Conference 2023 Conference Paper

Flat Seeking Bayesian Neural Networks

Van-Anh Nguyen
Tung-Long Vuong
Hoang Phan
Thanh-Toan Do
Dinh Phung
Trung Le

Bayesian Neural Networks (BNNs) provide a probabilistic interpretation for deep learning models by imposing a prior distribution over model parameters and inferring a posterior distribution based on observed data. The model sampled from the posterior distribution can be used for providing ensemble predictions and quantifying prediction uncertainty. It is well-known that deep learning models with lower sharpness have better generalization ability. However, existing posterior inferences are not aware of sharpness/flatness in terms of formulation, possibly leading to high sharpness for the models sampled from them. In this paper, we develop theories, the Bayesian setting, and the variational inference approach for the sharpness-aware posterior. Specifically, the models sampled from our sharpness-aware posterior, and the optimal approximate posterior estimating this sharpness-aware posterior, have better flatness, hence possibly possessing higher generalization ability. We conduct experiments by leveraging the sharpness-aware posterior with state-of-the-art Bayesian Neural Networks, showing that the flat-seeking counterparts outperform their baselines in all metrics of interest.

PDF Details

NeurIPS Conference 2023 Conference Paper

Model and Feature Diversity for Bayesian Neural Networks in Mutual Learning

Van Cuong Pham
Cuong Nguyen
Trung Le
Dinh Phung
Gustavo Carneiro
Thanh-Toan Do

Bayesian Neural Networks (BNNs) offer probability distributions for model parameters, enabling uncertainty quantification in predictions. However, they often underperform compared to deterministic neural networks. Utilizing mutual learning can effectively enhance the performance of peer BNNs. In this paper, we propose a novel approach to improve BNNs performance through deep mutual learning. The proposed approaches aim to increase diversity in both network parameter distributions and feature distributions, promoting peer networks to acquire distinct features that capture different characteristics of the input, which enhances the effectiveness of mutual learning. Experimental results demonstrate significant improvements in the classification accuracy, negative log-likelihood, and expected calibration error when compared to traditional mutual learning for BNNs.

PDF Details

NeurIPS Conference 2023 Conference Paper

Optimal Transport Model Distributional Robustness

Van-Anh Nguyen
Trung Le
Anh Bui
Thanh-Toan Do
Dinh Phung

Distributional robustness is a promising framework for training deep learning models that are less vulnerable to adversarial examples and data distribution shifts. Previous works have mainly focused on exploiting distributional robustness in the data space. In this work, we explore an optimal transport-based distributional robustness framework in model spaces. Specifically, we examine a model distribution within a Wasserstein ball centered on a given model distribution that maximizes the loss. We have developed theories that enable us to learn the optimal robust center model distribution. Interestingly, our developed theories allow us to flexibly incorporate the concept of sharpness awareness into training, whether it's a single model, ensemble models, or Bayesian Neural Networks, by considering specific forms of the center model distribution. These forms include a Dirac delta distribution over a single model, a uniform distribution over several models, and a general Bayesian Neural Network. Furthermore, we demonstrate that Sharpness-Aware Minimization (SAM) is a specific case of our framework when using a Dirac delta distribution over a single model, while our framework can be seen as a probabilistic extension of SAM. To validate the effectiveness of our framework in the aforementioned settings, we conducted extensive experiments, and the results reveal remarkable improvements compared to the baselines.

PDF Details

TMLR Journal 2023 Journal Article

Task Weighting in Meta-learning with Trajectory Optimisation

Cuong C. Nguyen
Thanh-Toan Do
Gustavo Carneiro

Developing meta-learning algorithms that are un-biased toward a subset of training tasks often requires hand-designed criteria to weight tasks, potentially resulting in sub-optimal solutions. In this paper, we introduce a new principled and fully-automated task-weighting algorithm for meta-learning methods. By considering the weights of tasks within the same mini-batch as an action, and the meta-parameter of interest as the system state, we cast the task-weighting meta-learning problem to a trajectory optimisation and employ the iterative linear quadratic regulator to determine the optimal action or weights of tasks. We theoretically show that the proposed algorithm converges to an $\epsilon_{0}$-stationary point, and empirically demonstrate that the proposed approach out-performs common hand-engineering weighting methods in two few-shot learning benchmarks.

PDF Details

IJCAI Conference 2022 Conference Paper

Logic Rules Meet Deep Learning: A Novel Approach for Ship Type Classification (Extended Abstract)

Manolis Pitsikalis
Thanh-Toan Do
Alexei Lisitsa
Shan Luo

The shipping industry is an important component of the global trade and economy. In order to ensure law compliance and safety, it needs to be monitored. In this paper, we present a novel ship type classification model that combines vessel transmitted data from the Automatic Identification System, with vessel imagery. The main components of our approach are the Faster R-CNN Deep Neural Network and a Neuro-Fuzzy system with IF-THEN rules. We evaluate our model using real world data and showcase the advantages of this combination while also compare it with other methods. Results show that our model can increase prediction scores by up to 15. 4% when compared with the next best model we considered, while also maintaining a level of explainability as opposed to common black box approaches.

PDF Details DOI

UAI Conference 2021 Conference Paper

Probabilistic task modelling for meta-learning

Cuong Cao Nguyen
Thanh-Toan Do
Gustavo Carneiro 0001

We propose probabilistic task modelling – a generative probabilistic model for collections of tasks used in meta-learning. The proposed model combines variational auto-encoding and latent Dirichlet allocation to model each task as a mixture of Gaussian distribution in an embedding space. Such modelling provides an explicit representation of a task through its task-theme mixture. We present an efficient approximation inference technique based on variational inference method for empirical Bayes parameter estimation. We perform empirical evaluations to validate the task uncertainty and task distance produced by the proposed method through correlation diagrams of the prediction accuracy on testing tasks. We also carry out experiments of task selection in meta-learning to demonstrate how the task relatedness inferred from the proposed model help to facilitate meta-learning algorithms.

Details

IJCAI Conference 2020 Conference Paper

Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks

Tuan Hoang
Thanh-Toan Do
Tam V. Nguyen
Ngai-Man Cheung

This paper proposes two novel techniques to train deep convolutional neural networks with low bit-width weights and activations. First, to obtain low bit-width weights, most existing methods obtain the quantized weights by performing quantization on the full-precision network weights. However, this approach would result in some mismatch: the gradient descent updates full-precision weights, but it does not update the quantized weights. To address this issue, we propose a novel method that enables direct updating of quantized weights with learnable quantization levels to minimize the cost function using gradient descent. Second, to obtain low bit-width activations, existing works consider all channels equally. However, the activation quantizers could be biased toward a few channels with high-variance. To address this issue, we propose a method to take into account the quantization errors of individual channels. With this approach, we can learn activation quantizers that minimize the quantization errors in the majority of channels. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on the image classification task, using AlexNet, ResNet and MobileNetV2 architectures on CIFAR-100 and ImageNet datasets.

PDF Details DOI

ICML Conference 2019 Conference Paper

Bayesian Generative Active Deep Learning

Toan Tran 0002
Thanh-Toan Do
Ian D. Reid 0001
Gustavo Carneiro 0001

Deep learning models have demonstrated outstanding performance in several problems, but their training process tends to require immense amounts of computational and human resources for training and labeling, constraining the types of problems that can be tackled. Therefore, the design of effective training methods that require small labeled training sets is an important research direction that will allow a more effective use of resources. Among current approaches designed to address this issue, two are particularly interesting: data augmentation and active learning. Data augmentation achieves this goal by artificially generating new training points, while active learning relies on the selection of the “most informative” subset of unlabeled training samples to be labelled by an oracle. Although successful in practice, data augmentation can waste computational resources because it indiscriminately generates samples that are not guaranteed to be informative, and active learning selects a small subset of informative samples (from a large un-annotated set) that may be insufficient for the training process. In this paper, we propose a Bayesian generative active deep learning approach that combines active learning with data augmentation – we provide theoretical and empirical evidence (MNIST, CIFAR-$\{10, 100\}$, and SVHN) that our approach has more efficient training and better classification results than data augmentation and active learning.

Details

ICRA Conference 2018 Conference Paper

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Thanh-Toan Do
Anh Nguyen 0003
Ian D. Reid 0001

We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images. Our AffordanceNet has two branches: an object detection branch to localize and classify the object, and an affordance detection branch to assign each pixel in the object to its most probable affordance label. The proposed framework employs three key components for effectively handling the multiclass problem in the affordance mask: a sequence of deconvolutional layers, a robust resizing strategy, and a multi-task loss function. The experimental results on the public datasets show that our AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image. This makes our AffordanceNet well suitable for real-time robotic applications. Furthermore, we demonstrate the effectiveness of AffordanceNet in different testing environments and in real robotic applications. The source code is available at https://github.com/nqanh/affordance-net.

Details

ICRA Conference 2018 Conference Paper

SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes

Trung T. Pham
Thanh-Toan Do
Niko Sünderhauf
Ian D. Reid 0001

This paper presents SceneCut, a novel approach to jointly discover previously unseen objects and non-object surfaces using a single RGB-D image. SceneCut's joint reasoning over scene semantics and geometry allows a robot to detect and segment object instances in complex scenes where modern deep learning-based methods either fail to separate object instances, or fail to detect objects that were not seen during training. SceneCut automatically decomposes a scene into meaningful regions which either represent objects or scene surfaces. The decomposition is qualified by an unified energy function over objectness and geometric fitting. We show how this energy function can be optimized efficiently by utilizing hierarchical segmentation trees. Moreover, we leverage a pre-trained convolutional oriented boundary network to predict accurate boundaries from images, which are used to construct high-quality region hierarchies. We evaluate SceneCut on several different indoor environments, and the results show that SceneCut significantly outperforms all the existing methods.

Details