Author name cluster

Minh N. Do

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

IJCAI Conference 2025 Conference Paper

Robult: Leveraging Redundancy and Modality-Specific Features for Robust Multimodal Learning

Duy A. Nguyen
Abhi Kamboj
Minh N. Do

Addressing missing modalities and limited labeled data is crucial for advancing robust multimodal learning. We propose Robult, a scalable framework designed to mitigate these challenges by preserving modality-specific information and leveraging redundancy through a novel information-theoretic approach. Robult optimizes two core objectives: (1) a soft Positive-Unlabeled (PU) contrastive loss that maximizes task-relevant feature alignment while effectively utilizing limited labeled data in semi-supervised settings, and (2) a latent reconstruction loss that ensures unique modality-specific information is retained. These strategies, embedded within a modular design, enhance performance across various downstream tasks and ensure resilience to incomplete modalities during inference. Experimental results across diverse datasets validate that Robult achieves superior performance over existing approaches in both semi-supervised learning and missing modality contexts. Furthermore, its lightweight design promotes scalability and seamless integration with existing architectures, making it suitable for real-world multimodal applications.

PDF Details DOI

AAAI Conference 2024 Conference Paper

MaskDiff: Modeling Mask Distribution with Diffusion Probabilistic Model for Few-Shot Instance Segmentation

Minh-Quan Le
Tam V. Nguyen
Trung-Nghia Le
Thanh-Toan Do
Minh N. Do
Minh-Triet Tran

Few-shot instance segmentation extends the few-shot learning paradigm to the instance segmentation task, which tries to segment instance objects from a query image with a few annotated examples of novel categories. Conventional approaches have attempted to address the task via prototype learning, known as point estimation. However, this mechanism depends on prototypes (e.g. mean of K-shot) for prediction, leading to performance instability. To overcome the disadvantage of the point estimation mechanism, we propose a novel approach, dubbed MaskDiff, which models the underlying conditional distribution of a binary mask, which is conditioned on an object region and K-shot information. Inspired by augmentation approaches that perturb data with Gaussian noise for populating low data density regions, we model the mask distribution with a diffusion probabilistic model. We also propose to utilize classifier-free guided mask sampling to integrate category information into the binary mask generation process. Without bells and whistles, our proposed method consistently outperforms state-of-the-art methods on both base and novel classes of the COCO dataset while simultaneously being more stable than existing methods. The source code is available at: https://github.com/minhquanlecs/MaskDiff.

PDF Details DOI

AIIM Journal 2024 Journal Article

Modelling-based joint embedding of histology and genomics using canonical correlation analysis for breast cancer survival prediction

Vaishnavi Subramanian
Tanveer Syeda-Mahmood
Minh N. Do

Details DOI

JBHI Journal 2024 Journal Article

Smartphone-Based Digitized Neurological Examination Toolbox for Multi-Test Neurological Abnormality Detection and Documentation

Trung-Hieu Hoang
Christopher Zallek
Minh N. Do

Understanding the efficacy of digital biomarkers in vision-based human motion analysis is essential, not only for interpreting the computer-aided exam results but also for advancing the next generation of digital health tool solutions. This study extensively analyzes digitized neurological examination (DNE) biomarkers for detecting and documenting exam features of Parkinson's disease (PD) and other neurological disorders (OD). Collected over 113 participants, DNE-113, a multi-test DNE database of finger tapping, finger to finger, forearm roll, stand-up and walk, and facial activation tests, covering a broader range of neurological abnormalities beyond PD is first proposed. Subsequently, DNE-113 is integrated into pyDNE - a convenient open-source toolbox, streamlining the creation and assessment of digital biomarkers. This toolbox empowers us to assess the quality of DNE biomarkers across diverse classification tasks. We showcase the discriminative potency of DNE biomarkers, successfully characterizing abnormal signals in neurological patients. Our findings highlight not only the potential use cases but also the persisting challenges in constructing digital biomarkers for computer-aided movement analysis on PD and OD patients.

Details DOI

JBHI Journal 2022 Journal Article

Towards a Comprehensive Solution for a Vision-Based Digitized Neurological Examination

Trung-Hieu Hoang
Mona Zehni
Huaijin Xu
George Heintz
Christopher Zallek
Minh N. Do

The ability to use digitally recorded and quantified neurological exam information is important to help healthcare systems deliver better care, in-person and via telehealth, as they compensate for a growing shortage of neurologists. Current neurological digital biomarker pipelines, however, are narrowed down to a specific neurological exam component or applied for assessing specific conditions. In this paper, we propose an accessible vision-based exam and documentation solution called Digitized Neurological Examination (DNE) to expand exam biomarker recording options and clinical applications using a smartphone/tablet. Through our DNE software, healthcare providers in clinical settings and people at home are enabled to video capture an examination while performing instructed neurological tests, including finger tapping, finger to finger, forearm roll, and stand-up and walk. Our modular design of the DNE software supports integrations of additional tests. The DNE extracts from the recorded examinations the 2D/3D human-body pose and quantifies kinematic and spatio-temporal features. The features are clinically relevant and allow clinicians to document and observe the quantified movements and the changes of these metrics over time. A web server and a user interface for recordings viewing and feature visualizations are available. DNE was evaluated on a collected dataset of 21 subjects containing normal and simulated-impaired movements. The overall accuracy of DNE is demonstrated by classifying the recorded movements using various machine learning models. Our tests show an accuracy beyond 90% for upper-limb tests and 80% for the stand-up and walk tests.

Details DOI

IROS Conference 2019 Conference Paper

Dense 3D Reconstruction for Visual Tunnel Inspection using Unmanned Aerial Vehicle

Ramanpreet Singh Pahwa
Kennard Yanting Chan
Jiamin Bai
Vincensius Billy Saputra
Minh N. Do
Shaohui Foong

Advances in Unmanned Aerial Vehicle (UAV) opens venues for application such as tunnel inspection. Owing to its versatility to fly inside the tunnels, it can quickly identify defects and potential problems related to safety. However, long tunnels, especially with repetitive or uniform structures pose a significant problem for UAV navigation. Furthermore, post-processing visual data from the camera mounted on the UAV is required to generate useful information for the inspection task. In this work, we design a UAV with a single rotating camera to accomplish the task. Compared to other platforms, our solution can fit the stringent requirement for tunnel inspection, in terms of battery life, size and weight. While the current state-of-the-art can estimate camera pose and 3D geometry from a sequence of images, they assume large overlap, small rotational motion, and many distinct matching points between images. These assumptions severely limit their effectiveness in tunnel-like scenarios where the camera has erratic or large rotational motion, such as the one mounted on the UAV. This paper presents a novel solution which exploits Structure-from-Motion, Bundle Adjustment, and available geometry priors to robustly estimate camera pose and automatically reconstruct a fully-dense 3D scene using the least possible number of images in various challenging tunnel-like environments. We validate our system with both Virtual Reality application and experimentation with a real dataset. The results demonstrate that the proposed reconstruction along with texture mapping allows for remote navigation and inspection of tunnel-like environments, even those which are inaccessible for humans.

Details