Author name cluster

Junhao Dong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception

Kailin Lyu
Long Xiao
Jianing Zeng
Junhao Dong
Xuexin Liu
Zhuojun Zou
Haoyue Yang
Lin Shu

Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance Embedding Regularization(CER) strategy, which significantly improves classification accuracy in fine-grained subcategory material recognition tasks. Experimental results demonstrate that, compared to existing non-visual methods, the proposed TouchFormer framework achieves classification accuracy improvements of 2.48% and 6.83% on SSMC and USMC tasks, respectively. Furthermore, real-world robotic experiments validate TouchFormer's effectiveness in enabling robots to better perceive and interpret their environment, paving the way for its deployment in safety-critical applications such as emergency response and industrial automation.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning

Yifei Zhang
Hao Zhu
Junhao Dong
Haoran Shi
Ziqiao Meng
Piotr Koniusz
Han Yu

Parameter-efficient fine-tuning (PEFT) is essential for adapting large foundation models without excessive storage cost. However, current approaches such as LoRA treat each layer’s adaptation independently, overlooking correlations across layers. This independence causes the number of trainable parameters to grow linearly with model depth. We provide theoretical and empirical evidence that skip connections in transformers create smooth gradient propagation across layers. This smoothness leads to weight adaptations that concentrate most of their energy in low-frequency spectral components, especially along the layer dimension. Empirical analysis confirms this effect, showing that most of adaptation energy lies in low frequencies. Building on this insight, we propose CrossSpectra, which parameterizes all attention-weight adaptations $(Q, K, V)$ across layers as a single 3D tensor and represents them with sparse spectral coefficients ($\kappa_1, \kappa_2$). Using $\kappa_{1}$ non-zero coefficients within each layer’s frequency space and truncating to $\kappa_{2}$ frequencies across layers, CrossSpectra requires $\mathcal{O}(\kappa_{1}\kappa_{2})$ parameters instead of LoRA’s $\mathcal{O}(Lrd)$, where $L$ is the number of layers and $r$ the rank. Across natural-language and vision benchmarks, \methodname{} matches or surpasses baseline performance while using fewer parameters than LoRA, achieving only $0. 36\%$ of LoRA’s parameter count when fine-tuning LLaMA-7B on instruction-following tasks. These results show that exploiting the \textbf{architectural smoothness of transformers} through spectral analysis yields major efficiency gains in PEFT.

PDF Details

NeurIPS Conference 2025 Conference Paper

Machine Unlearning via Task Simplex Arithmetic

Junhao Dong
Hao Zhu
Yifei Zhang
Xinghua Qu
Yew Soon Ong
Piotr Koniusz

As foundation Vision-Language Models (VLMs) unlock fine-tuning on smaller datasets while leveraging large-scale pre-training data, machine unlearning becomes critical in addressing privacy concerns and regulatory compliance. Task vector, representing the difference between parameters of models fine-tuned with and without specific data, is a popular retraining-free unlearning strategy. However, we observe that task vectors exhibit substantial sensitivity to various fine-tuning configurations, resulting in unstable unlearning effectiveness that correlates negatively with the prediction-level variance. While aggregating multiple functions (e. g. , VLM with classifier) whose parameters are represented by different task vectors reduces function variance and improves unlearning, the computational cost of obtaining numerous task vectors and aggregating functions is computationally high. Thus, in order to capture the space of task vectors induced by diverse fine-tuning strategies, we propose modeling it within the convex hull of $(Q-1)$-simplex whose vertices represent $Q$ task vectors. Although a function ensemble can be formed by sampling numerous task vectors from such a simplex, we derive a closed-form ensemble of an infinite number of functions whose parameters are uniformly sampled from the simplex, enabling efficient function-level task vector ensembling with enhanced unlearning performance. Extensive experiments and analyses across diverse datasets and scenarios demonstrate the efficacy of our method.

PDF Details

NeurIPS Conference 2025 Conference Paper

Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models

Junhao Dong
Cong Zhang
Xinghua Qu
Zejun Ma
Piotr Koniusz
Yew Soon Ong

Numerous well-established studies have demonstrated the superhuman capabilities of modern Vision-Language Models (VLMs) across a wide range of tasks. However, growing is the doubt about the continuing availability of reliable high-quality labeling (supervision) from human annotators, leading to stagnation of the model's performance. To address this challenge, ``superalignment'' employs the so-called weak-to-strong generalization paradigm, where the supervision from a weak model can provide generalizable knowledge for a strong model. While effective in aligning knowledge for clean samples between the strong and weak models, the standard weak-to-strong approach typically fails to capture adversarial robustness, exposing strong VLMs to adversarial attacks. This inability to transfer adversarial robustness is because adversarial samples are normally missing in the superalignment stage. To this end, we are the first to propose the weak-to-strong (adversarial) robustness generalization method to elicit zero-shot robustness in large-scale models by an unsupervised scheme, mitigating the unreliable information source for alignment from two perspectives: alignment re-weighting and source guidance refinement. We analyze settings under which robustness generalization is possible. Extensive experiments across various vision-language benchmarks validate the effectiveness of our method in numerous scenarios, demonstrating its plug-and-play applicability to large-scale VLMs.

PDF Details

NeurIPS Conference 2025 Conference Paper

Solving Discrete (Semi) Unbalanced Optimal Transport with Equivalent Transformation Mechanism and KKT-Multiplier Regularization

Weiming Liu
Xinting Liao
Jun Dan
Fan Wang
Hua Yu
Junhao Dong
Shunjie Dong
Lianyong Qi

Semi-Unbalanced Optimal Transport (SemiUOT) shows great promise in matching two probability measures by relaxing one of the marginal constraints. Previous solvers often incorporate an entropy regularization term, which can result in inaccurate matching solutions. To address this issue, we focus on determining the marginal probability distribution of SemiUOT with KL divergence using the proposed Equivalent Transformation Mechanism (ETM) approach. Furthermore, we extend the ETM-based method into exploiting the marginal probability distribution of Unbalanced Optimal Transport (UOT) with KL divergence for validating its generalization. Once the marginal probabilities of UOT/SemiUOT are determined, they can be transformed into a classical Optimal Transport (OT) problem. Moreover, we propose a KKT-Multiplier regularization term combined with Multiplier Regularized Optimal Transport (MROT) to achieve more accurate matching results. We conduct several numerical experiments to demonstrate the effectiveness of our proposed methods in addressing UOT/SemiUOT problems.

PDF Details

EAAI Journal 2024 Journal Article

Boundary-refined prototype generation: A general end-to-end paradigm for semi-supervised semantic segmentation

Junhao Dong
Zhu Meng
Delong Liu
Jiaxuan Liu
Zhicheng Zhao
Fei Su

Semi-supervised semantic segmentation has attracted increasing attention in computer vision, aiming to leverage unlabeled data through latent supervision. To achieve this goal, prototype-based classification has been introduced and achieved lots of success. However, the current approaches isolate prototype generation from the main training framework, presenting a non-end-to-end workflow. Furthermore, most methods directly perform the K-Means clustering on features to generate prototypes, resulting in their proximity to category semantic centers, while overlooking the clear delineation of class boundaries. To address the above problems, we propose a novel end-to-end boundary-refined prototype generation (BRPG) method. Specifically, we perform online clustering on sampled features to incorporate the prototype generation into the whole training framework. In addition, to enhance the classification boundaries, we sample and cluster high- and low-confidence features separately based on confidence estimation, facilitating the generation of prototypes closer to the class boundaries. Moreover, an adaptive prototype optimization strategy is proposed to increase the number of prototypes for categories with scattered feature distributions, which further refines the class boundaries. Extensive experiments demonstrate the remarkable robustness and scalability of our method across diverse datasets, segmentation networks, and semi-supervised frameworks, outperforming the state-of-the-art approaches on three benchmark datasets: PASCAL VOC 2012, Cityscapes and MS COCO. The code is available at https: //github. com/djh-dzxw/BRPG.

Details DOI

JBHI Journal 2024 Journal Article

NuSEA: Nuclei Segmentation With Ellipse Annotations

Zhu Meng
Junhao Dong
Binyu Zhang
Shichao Li
Ruixiao Wu
Fei Su
Guangxi Wang
Limei Guo

Objective: Nuclei segmentation is a crucial pre-task for pathological microenvironment quantification. However, the acquisition of manually precise nuclei annotations for improving the performance of deep learning models is time-consuming and expensive. Methods: In this paper, an efficient nuclear annotation tool called NuSEA is proposed to achieve accurate nucleus segmentation, where a simple but effective ellipse annotation is applied. Specifically, the core network U-Light of NuSEA is lightweight with only 0. 86 M parameters, which is suitable for real-time nuclei segmentation. In addition, an Elliptical Field Loss and a Texture Loss are proposed to enhance the edge segmentation and constrain the smoothness simultaneously. Results: Extensive experiments on three public datasets (MoNuSeg, CPM-17, and CoNSeP) demonstrate that NuSEA is superior to the state-of-the-art (SOTA) methods and better than existing algorithms based on point, rectangle, and text annotations. Conclusions: With the assistance of NuSEA, a new dataset called NuSEA-dataset v1. 0, encompassing 118, 857 annotated nuclei from the whole-slide images of 12 organs is released. Significance: NuSEA provides a rapid and effective annotation tool for nuclei in histopathological images, benefiting future explorations in deep learning algorithms.

Details DOI