Author name cluster

Ran Ran

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

DWTSG: Parameter-Efficient Fine-Tuning of Large Pre-trained Models via Discrete Wavelet Transform and Subband Guidance

Chengwei Sun
Jiwei Wei
Shiyuan He
Zeyu Ma
Yuyang Zhou
Ran Ran
Jie Zou
Yang Yang

Fully fine-tuning large pre-trained models for each downstream task is impractical due to prohibitive memory, computation, and storage costs. Although parameter-efficient fine-tuning (PEFT) methods address this issue, leading methods like LoRA still exhibit linear scaling of trainable parameters with hidden size. Recent studies have explored PEFT in the frequency domain to reduce computational costs by employing fast Fourier transform and discrete cosine transform with sparse frequency selection. These methods rely on global frequency representations that lack spatial locality and disperse energy across the domain. As a result, sparse coefficient selection struggles to preserve fine-grained structural information and often introduces artifacts such as ringing near boundaries. To address these limitations, we propose DWTSG, a novel PEFT framework based on discrete wavelet transform (DWT) and subband guidance. DWTSG decomposes pre-trained weights into four wavelet subbands that jointly encode global context and local details. It fine-tunes only the most informative coefficients in each subband through an energy-based selection strategy that prioritizes coefficients based on their individual importance and interactions. Finally, inverse DWT reconstructs the updated weights, enabling efficient and precise adaptation. Extensive experiments on natural language understanding, commonsense reasoning, and image classification demonstrate that DWTSG outperforms existing PEFT methods, achieving superior performance and higher parameter efficiency.

PDF Details DOI

AAAI Conference 2026 Conference Paper

NODiff: Neural Operator Diffusion for Multispectral Image Fusion

Junming Hou
Ran Ran
Sixing Chen
Zihao Chen
Xiaofeng Cong
Junling Li
Liang-Jian Deng

Pansharpening is a powerful technique for generating high-resolution multispectral (HRMS) images by fusing currently available image pairs of low-resolution multispectral (LRMS) and texture-rich panchromatic (PAN) data, effectively addressing the physical constraints of satellite sensors. While recent generative diffusion models have demonstrated impressive performance gains in this domain, their prohibitive computational demands and training costs hinder practicality in resource-constrained remote sensing satellite systems. In this work, we propose NODiff, a novel diffusion framework that replaces the conventional attention-based denoising backbone with a neural operator, seamlessly integrating operator learning and generative modeling into an efficient yet effective solution for pansharpening. In practice, we implement our approach through a two-stage learning paradigm: First, we pretrain the proposed Neural Operator-based diffusion model to learn the high-resolution texture priors essential for pansharpening. Afterward, we freeze the pretrained parameters, and design a lightweight conditional detail guidance adapter to enable efficient fine-tuning for generating desired HRMS images. Meanwhile, a time-aware low-rank adaptation is introduced to dynamically refine high-frequency details potentially affected by spectral mode truncation. Extensive experiments on multiple benchmark datasets demonstrate that NODiff achieves competitive pansharpening performance while significantly reducing training and inference costs. Beyond pansharpening, our method provides new insights into building resource-efficient generative models.

PDF Details DOI

AAAI Conference 2025 Conference Paper

CDTR: Semantic Alignment for Video Moment Retrieval Using Concept Decomposition Transformer

Ran Ran
Jiwei Wei
Xiangyi Cai
Xiang Guan
Jie Zou
Yang Yang
Heng Tao Shen

Video Moment Retrieval (VMR) involves locating specific moments within a video based on natural language queries. However, existing VMR methods that employ various strategies for cross-modal alignment still face challenges such as limited understanding of fine-grained semantics, semantic overlap, and sparse constraints. To address these limitations, we propose a novel Concept Decomposition Transformer (CDTR) model for VMR. CDTR introduces a semantic concept decomposition module that disentangles video moments and sentence queries into concept representations, reflecting the relevance between various concepts and capturing fine-grained semantics which is crucial for cross-modal matching. These decomposed concept representations are then used as pseudo-labels, determined as positive or negative samples by adaptive concept-specific thresholds. Subsequently, fine-grained concept alignment is performed in video intra-modal and textual-visual cross-modal, aligning different conceptual components within features, enhancing the model's ability to distinguish fine-grained semantics, and alleviating issues related to semantic overlap and sparse constraints. Comprehensive experiments demonstrate the effectiveness of the CDTR, outperforming state-of-the-art methods on three widely used datasets: QVHighlights, Charades-STA, and TACoS.

PDF Details DOI

JBHI Journal 2024 Journal Article

An Implicit-Explicit Prototypical Alignment Framework for Semi-Supervised Medical Image Segmentation

Chunna Tian
Zhenxi Zhang
Xinbo Gao
Heng Zhou
Ran Ran
Zhicheng Jiao

Semi-supervised learning methods have been explored to mitigate the scarcity of pixel-level annotation in medical image segmentation tasks. Consistency learning, serving as a mainstream method in semi-supervised training, suffers from low efficiency and poor stability due to inaccurate supervision and insufficient feature representation. Prototypical learning is one potential and plausible way to handle this problem due to the nature of feature aggregation in prototype calculation. However, the previous works have not fully studied how to enhance the supervision quality and feature representation using prototypical learning under the semi-supervised condition. To address this issue, we propose an implicit-explicit alignment (IEPAlign) framework to foster semi-supervised consistency training. In specific, we develop an implicit prototype alignment method based on dynamic multiple prototypes on-the-fly. And then, we design a multiple prediction voting strategy for reliable unlabeled mask generation and prototype calculation to improve the supervision quality. Afterward, to boost the intra-class consistency and inter-class separability of pixel-wise features in semi-supervised segmentation, we construct a region-aware hierarchical prototype alignment, which transmits information from labeled to unlabeled and from certain regions to uncertain regions. We evaluate IEPAlign on three medical image segmentation tasks. The extensive experimental results demonstrate that the proposed method outperforms other popular semi-supervised segmentation methods and achieves comparable performance with fully-supervised training methods.

Details DOI

IJCAI Conference 2023 Conference Paper

Bidirectional Dilation Transformer for Multispectral and Hyperspectral Image Fusion

Shangqi Deng
Liang-Jian Deng
Xiao Wu
Ran Ran
Rui Wen

Transformer-based methods have proven to be effective in achieving long-distance modeling, capturing the spatial and spectral information, and exhibiting strong inductive bias in various computer vision tasks. Generally, the Transformer model includes two common modes of multi-head self-attention (MSA): spatial MSA (Spa-MSA) and spectral MSA (Spe-MSA). However, Spa-MSA is computationally efficient but limits the global spatial response within a local window. On the other hand, Spe-MSA can calculate channel self-attention to accommodate high-resolution images, but it disregards the crucial local information that is essential for low-level vision tasks. In this study, we propose a bidirectional dilation Transformer (BDT) for multispectral and hyperspectral image fusion (MHIF), which aims to leverage the advantages of both MSA and the latent multiscale information specific to MHIF tasks. The BDT consists of two designed modules: the dilation Spa-MSA (D-Spa), which dynamically expands the spatial receptive field through a given hollow strategy, and the grouped Spe-MSA (G-Spe), which extracts latent features within the feature map and learns local data behavior. Additionally, to fully exploit the multiscale information from both inputs with different spatial resolutions, we employ a bidirectional hierarchy strategy in the BDT, resulting in improved performance. Finally, extensive experiments on two commonly used datasets, CAVE and Harvard, demonstrate the superiority of BDT both visually and quantitatively. Furthermore, the related code will be available at the GitHub page of the authors.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

LGPConv: Learnable Gaussian Perturbation Convolution for Lightweight Pansharpening

Chen-Yu Zhao
Tian-Jing Zhang
Ran Ran
Zhi-Xuan Chen
Liang-Jian Deng

Pansharpening is a crucial and challenging task that aims to obtain a high spatial resolution image by merging a multispectral (MS) image and a panchromatic (PAN) image. Current methods use CNNs with standard convolution, but we've observed strong correlation among channel dimensions in the kernel, leading to computational burden and redundancy. To address this, we propose Learnable Gaussian Perturbation Convolution (LGPConv), surpassing standard convolution. LGPConv leverages two properties of standard convolution kernels: 1) correlations within channels, learning a premier kernel as a base to reduce parameters and training difficulties caused by redundancy; 2) introducing Gaussian noise perturbations to simulate randomness and enhance nonlinear representation within channels. We incorporate LGPConv into a well-designed pansharpening network and demonstrate its superiority through extensive experiments, achieving state-of-the-art performance with minimal parameters (27K). Code is available on the GitHub page of the authors.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference

Hongwu Peng
Ran Ran
Yukui Luo
Jiahui Zhao
Shaoyi Huang
Kiran Thorat
Tong Geng
Chenghong Wang

The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14. 2× latency speedup relative to CryptoGCN, while preserving an inference accuracy of ~75\% and notably reducing multiplication depth. Additionally, LinGCN proves scalable for larger models, delivering a substantial 85. 78\% accuracy with 6371s latency, a 10. 47\% accuracy improvement over CryptoGCN.

PDF Details

NeurIPS Conference 2023 Conference Paper

Penguin: Parallel-Packed Homomorphic Encryption for Fast Graph Convolutional Network Inference

Ran Ran
Nuo Xu
Tao Liu
Wei Wang
Gang Quan
Wujie Wen

The marriage of Graph Convolutional Network (GCN) and Homomorphic Encryption (HE) enables the inference of graph data on the cloud with significantly enhanced client data privacy. However, the tremendous computation and memory overhead associated with HE operations challenges the practicality of HE-based GCN inference. GCN inference involves a sequence of expensive matrix-matrix multiplications, and we observe that directly applying the state-of-the-art HE-based secure matrix-matrix multiplication solutions to accelerate HE-GCN inference is far less efficient as it does not exploit the unique aggregation mechanism of two-dimension graph node-features in GCN layer computation. As a result, in this paper, we propose a novel HE-based ciphertext packing technique, i. e. , Penguin, that can take advantage of the unique computation pattern during the HE-GCN inference to significantly reduce the computation and memory overhead associated with HE operations. Specifically, Penguin employs (i) an effective two-dimension parallel packing technique for feature ciphertext with optimal graph node partitioning and graph feature interleaving, and (ii) an interleaved assembly technique that can effectively make use of the blank slots to merge ciphertexts after feature reduction and significantly reduce the costly rotation operation. We provide theoretical analysis and experimental validation to demonstrate the speedup achieved by Penguin in accelerating GCN inference using popular GCN models and datasets. Our results show that Penguin can achieve up to $\sim10\times$ speedup and around $\sim79$% reduction in computational memory overhead, significantly outperforming state-of-the-art solutions. To the best of our knowledge, this is the first work that can ensure the protection of both graph structure and features when accelerating HE-GCN inference on encrypted data. Our code is publicly available at https: //github. com/ranran0523/Penguin.

PDF Details

ICML Conference 2023 Conference Paper

SpENCNN: Orchestrating Encoding and Sparsity for Fast Homomorphically Encrypted Neural Network Inference

Ran Ran
Xinwei Luo
Wei Wang
Tao Liu 0023
Gang Quan
Xiaolin Xu 0001
Caiwen Ding
Wujie Wen

Homomorphic Encryption (HE) is a promising technology to protect clients’ data privacy for Machine Learning as a Service (MLaaS) on public clouds. However, HE operations can be orders of magnitude slower than their counterparts for plaintexts and thus result in prohibitively high inference latency, seriously hindering the practicality of HE. In this paper, we propose a HE-based fast neural network (NN) inference framework–SpENCNN built upon the co-design of HE operation-aware model sparsity and the single-instruction-multiple-data (SIMD)-friendly data packing, to improve NN inference latency. In particular, we first develop an encryption-aware HE-group convolution technique that can partition channels among different groups based on the data size and ciphertext size, and then encode them into the same ciphertext by novel group-interleaved encoding, so as to dramatically reduce the number of bottlenecked operations in HE convolution. We further tailor a HE-friendly sub-block weight pruning to reduce the costly HE-based convolution operation. Our experiments show that SpENCNN can achieve overall speedups of 8. 37$\times$, 12. 11$\times$, 19. 26$\times$, and 1. 87$\times$ for LeNet, VGG-5, HEFNet, and ResNet-20 respectively, with negligible accuracy loss. Our code is publicly available at https: //github. com/ranran0523/SPECNN.

Details

NeurIPS Conference 2022 Conference Paper

CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference

Ran Ran
Wei Wang
Quan Gang
Jieming Yin
Nuo Xu
Wujie Wen

Recently cloud-based graph convolutional network (GCN) has demonstrated great success and potential in many privacy-sensitive applications such as personal healthcare and financial systems. Despite its high inference accuracy and performance on the cloud, maintaining data privacy in GCN inference, which is of paramount importance to these practical applications, remains largely unexplored. In this paper, we take an initial attempt towards this and develop CryptoGCN--a homomorphic encryption (HE) based GCN inference framework. A key to the success of our approach is to reduce the tremendous computational overhead for HE operations, which can be orders of magnitude higher than its counterparts in the plaintext space. To this end, we develop a solution that can effectively take advantage of the sparsity of matrix operations in GCN inference to significantly reduce the encrypted computational overhead. Specifically, we propose a novel Adjacency Matrix-Aware (AMA) data formatting method along with the AMA assisted patterned sparse matrix partitioning, to exploit the complex graph structure and perform efficient matrix-matrix multiplication in HE computation. In this way, the number of HE operations can be significantly reduced. We also develop a co-optimization framework that can explore the trade-offs among the accuracy, security level, and computational overhead by judicious pruning and polynomial approximation of activation modules in GCNs. Based on the NTU-XVIEW skeleton joint dataset, i. e. , the largest dataset evaluated homomorphically by far as we are aware of, our experimental results demonstrate that CryptoGCN outperforms state-of-the-art solutions in terms of the latency and number of homomorphic operations, i. e. , achieving as much as a 3. 10$\times$ speedup on latency and reduces the total Homomorphic Operation Count (HOC) by 77. 4\% with a small accuracy loss of 1-1. 5$\%$. Our code is publicly available at https: //github. com/ranran0523/CryptoGCN.

PDF Details