Arrow Research search

Author name cluster

Shanshan Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

DeFT-LoRA: Decoupled and Fused Tuning with LoRA Experts for Universal Cross-Domain Retrieval

  • Ke Xu
  • Xiaozheng Shen
  • Shanshan Wang
  • Mengzhu Wang
  • Xun Yang

Universal Cross-Domain Retrieval (UCDR) aims to retrieve images across unseen domains and categories, a critical capability for real-world applications. While large-scale Vision-Language Models (VLMs) like CLIP offer strong zero-shot category generalization, they struggle with domain shifts. Existing methods often improve domain robustness at the cost of high computational overhead or by compromising the VLM's inherent knowledge. To address this, we propose Decoupled and Fused Tuning with LoRA (DeFT-LoRA), a novel and parameter-efficient framework that integrates Low-Rank Adaptation (LoRA) with a Mixture-of-Experts (MoE) mechanism. This approach resolves the intrinsic conflict between domain-invariant and domain-specific knowledge in a single adapter, enabling our model to construct a domain adapters for each input image. We propose a three-stage training strategy, which first learns a shared Base LoRA for domain-invariant features, then derives Domain-Specific Experts to capture specific styles, and finally fuses them dynamically with a lightweight gating network. Extensive experiments on three UCDR benchmarks demonstrate that DeFT-LoRA achieves comparable or superior performance to state-of-the-art methods while requiring only 1.46 percent of CLIP's image-encoder parameters and reducing computational overhead, thereby establishing an exceptional balance between accuracy and efficiency.

AAAI Conference 2026 Conference Paper

Learning to Cluster Rare Cell Types: Implicit Semantic Data Augmentation for Spatial Multi-modal Omics Analysis

  • Daixian Liu
  • Hau-Sing So
  • Haoran Chen
  • Jiao Li
  • Shanshan Wang
  • Mengzhu Wang
  • Jingcai Guo

Spatial multi-modal omics technologies have transformed biological research by enabling the simultaneous profiling of gene expression, protein abundance, and chromatin accessibility within their native spatial contexts. Despite these advances, accurately clustering rare cell types remains a major challenge due to data sparsity, high dimensionality, and limited annotated samples. While Graph Neural Networks (GNNs) have shown potential in modeling spatial omics data, their effectiveness is often constrained by the use of fixed K-nearest neighbor (KNN) graph structures, which fail to capture latent semantic relationships masked by sequencing noise. To overcome these limitations, we propose CRCT (Clustering Rare Cell Types): a novel framework that combines Implicit Semantic Data Augmentation (ISDA) with adaptive graph learning for spatial multi-modal omics analysis. Unlike traditional augmentation strategies that generate explicit synthetic samples, CRCT operates in the deep feature space by dynamically estimating intra-class covariance matrices and implicitly perturbing features along semantically meaningful directions. This enables effective augmentation for rare cell populations while preserving biological fidelity. Extensive experiments across four real-world datasets (HLN, MB, Stereo‑CITE‑seq, and SPOTS) and one synthetic benchmark demonstrate the state-of-the-art performance of CRCT, achieving improvements of up to +1.7 NMI and +7.8 ARI over strong baseline methods.

TIST Journal 2026 Journal Article

Personalized Forgetting Mechanism with Concept-Driven Knowledge Tracing

  • Shanshan Wang
  • Ying Hu
  • Qianru Li
  • Xun Yang
  • Zhongzhou Zhang
  • Keyang Wang
  • Xingyi Zhang

Knowledge Tracing (KT) aims to trace changes in students’ knowledge states throughout their entire learning process by analyzing their historical learning data and predicting their future learning performance. Existing forgetting curve theory based knowledge tracing models only consider the general forgetting caused by time intervals, ignoring the specific influences in KT task. Firstly, the discriminative information in forgetting curve is personalized due to the difference of students. Secondly, the relationship between knowledge concepts could contribute to the generalized features in the forgetting process. Considering these two aspects, we propose a C oncept-driven P ersonalized F orgetting knowledge tracing model (CPF) which integrates the relationships between knowledge concepts and the personalization of students in cognitive abilities. First, personalized cognitive abilities are integrated into the learning and forgetting processes. Individual cognitive differences are modeled to dynamically adjust learning gains and forgetting rates based on students’ knowledge mastery and learning strategies, which enables a more personalized learning experience. Second, the hierarchical relationships among knowledge concepts are considered by designing a precursor-successor knowledge concept matrix. In this way, the potential impact of forgetting prior knowledge concepts on subsequent ones is also integrated in KT task. Furthermore, the proposed personalized forgetting mechanism not only could be applied into the learning of specific knowledge concepts but also in the forgetting-review mechanism of life-long learning process. Extensive experimental results on several public datasets show that our CPF outperforms current forgetting curve theory based methods in predicting student performance, demonstrating CPF can better simulate changes in students’ knowledge state through the personalized forgetting mechanism. Our code is publicly available at https://github.com/lqr-1169/CPF.

ICRA Conference 2025 Conference Paper

Automated Video Object Detection of Motile Cells Under Microscopy

  • Haocong Song
  • Wenyuan Chen
  • Guanqiao Shan
  • Chen Sun 0015
  • Bingqing Wan
  • Changsheng Dai
  • Hang Liu 0004
  • Shanshan Wang

Video object detection (VOD) of motile cells (e. g. , bacteria and sperm) under microscopy is challenging due to motion blur, sporadic out-of-focus, and pose variations. Compared with VOD in generic scenes, the lower contrast and smaller color space of microscopy imaging further introduce feature overlap between the foreground objects and the background objects (e. g. , impurity cells and contaminants). Transformer-based methods have achieved great success in the VOD of generic scenes by utilizing object queries to model the inner-frame objects and the inter-frame objects. However, the appearance overlap problem in microscopy video frames significantly compromises the inter-frame query aggregation by introducing background features into the object query. To tackle this challenge, this paper reports a static-dynamic query-based VOD network that treats object queries of the current video frame and reference video frames differently. Specifically, a two-stage framework is implemented that first generates high-quality object queries of reference frames with a static Transformer decoder pre-trained on a still image dataset. The network is then trained on a per-frame annotated dataset using a dynamic Transformer decoder to model the object queries of the current frame. A Reference Query Relation Module is further proposed to enhance the reference queries for more effective aggregation with the current query. Experiments on clinically collected biopsied sperm datasets validated the effectiveness of the proposed method.

IJCAI Conference 2025 Conference Paper

ESBN: Estimation Shift of Batch Normalization for Source-free Universal Domain Adaptation

  • Jiao Li
  • Houcheng Su
  • Bingli Wang
  • Yuandong Min
  • Mengzhu Wang
  • Nan Yin
  • Shanshan Wang
  • Jingcai Guo

Domain adaptation (DA) is crucial for transferring models trained in one domain to perform well in a different, often unseen domain. Traditional methods, including unsupervised domain adaptation (UDA) and source-free domain adaptation (SFDA), have made significant progress. However, most existing DA methods rely heavily on Batch Normalization (BN) layers, which are not optimal in source-free settings, where the source domain is unavailable for comparison. In this study, we propose a novel method, ESBN, which addresses the challenge of domain shift by adjusting the placement of normalization layers and replacing BN with Batch-free Normalization (BFN). Unlike BN, BFN is less dependent on batch statistics and provides more robust feature representations through instance-specific statistics. We systematically investigate the effects of different BN layer placements across various network configurations and demonstrate that selective replacement with BFN improves generalization performance. Extensive experiments on multiple domain adaptation benchmarks show that our approach outperforms state-of-the-art methods, particularly in challenging scenarios such as Open-Partial Domain Adaptation (OPDA).

JBHI Journal 2025 Journal Article

Medical Vision-Language Modeling With Semantic Interaction and Adaptive Refinement Prompting for Bias Mitigation

  • Cheng Li
  • Weijian Huang
  • Hao Yang
  • Jiarun Liu
  • Yong Liang
  • Shanshan Wang

Vision-Language Models (VLMs) have demonstrated impressive capabilities across various medical tasks, including report generation and visual question answering (VQA). However, pixel-level tasks such as image segmentation remain relatively underexplored, despite their critical importance for clinical decision-making, surgical planning, and model interpretability. Moreover, the scarcity of high-quality segmentation annotations in the medical domain often leads to biased data distributions, characterized by imbalances in disease types, anatomical coverage, and image quality. These biases are frequently overlooked during both model development and evaluation, limiting the robustness and real-world applicability of VLMs in healthcare scenarios. In this study, we propose a unified medical vision-language model applicable for a variety of clinical tasks, including report generation, VQA, and pixel-level image segmentation. Within the model, we propose a semantic interaction mechanism aimed at enhancing pixel-level vision and language representation learning. To mitigate the impact of biased data distributions, we explicitly develop an adaptive refinement prompting method involving the iterative re-prompting of hard samples. The proposed method is thoroughly validated through experiments on eight datasets and comparisons with nine state-of-the-art methods. The experimental results indicate that our model achieves superior performance in both medical VQA and segmentation tasks. These results highlight the potential of our approach in advancing the deployment of medical VLMs in real-world clinical applications. Code will be released at: https://github.com/SZUHvern/Unified-Medical-Vision-Language-Modeling

AAAI Conference 2024 Conference Paper

Boosting Neural Cognitive Diagnosis with Student’s Affective State Modeling

  • Shanshan Wang
  • Zhen Zeng
  • Xun Yang
  • Ke Xu
  • Xingyi Zhang

Cognitive Diagnosis Modeling aims to infer students' proficiency level on knowledge concepts from their response logs. Existing methods typically model students’ response processes as the interaction between students and exercises or concepts based on hand-crafted or deeply-learned interaction functions. Despite their promising achievements, they fail to consider the relationship between students' cognitive states and affective states in learning, e.g., the feelings of frustration, boredom, or confusion with the learning content, which is insufficient for comprehensive cognitive diagnosis in intelligent education. To fill the research gap, we propose a novel Affect-aware Cognitive Diagnosis (ACD) model which can effectively diagnose the knowledge proficiency levels of students by taking into consideration the affective factors. Specifically, we first design a student affect perception module under the assumption that the affective state is jointly influenced by the student's affect trait and the difficulty of the exercise. Then, our inferred affective distribution is further used to estimate the student's subjective factors, i.e., guessing and slipping, respectively. Finally, we integrate the estimated guessing and slipping parameters with the basic neural cognitive diagnosis framework based on the DINA model, which facilitates the modeling of complex exercising interactions in a more accurate and interpretable fashion. Besides, we also extend our affect perception module in an unsupervised learning setting based on contrastive learning, thus significantly improving the compatibility of our ACD. To the best of our knowledge, we are the first to unify the cognition modeling and affect modeling into the same framework for student cognitive diagnosis. Extensive experiments on real-world datasets clearly demonstrate the effectiveness of our ACD. Our code is available at https://github.com/zeng-zhen/ACD.

AAAI Conference 2024 Conference Paper

PTMQ: Post-training Multi-Bit Quantization of Neural Networks

  • Ke Xu
  • Zhongcheng Li
  • Shanshan Wang
  • Xingyi Zhang

The ability of model quantization with arbitrary bit-width to dynamically meet diverse bit-width requirements during runtime has attracted significant attention. Recent research has focused on optimizing large-scale training methods to achieve robust bit-width adaptation, which is a time-consuming process requiring hundreds of GPU hours. Furthermore, converting bit-widths requires recalculating statistical parameters of the norm layers, thereby impeding real-time switching of the bit-width. To overcome these challenges, we propose an efficient Post-Training Multi-bit Quantization (PTMQ) scheme that requires only a small amount of calibration data to perform block-wise reconstruction of multi-bit quantization errors. It eliminates the influence of statistical parameters by fusing norm layers, and supports real-time switching bit-widths in uniform quantization and mixed-precision quantization. To improve quantization accuracy and robustness, we propose a Multi-bit Feature Mixer technique (MFM) for fusing features of different bit-widths to enhance robustness across varying bit-widths. Moreover, we introduced the Group-wise Distillation Loss (GD-Loss) to enhance the correlation between different bit-width groups and further improve the overall performance of PTMQ. Extensive experiments demonstrate that PTMQ achieves comparable performance to existing state-of-the-art post-training quantization methods, while optimizing it speeds up by 100$\times$ compared to recent multi-bit quantization works. Code can be available at https://github.com/xuke225/PTMQ.

AAAI Conference 2023 Conference Paper

Self-Supervised Graph Learning for Long-Tailed Cognitive Diagnosis

  • Shanshan Wang
  • Zhen Zeng
  • Xun Yang
  • Xingyi Zhang

Cognitive diagnosis is a fundamental yet critical research task in the field of intelligent education, which aims to discover the proficiency level of different students on specific knowledge concepts. Despite the effectiveness of existing efforts, previous methods always considered the mastery level on the whole students, so they still suffer from the Long Tail Effect. A large number of students who have sparse interaction records are usually wrongly diagnosed during inference. To relieve the situation, we proposed a Self-supervised Cognitive Diagnosis (SCD) framework which leverages the self-supervised manner to assist the graph-based cognitive diagnosis, then the performance on those students with sparse data can be improved. Specifically, we came up with a graph confusion method that drops edges under some special rules to generate different sparse views of the graph. By maximizing the cross-view consistency of node representations, our model could pay more attention on long-tailed students. Additionally, we proposed an importance-based view generation rule to improve the influence of long-tailed students. Extensive experiments on real-world datasets show the effectiveness of our approach, especially on the students with much sparser interaction records. Our code is available at https://github.com/zeng-zhen/SCD.

JBHI Journal 2023 Journal Article

Variable Augmented Network for Invertible Modality Synthesis and Fusion

  • Yuhao Wang
  • Ruirui Liu
  • Zihao Li
  • Shanshan Wang
  • Cailian Yang
  • Qiegen Liu

As an effective way to integrate the information contained in multiple medical images under different modalities, medical image synthesis and fusion have emerged in various clinical applications such as disease diagnosis and treatment planning. In this paper, an invertible and variable augmented network (iVAN) is proposed for medical image synthesis and fusion. In iVAN, the channel number of the network input and output is the same through variable augmentation technology, and data relevance is enhanced, which is conducive to the generation of characterization information. Meanwhile, the invertible network is used to achieve the bidirectional inference processes. Empowered by the invertible and variable augmentation schemes, iVAN not only be applied to the mappings of multi-input to one-output and multi-input to multi-output, but also to the case of one-input to multi-output. Experimental results demonstrated superior performance and potential task flexibility of the proposed method, compared with existing synthesis and fusion methods.

IJCAI Conference 2020 Conference Paper

Self-adaptive Re-weighted Adversarial Domain Adaptation

  • Shanshan Wang
  • Lei Zhang

Existing adversarial domain adaptation methods mainly consider the marginal distribution and these methods may lead to either under transfer or negative transfer. To address this problem, we present a self-adaptive re-weighted adversarial domain adaptation approach, which tries to enhance domain alignment from the perspective of conditional distribution. In order to promote positive transfer and combat negative transfer, we reduce the weight of the adversarial loss for aligned features while increasing the adversarial force for those poorly aligned measured by the conditional entropy. Additionally, triplet loss leveraging source samples and pseudo-labeled target samples is employed on the confusing domain. Such metric loss ensures the distance of the intra-class sample pairs closer than the inter-class pairs to achieve the class-level alignment. In this way, the high accurate pseudolabeled target samples and semantic alignment can be captured simultaneously in the co-training process. Our method achieved low joint error of the ideal source and target hypothesis. The expected target error can then be upper bounded following Ben-David’s theorem. Empirical evidence demonstrates that the proposed model outperforms state of the arts on standard domain adaptation datasets.