Arrow Research search

Author name cluster

Jiajia Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
1 author row

Possible papers

12

AAAI Conference 2026 Conference Paper

Palimpsest: Reconciling the CISS Trilemma for Incremental Nuclei Segmentation

  • Jiajia Li
  • Huisi Wu

Adapting computational pathology models to evolving clinical diagnostics via Class-Incremental Semantic Segmentation (CISS) is critical. However, this task imposes a unique CISS Trilemma: a simultaneous failure to preserve the intricate tissue background (stability), distinguish morphologically similar new nuclei (plasticity), and maintain a constant model size (scalability), all under a strict exemplar-free constraint. To resolve this, we introduce Palimpsest, a novel framework that systematically decouples these conflicting demands. Palimpsest integrates three synergistic mechanisms: a Parameter-Conserving Synthesis (PCS) module merges lightweight adapters to ensure scalability; a novel Similarity-Aware Centroid Recalibration (SCR) module executes differentiated recalibration to counteract non-uniform foreground drift, securing plasticity; and an Adaptive Residual Shading (ARS) module performs logit-space decoupling to preserve background integrity, ensuring stability. Extensive experiments on two histopathology datasets demonstrate that Palimpsest significantly outperforms state-of-the-art methods, achieving a superior stability-plasticity balance, particularly in challenging long-term incremental scenarios.

JBHI Journal 2025 Journal Article

MHFNet: A Multimodal Hybrid-Embedding Fusion Network for Automatic Sleep Staging

  • Ruhan Liu
  • Jiajia Li
  • Yang Wen
  • Xian Huang
  • Bin Sheng
  • David Dagan Feng
  • Ping Zhang

Scoring sleep stages is essential for evaluating the status of sleep continuity and comprehending its structure. Despite previous attempts, automating sleep scoring remains challenging. First, most existing works did not fuse local and global temporal information. Second, the correlation for special waves in different signals is rarely used in sleep staging modeling. Third, the logic of scoring rules based on adjacent epochs is not considered in developing sleep staging models. This paper introduces a multimodal hybrid-embedding fusion network (MHFNet), which aims to tackle these challenges in automating sleep stage scoring. MHFNet comprises multi-stream Xception blocks to extract wave characteristics, a hybrid time-embedding module to combine local and global temporal information, a dual-path gate transformer to fuse and enhance attention features, and a refined output header to reconstruct sleep scoring. We perform experiments using three publicly available datasets (SleepEDF-ST, SleepEDF-SC, and SHHS). Experimental results indicate the superiority of MHFNet over baseline approaches in cross-validation. Moreover, at the individual level, MHFNet yielded an average $R^{2}$ score improvement of 9 $\%$ in the testing dataset compared to state-of-the-art models, paving the way for its applications in real-world sleep medicine.

AAAI Conference 2025 Conference Paper

SongSong: A Time Phonograph for Chinese SongCi Music from Thousand of Years Away

  • Jiliang Hu
  • Jiajia Li
  • Ziyi Pan
  • Chong Chen
  • Zuchao Li
  • Ping Wang
  • Lefei Zhang

Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it challenging to produce ancient music with distinct rhythms and styles, such as ancient Chinese SongCi. In this paper, we introduce SongSong, the first music generation model capable of restoring Chinese SongCi to our knowledge. Our model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music. Additionally, to address the lack of ancient music datasets, we create OpenSongSong, a comprehensive dataset of ancient Chinese SongCi music, featuring 29.9 hours of compositions by various renowned SongCi music masters. To assess SongSong's proficiency in performing SongCi, we randomly select 85 SongCi sentences that were not part of the training set for evaluation against SongSong and music generation platforms such as Suno and SkyMusic. The subjective and objective outcomes indicate that our proposed model achieves leading performance in generating high-quality SongCi music.

AAAI Conference 2024 Conference Paper

N-gram Unsupervised Compoundation and Feature Injection for Better Symbolic Music Understanding

  • Jinhao Tian
  • Zuchao Li
  • Jiajia Li
  • Ping Wang

The first step to apply deep learning techniques for symbolic music understanding is to transform musical pieces (mainly in MIDI format) into sequences of predefined tokens like note pitch, note velocity, and chords. Subsequently, the sequences are fed into a neural sequence model to accomplish specific tasks. Music sequences exhibit strong correlations between adjacent elements, making them prime candidates for N-gram techniques from Natural Language Processing (NLP). Consider classical piano music: specific melodies might recur throughout a piece, with subtle variations each time. In this paper, we propose a novel method, NG-Midiformer, for understanding symbolic music sequences that leverages the N-gram approach. Our method involves first processing music pieces into word-like sequences with our proposed unsupervised compoundation, followed by using our N-gram Transformer encoder, which can effectively incorporate N-gram information to enhance the primary encoder part for better understanding of music sequences. The pre-training process on large-scale music datasets enables the model to thoroughly learn the N-gram information contained within music sequences, and subsequently apply this information for making inferences during the fine-tuning stage. Experiment on various datasets demonstrate the effectiveness of our method and achieved state-of-the-art performance on a series of music understanding downstream tasks. The code and model weights will be released at https://github.com/CinqueOrigin/NG-Midiformer.

TIST Journal 2018 Journal Article

Quick Bootstrapping of a Personalized Gaze Model from Real-Use Interactions

  • Michael Xuelin Huang
  • Jiajia Li
  • Grace Ngai
  • Hong Va Leong

Understanding human visual attention is essential for understanding human cognition, which in turn benefits human--computer interaction. Recent work has demonstrated a Personalized, Auto-Calibrating Eye-tracking (PACE) system, which makes it possible to achieve accurate gaze estimation using only an off-the-shelf webcam by identifying and collecting data implicitly from user interaction events. However, this method is constrained by the need for large amounts of well-annotated data. We thus present fast-PACE, an adaptation to PACE that exploits knowledge from existing data from different users to accelerate the learning speed of the personalized model. The result is an adaptive, data-driven approach that continuously “learns” its user and recalibrates, adapts, and improves with additional usage by a user. Experimental evaluations of fast-PACE demonstrate its competitive accuracy in iris localization, validity of alignment identification between gaze and interactions, and effectiveness of gaze transfer. In general, fast-PACE achieves an initial visual error of 3.98 degrees and then steadily improves to 2.52 degrees given incremental interaction-informed data. Our performance is comparable to state-of-the-art, but without the need for explicit training or calibration. Our technique addresses the data quality and quantity problems. It therefore has the potential to enable comprehensive gaze-aware applications in the wild.