Arrow Research search

Author name cluster

Yi Shi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 Conference Paper

Enhancing Meme Emotion Understanding with Multi-Level Modality Enhancement and Dual-Stage Modal Fusion

  • Yi Shi
  • Wenlong Meng
  • Zhenyuan Guo
  • Chengkun Wei
  • Wenzhi Chen

With the rapid rise of social media and Internet culture, memes have become a popular medium for expressing emotional tendencies. This has sparked growing interest in Meme Emotion Understanding (MEU), which aims to classify the emotional intent behind memes by leveraging their multimodal contents. While existing efforts have achieved promising results, two major challenges remain: (1) a lack of fine-grained multimodal fusion strategies, and (2) insufficient mining of memes' implicit meanings and background knowledge. To address these challenges, we propose MemoDetector, a novel framework for advancing MEU. First, we introduce a four-step textual enhancement module that utilizes the rich knowledge and reasoning capabilities of Multimodal Large Language Models (MLLMs) to progressively infer and extract implicit and contextual insights from memes. These enhanced texts significantly enrich the original meme contents and provide valuable guidance for downstream classification. Next, we design a dual-stage modal fusion strategy: the first stage performs shallow fusion on raw meme image and text, while the second stage deeply integrates the enhanced visual and textual features. This hierarchical fusion enables the model to better capture nuanced cross-modal emotional cues. Experiments on two datasets, MET-MEME and MOOD, demonstrate that our method consistently outperforms state-of-the-art baselines. Specifically, MemoDetector improves F1 scores by 4.3% on MET-MEME and 3.4% on MOOD. Further ablation studies and in-depth analyses validate the effectiveness and robustness of our approach, highlighting its strong potential for advancing MEU.

NeurIPS Conference 2025 Conference Paper

Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning

  • Wenchang Duan
  • Yaoliang Yu
  • Jiwan He
  • Yi Shi

Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).

JBHI Journal 2025 Journal Article

Agnostic-Specific Modality Learning for Cancer Survival Prediction From Multiple Data

  • Honglei Liu
  • Yi Shi
  • Ying Xu
  • Ao Li
  • Minghui Wang

Cancer is a pressing public health problem and one of the main causes of mortality worldwide. The development of advanced computational methods for predicting cancer survival is pivotal in aiding clinicians to formulate effective treatment strategies and improve patient quality of life. Recent advances in survival prediction methods show that integrating diverse information from various cancer-related data, such as pathological images and genomics, is crucial for improving prediction accuracy. Despite promising results of existing approaches, there are great challenges of modality gap and semantic redundancy presented in multiple cancer data, which could hinder the comprehensive integration and pose substantial obstacles to further enhancing cancer survival prediction. In this study, we propose a novel agnostic-specific modality learning (ASML) framework for accurate cancer survival prediction. To bridge the modality gap and provide a comprehensive view of distinct data modalities, we employ an agnostic-specific learning strategy to learn the commonality across modalities and the uniqueness of each modality. Moreover, a cross-modal fusion network is exerted to integrate multimodal information by modeling modality correlations and diminish semantic redundancy in a divide-and-conquer manner. Extensive experiment results on three TCGA datasets demonstrate that ASML reaches better performance than other existing cancer survival prediction methods for multiple data.

JBHI Journal 2025 Journal Article

DRLSurv: Disentangled Representation Learning for Cancer Survival Prediction by Mining Multimodal Consistency and Complementarity

  • Ying Xu
  • Yi Shi
  • Honglei Liu
  • Ao Li
  • Anli Zhang
  • Minghui Wang

Accurate cancer survival prediction is crucial in devising optimal treatment plans and offering individualized care to improve clinical outcomes. Recent researches confirm that integrating heterogenous cancer data such as histopathological images and genomic data, can enhance our understanding of cancer progression and provides a multimodal perspective on patient survival chances. However, existing methods often over-look the fundamental aspects of multimodal data, i. e. , consistency and complementarity, which in consequence significantly hinder advancements in cancer survival prediction. To address this issue, we represent DRLSurv, a novel multimodal deep learning method that leverages disentangled representation learning for precise cancer survival prediction. Through dedicated deep encoding networks, DRLSurv decomposes each modality into modality-invariant and modality-specific representations, which are mapped to common and unique feature subspaces for simultaneously mining the distinct aspects of cancer multimodal data. Moreover, our method innovatively introduces a subspace-based proximity contrastive loss and re-disentanglement loss, thus ensuring the successful decomposition of consistent and complementary information while maintaining the multimodal fidelity during the learning of disentangled representations. Both quantitative analyses and visual assessments on different datasets validate the superiority of DRLSurv over existing survival prediction approaches, demonstrating its powerful capability to exploit enriched survival-related information from cancer multimodal data. Therefore, DRLSurv not only offers a unified and comprehensive deep learning framework for advancing multimodal survival predictions, but also provides valuable insights for cancer prognosis and survival analysis.

JBHI Journal 2025 Journal Article

MIF: Multi-Shot Interactive Fusion Model for Cancer Survival Prediction Using Pathological Image and Genomic Data

  • Yi Shi
  • Minghui Wang
  • Honglei Liu
  • Fang Zhao
  • Ao Li
  • Xun Chen

Accurate cancer survival prediction is crucial for oncologists to determine therapeutic plan, which directly influences the treatment efficacy and survival outcome of patient. Recently, multimodal fusion-based prognostic methods have demonstrated effectiveness for survival prediction by fusing diverse cancer-related data from different medical modalities, e. g. , pathological images and genomic data. However, these works still face significant challenges. First, most approaches attempt multimodal fusion by simple one-shot fusion strategy, which is insufficient to explore complex interactions underlying in highly disparate multimodal data. Second, current methods for investigating multimodal interactions face the capability-efficiency dilemma, which is the difficult balance between powerful modeling capability and applicable computational efficiency, thus impeding effective multimodal fusion. In this study, to encounter these challenges, we propose an innovative multi-shot interactive fusion method named MIF for precise survival prediction by utilizing pathological and genomic data. Particularly, a novel multi-shot fusion framework is introduced to promote multimodal fusion by decomposing it into successive fusing stages, thus delicately integrating modalities in a progressive way. Moreover, to address the capacity-efficiency dilemma, various affinity-based interactive modules are introduced to synergize the multi-shot framework. Specifically, by harnessing comprehensive affinity information as guidance for mining interactions, the proposed interactive modules can efficiently generate low-dimensional discriminative multimodal representations. Extensive experiments on different cancer datasets unravel that our method not only successfully achieves state-of-the-art performance by performing effective multimodal fusion, but also possesses high computational efficiency compared to existing survival prediction methods.

AAAI Conference 2025 Conference Paper

SpeHeaTal: A Cluster-Enhanced Segmentation Method for Sperm Morphology Analysis

  • Yi Shi
  • Yun-Kai Wang
  • Xu-Peng Tian
  • Tie-Yi Zhang
  • Bing Yao
  • Hui Wang
  • Yong Shao
  • Cen-Cen Wang

The accurate assessment of sperm morphology is crucial in andrological diagnostics, where the segmentation of sperm images presents significant challenges. Existing approaches frequently rely on large annotated datasets and often struggle with the segmentation of overlapping sperm and the presence of dye impurities. To address these challenges, this paper first analyzes the issue of overlapping sperm tails from a geometric perspective and introduces a novel clustering algorithm, Con2Dis, which effectively segments overlapping tails by considering three essential factors: CONnectivity, CONformity, and DIStance. Building on this foundation, we propose an unsupervised method, SpeHeaTal, designed for the comprehensive segmentation of the SPErm HEAd and TAiL. SpeHeaTal employs the Segment Anything Model (SAM) to generate masks for sperm heads while filtering out dye impurities, utilizes Con2Dis to segment tails, and then applies a tailored mask splicing technique to produce complete sperm masks. Experimental results underscore the superior performance of SpeHeaTal, particularly in handling images with overlapping sperm.

JBHI Journal 2022 Journal Article

Multi-Dimensional Feature Combination Method for Continuous Blood Pressure Measurement Based on Wrist PPG Sensor

  • Pan Yao
  • Ning Xue
  • Siyuan Yin
  • Changhua You
  • Yusen Guo
  • Yi Shi
  • Tiezhu Liu
  • Lei Yao

The cuff-less blood pressure (BP) monitoring method based on photoplethysmo- gram (PPG) makes it possible for long-term BP monitoring to prevent and treat cardiovascular and cerebrovascular events. In this paper, a portable BP prediction system based on feature combination and artificial neural network (ANN) is implemented. The robustness of the model is improved from three aspects. Firstly, an adaptive peak extraction algorithm was used to improve the accuracy of peaks and troughs detection. Secondly, multi-dimensional features were extracted and fused, including three groups of PPG-based features and one group of demographics-based features. Finally, a two-layer feedforward artificial neural networks algorithm was used for regression. Thirty-three subjects distributed in the three BP groups were recruited. The proposed method passed the European Society of Hypertension International Protocol revision 2010 (ESP-IP2). Experimental results show that the proposed method exhibits good accuracy for a diverse population with an estimation error of −0. 07 ± 4. 47 mmHg for SBP and 0. 00 ± 3. 61 mmHg for DBP. Moreover, the model tracked the BP of two subjects for half a month, laying the foundation work for daily BP monitoring. This work will contribute to the long-term wellness management and rehabilitation process, enabling timely detection and improvement of the user's physical health.