Arrow Research search

Author name cluster

Xin Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2026 Conference Paper

Zero-shot Implicit Neural Manifold Representation (INMR) for Ultra-high Temporal Resolution Dynamic MRI

  • Jie Feng
  • Rui Luo
  • Tian Zeng
  • Xin Shen
  • Haikun Qi
  • Yuyao Zhang
  • Dong Liang
  • Hongjiang Wei

Capturing accurate dynamic information of moving organs is essential for functional assessment using non-invasive imaging modalities. Achieving high temporal resolution visualization of physiological processes remains a critical challenge in dynamic magnetic resonance imaging (MRI) when reconstructing from extremely limited acquisitions. We introduce an unsupervised zero-shot reconstruction framework combining Implicit Neural Representation (INR) with manifold learning, capable of reconstructing dynamic MRI data at unprecedented temporal resolutions (less than 10 ms per frame for 2D imaging, less than 400 ms per frame for 3D imaging). The framework employs learnable low-dimensional manifold vectors to autonomously capture motion in real time directly from undersampled data, and dynamically condition coordinate-based spatial representations to generate high-fidelity image sequences. Through a novel spatiotemporal coarse-to-fine (C2F) optimization strategy, our method outperforms current state-of-the-art (SOTA) techniques across multiple imaging scenarios, including cardiac, speech and dynamic-contrast-enhanced (DCE) abdominal MRI, demonstrating robust performance under challenging motion patterns and contrast dynamics. The learned manifolds additionally provide intuitive visualization of motion and contrast evolution during imaging. These advances indicate strong clinical potential for applications requiring extreme temporal resolution while maintaining both anatomical and temporal fidelity.

IJCAI Conference 2025 Conference Paper

Multimodal Retina Image Analysis Survey: Datasets, Tasks and Methods

  • Hongwei Sheng
  • Heming Du
  • Xin Shen
  • Sen Wang
  • Xin Yu

Retina images provide a noninvasive view of the central nervous system and microvasculature, making it essential for clinical applications. Changes in the retina often indicate both ophthalmic and systemic diseases, aiding in diagnosis and early intervention. While deep learning algorithms have advanced retina image analysis, a comprehensive review of related datasets, tasks, and benchmarking is still lacking. In this survey, we systematically categorize existing retina image datasets based on their available data modalities, and review the tasks these datasets support in multimodal retina image analysis. We also explain key evaluation metrics used in various retina image analysis benchmarks. By thoroughly examining current datasets and methods, we highlight the challenges and limitations in existing benchmarks and discuss potential research topics in the field. We hope this work will guide future retina analysis methods and promote the shared use of existing data across different tasks.

NeurIPS Conference 2024 Conference Paper

MM-WLAuslan: Multi-View Multi-Modal Word-Level Australian Sign Language Recognition Dataset

  • Xin Shen
  • Heming Du
  • Hongwei Sheng
  • Shuyun Wang
  • Hui Chen
  • Huiqiang Chen
  • Zhuojie Wu
  • Xiaobiao Du

Isolated Sign Language Recognition (ISLR) focuses on identifying individual sign language glosses. Considering the diversity of sign languages across geographical regions, developing region-specific ISLR datasets is crucial for supporting communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale word-level dataset for the ISLR task. To fill this gap, we curate \underline{\textbf{the first}} large-scale Multi-view Multi-modal Word-Level Australian Sign Language recognition dataset, dubbed MM-WLAuslan. Compared to other publicly available datasets, MM-WLAuslan exhibits three significant advantages: (1) the largest amount of data, (2) the most extensive vocabulary, and (3) the most diverse of multi-modal camera views. Specifically, we record 282K+ sign videos covering 3, 215 commonly used Auslan glosses presented by 73 signers in a studio environment. Moreover, our filming system includes two different types of cameras, i. e. , three Kinect-V2 cameras and a RealSense camera. We position cameras hemispherically around the front half of the model and simultaneously record videos using all four cameras. Furthermore, we benchmark results with state-of-the-art methods for various multi-modal ISLR settings on MM-WLAuslan, including multi-view, cross-camera, and cross-view. Experiment results indicate that MM-WLAuslan is a challenging ISLR dataset, and we hope this dataset will contribute to the development of Auslan and the advancement of sign languages worldwide. All datasets and benchmarks are available at MM-WLAuslan.

NeurIPS Conference 2023 Conference Paper

Auslan-Daily: Australian Sign Language Translation for Daily Communication and News

  • Xin Shen
  • Shaozu Yuan
  • Hongwei Sheng
  • Heming Du
  • Xin Yu

Sign language translation (SLT) aims to convert a continuous sign language video clip into a spoken language. Considering different geographic regions generally have their own native sign languages, it is valuable to establish corresponding SLT datasets to support related communication and research. Auslan, as a sign language specific to Australia, still lacks a dedicated large-scale dataset for SLT. To fill this gap, we curate an Australian Sign Language translation dataset, dubbed Auslan-Daily, which is collected from the Auslan educational TV series and Auslan TV programs. The former involves daily communications among multiple signers in the wild, while the latter comprises sign language videos for up-to-date news, weather forecasts, and documentaries. In particular, Auslan-Daily has two main features: (1) the topics are diverse and signed by multiple signers, and (2) the scenes in our dataset are more complex, e. g. , captured in various environments, gesture interference during multi-signers' interactions and various camera positions. With a collection of more than 45 hours of high-quality Auslan video materials, we invite Auslan experts to align different fine-grained visual and language pairs, including video $\leftrightarrow$ fingerspelling, video $\leftrightarrow$ gloss, and video $\leftrightarrow$ sentence. As a result, Auslan-Daily contains multi-grained annotations that can be utilized to accomplish various fundamental sign language tasks, such as signer detection, sign spotting, fingerspelling detection, isolated sign language recognition, sign language translation and alignment. Moreover, we benchmark results with state-of-the-art models for each task in Auslan-Daily. Experiments indicate that Auslan-Daily is a highly challenging SLT dataset, and we hope this dataset will contribute to the development of Auslan and the advancement of sign languages worldwide in a broader context. All datasets and benchmarks are available at Auslan-Daily.

AAAI Conference 2023 Conference Paper

MNER-QG: An End-to-End MRC Framework for Multimodal Named Entity Recognition with Query Grounding

  • Meihuizi Jia
  • Lei Shen
  • Xin Shen
  • Lejian Liao
  • Meng Chen
  • Xiaodong He
  • Zhendong Chen
  • Jiaqi Li

Multimodal named entity recognition (MNER) is a critical step in information extraction, which aims to detect entity spans and classify them to corresponding entity types given a sentence-image pair. Existing methods either (1) obtain named entities with coarse-grained visual clues from attention mechanisms, or (2) first detect fine-grained visual regions with toolkits and then recognize named entities. However, they suffer from improper alignment between entity types and visual regions or error propagation in the two-stage manner, which finally imports irrelevant visual information into texts. In this paper, we propose a novel end-to-end framework named MNER-QG that can simultaneously perform MRC-based multimodal named entity recognition and query grounding. Specifically, with the assistance of queries, MNER-QG can provide prior knowledge of entity types and visual regions, and further enhance representations of both text and image. To conduct the query grounding task, we provide manual annotations and weak supervisions that are obtained via training a highly flexible visual grounding model with transfer learning. We conduct extensive experiments on two public MNER datasets, Twitter2015 and Twitter2017. Experimental results show that MNER-QG outperforms the current state-of-the-art models on the MNER task, and also improves the query grounding performance.

TMLR Journal 2022 Journal Article

A Unified Domain Adaptation Framework with Distinctive Divergence Analysis

  • Zhiri YUAN
  • Xixu HU
  • Qi Wu
  • Shumin MA
  • Cheuk Hang Leung
  • Xin Shen
  • Yiyan Huang

Unsupervised domain adaptation enables knowledge transfer from a labeled source domain to an unlabeled target domain by aligning the learnt features of both domains. The idea is theoretically supported by the generalization bound analysis in Ben-David et al. (2007), which specifies the applicable task (binary classification) and designates a specific distribution divergence measure. Although most distribution-aligning domain adaptation models seek theoretical grounds from this particular bound analysis, they do not actually fit into the stringent conditions. In this paper, we bridge the long-standing theoretical gap in literature by providing a unified generalization bound. Our analysis can well accommodate the classification/regression tasks and most commonly-used divergence measures, and more importantly, it can theoretically recover a large amount of previous models. In addition, we identify the key difference in the distribution divergence measures underlying the diverse models and commit a comprehensive in-depth comparison of the commonly-used divergence measures. Based on the unified generalization bound, we propose new domain adaptation models that achieve transferability through domain-invariant representations and conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of distribution-aligning domain adaptation algorithms.

IJCAI Conference 2021 Conference Paper

Two-stage Training for Learning from Label Proportions

  • Jiabin Liu
  • Bo Wang
  • Xin Shen
  • Zhiquan Qi
  • YingJie Tian

Learning from label proportions (LLP) aims at learning an instance-level classifier with label proportions in grouped training data. Existing deep learning based LLP methods utilize end-to-end pipelines to obtain the proportional loss with Kullback-Leibler divergence between the bag-level prior and posterior class distributions. However, the unconstrained optimization on this objective can hardly reach a solution in accordance with the given proportions. Besides, concerning the probabilistic classifier, this strategy unavoidably results in high-entropy conditional class distributions at the instance level. These issues further degrade the performance of the instance-level classification. In this paper, we regard these problems as noisy pseudo labeling, and instead impose the strict proportion consistency on the classifier with a constrained optimization as a continuous training stage for existing LLP classifiers. In addition, we introduce the mixup strategy and symmetric cross-entropy to further reduce the label noise. Our framework is model-agnostic, and demonstrates compelling performance improvement in extensive experiments, when incorporated into other deep LLP models as a post-hoc phase.