Arrow Research search

Author name cluster

Xinyang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

ICLR Conference 2025 Conference Paper

Advancing Graph Generation through Beta Diffusion

  • Xinyang Liu
  • Yilin He
  • Bo Chen
  • Mingyuan Zhou

Diffusion models have excelled in generating natural images and are now being adapted to a variety of data types, including graphs. However, conventional models often rely on Gaussian or categorical diffusion processes, which can struggle to accommodate the mixed discrete and continuous components characteristic of graph data. Graphs typically feature discrete structures and continuous node attributes that often exhibit rich statistical patterns, including sparsity, bounded ranges, skewed distributions, and long-tailed behavior. To address these challenges, we introduce Graph Beta Diffusion (GBD), a generative model specifically designed to handle the diverse nature of graph data. GBD leverages a beta diffusion process, effectively modeling both continuous and discrete elements. Additionally, we propose a modulation technique that enhances the realism of generated graphs by stabilizing critical graph topology while maintaining flexibility for other components. GBD competes strongly with existing models across multiple general and biochemical graph benchmarks, showcasing its ability to capture the intricate balance between discrete and continuous features inherent in real-world graph data. Our PyTorch code is available at https://github.com/xinyangATK/GraphBetaDiffusion.

ICML Conference 2025 Conference Paper

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

  • Tiansheng Wen
  • Yifei Wang 0001
  • Zequn Zeng
  • Zhong Peng
  • Yudi Su
  • Xinyang Liu
  • Bo Chen 0001
  • Hongwei Liu 0001

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation ( CSR ), a method that specifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed—often by large margins—while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at this URL.

NeurIPS Conference 2025 Conference Paper

Physics-informed Neural Operator for Pansharpening

  • Xinyang Liu
  • Junming Hou
  • Chenxu Wu
  • Xiaofeng Cong
  • Zihao Chen
  • Shangqi Deng
  • Junling Li
  • Liang-Jian Deng

Over the past decades, pansharpening has contributed greatly to numerous remote sensing applications, with methods evolving from theoretically grounded models to deep learning approaches and their hybrids. Though promising, existing methods rarely address pansharpening through the lens of underlying physical imaging processes. In this work, we revisit the spectral imaging mechanism and propose a novel physics‐informed neural operator framework for pansharpening, termed PINO, which faithfully models the end‐to‐end electro‐optical sensor process. Specifically, PINO operates as: (1) First, a spatial-spectral encoder pair is introduced to aggregate multi-granularity high-resolution panchromatic (PAN) and low-resolution multispectral (LRMS) features. (2) Subsequently, an iterative neural integral process utilizes these fused spatial-spectral characteristics to learn a continuous radiance field $L_i(x, y, \lambda)$ over spatial coordinates and wavelength, effectively emulating band-wise spectral integration. (3) Finally, the learned radiance field is modulated by the sensor’s spectral responsivity $R_b(\lambda)$ to produce physically consistent spatial–spectral fusion products. This physics-grounded fusion paradigm offers a principled solution for reconstructing high-resolution multispectral and hyperspectral images in accordance with sensor imaging physics, effectively harnessing the unique advantages of spectral data to better uncover real-world characteristics. Experiments on multiple benchmark datasets show that our method surpasses state-of-the-art fusion algorithms, achieving reduced spectral aberrations and finer spatial textures. Furthermore, extension to hyperspectral (HS) data demonstrates its generalizability and universality. The code will be available upon potential acceptance.

AAAI Conference 2025 Conference Paper

SemiDFL: A Semi-Supervised Paradigm for Decentralized Federated Learning

  • Xinyang Liu
  • Pengchao Han
  • Xuan Li
  • Bo Liu

Decentralized federated learning (DFL) realizes cooperative model training among connected clients without relying on a central server, thereby mitigating communication bottlenecks and eliminating the single-point failure issue present in centralized federated learning (CFL). Most existing work on DFL focuses on supervised learning, assuming each client possesses sufficient labeled data for local training. However, in real-world applications, much of the data is unlabeled. We address this by considering a challenging yet practical semi-supervised learning (SSL) scenario in DFL, where clients may have varying data sources: some with few labeled samples, some with purely unlabeled data, and others with both. In this work, we propose SemiDFL, the first semi-supervised DFL method that enhances DFL performance in SSL scenarios by establishing a consensus in both data and model spaces. Specifically, we utilize neighborhood information to improve the quality of pseudo-labeling, which is crucial for effectively leveraging unlabelled data. We then design a consensus-based diffusion model to generate synthesized data, which is used in combination with pseudo-labeled data to create mixed datasets. Additionally, we develop an adaptive aggregation method that leverages the model accuracy of synthesized data to further enhance SemiDFL performance. Through extensive experimentation, we demonstrate the remarkable performance superiority of the proposed DFL-Semi method over existing CFL and DFL schemes in both iid and Non-iid SSL scenarios.

NeurIPS Conference 2024 Conference Paper

ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses

  • Junjie Ni
  • Guofeng Zhang
  • Guanglin Li
  • Yijin Li
  • Xinyang Liu
  • Zhaoyang Huang
  • Hujun Bao

We tackle the efficiency problem of learning local feature matching. Recent advancements have given rise to purely CNN-based and transformer-based approaches, each augmented with deep learning techniques. While CNN-based methods often excel in matching speed, transformer-based methods tend to provide more accurate matches. We propose an efficient transformer-based network architecture for local feature matching. This technique is built on constructing multiple homography hypotheses to approximate the continuous correspondence in the real world and uni-directional cross-attention to accelerate the refinement. On the YFCC100M dataset, our matching accuracy is competitive with LoFTR, a state-of-the-art transformer-based architecture, while the inference speed is boosted to 4 times, even outperforming the CNN-based methods. Comprehensive evaluations on other open datasets such as Megadepth, ScanNet, and HPatches demonstrate our method's efficacy, highlighting its potential to significantly enhance a wide array of downstream applications.

UAI Conference 2024 Conference Paper

Patch-Prompt Aligned Bayesian Prompt Tuning for Vision-Language Models

  • Xinyang Liu
  • Dongsheng Wang 0003
  • Bowei Fang
  • Miaoge Li
  • Yishi Xu
  • Zhibin Duan
  • Bo Chen 0001
  • Mingyuan Zhou

For downstream applications of vision-language pre-trained models, there has been significant interest in constructing effective prompts. Existing works on prompt engineering, which either require laborious manual designs or optimize the prompt tuning as a point estimation problem, may fail to describe diverse characteristics of categories and limit their applications. We introduce a Bayesian probabilistic resolution to prompt tuning, where the label-specific stochastic prompts are generated hierarchically by first sampling a latent vector from an underlying distribution and then employing a lightweight generative model. Importantly, we semantically regularize the tuning process by minimizing the statistic distance between the visual patches and linguistic prompts, which pushes the stochastic label representations to faithfully capture diverse visual concepts, instead of overfitting the training categories. We evaluate the effectiveness of our approach on four tasks: few-shot image recognition, base-to-new generalization, dataset transfer learning, and domain shifts. Extensive results on over 15 datasets show promising transferability and generalization performance of our proposed model, both quantitatively and qualitatively.

ICML Conference 2023 Conference Paper

Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process

  • Zhibin Duan
  • Xinyang Liu
  • Yudi Su
  • Yishi Xu
  • Bo Chen 0001
  • Mingyuan Zhou

Deep topic models have shown an impressive ability to extract multi-layer document latent representations and discover hierarchical semantically meaningful topics. However, most deep topic models are limited to the single-step generative process, despite the fact that the progressive generative process has achieved impressive performance in modeling image data. To this end, in this paper, we propose a novel progressive deep topic model that consists of a knowledge-informed textural data coarsening process and a corresponding progressive generative model. The former is used to build multi-level observations ranging from concrete to abstract, while the latter is used to generate more concrete observations gradually. Additionally, we incorporate a graph-enhanced decoder to capture the semantic relationships among words at different levels of observation. Furthermore, we perform a theoretical analysis of the proposed model based on the principle of information theory and show how it can alleviate the well-known "latent variable collapse" problem. Finally, extensive experiments demonstrate that our proposed model effectively improves the ability of deep topic models, resulting in higher-quality latent document representations and topics.

NeurIPS Conference 2023 Conference Paper

Context-guided Embedding Adaptation for Effective Topic Modeling in Low-Resource Regimes

  • Yishi Xu
  • Jianqiao Sun
  • Yudi Su
  • Xinyang Liu
  • Zhibin Duan
  • Bo Chen
  • Mingyuan Zhou

Embedding-based neural topic models have turned out to be a superior option for low-resourced topic modeling. However, current approaches consider static word embeddings learnt from source tasks as general knowledge that can be transferred directly to the target task, discounting the dynamically changing nature of word meanings in different contexts, thus typically leading to sub-optimal results when adapting to new tasks with unfamiliar contexts. To settle this issue, we provide an effective method that centers on adaptively generating semantically tailored word embeddings for each task by fully exploiting contextual information. Specifically, we first condense the contextual syntactic dependencies of words into a semantic graph for each task, which is then modeled by a Variational Graph Auto-Encoder to produce task-specific word representations. On this basis, we further impose a learnable Gaussian mixture prior on the latent space of words to efficiently learn topic representations from a clustering perspective, which contributes to diverse topic discovery and fast adaptation to novel tasks. We have conducted a wealth of quantitative and qualitative experiments, and the results show that our approach comprehensively outperforms established topic models.

NeurIPS Conference 2023 Conference Paper

Tuning Multi-mode Token-level Prompt Alignment across Modalities

  • Dongsheng Wang
  • Miaoge Li
  • Xinyang Liu
  • MingSheng Xu
  • Bo Chen
  • Hanwang Zhang

Advancements in prompt tuning of vision-language models have underscored their potential in enhancing open-world visual concept comprehension. However, prior works only primarily focus on single-mode (only one prompt for each modality) and holistic level (image or sentence) semantic alignment, which fails to capture the sample diversity, leading to sub-optimal prompt discovery. To address the limitation, we propose a multi-mode token-level tuning framework that leverages the optimal transportation to learn and align a set of prompt tokens across modalities. Specifically, we rely on two essential factors: 1) multi-mode prompts discovery, which guarantees diverse semantic representations, and 2) token-level alignment, which helps explore fine-grained similarity. Consequently, the similarity can be calculated as a hierarchical transportation problem between the modality-specific sets. Extensive experiments on popular image recognition benchmarks show the superior generalization and few-shot abilities of our approach. The qualitative analysis demonstrates that the learned prompt tokens have the ability to capture diverse visual concepts.

ICRA Conference 2022 Conference Paper

Crossview Mapping with Graph-based Geolocalization on City-Scale Street Maps

  • Zhichao Ye
  • Chong Bao
  • Xinyang Liu
  • Hujun Bao
  • Zhaopeng Cui
  • Guofeng Zhang 0001

3D environment mapping has been actively stud-ied recently with the development of autonomous driving and augmented reality. Although many image-based methods are proposed due to their convenience and flexibility compared to other complex sensors, few works focus on fixing the inherent scale ambiguity of image-based methods and registering the reconstructed structure to the real-world 3D map, which is very important for autonomous driving. This paper presents a low-cost mapping solution that is able to refine and align the monocular reconstructed point cloud given a public street map. Specifically, we first find the association between the street map and the reconstructed point cloud structure by a novel graph-based geolocalization method. Then, optimized with the corresponding relationship, the map accuracy is significantly improved. The rich environment information can also be associated with the point cloud by the geographical location. Experiments show that our geolocalization algorithm can locate the scene on a gigantic city-scale map (173. 46 km2) in two minutes and support 3D map reconstruction with absolute scale and rich environmental information from Internet videos.