Arrow Research search

Author name cluster

Liang Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

JBHI Journal 2026 Journal Article

scProca: A Cross-Attention-Enhanced Deep Generative Model for Single-Cell Transcriptomics and Proteomics Integration and Imputation

  • Jiankang Xiong
  • Shuqiao Zheng
  • Fuzhou Gong
  • Liang Ma
  • Lin Wan

Understanding the molecular mechanisms of complex diseases requires insight into cellular interactions and protein expression. While large-scale sequencing enables disease subtyping and patient stratification, integrating proteomics and transcriptomics data offers a deeper view of cellular states. Recent methods combine scRNA-seq, which provides broad cellular coverage, with transcriptomics and proteomics co-profiling, which provides more comprehensive molecular measurements. However, many models adopt simplistic strategies for joint analysis. We introduce scProca, a deep generative model that incorporates inter-cellular relationships via cross-attention mechanisms to handle heterogeneous inputs, whether from RNA-seq or co-profiling datasets. scProca achieves state-of-the-art integration and imputation, remains robust under high protein sparsity, generalizes across species and tissues, scales to large datasets, and is compatible with experimental batches, demonstrating strong flexibility for complex experimental settings.

NeurIPS Conference 2025 Conference Paper

MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents

  • Ziming Wei
  • Bingqian Lin
  • Zijian Jiao
  • Yunshuang Nie
  • Liang Ma
  • Yuecheng Liu
  • Yuzheng Zhuang
  • Xiaodan Liang

Spatial Planning is a crucial part in the field of spatial intelligence, which requires the understanding and planning about object arrangements in space perspective. AI agents with the spatial planning ability can better adapt to various real-world applications, including robotic manipulation, automatic assembly, urban planning etc. Recent works have attempted to construct benchmarks for evaluating the spatial intelligence of Multimodal Large Language Models (MLLMs). Nevertheless, these benchmarks primarily focus on spatial reasoning based on typical Visual Question-Answering (VQA) forms, which suffers from the gap between abstract spatial understanding and concrete task execution. In this work, we take a step further to build a comprehensive benchmark called MineAnyBuild, aiming to evaluate the spatial planning ability of open-world AI agents in the Minecraft game. Specifically, MineAnyBuild requires an agent to generate executable architecture building plans based on the given multi-modal human instructions. It involves 4, 000 curated spatial planning tasks and also provides a paradigm for infinitely expandable data collection by utilizing rich player-generated content. MineAnyBuild evaluates spatial planning through four core supporting dimensions: spatial understanding, spatial reasoning, creativity, and spatial commonsense. Based on MineAnyBuild, we perform a comprehensive evaluation for existing MLLM-based agents, revealing the severe limitations but enormous potential in their spatial planning abilities. We believe our MineAnyBuild will open new avenues for the evaluation of spatial intelligence and help promote further development for open-world AI agents capable of spatial planning.

NeurIPS Conference 2025 Conference Paper

PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

  • Liang Ma
  • Jiajun Wen
  • Min Lin
  • Rongtao Xu
  • Xiwen Liang
  • Bingqian Lin
  • Jun Ma
  • Yongxin Wang

While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block assembly tasks. PhyBlock integrates a novel four-level cognitive hierarchy assembly task alongside targeted Visual Question Answering (VQA) samples, collectively aimed at evaluating progressive spatial reasoning and fundamental physical comprehension, including object properties, spatial relationships, and holistic scene understanding. PhyBlock includes 2600 block tasks (400 assembly tasks, 2200 VQA tasks) and evaluates models across three key dimensions: partial completion, failure diagnosis, and planning robustness. We benchmark 23 state-of-the-art VLMs, highlighting their strengths and limitations in physically grounded, multi-step planning. Our empirical findings indicate that the performance of VLMs exhibits pronounced limitations in high-level planning and reasoning capabilities, leading to a notable decline in performance for the growing complexity of the tasks. Error analysis reveals persistent difficulties in spatial orientation and dependency reasoning. We position PhyBlock as a unified testbed to advance embodied reasoning, bridging vision-language understanding and real-world physical problem-solving.

ICML Conference 2025 Conference Paper

Unbiased Evaluation of Large Language Models from a Causal Perspective

  • Meilin Chen
  • Jian Tian
  • Liang Ma
  • Di Xie
  • Weijie Chen 0006
  • Jiang Zhu

Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designing unbiased evaluation protocols. Furthermore, we identify two type of bias in Agents-as-an-Evaluator through carefully designed probing tasks on a minimal Agents-as-an-Evaluator setup. To address these issues, we propose the Unbiased Evaluator, an evaluation protocol that delivers a more comprehensive, unbiased, and interpretable assessment of LLMs. Extensive experiments reveal significant room for improvement in current LLMs. Additionally, we demonstrate that the Unbiased Evaluator not only offers strong evidence of benchmark contamination but also provides interpretable evaluation results.

AAAI Conference 2019 Conference Paper

Learning Incremental Triplet Margin for Person Re-Identification

  • Yingying Zhang
  • Qiaoyong Zhong
  • Liang Ma
  • Di Xie
  • Shiliang Pu

Person re-identification (ReID) aims to match people across multiple non-overlapping video cameras deployed at different locations. To address this challenging problem, many metric learning approaches have been proposed, among which triplet loss is one of the state-of-the-arts. In this work, we explore the margin between positive and negative pairs of triplets and prove that large margin is beneficial. In particular, we propose a novel multi-stage training strategy which learns incremental triplet margin and improves triplet loss effectively. Multiple levels of feature maps are exploited to make the learned features more discriminative. Besides, we introduce global hard identity searching method to sample hard identities when generating a training batch. Extensive experiments on Market-1501, CUHK03, and DukeMTMCreID show that our approach yields a performance boost and outperforms most existing state-of-the-art methods.