Author name cluster

Xi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers

2 author rows

EAAI Journal 2026 Journal Article

A novel fault diagnosis method for wind turbine gearbox based on data-driven incremental broad network

Yuchen He
Xi Li
Lijuan Qian
Jiawei Lu

Details DOI

EAAI Journal 2026 Journal Article

A vision mamba-enhanced network with frequency-directional feature fusion for pavement crack segmentation

Xi Li
Yuqi Wang
Qiang Zhou
Jianhui Zhan
Deng Zuo
Weichao Chen

Details DOI

YNIMG Journal 2026 Journal Article

Dual neural mechanisms of sustained response inhibition: Right-lateralized core control and left-lateralized adaptive support

Liyue Lin
Jiahe Sun
Wei Xiong
Jiayi Zhao
Yishu Chen
Min Li
Xi Li
Ruyan Jiao

Details DOI

EAAI Journal 2026 Journal Article

Gated Memory-Guided Multi-scale spatio–temporal–spectral feature fusion network for unsupervised Internet of Things time series anomaly detection

Peng You
Peng Chen
Xi Li
Ang Bian

Details DOI

AAAI Conference 2026 Conference Paper

IGFuse: Interactive 3D Gaussian Scene Reconstruction via Multi-Scans Fusion

Wenhao Hu
Zesheng Li
Haonan Zhou
Liu Liu
Xuexiang Wen
Zhizhong Su
Xi Li
Gaoang Wang

Reconstructing complete and interactive 3D scenes remains a fundamental challenge in computer vision and robotics, particularly due to persistent object occlusions and limited sensor coverage. Even multi-view observations from a single scene scan often fail to capture the full structural details. Existing approaches typically rely on multi-stage pipelines—such as segmentation, background completion, and inpainting—or require per-object dense scanning, both of which are error-prone, and not easily scalable. We propose IGFuse, a novel framework that reconstructs interactive Gaussian scene by fusing observations from multiple scans, where natural object rearrangement between captures reveal previously occluded regions. Our method constructs segmentation-aware Gaussian fields and enforces bi-directional photometric and semantic consistency across scans. To handle spatial misalignments, we introduce a pseudo-intermediate scene state for symmetric alignment, alongside collaborative co-pruning strategies to refine geometry. IGFuse enables high-fidelity rendering and object-level scene manipulation without dense observations or complex pipelines. Extensive experiments validate the framework’s strong generalization to novel scene configurations, demonstrating its effectiveness for real-world 3D reconstruction and real-to-simulation transfer.

PDF Details DOI

AAAI Conference 2026 Conference Paper

UniScene-MoTion: Unified Scene & Motion-aware Diffusion Transition Framework

Rui Jiang
Chongmian Wang
Xinghe Fu
Yehao Lu
Teng Li
Xi Li

Video transitions are critical for ensuring temporal coherence in edited media, yet existing methods often rely on handcrafted effects or relative-scale trajectories that fail to capture the physical structure of real-world scenes. In this work, we introduce a scale-aware video transition framework that explicitly incorporates depth-aware 3D reasoning into a diffusion-based generation pipeline. Built upon a powerful I2V foundation, our method leverages single-image depth prediction to align camera motion with metric-scale geometry, enabling physically consistent transitions. To reduce reliance on precise camera inputs, we propose a bidirectional conditional control module and a progressive training strategy with conditional dropout, enhancing generalization to loosely specified or missing camera trajectories. Extensive experiments demonstrate that our approach achieves state-of-the-art performance, delivering realistic, geometrically coherent transitions across diverse scenes and applications with minimal input guidance.

PDF Details DOI

ICML Conference 2025 Conference Paper

AAAR-1. 0: Assessing AI's Potential to Assist Research

Renze Lou
Hanzi Xu
Sijia Wang
Jiangshu Du
Ryo Kamoi
Xiaoxin Lu
Jian Xie
Yuxuan Sun 0002

Numerous studies have assessed the proficiency of AI systems, particularly large language models (LLMs), in facilitating everyday tasks such as email writing, question answering, and creative content generation. However, researchers face unique challenges and opportunities in leveraging LLMs for their own work, such as brainstorming research ideas, designing experiments, and writing or reviewing papers. In this study, we introduce AAAR-1. 0, a benchmark dataset designed to evaluate LLM performance in three fundamental, expertise-intensive research tasks: (i) EquationInference, assessing the correctness of equations based on the contextual information in paper submissions; (ii) ExperimentDesign, designing experiments to validate research ideas and solutions; and (iii) PaperWeakness, identifying weaknesses in paper submissions. AAAR-1. 0 differs from prior benchmarks in two key ways: first, it is explicitly research-oriented, with tasks requiring deep domain expertise; second, it is researcher-oriented, mirroring the primary activities that researchers engage in on a daily basis. An evaluation of both open-source and proprietary LLMs reveals their potential as well as limitations in conducting sophisticated research tasks. We will release the AAAR-1. 0 and keep iterating it to new versions.

Details

NeurIPS Conference 2025 Conference Paper

Collaborating Vision, Depth, and Thermal Signals for Multi-Modal Tracking: Dataset and Algorithm

Xue-Feng Zhu
Tianyang Xu
Yifan Pan
Jinjie Gu
Xi Li
Jiwen Lu
Xiaojun Wu
Josef Kittler

Existing multi-modal object tracking approaches primarily focus on dual-modal paradigms, such as RGB-Depth or RGB-Thermal, yet remain challenged in complex scenarios due to limited input modalities. To address this gap, this work introduces a novel multi-modal tracking task that leverages three complementary modalities, including visible RGB, Depth (D), and Thermal Infrared (TIR), aiming to enhance robustness in complex scenarios. To support this task, we construct a new multi-modal tracking dataset, coined RGBDT500, which consists of 500 videos with synchronised frames across the three modalities. Each frame provides spatially aligned RGB, depth, and thermal infrared images with precise object bounding box annotations. Furthermore, we propose a novel multi-modal tracker, dubbed RDTTrack. RDTTrack integrates tri-modal information for robust tracking by leveraging a pretrained RGB-only tracking model and prompt learning techniques. In specific, RDTTrack fuses thermal infrared and depth modalities under a proposed orthogonal projection constraint, then integrates them with RGB signals as prompts for the pre-trained foundation tracking model, effectively harmonising tri-modal complementary cues. The experimental results demonstrate the effectiveness and advantages of the proposed method, showing significant improvements over existing dual-modal approaches in terms of tracking accuracy and robustness in complex scenarios. The dataset and source code are publicly available at https: //xuefeng-zhu5. github. io/RGBDT500.