Author name cluster

Limin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers

1 author row

AAAI Conference 2026 Conference Paper

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment

Yang Chen
Xiaowei Xu
Shuai Wang
Chenhui Zhu
Ruxue Wen
Xubin Li
Tiezheng Ge
Limin Wang

Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm for classification, which provides a more intrinsic evaluation of the NF's embedded semantic knowledge. Comprehensive experiments demonstrate that our approach accelerates the training of NFs by over 3.3x, while simultaneously delivering significant improvements in both generative quality and classification accuracy. New state-of-the-art results for NFs are established on ImageNet 64 x 64 and 256 x 256.

PDF Details DOI

EAAI Journal 2026 Journal Article

Intelligent pose correction of shield machines via an integrated convolutional long short-term memory Kolmogorov-Arnold network and model reference adaptive control

Xiangyu Li
Xuanyu Liu
Limin Wang
Yudong Wang
He Zhang
Yueyang Huang
Junzhi Lu

In underground tunnel construction, Earth Pressure Balance shield machines are required to advance along a designed alignment. However, complex geological conditions and equipment-related disturbances often lead to pose deviations, which can compromise construction quality. This study proposes an integrated intelligent pose correction framework that combines pose prediction with adaptive control. First, key input variables are selected through Pearson correlation analysis and denoised using a hybrid Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-wavelet transform method. A pose prediction model is then developed based on a Convolutional Long Short-Term Memory Kolmogorov-Arnold Network (CL-KAN), which replaces the fully connected layers of a Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) with KAN layers to enhance nonlinear feature representation. Experimental results show that the CL-KAN model achieves high prediction accuracy, with root mean squared error values ranging from 0. 88 to 1. 68 mm for vertical deviations and coefficients of determination ranging from 0. 90 to 0. 97 for the key pose parameters. Compared with a baseline CNN-LSTM, the CL-KAN model reduces the root mean squared error by 12. 3-18. 6% while requiring fewer trainable parameters. To bridge prediction and control, a context-aware perturbation importance analysis (CA-PIA) method is employed to identify influential control features, which subsequently guide the parameter optimization of a model reference adaptive control (MRAC) strategy. Field validation under complex working conditions demonstrates that the proposed framework confines pose deviations within ±7 mm, showing strong robustness and practical applicability for intelligent pose correction in tunnel engineering based on artificial intelligence techniques.

Details DOI

AAAI Conference 2026 Conference Paper

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

Zikang Wang
Boyu Chen
Zhengrong Yue
Yi Wang
Yu Qiao
Limin Wang
Yali Wang

Recent advances in video understanding have been driven by MLLMs. But these MLLMs are good at analyzing short videos, while suffering from difficulties in understanding videos with a longer context. To address this difficulty, several agent paradigms have recently been proposed, using MLLMs as agents for retrieving extra contextual knowledge in a long video. However, most existing agents ignore the key fact that a long video is composed with multiple shots, i.e., to answer the user question from a long video, it is critical to deeply understand its relevant shots like human. Without such insight, these agents often mistakenly find redundant even noisy temporal context, restricting their capacity for long video understanding. To fill this gap, we propose VideoChat-A1, a novel long video agent paradigm. Different from the previous works, our VideoChat-A1 can deeply think with long videos, via a distinct chain-of-shot reasoning paradigm. More specifically, it can progressively select the relevant shots of user question, and look into these shots in a coarse-to-fine partition. By multi-modal reasoning along the shot chain, VideoChat-A1 can effectively mimic step-by-step human thinking process, allowing the interactive discovery of preferable temporal context for thoughtful understanding in long videos. Extensive experiments show that, VideoChat-A1 achieves the state-of-the-art performance on the mainstream long video QA benchmarks, e.g., it achieves 77.0 on VideoMME(w/ subs) and 70.1 on EgoSchema, outperforming its strong baselines (e.g., InternVL2.5-8B and InternVideo2.5-8B), by up to 10.1% and 6.2%. Compared to leading closed-source GPT-4o and Gemini 1.5 Pro, VideoChat-A1 offers competitive accuracy, but only with 7% input frames and 12% inference time on average.

PDF Details DOI

EAAI Journal 2025 Journal Article

A binary linear predictive evolutionary algorithm with feature analysis for multiobjective feature selection in classification

Ting Zhou
Limin Wang
Xuming Han
Zhiquan Liu
Minghan Gao

Multiobjective feature selection (MOFS), which aims to obtain a set of Pareto optimal feature subsets by simultaneously maximizing classification accuracy and minimizing the number of selected features, has attracted considerable attention recently. However, most existing studies still face a challenge that locating more well-distributed Pareto optimal feature subsets, especially for high-dimensional complex datasets. In response to this challenge, this paper proposes a binary linear predictive evolutionary algorithm with feature analysis (MBLPE) for MOFS. In this paper, a feature analysis-based selection method is proposed to select effective solutions into the next generation. Concretely, two subset evaluation indicators are designed to efficiently deal with duplicated feature subsets during evolution. Then, a fitness allocation is constructed to select effective solutions, improving population diversity. Moreover, a fisher score-based initialization scheme is designed when handling high-dimensional complex datasets. The proposed scheme effectively removes irrelevant and redundant features in search space by identifying features with strong discriminant performance in advance, thereby reducing computational cost. In comparison with seven state-of-the-art algorithms on 18 classification datasets with different characteristics, the proposed MBLPE finds more diverse feature subsets with better convergence in solving the MOFS problems.

Details DOI

NeurIPS Conference 2025 Conference Paper

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Guo Chen
Zhiqi Li
Shihao Wang
Jindong Jiang
Yicheng Liu
Lidong Lu
De-An Huang
Wonmin Byeon

We introduce Eagle2. 5, a frontier vision-language model (VLM) for long-context multimodal learning. Our work addresses the challenges in long video comprehension and high-resolution image understanding, introducing a generalist framework for both tasks. The proposed training framework incorporates Automatic Degrade Sampling and Image Area Preservation, two techniques that preserve contextual integrity and visual details. The framework also includes numerous efficiency optimizations in the pipeline for long-context data training. Finally, we propose Eagle-Video-110K, a novel dataset that integrates both story-level and clip-level annotations, facilitating long-video understanding. Eagle2. 5 demonstrates substantial improvements on long-context multimodal benchmarks, providing a robust solution to the limitations of existing VLMs. Notably, our best model Eagle2. 5-8B achieves 72. 4\% on Video-MME with 512 input frames, matching the results of top-tier commercial model such as GPT-4o and large-scale open-source models like Qwen2. 5-VL-72B and InternVL2. 5-78B.