Arrow Research search

Author name cluster

Xiaodong Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
2 author rows

Possible papers

17

AAAI Conference 2026 Conference Paper

Fine-flow Distilling Coarse-flow Video Generation for Long-Term Driving World Model

  • Xiaodong Wang
  • Zhirong Wu
  • Peixi Peng

Driving world models are used to simulate futures by video generation based on the condition of the current state and actions. However, current models often suffer serious error accumulations when predicting the long-term future, which limits practical applications. Recent studies utilize the Diffusion Transformer (DiT) as the backbone of driving world models to improve learning flexibility. However, these models are always trained on short video clips, and multiple roll-out generations struggle to produce consistent and reasonable long videos due to the training-inference gap. To this end, we propose several solutions to build a simple yet effective long-term driving world model. First, we hierarchically decouple world model learning into large motion learning and bidirectional continuous motion learning. Then, considering the continuity of driving scenes, we propose a simple distillation method where fine-grained video flows are self-supervised signals for coarse-grained flows. The distillation is designed to improve the coherence of infinite video generation. The coarse-grained and fine-grained modules are coordinated to generate long-term and temporally coherent videos. On NuScenes, compared with the state-of-the-art front-view models, our model improves FVD by 27% and reduces inference time by 85% for the video task of generating 110+ frames.

AAAI Conference 2026 Conference Paper

LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding

  • Xiaodong Wang
  • Langling Huang
  • Zhirong Wu
  • Xu Zhao
  • Teng Xu
  • Xuhong Xia
  • Peixi Peng

The development of multimodal large language models (MLLMs) has advanced general video understanding. However, existing video evaluation benchmarks primarily focus on non-interactive videos, such as movies and recordings. To fill this gap, this paper proposes the first omnimodal benchmark for interactive livestream videos, LiViBench. It features a diverse set of 24 tasks, highlighting the perceptual, reasoning, and livestream-specific challenges. To efficiently construct the dataset, we design a standardized semi-automatic annotation workflow that incorporates the human-in-the-loop at multiple stages. The workflow leverages multiple MLLMs to form a multi-agent system for comprehensive video description and uses a seed-question-driven method to construct high-quality annotations. All interactive videos in the benchmark include audio, speech, and real-time comments modalities. To enhance models' understanding of interactive videos, we design tailored two-stage instruction-tuning and propose a Video-to-Comment Retrieval (VCR) module to improve the model's ability to utilize real-time comments. Based on these advancements, we develop LiVi-LLM-7B, an MLLM with enhanced knowledge of interactive livestreams. Experiments show that our model outperforms larger open-source models with up to 72B parameters, narrows the gap with leading proprietary models on LiViBench, and achieves enhanced performance on general video benchmarks, including VideoMME, LongVideoBench, MLVU, and VideoEval-Pro.

EAAI Journal 2025 Journal Article

Integration of in-wheel motor sensorless systems and hierarchical direct yaw moment control for distributed drive electric vehicles

  • Xiaodong Wang
  • Maoping Ran
  • Xinglin Zhou

Ensuring robust and reliable control of distributed vehicles powered by in-wheel motor systems poses a significant challenge due to the harsh operating environments and high costs of such motor systems. Poor motor control, parameter variations, and sensor malfunction under these conditions can compromise the vehicle yaw stability. Integrating permanent magnet synchronous motor (PMSM) sensorless systems with vehicle yaw moment control offers a cost-effective solution for this issue without wheel angular speed sensors while enhancing yaw stability. In this paper, a composite nonlinear feedback sliding mode controller that can enhance the PMSM speed response is proposed. The proposed scheme exhibits a rotor speed overshoot and transient time of only 0. 64% and 0. 07s, respectively, which are smaller and shorter compared with other methods under motor parameter changes. Subsequently, the key states and tire-road friction coefficients required for vehicle control were estimated using sensorless rotor speeds and unscented Kalman filters, enabling the integration of the PMSM sensorless system with the vehicle yaw moment control. Additionally, a fuzzy adaptive hybrid sliding mode method is presented for yaw moment control enhancement. This method maintained the smallest sideslip angle root mean square error during double lane changes (0. 4192 deg) compared with other methods. Analysis results show that different motor controllers and parameter changes significantly affect the vehicle dynamics performance. The proposed integrated scheme is feasible and effectively enhances the yaw moment control via high-performance sensorless PMSM systems.

NeurIPS Conference 2025 Conference Paper

Spectral Compressive Imaging via Chromaticity-Intensity Decomposition

  • Xiaodong Wang
  • Zijun He
  • Ping Wang
  • Lishun Wang
  • Yanan Hu
  • Xin Yuan

In coded aperture snapshot spectral imaging (CASSI), the captured measurement entangles spatial and spectral information, posing a severely ill-posed inverse problem for hyperspectral images (HSIs) reconstruction. Moreover, the captured radiance inherently depends on scene illumination, making it difficult to recover the intrinsic spectral reflectance that remains invariant to lighting conditions. To address these challenges, we propose a chromaticity-intensity decomposition framework, which disentangles an HSI into a spatially smooth intensity map and a spectrally variant chromaticity cube. The chromaticity encodes lighting-invariant reflectance, enriched with high-frequency spatial details and local spectral sparsity. Building on this decomposition, we develop CIDNet—a Chromaticity-Intensity Decomposition unfolding network within a dual-camera CASSI system. CIDNet integrates a hybrid spatial-spectral Transformer tailored to reconstruct fine-grained and sparse spectral chromaticity and a degradation-aware, spatially-adaptive noise estimation module that captures anisotropic noise across iterative stages. Extensive experiments on both synthetic and real-world CASSI datasets demonstrate that our method achieves superior performance in both spectral and chromaticity fidelity. Code is released at: \url{https: //github. com/xiaodongwo/CIDNet}.

ECAI Conference 2024 Conference Paper

Complex-Valued Gabor-Attention Residual Fusion Network for Iris Recognition

  • Zhuoru Li
  • Jian Xiao
  • Xiaowei Bai
  • Xiaodong Wang
  • Yingxi Li
  • Zhenyu Fang
  • Liang Xie 0012
  • Ye Yan 0001

Iris recognition has gained significant attention in identity verification due to the unique, stable texture patterns in iris. Successfully extracting these patterns is essential for quick and precise identification. Although deep learning methods have automated the iris recognition, they predominantly rely on real-valued networks that overlook the complex-valued representation of iris texture. This means they cannot effectively process phase and amplitude information, and fail to integrate domain-specific knowledge of iris, thereby not fully capturing the intricate details of the iris texture. Inspired by classical manual methods that efficiently harness the complex-valued representation of the iris to extract both amplitude and phase information. We integrate Gabor filters with complex-valued neural networks, propose a Complex-Valued Gabor-Attention Residual Fusion Network (GRFN) tailored for iris recognition, aiming to comprehensively capture the iris texture’s multi-scale and multi-orientation phase and amplitude features. The GRFN incorporates adaptive Gabor Complex-Valued Convolution Kernels (GCVK) to introduce a Gabor attention mechanism focused on iris biometric characteristics. Furthermore, we propose a novel residual feature fusion approach that selects and merges local and global features across multiple directions and scales, mitigating model degradation and enhancing the network’s ability to extract iris texture features effectively. Extensive experiments show that the proposed network outperforms the state-of-the-art performance on two benchmark datasets.

AAAI Conference 2024 Conference Paper

ORES: Open-Vocabulary Responsible Visual Synthesis

  • Minheng Ni
  • Chenfei Wu
  • Xiaodong Wang
  • Shengming Yin
  • Lijuan Wang
  • Zicheng Liu
  • Nan Duan

Avoiding synthesizing specific visual concepts is an essential challenge in responsible visual synthesis. However, the visual concept that needs to be avoided for responsible visual synthesis tends to be diverse, depending on the region, context, and usage scenarios. In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content. To address this problem, we present a Two-stage Intervention (TIN) framework. By introducing 1) rewriting with learnable instruction through a large-scale language model (LLM) and 2) synthesizing with prompt intervention on a diffusion synthesis model, it can effectively synthesize images avoiding any concepts but following the user's query as much as possible. To evaluate on ORES, we provide a publicly available dataset, baseline models, and benchmark. Experimental results demonstrate the effectiveness of our method in reducing risks of image generation. Our work highlights the potential of LLMs in responsible visual synthesis. Our code and dataset is public available in https://github.com/kodenii/ORES.

EAAI Journal 2024 Journal Article

Smooth fusion of multi-spectral images via total variation minimization for traffic scene semantic segmentation

  • Ying Li
  • Aiqing Fang
  • Yangming Guo
  • Wei Sun
  • Xiaobao Yang
  • Xiaodong Wang

Achieving precise semantic segmentation for traffic scenes relies on adopting multi-spectral image fusion techniques to attain high-quality images. Many existing fusion solutions often aim to enhance the similarity between the input and fusion results at the pixel intensity and texture details stage. However, this can result in smoothness issues that limit semantic segmentation performance. To address these issues, we present a smooth representation learning optimization mechanism (SFLM) that conducts image fusion on two dimensions: inter- and intra-image levels. The former overcomes over- or under-smoothing problems via the mutual information maximization between the fusion result and image samples (i. e. , negative and positive). The latter balances under and over-smoothing for fusion results by minimizing the total variation in pixel space and maximizing the total variation in gradient space based on contrast learning. In this way, the proposed method effectively overcomes the fusion quality issues, providing better feature representations for semantic segmentation in autonomous vehicles. Experimental results on four public datasets validate our method’s effectiveness, robustness, and overall superiority.

EAAI Journal 2023 Journal Article

Fine-grained Image Recognition via Attention Interaction and Counterfactual Attention Network

  • Lei Huang
  • Chen An
  • Xiaodong Wang
  • Leon Bevan Bullock
  • Zhiqiang Wei

Learning subtle and discriminative regions plays an important role in fine-grained image recognition, and attention mechanisms have shown great potential in such tasks. Recent research mainly focuses on employing the attention mechanism to locate key discriminative regions and learn salient features, whilst ignoring imperceptible complementary features and the causal relationship between prediction results and attention. To address the above issues, we propose an Attention Interaction and Counterfactual Attention Network (AICA-Net). Specifically, we propose an Attention Interaction Fusion Module (AIFM) to model the negative correlation between the attention map channels to locate the complementary features, and fuse the complementary features and key discriminative features to generate richer fine-grained features. Simultaneously, an Enhanced Counterfactual Attention Module (ECAM) is proposed to generate a counterfactual attention map. By comparing the impact of the learned attention map and the counterfactual attention map on the final prediction results, quantifying the quality of attention drives the network to learn more effective attention. Extensive experiments on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets have shown that our AICA-Net can get outstanding results. In particular, it achieves 90. 83% and 95. 87% accuracy on two open competitive benchmark datasets CUB-200-2011 and Stanford Cars, respectively. Experiments demonstrate that our method outperforms state-of-the-art solutions.

IJCAI Conference 2023 Conference Paper

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

  • Xiaodong Wang
  • Chenfei Wu
  • Shengming Yin
  • Minheng Ni
  • Jianfeng Wang
  • Linjie Li
  • Zhengyuan Yang
  • Fan Yang

3D photography renders a static image into a video with appealing 3D visual effects. Existing approaches typically first conduct monocular depth estimation, then render the input frame to subsequent frames with various viewpoints, and finally use an inpainting model to fill those missing/occluded regions. The inpainting model plays a crucial role in rendering quality, but it is normally trained on out-of-domain data. To reduce the training and inference gap, we propose a novel self-supervised diffusion model as the inpainting module. Given a single input image, we automatically construct a training pair of the masked occluded image and the ground-truth image with random cycle rendering. The constructed training samples are closely aligned to the testing instances, without the need for data annotation. To make full use of the masked images, we designed a Masked Enhanced Block (MEB), which can be easily plugged into the UNet and enhance the semantic conditions. Towards real-world animation, we present a novel task: out-animation, which extends the space and time of input objects. Extensive experiments on real datasets show that our method achieves competitive results with existing SOTA methods.

YNIMG Journal 2023 Journal Article

Quantitative susceptibility mapping in rats with minimal hepatic encephalopathy: Does iron overload aggravate cognitive impairment by promoting neuroinflammation?

  • Xuhong Yang
  • Minglei Wang
  • Wenxiao Liu
  • Mingli Hou
  • Jianguo Zhao
  • Xueying Huang
  • Minxing Wang
  • Jiarui Zheng

BACKGROUND AND AIMS: Minimal hepatic encephalopathy (MHE) is a mild form of hepatic encephalopathy that lacks observable signs and symptoms. Nevertheless, MHE can cause neurocognitive dysfunction, although the neurobiological mechanisms are not fully understood. Here, the effects of hippocampal iron deposition on cognitive function and its role in MHE were investigated. MATERIALS AND METHODS: Eighteen rats were assigned to experimental and control groups. MHE was induced by thioacetamide. Spatial memory and exploratory behavior were assessed by the Morris water and elevated plus mazes. Hippocampal susceptibility was measured by quantitative susceptibility mapping, iron deposition in the hippocampus and liver by Prussian blue staining, and inflammatory cytokine and ferritin levels in the hippocampus were measured by ELISA. RESULTS: MHE rats showed impaired spatial memory and exploratory behavior (P < 0.05 for all parameters). The bilateral hippocampal susceptibility values were significantly raised in MHE rats, together with evidence of neuroinflammation (increased pro-inflammatory and reduced anti-inflammatory cytokine levels (all P < 0.05). Further analysis indicated good correlations between hippocampal susceptibility values with latency time and inflammatory cytokine levels in MHE but not in control rats. CONCLUSION: MHE induced by thioacetamide was associated with hippocampal iron deposition and inflammation, suggesting that iron overload may be an important driver of neuroinflammatory responses.

NeurIPS Conference 2022 Conference Paper

Homomorphic Matrix Completion

  • Xiao-Yang Liu
  • Zechu (Steven) Li
  • Xiaodong Wang

In recommendation systems, global positioning, system identification and mobile social networks, it is a fundamental routine that a server completes a low-rank matrix from an observed subset of its entries. However, sending data to a cloud server raises up the data privacy concern due to eavesdropping attacks and the single-point failure problem, e. g. , the Netflix prize contest was canceled after a privacy lawsuit. In this paper, we propose a homomorphic matrix completion algorithm for privacy-preserving data completion. First, we formulate a \textit{homomorphic matrix completion} problem where a server performs matrix completion on cyphertexts, and propose an encryption scheme that is fast and easy to implement. Secondly, we prove that the proposed scheme satisfies the \textit{homomorphism property} that decrypting the recovered matrix on cyphertexts will obtain the target complete matrix in plaintext. Thirdly, we prove that the proposed scheme satisfies an $(\epsilon, \delta)$-differential privacy property. While with similar level of privacy guarantee, we reduce the best-known error bound $O(\sqrt[10]{n_1^3n_2})$ to EXACT recovery at a price of more samples. Finally, on numerical data and real-world data, we show that both homomorphic nuclear-norm minimization and alternating minimization algorithms achieve accurate recoveries on cyphertexts, verifying the homomorphism property.

AAAI Conference 2020 Conference Paper

Towards Scale-Free Rain Streak Removal via Self-Supervised Fractal Band Learning

  • Wenhan Yang
  • Shiqi Wang
  • Dejia Xu
  • Xiaodong Wang
  • Jiaying Liu

Data-driven rain streak removal methods, which most of rely on synthesized paired data, usually come across the generalization problem when being applied in real cases. In this paper, we propose a novel deep-learning based rain streak removal method injected with self-supervision to improve the ability to remove rain streaks in various scales. To realize this goal, we made efforts in two aspects. First, considering that rain streak removal is highly correlated with texture characteristics, we create a fractal band learning (FBL) network based on frequency band recovery. It integrates commonly seen band feature operations with neural modules and effectively improves the capacity to capture discriminative features for deraining. Second, to further improve the generalization ability of FBL for rain streaks in various scales, we add cross-scale self-supervision to regularize the network training. The constraint forces the extracted features of inputs in different scales to be equivalent after rescaling. Therefore, FBL can offer similar responses based on solely image content without the interleave of scale and is capable to remove rain streaks in various scales. Extensive experiments in quantitative and qualitative evaluations demonstrate the superiority of our FBL for rain streak removal, especially for the real cases where very large rain streaks exist, and prove the effectiveness of its each component. Our code will be public available at: https: //github. com/flyywh/AAAI-2020-FBL-SS.

JMLR Journal 2020 Journal Article

Union of Low-Rank Tensor Spaces: Clustering and Completion

  • Morteza Ashraphijuo
  • Xiaodong Wang

We consider the problem of clustering and completing a set of tensors with missing data that are drawn from a union of low-rank tensor spaces. In the clustering problem, given a partially sampled tensor data that is composed of a number of subtensors, each chosen from one of a certain number of unknown tensor spaces, we need to group the subtensors that belong to the same tensor space. We provide a geometrical analysis on the sampling pattern and subsequently derive the sampling rate that guarantees the correct clustering under some assumptions with high probability. Moreover, we investigate the fundamental conditions for finite/unique completability for the union of tensor spaces completion problem. Both deterministic and probabilistic conditions on the sampling pattern to ensure finite/unique completability are obtained. For both the clustering and completion problems, our tensor analysis provides significantly better bound than the bound given by the matrix analysis applied to any unfolding of the tensor data. [abs] [ pdf ][ bib ] &copy JMLR 2020. ( edit, beta )

TCS Journal 2017 Journal Article

A space efficient algorithm for the longest common subsequence in k-length substrings

  • Daxin Zhu
  • Lei Wang
  • Tinran Wang
  • Xiaodong Wang

Two space efficient algorithms to solve the L C S k problem and L C S ≥ k problem are presented in this paper. The algorithms improve the time and space complexities of the algorithms of Benson et al. [4]. The space cost of the first algorithm to solve the L C S k problem is reduced from O ( n 2 ) to O ( k n ), if the size of the two input sequences are both n. The time and space costs of the second algorithm to solve the L C S ≥ k problem are both improved. The time cost is reduced from O ( k n 2 ) to O ( n 2 ), and the space cost is reduced from O ( n 2 ) to O ( k n ). In the case of k = O ( 1 ), the two algorithms are both linear space algorithms.

JMLR Journal 2017 Journal Article

Fundamental Conditions for Low-CP-Rank Tensor Completion

  • Morteza Ashraphijuo
  • Xiaodong Wang

We consider the problem of low canonical polyadic (CP) rank tensor completion. A completion is a tensor whose entries agree with the observed entries and its rank matches the given CP rank. We analyze the manifold structure corresponding to the tensors with the given rank and define a set of polynomials based on the sampling pattern and CP decomposition. Then, we show that finite completability of the sampled tensor is equivalent to having a certain number of algebraically independent polynomials among the defined polynomials. Our proposed approach results in characterizing the maximum number of algebraically independent polynomials in terms of a simple geometric structure of the sampling pattern, and therefore we obtain the deterministic necessary and sufficient condition on the sampling pattern for finite completability of the sampled tensor. Moreover, assuming that the entries of the tensor are sampled independently with probability $p$ and using the mentioned deterministic analysis, we propose a combinatorial method to derive a lower bound on the sampling probability $p$, or equivalently, the number of sampled entries that guarantees finite completability with high probability. We also show that the existing result for the matrix completion problem can be used to obtain a loose lower bound on the sampling probability $p$. In addition, we obtain deterministic and probabilistic conditions for unique completability. It is seen that the number of samples required for finite or unique completability obtained by the proposed analysis on the CP manifold is orders-of- magnitude lower than that is obtained by the existing analysis on the Grassmannian manifold. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

JMLR Journal 2017 Journal Article

Rank Determination for Low-Rank Data Completion

  • Morteza Ashraphijuo
  • Xiaodong Wang
  • Vaneet Aggarwal

Recently, fundamental conditions on the sampling patterns have been obtained for finite completability of low-rank matrices or tensors given the corresponding ranks. In this paper, we consider the scenario where the rank is not given and we aim to approximate the unknown rank based on the location of sampled entries and some given completion. We consider a number of data models, including single-view matrix, multi-view matrix, CP tensor, tensor-train tensor and Tucker tensor. For each of these data models, we provide an upper bound on the rank when an arbitrary low-rank completion is given. We characterize these bounds both deterministically, i.e., with probability one given that the sampling pattern satisfies certain combinatorial properties, and probabilistically, i.e., with high probability given that the sampling probability is above some threshold. Moreover, for both single-view matrix and CP tensor, we are able to show that the obtained upper bound is exactly equal to the unknown rank if the lowest-rank completion is given. Furthermore, we provide numerical experiments for the case of single-view matrix, where we use nuclear norm minimization to find a low-rank completion of the sampled data and we observe that in most of the cases the proposed upper bound on the rank is equal to the true rank. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

EAAI Journal 2010 Journal Article

Adaptive typhoon cloud image enhancement using genetic algorithm and non-linear gain operation in undecimated wavelet domain

  • Changjiang Zhang
  • Xiaodong Wang
  • Chunjiang Duanmu

By combining discrete undecimated wavelet transform (UWT) with genetic algorithm (GA) an efficient enhancement algorithm for typhoon cloud image is proposed. Having implemented UWT to a typhoon cloud mage, noise in a typhoon cloud image is reduced by modifying the undecimated wavelet coefficients by combining with generalization cross validation at fine resolution levels. GA and non-linear gain operation are used to modify the undecimated wavelet coefficients at coarse resolution levels in order to extrude the details of a typhoon cloud image. Experimental results show that the proposed algorithm can efficiently reduce the additive gauss white noise in a typhoon cloud image while well extruding the detail. In order to accurately assess an enhanced typhoon cloud image’s quality, an overall score index is proposed based on information entropy, contrast measure and peak signal-noise ratio. Finally, comparisons between the proposed algorithm and five other similar methods, are carried out.