Author name cluster

Xiaohua Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

Explainable Synthetic Image Detection Through Diffusion Timestep Ensembling

Yixin Wu
Feiran Zhang
Tianyuan Shi
Ruicheng Yin
Zhenghua Wang
Zhenliang Gan
Xiaohua Wang
Changze Lv

Recent advances in diffusion models have enabled the creation of deceptively real images, posing significant security risks when misused. In this study, we empirically show that different timesteps of DDIM inversion reveal varying subtle distinctions between synthetic and real images that are extractable for detection, taking the forms of such as Fourier power spectrum high-frequency discrepancies and inter-pixel variance distributions. Based on these observations, we propose a novel detection method named ESIDE that directly utilizes features of intermediately noised images by training an ensemble on multiple noised timesteps, circumventing the overtime of conventional reconstruction-based strategies. To enhance human comprehension, we introduce a metric-grounded explanation refinement module to identify and explain AI-generated flaws. Additionally, we present the benchmarks GenHard and GenExplain, offering detection samples of greater difficulty and high-quality rationales for fake images. Extensive experiments show that ESIDE achieves state-of-the-art performance with 98.91% and 95.89% detection accuracy on regular and challenging samples respectively, and demonstrates generalizability and robustness.

PDF Details DOI

EAAI Journal 2025 Journal Article

A real-time prediction model for instantaneous dam-break flood evolution of concrete gravity dams based on attention mechanism and spatiotemporal multiple features

Chao Wang
Yaofei Zhang
Sherong Zhang
Xiaohua Wang
Xingbo Zhou
Yishu Lai

Simulating the flood evolution following the sudden breach of concrete gravity dams is crucial for enabling prompt emergency flood control decisions. The real-time performance and reliability of these flood propagation simulations are essential for improving the accuracy and speed of emergency responses. This study introduces a deep learning model that integrates an attention mechanism to predict flood evolution parameters in real time. Initially, parameters such as water depth and flow rate were measured under 32 distinct dam-break scenarios using a hydrodynamic model. By combining terrain data with time-series flood discharge data, we compiled a dataset containing 1984 entries, enhanced through reduced-order methods. A novel deep learning model, the Flood-Swin-Transformer, was then developed to predict the spatiotemporal evolution of dam-break floods. This model was benchmarked against 11 baseline models and four state-of-the-art deep learning models. The results indicate: (1) Baseline models accurately predict water depth but are less effective at predicting flow rate parameters; (2) Deep learning models outperform baseline models in both accuracy and classification capabilities for water depth and flow rate parameters, showing robust performance; (3) Extensive analyses, including error, classification accuracy, effectiveness, robustness, and flood parameter error mapping, demonstrate the superior performance of the proposed model; (4) The proposed model predicts flood evolution up to 43. 75 times faster than traditional hydrodynamic models, facilitating real-time prediction capabilities.

Details DOI

NeurIPS Conference 2025 Conference Paper

Chain-of-Model Learning for Language Model

Xiaohua Wang
Kaitao Song
Xu Tan
Huiqiang Jiang
Chengruidong Zhang
Yongliang Shen
Cen Lu
Zihao Li

In this paper, we propose a novel learning paradigm, termed Chain-of-Model (CoM), which incorporates the causal relationship into the hidden states of each layer as a chain style. thereby introducing great scaling efficiency in model training and inference flexibility in deployment. We introduce the concept of Chain-of-Representation (CoR), which formulates the hidden states at each layer as a combination of multiple sub-representations (i. e. , chains). In each layer, each chain from the output representations can only view all of its preceding chains in the input representations. Consequently, the model built upon CoM framework can progressively scale up the model size by increasing the chains based on the previous models (i. e. , chains), and offer multiple sub-models at varying sizes for elastic inference by using different chain numbers. Based on this principle, we devise Chain-of-Language-Model (CoLM), which incorporates the idea of CoM into each layer of Transformer architecture. Based on CoLM, we further introduce CoLM-Air by introducing a KV sharing mechanism, that computes all keys and values within the first chain and then shares across all chains. This design demonstrates additional extensibility, such as enabling seamless LM switching, prefilling acceleration and so on. Experimental results demonstrate our CoLM family can achieve comparable performance to the standard Transformer, while simultaneously enabling greater flexiblity, such as progressive scaling to improve training efficiency and offer multiple varying model sizes for elastic inference, paving a a new way toward building language models.

PDF Details

ICML Conference 2025 Conference Paper

Dendritic Localized Learning: Toward Biologically Plausible Algorithm

Changze Lv
Jingwen Xu
Yiyang Lu
Xiaohua Wang
Zhenghua Wang
Zhibo Xu
Di Yu 0001
Xin Du 0002

Backpropagation is the foundational algorithm for training neural networks and a key driver of deep learning’s success. However, its biological plausibility has been challenged due to three primary limitations: weight symmetry, reliance on global error signals, and the dual-phase nature of training, as highlighted by the existing literature. Although various alternative learning approaches have been proposed to address these issues, most either fail to satisfy all three criteria simultaneously or yield suboptimal results. Inspired by the dynamics and plasticity of pyramidal neurons, we propose Dendritic Localized Learning (DLL), a novel learning algorithm designed to overcome these challenges. Extensive empirical experiments demonstrate that DLL satisfies all three criteria of biological plausibility while achieving state-of-the-art performance among algorithms that meet these requirements. Furthermore, DLL exhibits strong generalization across a range of architectures, including MLPs, CNNs, and RNNs. These results, benchmarked against existing biologically plausible learning algorithms, offer valuable empirical insights for future research. We hope this study can inspire the development of new biologically plausible algorithms for training multilayer networks and advancing progress in both neuroscience and machine learning. Our code is available at https: //github. com/Lvchangze/Dendritic-Localized-Learning.

Details

NeurIPS Conference 2025 Conference Paper

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

Jiahao Wang
Weiye Xu
Aijun Yang
Wengang Zhou
Lewei Lu
Houqiang Li
Xiaohua Wang
Jinguo Zhu

Outcome‑reward reinforcement learning (RL) is a common—and increasingly significant—way to refine the step‑by‑step reasoning of multimodal large language models (MLLMs). In the multiple‑choice setting—a dominant format for multimodal reasoning benchmarks—the paradigm faces a significant yet often overlooked obstacle: unfaithful trajectories that guess the correct option after a faulty chain of thought receive the same reward as genuine reasoning, which is a flaw that cannot be ignored. We propose Self‑Consistency Sampling (SCS) to correct this issue. For each question, SCS (i) introduces small visual perturbations and (ii) performs repeated truncation‑and‑resampling of a reference trajectory; agreement among the resulting trajectories yields a differentiable consistency score that down‑weights unreliable traces during policy updates. Plugging SCS into RLOO, GRPO, REINFORCE++ series improves accuracy by up to 7. 7 percentage points on six multimodal benchmarks with negligible extra computation, offering a simple, general remedy for outcome‑reward RL in MLLMs.

PDF Details

EAAI Journal 2023 Journal Article

Integration of ROV and vision-based underwater inspection for Limnoperna fortunei in water conveyance structure

Xin Fang
Heng Li
Sherong Zhang
Jikang Zhang
Chao Wang
Xiaohua Wang
Ziao Ma
He Jia

The invasion of Limnoperna fortunei (L. fortunei) has been identified as one major biofouling in the operation of hydraulic engineering, which not only corrodes the concrete structures but also reduces the pipe diameter and increases the surface roughness, leading to the decrease of water conveyance capacity and the increase of the project operation cost. To better cope with this problem, an automated underwater inspection analysis scheme for the biofouling of L. fortune is provided in this study, which innovatively integrates the underwater remote operating rover (ROV) and computer vision techniques to inspect and evaluate the invasion of L. fortunei in water conveyance structure. This scheme first presents an image enhancement approach based on the fusion strategy to improve the quality of images extracted from underwater robot inspection videos. Then, the L. fortunei is segmented by U-Net in the enhanced underwater images, and the definition of adherent area ratio quantitatively assesses the biofouling severity. At last, the underwater inspection analysis scheme is implemented in a typical aqueduct, and the automatic analysis results are compared with the field investigation during the emptying maintenance of the aqueduct. In this study, the dataset of real ROV inspection video sequences was first used to evaluate the effectiveness of the proposed method for inspecting L. fortunei invasions in realistic scenarios, and then for the comparison with state-of-the-art methods. The results show that the proposed automated inspection scheme is capable of efficiently improving the underwater imaging quality and accurately detecting the L. fortunei.

Details DOI

NeurIPS Conference 2022 Conference Paper

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e. g. , videotext retrieval and video caption. Code and pre-trained generalist models are publicly released at https: //github. com/fundamentalvision/Uni-Perceiver.

PDF Details

IROS Conference 2004 Conference Paper

Dynamic analysis and experiment of a 3mm swimming microrobot

Yi Zhang
Qimin Wang
Peiqiang Zhang
Xiaohua Wang
Tao Mei

A swimming microrobot driven by FMP (ferromagnetic polymer) actuators under magnetic field is developed. The size of the swimming microrobot is 3 mm /spl times/ 2 mm /spl times/ 0. 4 mm. The robot can swim when the magnetic intensity is higher than 8 mT and the frequency is about 10 Hz, with the speed of 0. 3/spl sim/1 mm/s. Dynamic analysis of the microrobot is performed, and the results fit the experiment data reasonably. The experimental results indicate that the driving method is effective and the swimming speed can be controlled by modifying the intensity of magnetic field.

Details

ICRA Conference 2004 Conference Paper

Grasp Characteristics of an Underactuated Robot Hand

Minzhou Luo
Tao Mei 0003
Xiaohua Wang
Yong Yu 0003

An underactuated robot hand with three fingers has been developed to achieve planar pinch grasp and enveloping grasp. The grasp characteristics of the underactuated hand are studied. In planar pinch grasp, which depends on two or three distal phalanges to work together, the key factors to influence grasp stability are investigated, and the sufficient and necessary conditions for stable grasp are proposed with force-closure conception. The allowed maximum external wrench is calculated when the grasp can't meet force-closure condition. In enveloping grasp, the phalanges driven by an actuator can't be controlled independently and have local motion lead to unstable grasp. Based on the definition of grasp configuration and driving force, the relationship between grasp configuration and external wrench is studied. The transition difficulty of grasp configuration is employed to evaluate the stability of enveloping grasp. Finally, some cases of stimulation are given to validate the evaluation.

Details

ICRA Conference 2004 Conference Paper

Multisensory Gripper and Local Autonomy of Extravehicular Mobile Robot

Yang Liu
Tao Mei
Xiaohua Wang
Bin Liang

This paper presents the development of the multisensory robot gripper for extravehicular mobile robot (EMR) and its sensor based local autonomy. For stable extravehicular walking and performing delicate tasks in unstructured and complex environment, our EMR gripper employed a simple and reliable mechanism and it is equipped with multisensory apparatus. Local autonomy of the space robot is an important requirement for on-orbit manipulation. Detecting contact state between robot gripper and environment is essential to fulfill space robot local autonomy. But we often face the problem of lack of sensory information when we try to know the contact state. A new way to detect: contact state under inadequate sensory information is proposed. By combing force sensor information with gripper geometry and mechanical analysis, some spatial contact information between robot and the trusswork can be derived. Then robot can adjust its position and orientation by fine motion displacement based on contact information to fulfill steady grasping. This method is implemented on a walking/grasping task, which is a simple and important fundamental task for extravehicular space robot.

Details