Author name cluster

Wei Ke

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI

Jianlei Chang
Ruofeng Mei
Wei Ke
Xiangyu Xu

Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a unified framework for efficient embodied AI with flow-based policy learning. To enhance data efficiency, we bring equivariance into flow matching. We theoretically prove that when using an isotropic Gaussian prior and an equivariant velocity prediction network, the resulting action distribution remains equivariant, leading to improved generalization and substantially reduced data demands. To accelerate sampling, we propose a novel acceleration regularization strategy. As direct computation of acceleration is intractable for marginal flow trajectories, we derive a novel surrogate loss that enables stable and scalable training using only conditional trajectories. Across a wide range of robotic manipulation benchmarks, the proposed algorithm achieves competitive or superior performance under limited data while offering dramatically faster inference. These results highlight EfficientFlow as a powerful and efficient paradigm for high-performance embodied AI.

PDF Details DOI

EAAI Journal 2026 Journal Article

Hard constraints and soft learning dual-graph anomaly detection for industrial processes

Chuan Zhang
Ming-Qing Zhang
Yi Luo
Wei Ke
Qun-Xiong Zhu
Yan-Lin He
Yang Zhang
Yuan Xu

Anomaly detection is critical for safe and stable operation in industrial processes. Industrial data exhibits strong spatiotemporal dependence, while variable interactions often evolve dynamically. Traditional methods struggle to model both fixed physical constraints and dynamic data relationships. This paper proposes a hard constraints and soft learning dual-graph anomaly detection (HCSL-DGAD). First, a macro-graph with hard constraints is constructed based on the connections between various components of the industrial process, and micro-graphs with soft learning is constructed through an adaptive method based on attention mechanisms. The macro-graph transmits physical constraints through a spatiotemporal graph convolutional network to ensure the rationality of the abnormal propagation path. The micro-graph uses a multi-head attention mechanism to dynamically learn the implicit relationship between nodes and capture coupling information not covered by the physical topology. Secondly, to address the multi-scale anomalies in the spatiotemporal domain, a dual-channel architecture is employed to extract features from both the macro and micro-graphs. Edge weights and node states are alternately updated in the micro-graph channel to accurately identify anomaly patterns at different scales. At the same time, temporal attention and variable attention are combined in the macro-graph channel to jointly improve the detection accuracy. Extensive experiments on three benchmarks show that HCSL-DGAD achieves average F1-scores of 98. 24%, 89. 92%, and 87. 00% on the Tennessee Eastman (TE) process, Secure Water Treatment (SWaT) and PROcess NeTwork Optimization (PRONTO) datasets, respectively.

Details DOI

AAAI Conference 2026 Conference Paper

Monte Carlo Diffusion for Generalizable Learning-Based RANSAC

Jiale Wang
Chen Zhao
Wei Ke
Tong Zhang

Random Sample Consensus (RANSAC) is a fundamental approach for robustly estimating parametric models from noisy data. Existing learning-based RANSAC methods utilize deep learning to enhance the robustness of RANSAC against outliers. However, these approaches are trained and tested on the data generated by the same algorithms, leading to limited generalization to out-of-distribution data during inference. Therefore, in this paper, we introduce a novel diffusion-based paradigm that progressively injects noise into ground-truth data, simulating the noisy conditions for training learning-based RANSAC. To enhance data diversity, we incorporate Monte Carlo sampling into the diffusion paradigm, approximating diverse data distributions by introducing different types of randomness at multiple stages. We evaluate our approach in the context of feature matching through comprehensive experiments on the ScanNet and MegaDepth datasets. The experimental results demonstrate that our Monte Carlo diffusion mechanism significantly improves the generalization ability of learning-based RANSAC. We also develop extensive ablation studies that highlight the effectiveness of key components in our framework.

PDF Details DOI

JBHI Journal 2025 Journal Article

HRMamba: Fusing Luminance Information for Remote Physiological Measurement in Varied Lighting Conditions

Kaiwen Yang
Nuoer Long
Wei Ke
Chan-Tong Lam
Tao Tan
Zitong Yu
Yue Sun

Camera-based photoplethysmography (cbPPG) represents a non-invasive technique for capturing physiological parameters through facial videos, enabling the extraction of vital signs such as heart rate, respiration rate, and blood oxygen saturation without direct physical contact. Existing deep learning methods face two core challenges when dealing with cbPPG: firstly, extracting weak PPG signals from video segments with large spatial and temporal redundancy and understanding their periodic patterns in long contexts; secondly, accurately extracting PPG signals in complex lighting environments, especially in low-light conditions. To address these issues, this paper proposes an end-to-end method based on Mamba, named HRMamba. This method employs temporal difference mamba to process temporal signals and combines bidirectional state space to enable Mamba to robustly understand the scene and learn the periodic patterns of PPG. Furthermore, a luminance post-processing module is designed to extract luminance information from the video without enhancing lighting or altering the original video data, and embed it into the PPG signal. Experimental results demonstrate that HRMamba achieves state-of-the-art performance, and the designed luminance post-processing module can be applied in various lighting environments, significantly enhancing the performance in dark environments without degrading the performance in normal light scenes.

Details DOI

EAAI Journal 2025 Journal Article

Latent temporal smoothness-induced Schatten- p norm factorization for sequential subspace clustering

Yuan Xu
Zhen-Zhen Zhao
Tong-Wei Lu
Wei Ke
Yi Luo
Yan-Lin He
Qun-Xiong Zhu
Yang Zhang

This paper presents an innovative latent temporal smoothness-induced Schatten- p norm factorization (SpFLTS) method aimed at addressing challenges in sequential subspace clustering tasks. Globally, SpFLTS employs a low-rank subspace clustering framework based on Schatten-2/3 norm factorization to enhance the comprehensive capture of the original data features. Locally, a total variation smoothing term is induced to the temporal gradients of latent subspace matrices obtained from sub-orthogonal projections, thereby preserving smoothness in the sequential latent space. To efficiently solve the closed-form optimization problem, a fast Fourier transform is combined with the non-convex alternating direction method of multipliers to optimize latent subspace matrix, which greatly speeds up computation. Experimental results demonstrate that the proposed SpFLTS method surpasses existing techniques on multiple benchmark databases, highlighting its superior clustering performance and extensive application potential.

Details DOI

EAAI Journal 2025 Journal Article

Regression loss-assisted conditional style generative adversarial network for virtual sample generation with small data in soft sensing

Xue-Yu Zhang
Qun-Xiong Zhu
Wei Ke
Yan-Lin He
Ming-Qing Zhang
Yuan Xu

Existing methods that extend virtual sample pools to address small sample problem caused by sample atypicality and uneven distribution often overlook data sparsity and inverse sample generation challenges, which limits the accuracy of subsequent modeling. To address above problem, we propose a novel regression-assisted conditional style generative adversarial network (RAC-StyleGAN). The proposed method leverages the strengths of StyleGAN in latent space mapping to enhance data diversity and granularity, while incorporating regression-assisted conditions to improve modeling performance. Specifically, RAC-StyleGAN utilizes kernel density estimation and radial basis function interpolation to ensure that the generated output variables are uniformly distributed. Based on the principle of inverse transformation, the interpolated output variables are then used as conditional inputs for the StyleGAN model, generating virtual input variables that faithfully reflect the marginal distribution of the original data. Furthermore, to preserve the complex nonlinear relationships between input and output variables, RAC-StyleGAN integrates a regression loss strategy based on empirical risk minimization into the StyleGAN framework. By fine-tuning the generation process, the soft-sensing model effectively captures the nonlinear mapping between inputs and outputs. Experimental validations on synthetic nonlinear functions, University of California Irvine machine learning (UCI) datasets, and a real-world purified terephthalic acid (PTA) solvent system demonstrate that RAC-StyleGAN effectively generates high-quality virtual samples, significantly enhancing the modeling performance.

Details DOI

EAAI Journal 2025 Journal Article

Spatio-temporal attention based collaborative local–global learning for traffic flow prediction

Haiyang Chi
Yuhuan Lu
Can Xie
Wei Ke
Bidong Chen

Traffic flow prediction is crucial for intelligent transportation systems (ITS), providing valuable insights for traffic control, route planning, and operation management. Existing work often separately models the spatial and temporal dependencies and primarily relies on predefined graphs to represent spatio-temporal dependencies, neglecting the traffic dynamics caused by unexpected events and the global relationships among road segments. Unlike previous models that primarily focus on local feature extraction, we propose a novel collaborative local–global learning model (LOGO) that employs spatio-temporal attention (STA) and graph convolutional networks (GCN). Specifically, LOGO simultaneously extracts hidden traffic features from both local and global perspectives. In local feature extraction, a novel STA is devised to directly attend to spatio-temporal coupling interdependencies instead of separately modeling temporal and spatial dependencies, and to capture in-depth spatio-temporal traffic context with an adaptive graph focusing on the dynamics in traffic flow. In global feature extraction, a global correlation matrix is constructed and GCNs are utilized to propagate messages on the obtained matrix to achieve interactions between both adjacent and similar road segments. Finally, the obtained local and global features are concatenated and fed into a gated aggregation to forecast future traffic flow. Extensive experiments on four real-world traffic datasets sourced from the Caltrans Performance Measurement System (PEMS03, PEMS04, PEMS07, and PEMS08) demonstrate the effectiveness of our proposed model. LOGO achieves the best performance over 18 state-of-the-art baselines and the best prediction performance with the highest improvement of 6. 06% on the PEMS07 dataset. Additionally, two real-world case studies further substantiate the robustness and interpretability of LOGO.

Details DOI

EAAI Journal 2025 Journal Article

Stereo matching on epipolar plane image for light field depth estimation via oriented structure

Rongshan Chen
Hao Sheng
Ruixuan Cong
Da Yang
Zhenglong Cui
Sizhe Wang
Wei Ke

Depth estimation plays a pivotal role in civil engineering such as road surface defect detection, as it serves as a valuable tool, offering high-precision and critical information about scene surface geometry. The Light Field captures both spatial and angular information of a scene, enabling precise depth estimation. The Epipolar Plane Image represents a specific 2-dimensional slice of the light field and is characterized by multiple depth-related lines. Previous epipolar plane image-based methods typically estimate depth maps by extracting the optimal slope for each line; however, they often neglect the visual relationships within this representation, leading to inaccuracies. In this paper, we explore the visual relationship of it and propose a novel visual feature, termed Oriented Structure, which can be utilized to compute scene depth. Similar to previous stereo matching-based methods, we design a new epipolar plane image-based cost volume to extract depth cues from this structure. The cost volume combines the occlusion robustness of epipolar plane image-based methods with the noise robustness of stereo matching-based methods, resulting in smoother depth maps with sharper edges. Building on the framework of existing stereo matching networks, we introduce an epipolar plane image-based stereo matching network for light field depth prediction. Finally, we conduct experiments using both synthetic and real datasets, demonstrating that our network produces higher-quality depth maps compared to previous state-of-the-art methods, ranking first (about 1. 405 mean squared error) on the 4-dimensional light field benchmark. Additionally, we also apply our method to defect detection tasks, providing accurate depth information that leads to improved results.

Details DOI

NeurIPS Conference 2022 Conference Paper

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

Zicheng Zhang
Yi Zhu
Jianzhuang Liu
Xiaodan Liang
Wei Ke

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Previous works learn to straightforwardly align the sentence embedding and pixel-level embedding for highlighting the referred objects, but ignore the semantic consistency of pixels within the same object, leading to incomplete masks and localization errors in predictions. To tackle this problem, we propose CoupAlign, a simple yet effective multi-level visual-semantic alignment method, to couple sentence-mask alignment with word-pixel alignment to enforce object mask constraint for achieving more accurate localization and segmentation. Specifically, the Word-Pixel Alignment (WPA) module performs early fusion of linguistic and pixel-level features in intermediate layers of the vision and language encoders. Based on the word-pixel aligned embedding, a set of mask proposals are generated to hypothesize possible objects. Then in the Sentence-Mask Alignment (SMA) module, the masks are weighted by the sentence embedding to localize the referred object, and finally projected back to aggregate the pixels for the target. To further enhance the learning of the two alignment modules, an auxiliary loss is designed to contrast the foreground and background pixels. By hierarchically aligning pixels and masks with linguistic features, our CoupAlign captures the pixel coherence at both visual and semantic levels, thus generating more accurate predictions. Extensive experiments on popular datasets (e. g. , RefCOCO and G-Ref) show that our method achieves consistent improvements over state-of-the-art methods, e. g. , about 2% oIoU increase on the validation and testing set of RefCOCO. Especially, CoupAlign has remarkable ability in distinguishing the target from multiple objects of the same class. Code will be available at https: //gitee. com/mindspore/models/tree/master/research/cv/CoupAlign.

PDF Details

AAAI Conference 2021 Conference Paper

Error-Aware Density Isomorphism Reconstruction for Unsupervised Cross-Domain Crowd Counting

Yuhang He
Zhiheng Ma
Xing Wei
Xiaopeng Hong
Wei Ke
Yihong Gong

This paper focuses on the unsupervised domain adaptation problem for video-based crowd counting, in which we use labeled data as source domain and unlabelled video data as target domain. It is challenging as there is a huge gap between the source and the target domain and no annotations of samples are available in the target domain. The key issue is how to utilize unlabelled videos in the target domain for knowledge learning and transferring from the source domain. To tackle this problem, we propose a novel Error-aware Density Isomorphism REConstruction Network (EDIREC-Net) for cross-domain crowd counting. EDIREC-Net jointly transfers a pre-trained counting model to target domains using a density isomorphism reconstruction objective and models the reconstruction erroneousness by error reasoning. Specifically, as crowd flows in videos are consecutive, the density maps in adjacent frames turn out to be isomorphic. On this basis, we regard the density isomorphism reconstruction error as a self-supervised signal to transfer the pre-trained counting models to different target domains. Moreover, we leverage an estimation-reconstruction consistency to monitor the density reconstruction erroneousness and suppress unreliable density reconstructions during training. Experimental results on four benchmark datasets demonstrate the superiority of the proposed method and ablation studies investigate the efficiency and robustness. The source code is available at https: //github. com/GehenHe/EDIREC-Net.

PDF Details