Arrow Research search

Author name cluster

Wen Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAAI Conference 2026 Conference Paper

FreDN: Spectral Disentanglement for Time Series Forecasting via Learnable Frequency Decomposition

  • Zhongde An
  • Jinhong You
  • Jiyanglin Li
  • Yiming Tang
  • Wen Li
  • Heming Du
  • shouguo du

Time series forecasting is essential in a wide range of real world applications. Recently, frequency-domain methods have attracted increasing interest for their ability to capture global dependencies. However, when applied to non-stationary time series, these methods encounter the spectral entanglement and the computational burden of complex-valued learning. The spectral entanglement refers to the overlap of trends, periodicities, and noise across the spectrum due to spectral leakage and the presence of non-stationarity. However, existing decompositions are not suited to resolving spectral entanglement. To address this, we propose the Frequency Decomposition Network (FreDN), which introduces a learnable Frequency Disentangler module to separate trend and periodic components directly in the frequency domain. Furthermore, we propose a theoretically supported ReIm Block to reduce the complexity of complex-valued operations while maintaining performance. We also re-examine the frequency-domain loss function and provide new theoretical insights into its effectiveness. Extensive experiments on seven long-term forecasting benchmarks demonstrate that FreDN outperforms state-of-the-art methods by up to 10%. Furthermore, compared with standard complex-valued architectures, our real-imaginary shared-parameter design reduces the parameter count and computational cost by at least 50%.

AAAI Conference 2026 Conference Paper

RCP-LO: A Relative Coordinate Prediction Framework for Generalizable Deep LiDAR Odometry

  • Chen Liu
  • Wen Li
  • Yongshu Huang
  • Minghang Zhu
  • Yuyang Yang
  • Dunqiang Liu
  • Sheng Ao
  • Cheng Wang

LiDAR odometry is a critical component of SLAM in autonomous driving and robotics. Learning-based methods have shown remarkable performance by regressing relative poses in an end-to-end manner. However, when applying these trained models, originally developed on the widely used KITTI dataset, to other scenes, performance often drops significantly. In other words, existing methods struggle to generalize well to new environments. To address this challenge, we propose RCP-LO, a simple yet effective LiDAR odometry framework. We introduce a novel representation for relative poses, reformulating them as relative coordinates, which can then be solved using geometrical verification. This approach avoids overly simplified pose representations and makes better use of scene geometry, thereby improving generalization. Moreover, to capture the inherent uncertainties in relative pose estimation from occluded LiDAR point clouds from dynamic environments, we adapt our framework to learn a denoising diffusion model, allowing for sampling plausible relative coordinates while improving robustness. We also introduce a differentiable geometric weighted singular value decomposition module, enabling efficient pose estimation through a single forward pass. Extensive experiments demonstrate that RCP-LO, trained exclusively on the KITTI dataset, achieves competitive performance compared to SOTA learning-based methods and generalizes effectively to the KITTI-360, Ford, and Oxford datasets.

AAAI Conference 2026 Conference Paper

V2VLoc: Robust GNSS-Free Collaborative Perception via LiDAR Localization

  • Wenkai Lin
  • Qiming Xia
  • Wen Li
  • Xun Huang
  • Chenglu Wen

Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.

IROS Conference 2025 Conference Paper

DR-MPC: Disturbance-Resilient Model Predictive Visual Servoing Control for Quadrotor UAV Pipeline Inspect

  • Wen Li
  • Jinya Su
  • Cunjia Liu
  • Wen-Hua Chen 0001
  • Shihua Li 0001

Unmanned Aerial Vehicles (UAVs) are gaining attention for inspections due to their improved safety, efficiency, and accuracy, alongside reduced costs and environmental risks. Visual servoing is crucial for autonomous UAV flight in GPS-degraded environments, guiding the UAV by minimizing errors between observed and desired visual features. This study focuses on Image-Based Visual Servoing (IBVS) control for quadrotor UAVs under complex dynamics and environmental disturbances. A nonlinear model predictive control (MPC) framework is first integrated with visual servoing to handle dynamics nonlinearity, control optimality, and constraints. To address uncertainties and disturbances, a Generalized Extended State Observer (GESO) is incorporated into the MPC, forming the Disturbance-Resilient (DR-) MPC. The GESO estimates the lumped disturbance to improve model predictions within the MPC horizon. The proposed algorithm is validated in a realistic Gazebo environment for UAV pipeline inspection in 3D scenarios, showing better control accuracy and reduced inspection time compared to three baseline methods: IBVS, IBVS-MPC(K) with kinematics, and IBVS-MPC(D) with dynamics. 1

NeurIPS Conference 2025 Conference Paper

GD$^2$: Robust Graph Learning under Label Noise via Dual-View Prediction Discrepancy

  • Kailai Li
  • Jiong Lou
  • Jiawei Sun
  • Honghong Zeng
  • Wen Li
  • Chentao Wu
  • Yuan Luo
  • Wei Zhao

Graph Neural Networks (GNNs) achieve strong performance in node classification tasks but exhibit substantial performance degradation under label noise. Despite recent advances in noise-robust learning, a principled approach that exploits the node-neighbor interdependencies inherent in graph data for label noise detection remains underexplored. To address this gap, we propose GD$^2$, a noise-aware \underline{G}raph learning framework that detects label noise by leveraging \underline{D}ual-view prediction \underline{D}iscrepancies. The framework contrasts the \textit{ego-view}, constructed from node-specific features, with the \textit{structure-view}, derived through the aggregation of neighboring representations. The resulting discrepancy captures disruptions in semantic coherence between individual node representations and the structural context, enabling effective identification of mislabeled nodes. Building upon this insight, we further introduce a view-specific training strategy that enhances noise detection by amplifying prediction divergence through differentiated view-specific supervision. Extensive experiments on multiple datasets and noise settings demonstrate that \name~achieves superior performance over state-of-the-art baselines.

NeurIPS Conference 2025 Conference Paper

GTR-Loc: Geospatial Text Regularization Assisted Outdoor LiDAR Localization

  • Shangshu Yu
  • Wen Li
  • Xiaotian Sun
  • Zhimin Yuan
  • Xin Wang
  • Sijie Wang
  • Rui She
  • Cheng Wang

Prevailing scene coordinate regression methods for LiDAR localization suffer from localization ambiguities, as distinct locations can exhibit similar geometric signatures — a challenge that current geometry-based regression approaches have yet to solve. Recent vision–language models show that textual descriptions can enrich scene understanding, supplying potential localization cues missing from point cloud geometries. In this paper, we propose GTR-Loc, a novel text-assisted LiDAR localization framework that effectively generates and integrates geospatial text regularization to enhance localization accuracy. We propose two novel designs: a Geospatial Text Generator that produces discrete pose-aware text descriptions, and a LiDAR-Anchored Text Embedding Refinement module that dynamically constructs view-specific embeddings conditioned on current LiDAR features. The geospatial text embeddings act as regularization to effectively reduce localization ambiguities. Furthermore, we introduce a Modality Reduction Distillation strategy to transfer textual knowledge. It enables high-performance LiDAR-only localization during inference, without requiring runtime text generation. Extensive experiments on challenging large-scale outdoor datasets, including QEOxford, Oxford Radar RobotCar, and NCLT, demonstrate the effectiveness of GTR-Loc. Our method significantly outperforms state-of-the-art approaches, notably achieving a 9. 64%/8. 04% improvement in position/orientation accuracy on QEOxford. Our code is available at https: //github. com/PSYZ1234/GTR-Loc.

ICML Conference 2025 Conference Paper

Integrating Intermediate Layer Optimization and Projected Gradient Descent for Solving Inverse Problems with Diffusion Models

  • Yang Zheng
  • Wen Li
  • Zhaoqiang Liu

Inverse problems (IPs) involve reconstructing signals from noisy observations. Recently, diffusion models (DMs) have emerged as a powerful framework for solving IPs, achieving remarkable reconstruction performance. However, existing DM-based methods frequently encounter issues such as heavy computational demands and suboptimal convergence. In this work, building upon the idea of the recent work DMPlug, we propose two novel methods, DMILO and DMILO-PGD, to address these challenges. Our first method, DMILO, employs intermediate layer optimization (ILO) to alleviate the memory burden inherent in DMPlug. Additionally, by introducing sparse deviations, we expand the range of DMs, enabling the exploration of underlying signals that may lie outside the range of the diffusion model. We further propose DMILO-PGD, which integrates ILO with projected gradient descent (PGD), thereby reducing the risk of suboptimal convergence. We provide an intuitive theoretical analysis of our approaches under appropriate conditions and validate their superiority through extensive experiments on diverse image datasets, encompassing both linear and nonlinear IPs. Our results demonstrate significant performance gains over state-of-the-art methods, highlighting the effectiveness of DMILO and DMILO-PGD in addressing common challenges in DM-based IP solvers.

ECAI Conference 2025 Conference Paper

PCFNet: Enhancing Time Series Forecasting Through Preserving Constant Frequency

  • Wenjun Yu
  • Wen Li
  • Wentao Gao
  • Wangyu Wu
  • Shouguo Du
  • Jiyanglin Li

Long-term time series forecasting has been widely applied in finance, traffic, and other domains. The stable periodic patterns serve as the foundation for conducting long-term forecasting. However, real-world time series often consist of multi-periodic components and trend components, which poses a significant challenge to time series prediction. In this paper, we introduce PCFNet, a simple yet effective time series forecasting model, which enhances time series forecasting by preserving the constant frequency components that represent the multi-periodicity of time series during the forecasting process. Specifically, PCFNet adaptively identifies the constant frequency components through a simple gated network. Then, the residual frequency components are predicted via a single layer of complex-valued linear layer. Finally, the residual frequency components are added to the constant frequency components to obtain the final outcome. Extensive experimental results across multiple real-world time series datasets demonstrate that PCFNet achieves state-of-the-art performance as a simple architecture.

AAAI Conference 2025 Conference Paper

S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field

  • Zixi Liang
  • Guowei Xu
  • Haifeng Wu
  • Ye Huang
  • Wen Li
  • Lixin Duan

Learning-based methods have become increasingly popular in 3D indoor scene synthesis (ISS), showing superior performance over traditional optimization-based approaches. These learning-based methods typically model distributions on simple yet explicit scene representations using generative models. However, due to the oversimplified explicit representations that overlook detailed information and the lack of guidance from multimodal relationships within the scene, most learning-based methods struggle to generate indoor scenes with realistic object arrangements and styles. In this paper, we introduce a new method, Scene Implicit Neural Field (S-INF), for indoor scene synthesis, aiming to learn meaningful representations of multimodal relationships, to enhance the realism of indoor scene synthesis. S-INF assumes that the scene layout is often related to the object-detailed information. It disentangles the multimodal relationships into scene layout relationships and detailed object relationships, fusing them later through implicit neural fields (INFs). By learning specialized scene layout relationships and projecting them into S-INF, we achieve a realistic generation of scene layout. Additionally, S-INF captures dense and detailed object relationships through differentiable rendering, ensuring stylistic consistency across objects. Through extensive experiments on the benchmark 3D-FRONT dataset, we demonstrate that our method consistently achieves state-of-the-art performance under different types of ISS.

NeurIPS Conference 2025 Conference Paper

SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

  • Jinhong Deng
  • Wen Li
  • Joey Tianyi Zhou
  • Yang He

Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Existing visual token pruning methods primarily focus on selecting the most salient tokens based on attention scores, resulting in the semantic incompleteness of the selected tokens. In this paper, we propose a novel visual token pruning strategy, called S aliency- C overage O riented token P runing for E fficient MLLMs (SCOPE), to jointly model both the saliency and coverage of the selected visual tokens to better preserve semantic completeness. Specifically, we introduce a set-coverage for a given set of selected tokens, computed based on the token relationships. We then define a token-coverage gain for each unselected token, quantifying how much additional coverage would be obtained by including it. By integrating the saliency score into the token-coverage gain, we propose our SCOPE score and iteratively select the token with the highest SCOPE score. We conduct extensive experiments on multiple vision-language understanding benchmarks using the LLaVA-1. 5 and LLaVA-Next models. Experimental results demonstrate that our method consistently outperforms prior approaches.

AAAI Conference 2025 Conference Paper

STGC-NeRF: Spatial-Temporal Geometric Consistency for LiDAR Neural Radiance Fields in Dynamic Scenes

  • Shangshu Yu
  • Xiaotian Sun
  • Wen Li
  • Qingshan Xu
  • Zhimin Yuan
  • Sijie Wang
  • Rui She
  • Cheng Wang

While Neural Radiance Fields (NeRFs) have advanced the frontiers of novel view synthesis (NVS) using LiDAR data, they still struggle in dynamic scenes. Due to the low frequency and sparsity characteristics of LiDAR point clouds, it is challenging to spontaneously learn a dynamic and consistent scene representation from posed scans. In this paper, we propose STGC-NeRF, a novel LiDAR NeRF method that combines spatial-temporal geometry consistency to enhance the reconstruction of dynamic scenes. First, we propose a temporal geometry consistency regularization to enhance the regression of time-varying scene geometries from low-frequency LiDAR sequences. By estimating the pointwise correspondences between synthetic (or real) and real frames at different times, we convert them into various forms of temporal supervision. This alleviates the inconsistency caused by moving objects in dynamic scenes. Second, to improve the reconstruction of sparse LiDAR data, we propose spatial geometric consistency constraints. By computing multiple neighborhood feature descriptors incorporating geometric and contextual information, we capture structural geometry information from sparse LiDAR data. This helps encourage consistent direction, smoothness, and detail of the local surface. Extensive experiments on the KITTI-360 and nuScenes datasets demonstrate that STGC-NeRF outperforms state-of-the-art methods in both geometry and intensity accuracy for dynamic LiDAR scene reconstruction.

AAAI Conference 2025 Conference Paper

Text to Point Cloud Localization with Multi-Level Negative Contrastive Learning

  • Dunqiang Liu
  • Shujun Huang
  • Wen Li
  • Siqi Shen
  • Cheng Wang

Language-based localization is a crucial task in robotics and computer vision, enabling robots to understand spatial positions through language. Recent methods rely on contrastive learning to establish correspondences between global features of texts and point clouds. However, the inherent ambiguity of textual descriptions makes it difficult to convey geometric information accurately, forcing alignment of them in the feature space may compromise the expressiveness of the point clouds. Unlike previous methods, this paper proposes using language as a filter to distinguish dissimilar locations. To this end, we propose a robust framework of multi-level negative contrastive learning for language-based localization, fully leveraging the descriptive power of language for spatial localization. Our method learns multiple mismatched factors by minimizing the similarity of different locations at different levels, including global-level, instance-level and relationlevel, respectively. Extensive experiments conducted on the KITTI360Pose benchmark demonstrate that our method outperforms better that the state-of-the-art methods. Specifically, we achieve a 56.3% improvement in Top-1 retrieval recall and a 45.9% improvement in 5m localization recall.

AAAI Conference 2024 Conference Paper

Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning

  • Yanqi Ge
  • Qiang Nie
  • Ye Huang
  • Yong Liu
  • Chengjie Wang
  • Feng Zheng
  • Wen Li
  • Lixin Duan

One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. Many outstanding metric-based and prototype-based methods following the Expectation-Maximization paradigm, have been proposed for this objective. However, they inevitably introduce biases into the learning process, particularly with long-tail distributed training data. In this paper, we reveal that the class prototype is not necessarily to be derived from training features and propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning. However, the pre-defined anchors may have a large semantic distance from the pixel features, which prevents them from being directly applied. To address this issue and generate feature centroid independent from feature learning, a simple yet effective Semantic Anchor Regularization (SAR) is proposed. SAR ensures the inter-class separability of semantic anchors in the semantic space by employing a classifier-aware auxiliary cross-entropy loss during training via disentanglement learning. By pulling the learned features to these semantic anchors, several advantages can be attained: 1) the intra-class compactness and naturally inter-class separability, 2) induced bias or errors from feature learning can be avoided, and 3) robustness to the long-tailed problem. The proposed SAR can be used in a plug-and-play manner in the existing models. Extensive experiments demonstrate that the SAR performs better than previous sophisticated prototype-based methods. The implementation is available at https://github.com/geyanqi/SAR.

NeurIPS Conference 2024 Conference Paper

Generalized Eigenvalue Problems with Generative Priors

  • Zhaoqiang Liu
  • Wen Li
  • Junren Chen

Generalized eigenvalue problems (GEPs) find applications in various fields of science and engineering. For example, principal component analysis, Fisher's discriminant analysis, and canonical correlation analysis are specific instances of GEPs and are widely used in statistical data processing. In this work, we study GEPs under generative priors, assuming that the underlying leading generalized eigenvector lies within the range of a Lipschitz continuous generative model. Under appropriate conditions, we show that any optimal solution to the corresponding optimization problems attains the optimal statistical rate. Moreover, from a computational perspective, we propose an iterative algorithm called the Projected Rayleigh Flow Method (PRFM) to approximate the optimal solution. We theoretically demonstrate that under suitable assumptions, PRFM converges linearly to an estimated vector that achieves the optimal statistical rate. Numerical results are provided to demonstrate the effectiveness of the proposed method.

JBHI Journal 2024 Journal Article

Model Generalizability Investigation for GFCE-MRI Synthesis in NPC Radiotherapy Using Multi-Institutional Patient-Based Data Normalization

  • Wen Li
  • Saikit Lam
  • Yinghui Wang
  • Chenyang Liu
  • Tian Li
  • Jens Kleesiek
  • Andy Lai-Yin Cheung
  • Ying Sun

Recently, deep learning has been demonstrated to be feasible in eliminating the use of gadoliniumbased contrast agents (GBCAs) through synthesizing gadolinium-free contrast-enhanced MRI (GFCE-MRI) from contrast-free MRI sequences, providing the community with an alternative to get rid of GBCAs-associated safety issues in patients. Nevertheless, generalizability assessment of the GFCE-MRI model has been largely challenged by the high inter-institutional heterogeneity of MRI data, on top of the scarcity of multi-institutional data itself. Although various data normalization methods have been adopted to address the heterogeneity issue, it has been limited to single-institutional investigation and there is no standard normalization approach presently. In this study, we aimed at investigating generalizability of GFCE-MRI model using data from seven institutions by manipulating heterogeneity of MRI data under five popular normalization approaches. Three state-of-the-art neural networks were applied to map from T1-weighted and T2-weighted MRI to contrast-enhanced MRI (CE-MRI) for GFCE-MRI synthesis in patients with nasopharyngeal carcinoma. MRI data from three institutions were used separately to generate three uni-institution models and jointly for a tri-institution model. The five normalization methods were applied to normalize the data of each model. MRI data from the remaining four institutions served as external cohorts for model generalizability assessment. Quality of GFCE-MRI was quantitatively evaluated against ground-truth CE-MRI using mean absolute error (MAE) and peak signal-to-noise ratio(PSNR). Results showed that performance of all uni-institution models remarkably dropped on the external cohorts. By contrast, model trained using multi-institutional data with Z-Score normalization yielded the best model generalizability improvement.

NeurIPS Conference 2024 Conference Paper

Towards Unsupervised Model Selection for Domain Adaptive Object Detection

  • Hengfu Yu
  • Jinhong Deng
  • Wen Li
  • Lixin Duan

Evaluating the performance of deep models in new scenarios has drawn increasing attention in recent years due to the wide application of deep learning techniques in various fields. However, while it is possible to collect data from new scenarios, the annotations are not always available. Existing Domain Adaptive Object Detection (DAOD) works usually report their performance by selecting the best model on the validation set or even the test set of the target domain, which is highly impractical in real-world applications. In this paper, we propose a novel unsupervised model selection approach for domain adaptive object detection, which is able to select almost the optimal model for the target domain without using any target labels. Our approach is based on the flat minima principle, i. e. , models located in the flat minima region in the parameter space usually exhibit excellent generalization ability. However, traditional methods require labeled data to evaluate how well a model is located in the flat minima region, which is unrealistic for the DAOD task. Therefore, we design a Detection Adaptation Score (DAS) approach to approximately measure the flat minima without using target labels. We show via a generalization bound that the flatness can be deemed as model variance, while the minima depend on the domain distribution distance for the DAOD task. Accordingly, we propose a Flatness Index Score (FIS) to assess the flatness by measuring the classification and localization fluctuation before and after perturbations of model parameters and a Prototypical Distance Ratio (PDR) score to seek the minima by measuring the transferability and discriminability of the models. In this way, the proposed DAS approach can effectively represent the degree of flat minima and evaluate the model generalization ability on the target domain. We have conducted extensive experiments on various DAOD benchmarks and approaches, and the experimental results show that the proposed DAS correlates well with the performance of DAOD models and can be used as an effective tool for model selection after training. The code will be released at https: //github. com/HenryYu23/DAS.

AAAI Conference 2023 Conference Paper

DC-Former: Diverse and Compact Transformer for Person Re-identification

  • Wen Li
  • Cheng Zou
  • Meng Wang
  • Furong Xu
  • Jianan Zhao
  • Ruobing Zheng
  • Yuan Cheng
  • Wei Chu

In person re-identification (ReID) task, it is still challenging to learn discriminative representation by deep learning, due to limited data. Generally speaking, the model will get better performance when increasing the amount of data. The addition of similar classes strengthens the ability of the classifier to identify similar identities, thereby improving the discrimination of representation. In this paper, we propose a Diverse and Compact Transformer (DC-Former) that can achieve a similar effect by splitting embedding space into multiple diverse and compact subspaces. Compact embedding subspace helps model learn more robust and discriminative embedding to identify similar classes. And the fusion of these diverse embeddings containing more fine-grained information can further improve the effect of ReID. Specifically, multiple class tokens are used in vision transformer to represent multiple embedding spaces. Then, a self-diverse constraint (SDC) is applied to these spaces to push them away from each other, which makes each embedding space diverse and compact. Further, a dynamic weight controller (DWC) is further designed for balancing the relative importance among them during training. The experimental results of our method are promising, which surpass previous state-of-the-art methods on several commonly used person ReID benchmarks. Our code is available at https://github.com/ant-research/Diverse-and-Compact-Transformer.

TIST Journal 2023 Journal Article

Hyper-Laplacian Regularized Multi-View Clustering with Exclusive L21 Regularization and Tensor Log-Determinant Minimization Approach

  • Qilun Luo
  • Ming Yang
  • Wen Li
  • Mingqing Xiao

Multi-view clustering aims to capture the multiple views inherent information by identifying the data clustering that reflects distinct features of datasets. Since there is a consensus in literature that different views of a dataset share a common latent structure, most existing multi-view subspace learning methods rely on the nuclear norm to seek the low-rank representation of the underlying subspace. However, the nuclear norm often fails to distinguish the variance of features for each cluster due to its convex nature and data tends to fall in multiple non-linear subspaces for multi-dimensional datasets. To address these problems, we propose a new and novel multi-view clustering method (HL-L21-TLD-MSC) that unifies the Hyper-Laplacian (HL) and exclusive ℓ 2,1 (L21) regularization with the Tensor Log-Determinant Rank Minimization (TLD) setting. Specifically, the hyper-Laplacian regularization maintains the local geometrical structure that makes the estimation prune to nonlinearities, and the mixed ℓ 2,1 and ℓ 1,2 regularization provides the joint sparsity within-cluster as well as the exclusive sparsity between-cluster. Furthermore, a log-determinant function is used as a tighter tensor rank approximation to discriminate the dimension of features. An efficient alternating algorithm is then derived to optimize the proposed model, and the construction of a convergent sequence to the Karush-Kuhn-Tucker (KKT) critical point solution is mathematically validated in detail. Extensive experiments are conducted on ten well-known datasets to demonstrate that the proposed approach outperforms the existing state-of-the-art approaches with various scenarios, in which, six of them achieve perfect results under our framework developed in this article, demonstrating highl effectiveness for the proposed approach.

NeurIPS Conference 2023 Conference Paper

Learning Motion Refinement for Unsupervised Face Animation

  • Jiale Tao
  • Shuhang Gu
  • Wen Li
  • Lixin Duan

Unsupervised face animation aims to generate a human face video based on theappearance of a source image, mimicking the motion from a driving video. Existingmethods typically adopted a prior-based motion model (e. g. , the local affine motionmodel or the local thin-plate-spline motion model). While it is able to capturethe coarse facial motion, artifacts can often be observed around the tiny motionin local areas (e. g. , lips and eyes), due to the limited ability of these methodsto model the finer facial motions. In this work, we design a new unsupervisedface animation approach to learn simultaneously the coarse and finer motions. Inparticular, while exploiting the local affine motion model to learn the global coarsefacial motion, we design a novel motion refinement module to compensate forthe local affine motion model for modeling finer face motions in local areas. Themotion refinement is learned from the dense correlation between the source anddriving images. Specifically, we first construct a structure correlation volume basedon the keypoint features of the source and driving images. Then, we train a modelto generate the tiny facial motions iteratively from low to high resolution. Thelearned motion refinements are combined with the coarse motion to generate thenew image. Extensive experiments on widely used benchmarks demonstrate thatour method achieves the best results among state-of-the-art baselines.

AAAI Conference 2022 Conference Paper

Denoised Maximum Classifier Discrepancy for Source-Free Unsupervised Domain Adaptation

  • Tong Chu
  • Yahao Liu
  • Jinhong Deng
  • Wen Li
  • Lixin Duan

Source-Free Unsupervised Domain Adaptation (SFUDA) aims to adapt a pre-trained source model to an unlabeled target domain without access to the original labeled source domain samples. Many existing SFUDA approaches apply the self-training strategy, which involves iteratively selecting confidently predicted target samples as pseudo-labeled samples used to train the model to fit the target domain. However, the self-training strategy may also suffer from sample selection bias and be impacted by the label noise of the pseudolabeled samples. In this work, we provide a rigorous theoretical analysis on how these two issues affect the model generalization ability when applying the self-training strategy for the SFUDA problem. Based on this theoretical analysis, we then propose a new Denoised Maximum Classifier Discrepancy (D-MCD) method for SFUDA to effectively address these two issues. In particular, we first minimize the distribution mismatch between the selected pseudo-labeled samples and the remaining target domain samples to alleviate the sample selection bias. Moreover, we design a strong-weak self-training paradigm to denoise the selected pseudo-labeled samples, where the strong network is used to select pseudolabeled samples while the weak network helps the strong network to filter out hard samples to avoid incorrect labels. In this way, we are able to ensure both the quality of the pseudolabels and the generalization ability of the trained model on the target domain. We achieve state-of-the-art results on three domain adaptation benchmark datasets, which clearly validates the effectiveness of our proposed approach. Full code is available at https: //github. com/kkkkkkon/D-MCD.

AAAI Conference 2021 Conference Paper

Analogical Image Translation for Fog Generation

  • Rui Gong
  • Dengxin Dai
  • Yuhua Chen
  • Wen Li
  • Danda Pani Paudel
  • Luc Van Gool

Image-to-image translation is to map images from a given style to another given style. While exceptionally successful, current methods assume the availability of training images in both source and target domains, which does not always hold in practice. Inspired by humans’ reasoning capability of analogy, we propose analogical image translation (AIT) that exploit the concept of gist, for the first time. Given images of two styles in the source domain: A and A0, along with images B of the first style in the target domain, learn a model to translate B to B0 in the target domain, such that A: A0: : B: B0. AIT is especially useful for translation scenarios in which training data of one style is hard to obtain but training data of the same two styles in another domain is available. For instance, in the case from normal conditions to extreme, rare conditions, obtaining real training images for the latter case is challenging. However, obtaining synthetic data for both cases is relatively easy. In this work, we aim at adding adverse weather effects, more specifically fog, to images taken in clear weather. To circumvent the challenge of collecting real foggy images, AIT learns the gist of translating synthetic clear-weather to foggy images, followed by adding fog effects onto real clear-weather images, without ever seeing any real foggy image. AIT achieves zero-shot image translation capability, whose effectiveness and benefit are demonstrated by the downstream task of semantic foggy scene understanding.

IJCAI Conference 2018 Conference Paper

Semi-Supervised Optimal Transport for Heterogeneous Domain Adaptation

  • Yuguang Yan
  • Wen Li
  • Hanrui Wu
  • Huaqing Min
  • Mingkui Tan
  • Qingyao Wu

Heterogeneous domain adaptation (HDA) aims to exploit knowledge from a heterogeneous source domain to improve the learning performance in a target domain. Since the feature spaces of the source and target domains are different, the transferring of knowledge is extremely difficult. In this paper, we propose a novel semi-supervised algorithm for HDA by exploiting the theory of optimal transport (OT), a powerful tool originally designed for aligning two different distributions. To match the samples between heterogeneous domains, we propose to preserve the semantic consistency between heterogeneous domains by incorporating label information into the entropic Gromov-Wasserstein discrepancy, which is a metric in OT for different metric spaces, resulting in a new semi-supervised scheme. Via the new scheme, the target and transported source samples with the same label are enforced to follow similar distributions. Lastly, based on the Kullback-Leibler metric, we develop an efficient algorithm to optimize the resultant problem. Comprehensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of our proposed method.

IJCAI Conference 2017 Conference Paper

Learning Discriminative Correlation Subspace for Heterogeneous Domain Adaptation

  • Yuguang Yan
  • Wen Li
  • Michael Ng
  • Mingkui Tan
  • Hanrui Wu
  • Huaqing Min
  • Qingyao Wu

Domain adaptation aims to reduce the effort on collecting and annotating target data by leveraging knowledge from a different source domain. The domain adaptation problem will become extremely challenging when the feature spaces of the source and target domains are different, which is also known as the heterogeneous domain adaptation (HDA) problem. In this paper, we propose a novel HDA method to find the optimal discriminative correlation subspace for the source and target data. The discriminative correlation subspace is inherited from the canonical correlation subspace between the source and target data, and is further optimized to maximize the discriminative ability for the target domain classifier. We formulate a joint objective in order to simultaneously learn the discriminative correlation subspace and the target domain classifier. We then apply an alternating direction method of multiplier (ADMM) algorithm to address the resulting non-convex optimization problem. Comprehensive experiments on two real-world data sets demonstrate the effectiveness of the proposed method compared to the state-of-the-art methods.

ICRA Conference 2014 Conference Paper

Toward featureless visual navigation: Simultaneous localization and planar surface extraction using motion vectors in video streams

  • Wen Li
  • Dezhen Song

Unlike the traditional feature-based methods, we propose using motion vectors (MVs) from video streams as inputs for visual navigation. Although MVs are very noisy and with low spatial resolution, MVs do possess high temporal resolution which means it is possible to merge MVs from different frames to improve signal quality. Homography filtering and MV thresholding are proposed to further improve MV quality so that we can establish plane observations from MVs. We propose an extended Kalman filter (EKF) based approach to simultaneously track robot motion and planes. We formally model error propagation of MVs and derive variance of the merged MVs. We have implemented the proposed method and tested it in physical experiments. Results show that the system is capable of performing robot localization and plane mapping with a relative trajectory error of less than 5. 1%.

ICRA Conference 2013 Conference Paper

Automatic bird species detection using periodicity of salient extremities

  • Wen Li
  • Dezhen Song

To assist nature observation, we develop an automatic bird species filtering method that takes videos from cameras with unknown parameters as input, and outputs likelihood of candidate species. The method recognizes the time series of salient extremities, which is the inter-wing tip distance, performs frequency analysis on periodicity, and provides a species prediction metric using likelihood ratios. To analyze the feasibility of the proposed method, we derive the probability that the salient extremity can be recognized in image for an arbitrary camera perspective. We also prove that the periodicity of the IWTD in the image is the same as the wingbeat frequency in the 3D space regardless of camera parameters with the exception of ignorable degenerated cases. Experiment results validate our analysis and show that the algorithm is very robust to segmentation error and data loss up to 30%.

IROS Conference 2009 Conference Paper

Fuzzy logic vorticity control of ocillating foil UUV

  • Wen Li
  • Tianmiao Wang
  • Jianhong Liang
  • Jinlan Li

This paper describes the design of a biomimetic inspired fish swimming like UUV based on 2-D ocillating foil machanism which plays guiding role in biomimetic propulsion research field, the relation between vorticity control and motion control parameters for 2-D ocillating foil has been built, a fuzzy logic vorticity controller is designed to achieve straight line maneuverable swimming and thrust efficiency can be ensured at the same time. Experimental result of field test shows that an obiviously improved vehicle performance is obtained. Meanwhile, burst-and-coast manevering is found to be a more effective swimming mode for the vihicle while compared with steady swimming.

ICRA Conference 2000 Conference Paper

Manipulative Difficulty Index of a Mobile Robot with Multiple Trailers in Pushing and Towing with Imperfect Measurement

  • Wen Li
  • Takashi Tsubouchi
  • Shin'ichi Yuta

A quantitative evaluation method of manipulative difficulty for a tractor pushing or towing multiple trailers is introduced. A manipulative difficulty index (MDI) thinking about imperfect measurement is proposed. This difficulty varies if physical parameters such as the position of the joint or the number of the trailers is changed. The authors focus on this change when some state variables have measurement errors. Also, quantitative comparisons for typical examples of the change of physical parameters of tractor-trailers under the above case are presented.