Arrow Research search

Author name cluster

Xian Wei

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

AAAI Conference 2026 Conference Paper

Automatic Paper Reviewing with Heterogeneous Graph Reasoning over LLM-Simulated Reviewer-Author Debates

  • Shuaimin Li
  • Liyang Fan
  • Yufang Lin
  • Zeyang Li
  • Xian Wei
  • Shiwen Ni
  • Hamid Alinejad-Rokny
  • Min Yang

Existing paper review methods often rely on superficial manuscript features or directly on large language models (LLMs), which are prone to hallucinations, biased scoring, and limited reasoning capabilities. Moreover, these methods often fail to capture the complex argumentative reasoning and negotiation dynamics inherent in reviewer-author interactions. To address these limitations, we propose ReViewGraph (Reviewer-Author Debates Graph Reasoner), a novel framework that performs heterogeneous graph reasoning over LLM-simulated multi-round reviewer-author debates. In our approach, reviewer-author exchanges are simulated through LLM-based multi-agent collaboration. Diverse opinion relations (e.g., acceptance, rejection, clarification, and compromise) are then explicitly extracted and encoded as typed edges within a heterogeneous interaction graph. By applying graph neural networks to reason over these structured debate graphs, ReViewGraph captures fine-grained argumentative dynamics and enables more informed review decisions. Extensive experiments on three datasets demonstrate that ReViewGraph outperforms strong baselines with an average relative improvement of 15.73%, underscoring the value of modeling detailed reviewer–author debate structures.

ICRA Conference 2025 Conference Paper

3D Dense Captioning via Prototypical Momentum Distillation

  • Jinpeng Mi
  • Ying Wang
  • Shaofei Jin
  • Shiming Zhang
  • Xian Wei
  • Jianwei Zhang

3D dense captioning aims to describe the crucial regions in 3D visual scenes in the form of natural language. Recent prevailing approaches achieve promising results by leveraging complicated structures incorporated with large-scale models, which necessitate abundant parameters and pose challenges regarding its practical applications. Besides, with limited training data, 3D dense captioners are often susceptible to overfitting, directly degrading caption generation performance. Drawing inspiration from the recent advancements in knowledge distillation, we propose a novel approach termed Prototypical Momentum Distillation (PMD) to prompt the model to generate more detailed captions. PMD incorporates Momentum Distillation (MD) with an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to transfer knowledge by considering the uncertainty of the teacher knowledge. Specifically, we employ the original captioner as the student model and maintain an Exponential Moving Average (EMA) copy of the captioner as the teacher model to impart knowledge as the auxiliary supervision of the student. To abate the misleading caused by uncertain knowledge, we present an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to cluster the distilled knowledge according to its confidence. We then transfer the rearranged knowledge from the teacher to guide the training route of the student. We conduct extensive experiments and ablation studies on two widely used benchmark datasets, ScanRefer and Nr3D. Experimental results demonstrate that PMD outperforms all state-of-the-art approaches on the benchmarks with MLE training, highlighting its effectiveness.

ICRA Conference 2025 Conference Paper

Dual-BEV Nav: Dual-Layer BEV-Based Heuristic Path Planning for Robotic Navigation in Unstructured Outdoor Environments

  • Jianfeng Zhang
  • Hanlin Dong
  • Jian Yang 0034
  • Jiahui Liu
  • Shibo Huang
  • Ke Li 0005
  • Xuan Tang
  • Xian Wei

Path planning with strong environmental adaptability plays a crucial role in robotic navigation in unstructured outdoor environments, especially in the case of low-quality location and map information. The path planning ability of a robot depends on the identification of the traversability of global and local ground areas. In real-world scenarios, the complexity of outdoor open environments makes it difficult for robots to identify the traversability of ground areas that lack a clearly defined structure. Moreover, most existing methods have rarely analyzed the integration of local and global traversability identifications in unstructured outdoor scenarios. To address this problem, we propose a novel method, Dual-BEV Nav, first introducing Bird's Eye View (BEV) representations into local planning to generate high-quality traversable paths. Then, these paths are projected into the global traversability probability map generated by the global BEV planning model to obtain the optimal path. By integrating the traversability from both local and global BEV, we establish a dual-layer BEV heuristic planning paradigm, enabling long-distance navigation in unstructured outdoor environments. We test our approach through both public dataset evaluations and real-world robot deployments, yielding promising results. Compared to baselines, the Dual-BEV Nav improved temporal distance prediction accuracy by up to 18. 26%. In the real-world deployment, under conditions significantly different from the training set and with notable occlusions in the global BEV, the Dual-BEV Nav successfully achieved a 65-meter-long outdoor navigation. Further analysis demonstrates that the local BEV representation significantly enhances the rationality of the planning, while the global BEV probability map ensures the robustness of the overall planning.

IJCAI Conference 2025 Conference Paper

Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review

  • Di Wu
  • Xian Wei
  • Guang Chen
  • Hao Shen
  • Bo Jin

Embodied multi-agent systems (EMAS) have attracted growing attention for their potential to address complex, real-world challenges in areas such as logistics and robotics. Recent advances in foundation models pave the way for generative agents capable of richer communication and adaptive problem-solving. This survey provides a systematic examination of how EMAS can benefit from these generative capabilities. We propose a taxonomy that categorizes EMAS by system architectures and embodiment modalities, emphasizing how collaboration spans both physical and virtual contexts. Central building blocks, perception, planning, communication, and feedback, are then analyzed to illustrate how generative techniques bolster system robustness and flexibility. Through concrete examples, we demonstrate the transformative effects of integrating foundation models into embodied, multi-agent frameworks. Finally, we discuss challenges and future directions, underlining the significant promise of EMAS to reshape the landscape of AI-driven collaboration.

IROS Conference 2025 Conference Paper

KiteRunner: Language-Driven Cooperative Local-Global Navigation Policy with UAV Mapping in Outdoor Environments

  • Shibo Huang
  • Chenfan Shi
  • Jian Yang 0034
  • Hanlin Dong
  • Jinpeng Mi
  • Ke Li 0005
  • Jianfeng Zhang
  • Miao Ding

Autonomous navigation in open-world outdoor environments faces challenges in integrating dynamic conditions, long-distance spatial reasoning, and semantic understanding. Traditional methods struggle to balance local planning, global planning, and semantic task execution, while existing large language models (LLMs) enhance semantic comprehension but lack spatial reasoning capabilities. Although diffusion models excel in local optimization, they fall short in large-scale long-distance navigation. To address these gaps, this paper proposes KiteRunner, a language-driven cooperative local-global navigation strategy that combines UAV orthophoto-based global planning with diffusion model-driven local path generation for long-distance navigation in open-world scenarios. Our method innovatively leverages real-time UAV orthophotography to construct a global probability map, providing traversability guidance for the local planner, while integrating large models like CLIP and GPT to interpret natural language instructions. Experiments demonstrate that KiteRunner achieves 5. 6% and 12. 8% improvements in path efficiency over state-of-the-art methods in structured and unstructured environments, respectively, with significant reductions in human interventions and execution time.

IROS Conference 2025 Conference Paper

NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

  • Xun Li
  • Jian Yang 0034
  • Fenli Jia
  • Muyu Wang
  • Jun Wu
  • Jinpeng Mi
  • Jilin Hu
  • Peidong Liang

Recently, camera localization has been widely adopted in autonomous robotic navigation due to its efficiency and convenience. However, autonomous navigation in unknown environments often suffers from scene ambiguity, environmental disturbances, and dynamic object transformation in camera localization. To address this problem, inspired by the brain cognitive navigation mechanism (such as grid cells, place cells, and head direction cells), we propose a novel neurobiological camera location method, namely NeuroLoc. Firstly, we designed a Hebbian learning module driven by place cells to save and replay historical information, aiming to restore the details of historical representations and solve the issue of scene fuzziness. Secondly, we utilized the head direction cell-inspired internal direction learning as multi-head attention embedding to help restore the true orientation in similar scenes. Finally, we added a 3D grid center prediction in the pose regression module to reduce the final wrong prediction. We evaluate the proposed NeuroLoc on commonly used benchmark indoor and outdoor datasets. The experimental results show that our NeuroLoc can enhance the robustness in complex environments and improve the performance of pose regression by using only a single image.

IJCAI Conference 2025 Conference Paper

PDDFormer: Pairwise Distance Distribution Graph Transformer for Crystal Material Property Prediction

  • Xiangxiang Shen
  • Zheng Wan
  • Lingfeng Wen
  • Licheng Sun
  • Jian Yang
  • Xuan Tang
  • Shing-Ho J. Lin
  • Xiao He

Crystal structures can be simplified as a periodic point set that repeats across three-dimensional space along an underlying lattice. Traditionally, crystal representation methods rely on descriptors such as lattice parameters, symmetry, and space groups to characterize the structure. However, in reality, atoms in materials always vibrate above absolute zero, causing their positions to fluctuate continuously. This dynamic behavior disrupts the fundamental periodicity of the lattice, making crystal graphs based on static lattice parameters and conventional descriptors discontinuous under slight perturbations. Chemists proposed the pairwise distance distribution (PDD) method to address this. However, the completeness of PDD requires defining a large number of neighboring atoms, leading to high computational costs. Additionally, PDD does not account for atomic information, making it challenging to apply it directly to crystal material property prediction tasks. To tackle these challenges, we introduce the atom-weighted Pairwise Distance Distribution (WPDD) and Unit cell Pairwise Distance Distribution (UPDD) for the first time, applying them to the construction of multi-edge crystal graphs. We demonstrate the continuity and general completeness of crystal graphs under slight atomic position perturbations. Moreover, by modeling PDD as global information and integrating it into matrix-based message passing, we significantly reduce computational costs. Comprehensive evaluation results show that WPDDFormer achieves state-of-the-art predictive accuracy across tasks on benchmark datasets such as the Materials Project and JARVIS-DFT.

ICLR Conference 2025 Conference Paper

R2Det: Exploring Relaxed Rotation Equivariance in 2D Object Detection

  • Zhiqiang Wu
  • Yingjie Liu
  • Hanlin Dong
  • Xuan Tang
  • Jian Yang 0034
  • Bo Jin
  • Mingsong Chen 0001
  • Xian Wei

Group Equivariant Convolution (GConv) empowers models to explore underlying symmetry in data, improving performance. However, real-world scenarios often deviate from ideal symmetric systems caused by physical permutation, characterized by non-trivial actions of a symmetry group, resulting in asymmetries that affect the outputs, a phenomenon known as Symmetry Breaking. Traditional GConv-based methods are constrained by rigid operational rules within group space, assuming data remains strictly symmetry after limited group transformations. This limitation makes it difficult to adapt to Symmetry-Breaking and non-rigid transformations. Motivated by this, we mainly focus on a common scenario: Rotational Symmetry-Breaking. By relaxing strict group transformations within Strict Rotation-Equivariant group $\mathbf{C}_n$, we redefine a Relaxed Rotation-Equivariant group $\mathbf{R}_n$ and introduce a novel Relaxed Rotation-Equivariant GConv (R2GConv) with only a minimal increase of $4n$ parameters compared to GConv. Based on R2GConv, we propose a Relaxed Rotation-Equivariant Network (R2Net) as the backbone and develop a Relaxed Rotation-Equivariant Object Detector (R2Det) for 2D object detection. Experimental results demonstrate the effectiveness of the proposed R2GConv in natural image classification, and R2Det achieves excellent performance in 2D object detection with improved generalization capabilities and robustness. The code is available in \texttt{https://github.com/wuer5/r2det}.

AAAI Conference 2025 Conference Paper

Relaxed Rotational Equivariance via G-Biases in Vision

  • Zhiqiang Wu
  • Yingjie Liu
  • Licheng Sun
  • Jian Yang
  • Hanlin Dong
  • Shing-Ho J. Lin
  • Xuan Tang
  • Jinpeng Mi

Group Equivariant Convolution (GConv) can capture rotational equivariance from original data. It assumes uniform and strict rotational equivariance across all features as the transformations under the specific group. However, the presentation or distribution of real-world data rarely conforms to strict rotational equivariance, commonly referred to as Rotational Symmetry-Breaking (RSB) in the system or dataset, making GConv unable to adapt effectively to this phenomenon. Motivated by this, we propose a simple but highly effective method to address this problem, which utilizes a set of learnable biases called G-Biases under the group order to break strict group constraints and then achieve a Relaxed Rotational Equivariant Convolution (RREConv). To validate the efficiency of RREConv, we conduct extensive ablation experiments on the discrete rotational group Cn. Experiments demonstrate that the proposed RREConv-based methods achieve excellent performance compared to existing GConv-based methods in both classification and 2D object detection tasks on the natural image datasets.

AAAI Conference 2024 Conference Paper

Hyperbolic Graph Diffusion Model

  • Lingfeng Wen
  • Xuan Tang
  • Mingjie Ouyang
  • Xiangxiang Shen
  • Jian Yang
  • Daxin Zhu
  • Mingsong Chen
  • Xian Wei

Diffusion generative models (DMs) have achieved promising results in image and graph generation. However, real-world graphs, such as social networks, molecular graphs, and traffic graphs, generally share non-Euclidean topologies and hidden hierarchies. For example, the degree distributions of graphs are mostly power-law distributions. The current latent diffusion model embeds the hierarchical data in a Euclidean space, which leads to distortions and interferes with modeling the distribution. Instead, hyperbolic space has been found to be more suitable for capturing complex hierarchical structures due to its exponential growth property. In order to simultaneously utilize the data generation capabilities of diffusion models and the ability of hyperbolic embeddings to extract latent hierarchical distributions, we propose a novel graph generation method called, Hyperbolic Graph Diffusion Model (HGDM), which consists of an auto-encoder to encode nodes into successive hyperbolic embeddings, and a DM that operates in the hyperbolic latent space. HGDM captures the crucial graph structure distributions by constructing a hyperbolic potential node space that incorporates edge information. Extensive experiments show that HGDM achieves better performance in generic graph and molecule generation benchmarks, with a 48% improvement in the quality of graph generation with highly hierarchical structures.

ICRA Conference 2024 Conference Paper

ProEqBEV: Product Group Equivariant BEV Network for 3D Object Detection in Road Scenes of Autonomous Driving

  • Hongwei Liu
  • Jian Yang 0034
  • Zhengyu Li
  • Ke Li 0005
  • Jianzhang Zheng
  • Xihao Wang
  • Xuan Tang
  • Mingsong Chen 0001

With the rapid development of autonomous driving systems, 3D object detection based on Bird’s Eye View (BEV) in road scenes has witnessed great progress over the past few years. As a road scene exhibits a part-whole hierarchy between the within objects and the scene itself, simple parts (e. g. , roads, lane lines, vehicles and pedestrians) can be assembled into progressively more complex shapes to form a BEV representation of the whole road scene. Therefore, a BEV often has multiple levels of freedom on motion, i. e. , the rotation and the moving shift of the whole BEV, and the random movements of objects (e. g. , pedestrians and vehicles) inside the BEV. However, most of the current single-sensor or multi-sensor fusion-based BEV object detection methods have not yet taken into account capturing such multi-level motion in a BEV. To address this problem, we propose a product group equivariant object detection network framework that is equivariant with respect to multiple levels of symmetry groups based on multi-sensor fusion. The proposed framework extracts local equivariant features of objects in point clouds, while global equivariant features are extracted in both point clouds and images. Furthermore, the network learns diverse rotation-equivariant features and mitigates a significant amount of detection errors caused by rotations of BEV and objects inside a BEV, thereby further enhancing the performance of object detection. The experiment results show that the network architecture significantly improves object detection on mAP and NDS, respectively. In addition, in order to demonstrate the effectiveness of the proposed local-multi-global equivariant components, we conduct sufficient ablation experiments. The results show that the individual components are indispensable for the object detection performance improvement of the overall network architecture.

NeurIPS Conference 2024 Conference Paper

SampDetox: Black-box Backdoor Defense via Perturbation-based Sample Detoxification

  • Yanxin Yang
  • Chentao Jia
  • Dengke Yan
  • Ming Hu
  • Tianlin Li
  • Xiaofei Xie
  • Xian Wei
  • Mingsong Chen

The advancement of Machine Learning has enabled the widespread deployment of Machine Learning as a Service (MLaaS) applications. However, the untrustworthy nature of third-party ML services poses backdoor threats. Existing defenses in MLaaS are limited by their reliance on training samples or white-box model analysis, highlighting the need for a black-box backdoor purification method. In our paper, we attempt to use diffusion models for purification by introducing noise in a forward diffusion process to destroy backdoors and recover clean samples through a reverse generative process. However, since a higher noise also destroys the semantics of the original samples, it still results in a low restoration performance. To investigate the effectiveness of noise in eliminating different types of backdoors, we conducted a preliminary study, which demonstrates that backdoors with low visibility can be easily destroyed by lightweight noise and those with high visibility need to be destroyed by high noise but can be easily detected. Based on the study, we propose SampDetox, which strategically combines lightweight and high noise. SampDetox applies weak noise to eliminate low-visibility backdoors and compares the structural similarity between the recovered and original samples to localize high-visibility backdoors. Intensive noise is then applied to these localized areas, destroying the high-visibility backdoors while preserving global semantic information. As a result, detoxified samples can be used for inference, even by poisoned models. Comprehensive experiments demonstrate the effectiveness of SampDetox in defending against various state-of-the-art backdoor attacks.

NeurIPS Conference 2024 Conference Paper

WaveAttack: Asymmetric Frequency Obfuscation-based Backdoor Attacks Against Deep Neural Networks

  • Jun Xia
  • Zhihao Yue
  • Yingbo Zhou
  • Zhiwei Ling
  • Yiyu Shi
  • Xian Wei
  • Mingsong Chen

Due to the increasing popularity of Artificial Intelligence (AI), more and more backdoor attacks are designed to mislead Deep Neural Network (DNN) predictions by manipulating training samples or processes. Although backdoor attacks have been investigated in various scenarios, they still suffer from the problems of both low fidelity of poisoned samples and non-negligible transfer in latent space, which make them easily identified by existing backdoor detection algorithms. To overcome this weakness, this paper proposes a novel frequency-based backdoor attack method named WaveAttack, which obtains high-frequency image features through Discrete Wavelet Transform (DWT) to generate highly stealthy backdoor triggers. By introducing an asymmetric frequency obfuscation method, our approach adds an adaptive residual to the training and inference stages to improve the impact of triggers, thus further enhancing the effectiveness of WaveAttack. Comprehensive experimental results show that, WaveAttack can not only achieve higher effectiveness than state-of-the-art backdoor attack methods, but also outperform them in the fidelity of images (i. e. , by up to 28. 27\% improvement in PSNR, 1. 61\% improvement in SSIM, and 70. 59\% reduction in IS). Our code is available at https: //github. com/BililiCode/WaveAttack.

ICRA Conference 2023 Conference Paper

DuEqNet: Dual-Equivariance Network in Outdoor 3D Object Detection for Autonomous Driving

  • Xihao Wang
  • JiaMing Lei
  • Hai Lan
  • Arafat Al-Jawari
  • Xian Wei

Outdoor 3D object detection has played an essential role in the environment perception of autonomous driving. In complicated traffic situations, precise object recognition provides indispensable information for prediction and planning in the dynamic system, improving self-driving safety and reliability. However, with the vehicle's veering, the constant rotation of the surrounding scenario makes a challenge for the perception systems. Yet most existing methods have not focused on alleviating the detection accuracy impairment brought by the vehicle's rotation, especially in outdoor 3D detection. In this paper, we propose DuEqNet, which first introduces the concept of equivariance into 3D object detection network by leveraging a hierarchical embedded framework. The dual-equivariance of our model can extract the equivariant features at both local and global levels, respectively. For the local feature, we utilize the graph-based strategy to guarantee the equivariance of the feature in point cloud pillars. In terms of the global feature, the group equivariant convolution layers are adopted to aggregate the local feature to achieve the global equivariance. In the experiment part, we evaluate our approach with different baselines in 3D object detection tasks and obtain State-Of-The-Art performance. According to the results, our model presents higher accuracy on orientation and better prediction efficiency. Moreover, our dual-equivariance strategy exhibits the satisfied plug-and-play ability on various popular object detection frameworks to improve their performance.

NeurIPS Conference 2023 Conference Paper

Learning Dictionary for Visual Attention

  • Yingjie Liu
  • Xuan Liu
  • Hui Yu
  • Xuan Tang
  • Xian Wei

Recently, the attention mechanism has shown outstanding competence in capturing global structure information and long-range relationships within data, thus enhancing the performance of deep vision models on various computer vision tasks. In this work, we propose a novel dictionary learning-based attention (\textit{Dic-Attn}) module, which models this issue as a decomposition and reconstruction problem with the sparsity prior, inspired by sparse coding in the human visual perception system. The proposed \textit{Dic-Attn} module decomposes the input into a dictionary and corresponding sparse representations, allowing for the disentanglement of underlying nonlinear structural information in visual data and the reconstruction of an attention embedding. By applying transformation operations in the spatial and channel domains, the module dynamically selects the dictionary's atoms and sparse representations. Finally, the updated dictionary and sparse representations capture the global contextual information and reconstruct the attention maps. The proposed \textit{Dic-Attn} module is designed with plug-and-play compatibility, allowing for integration into deep attention encoders. Our approach offers an intuitive and elegant means to exploit the discriminative information from data, promoting visual attention construction. Extensive experimental results on various computer vision tasks, e. g. , image and point cloud classification, validate that our method achieves promising performance, and shows a strong competitive comparison with state-of-the-art attention methods.

IJCAI Conference 2022 Conference Paper

Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation

  • Jun Xia
  • Ting Wang
  • Jiepin Ding
  • Xian Wei
  • Mingsong Chen

Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs). Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i. e. , attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between teacher and student models during knowledge distillation, ARGD can more effectively eradicate backdoors than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94. 85% reduction in ASR, while ACC can be improved by up to 3. 23%.

NeurIPS Conference 2022 Conference Paper

Geodesic Self-Attention for 3D Point Clouds

  • Zhengyu Li
  • Xuan Tang
  • Zihao Xu
  • Xihao Wang
  • Hui Yu
  • Mingsong Chen
  • Xian Wei

Due to the outstanding competence in capturing long-range relationships, self-attention mechanism has achieved remarkable progress in point cloud tasks. Nevertheless, point cloud object often has complex non-Euclidean spatial structures, with the behavior changing dynamically and unpredictably. Most current self-attention modules highly rely on the dot product multiplication in Euclidean space, which cannot capture internal non-Euclidean structures of point cloud objects, especially the long-range relationships along the curve of the implicit manifold surface represented by point cloud objects. To address this problem, in this paper, we introduce a novel metric on the Riemannian manifold to capture the long-range geometrical dependencies of point cloud objects to replace traditional self-attention modules, namely, the Geodesic Self-Attention (GSA) module. Our approach achieves state-of-the-art performance compared to point cloud Transformers on object classification, few-shot classification and part segmentation benchmarks.

NeurIPS Conference 2021 Conference Paper

Boost Neural Networks by Checkpoints

  • Feng Wang
  • Guoyizhe Wei
  • Qiao Liu
  • Jinxiang Ou
  • Xian Wei
  • Hairong Lv

Training multiple deep neural networks (DNNs) and averaging their outputs is a simple way to improve the predictive performance. Nevertheless, the multiplied training cost prevents this ensemble method to be practical and efficient. Several recent works attempt to save and ensemble the checkpoints of DNNs, which only requires the same computational cost as training a single network. However, these methods suffer from either marginal accuracy improvements due to the low diversity of checkpoints or high risk of divergence due to the cyclical learning rates they adopted. In this paper, we propose a novel method to ensemble the checkpoints, where a boosting scheme is utilized to accelerate model convergence and maximize the checkpoint diversity. We theoretically prove that it converges by reducing exponential loss. The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4. 16% lower error on Cifar-100 and 6. 96% on Tiny-ImageNet with ResNet-110 architecture. Moreover, the adaptive sample weights in our method make it an effective solution to address the imbalanced class distribution. In the experiments, it yields up to 5. 02% higher accuracy over single EfficientNet-B0 on the imbalanced datasets.

EAAI Journal 2020 Journal Article

Robust RGB-D tracking via compact CNN features

  • Yong Wang
  • Xian Wei
  • Lingkun Luo
  • Wen Wen
  • Yang Wang

Feature representation is at the core of visual tracking. This paper presents a robust tracking method in RGB-D videos. Firstly, the RGB and depth images are separately encoded using a hierarchical convolutional neural network (CNN) features. Secondly, in order to reduce computation cost, we exploit random projection to compress the CNN features. The high dimensional CNN features are randomly projected into a low dimensional feature space. The correlation filter tracking framework is then independently carried out in RGB and depth images. And backward tracking scheme is adopted to evaluate the tracking results in these two images. The final position is determined according to the tracked location in the two image channels. In addition, model updating is implemented adaptively. Our tracker is evaluated on two RGB-D benchmark datasets and achieves comparable results to the other state-of-the-art RGB-D tracking methods.

EAAI Journal 2019 Journal Article

Robust visual tracking based on response stability

  • Yong Wang
  • Xinbin Luo
  • Lu Ding
  • Shan Fu
  • Xian Wei

In this paper, a new approach of response stability based for visual object tracking is developed. This approach proposes a response stability criterion to measure the tracking quality and fuse tracking results of multiple layers of a convolutional neural network (CNN). Inspired by recent detection based methods for visual tracking, the detection capability of EdgeBoxes is investigated, and proposes to re-detect target when tracking failure occurs. In addition, 3D locally adaptive regression kernels (LARK) feature is employed in correlation filter based tracking framework. Extensive experimental results and performance compared with state-of-the-art tracking algorithms on challenging benchmark datasets show that our method is more accurate and robust.