Arrow Research search

Author name cluster

Wenbin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

AAAI Conference 2026 Conference Paper

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

  • Ziwei Liu
  • Borui Kang
  • Wei Li
  • Hangjie Yuan
  • Yanbing Yang
  • Wenbin Li
  • Yifan Zhu
  • Tao Feng

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware stabilized ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

NeurIPS Conference 2025 Conference Paper

DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection

  • Changhan Liu
  • Xunzhi Xiang
  • Zixuan Duan
  • Wenbin Li
  • Qi Fan
  • Yang Gao

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to generalize to unseen domains by leveraging a few annotated samples of the target domain, requiring models to exhibit both strong generalization and localization capabilities. However, existing well-trained detectors typically have strong localization capabilities but lack generalization, whereas vision foundation models (VFMs) generally exhibit better generalization but lack accurate localization capabilities. In this paper, we propose a novel Mixture-of-Experts (MoE) structure that integrates the detector's localization capability and the VFM's generalization by using VFM features to improve detector features. Specifically, we propose Expert-wise Router (ER) that selects the most relevant VFM experts for each backbone layer, and Region-wise Router (RR) that emphasizes foreground and suppress background. To bridge representation gaps, we further propose Shared Expert Projection (SEP) module and Private Expert Projection (PEP) module, which align VFM features to the detector feature space while decoupling shared image feature from private image feature in the VFM feature map. Finally, we propose MoE module to transfer the VFM’s generalization to the detector without altering the detector original architecture. Furthermore, our method extend well-trained detectors for detecting novel classes in unseen domains without re-training on the base classes. Experimental results on multiple cross-domain datasets validate the effectiveness of our method.

AAAI Conference 2025 Conference Paper

Effective and Efficient Representation Learning for Flight Trajectories

  • Shuo Liu
  • Wenbin Li
  • Di Yao
  • Jingping Bi

Flight trajectory data plays a vital role in the traffic management community, especially for downstream tasks such as trajectory prediction, flight recognition, and anomaly detection. Existing works often utilize handcrafted features and design models for different tasks individually, which heavily rely on domain expertise and are hard to extend. We argue that different flight analysis tasks share the same useful features of the trajectory. Jointly learning a unified representation for flight trajectories could be beneficial for improving the performance of various tasks. However, flight trajectory representation learning (TRL) faces two primary challenges, \ie unbalanced behavior density and 3D spatial continuity, which disable recent general TRL methods. In this paper, we propose Flight2Vec, a flight-specific representation learning method to address these challenges. Specifically, a behavior-adaptive patching mechanism is used to inspire the learned representation to pay more attention to behavior-dense segments. Moreover, we introduce a motion trend learning technique that guides the model to memorize not only the precise locations, but also the motion trend to generate better representations. Extensive experimental results demonstrate that Flight2Vec significantly improves performance in downstream tasks such as flight trajectory prediction, flight recognition, and anomaly detection.

NeurIPS Conference 2025 Conference Paper

Efficient Last-Iterate Convergence in Solving Extensive-Form Games

  • Linjian Meng
  • Tianpei Yang
  • Youzhi Zhang
  • Zhenxing Ge
  • Shangdong Yang
  • Tianyu Ding
  • Wenbin Li
  • Bo An

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR$^+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR$^+$ to solve perturbed regularized EFGs, we get Reward Transformation CFR$^+$ (RTCFR$^+$). Importantly, we extend prior work on the parameter-free property of CFR$^+$, enhancing its stability, which is vital for the empirical convergence of RTCFR$^+$. Experiments show that RTCFR$^+$ exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence. Interestingly, RTCFR$^+$ show performance no worse than average-iterate convergence CFR algorithms. It is the first last-iterate convergence algorithm to achieve such performance. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-RTCFR.

NeurIPS Conference 2025 Conference Paper

Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria

  • Linjian Meng
  • Youzhi Zhang
  • Zhenxing Ge
  • Tianyu Ding
  • Shangdong Yang
  • Zheng Xu
  • Wenbin Li
  • Yang Gao

Regret Matching$^+$ (RM$^+$) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM$^+$ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM$^+$ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM$^+$ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM$^+$ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM$^+$ (SOGRM$^+$) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM$^+$ variants. Experiments show that SOGRM$^+$ significantly outperforms other algorithms. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-SOGRM.

AAAI Conference 2025 Conference Paper

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

  • Yayuan Li
  • Jintao Guo
  • Lei Qi
  • Wenbin Li
  • Yinghuan Shi

Contrastive Language-Image Pretraining (CLIP) has been widely used in vision tasks. Notably, CLIP has demonstrated promising performance in few-shot learning (FSL). However, existing CLIP-based methods in training-free FSL (i.e., without the requirement of additional training) mainly learn different modalities independently, leading to two essential issues: 1) severe anomalous match in image modality; 2) varying quality of generated text prompts. To address these issues, we build a mutual guidance mechanism, that introduces an Image-Guided-Text (IGT) component to rectify varying quality of text prompts through image representations, and a Text-Guided-Image (TGI) component to mitigate the anomalous match of image modality through text representations. By integrating IGT and TGI, we adopt a perspective of Text-Image Mutual guidance Optimization, proposing TIMO. Extensive experiments show that TIMO significantly outperforms the state-of-the-art (SOTA) training-free method. Additionally, by exploring the extent of mutual guidance, we propose an enhanced variant, TIMO-S, which even surpasses the best training-required methods by 0.33% with approximately ×100 less time cost.

IJCAI Conference 2024 Conference Paper

DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition

  • Rui Yan
  • Hongyu Qu
  • Xiangbo Shu
  • Wenbin Li
  • Jinhui Tang
  • Tieniu Tan

Finetuning the large vision-language models on video data with a set of learnable prompts has shown promising performance on zero-shot activity recognition but still requires extra video data and expensive training costs. Inspired by recent Test-time Prompt Tuning (TPT) on the image domain, this work attempts to extend TPT to video data for zero-shot activity recognition. However, monotonous spatial augmentation and short class names cannot meet the need to capture diverse and complicated semantics of human behavior during prompt tuning. To this end, this work proposes a Dual Temporal-Sync Test-time Prompt Tuning (DTS-TPT) framework for zero-shot activity recognition. DTS-TPT tunes the learnable prompts appended to text inputs on video feature sequences of different temporal scales in multiple steps during test time. In each tuning step, we minimize the semantic consistency among the predictions from video feature sequences randomly augmented via AugMix with both original class names and the corresponding description generated through LLM. Compared with the state-of-the-art methods, the proposed method improves the zero-shot top-1 accuracy by approximately 2% ~ 5% on popular benchmarks. The code is available at https: //github. com/quhongyu/DTS-TPT.

AAAI Conference 2024 Conference Paper

Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning

  • Chao Li
  • Yupeng Zhang
  • Jianqi Wang
  • Yujing Hu
  • Shaokang Dong
  • Wenbin Li
  • Tangjie Lv
  • Changjie Fan

In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or sub-optimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.

IJCAI Conference 2024 Conference Paper

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

  • Chao Li
  • Yujing Hu
  • Shangdong Yang
  • Tangjie Lv
  • Changjie Fan
  • Wenbin Li
  • Chongjie Zhang
  • Yang Gao

This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

IROS Conference 2024 Conference Paper

Visual Perception System for Autonomous Driving

  • Qi Zhang
  • Siyuan Gou
  • Wenbin Li

The recent surge in interest in autonomous driving is fueled by its rapidly developing capacity to enhance safety, efficiency, and convenience. A key component of autonomous driving technology lies in its perceptual systems, where advancements have led to more precise algorithms applicable to autonomous driving, such as vision-based Simultaneous Localization and Mapping (SLAM), object detection, and tracking algorithms. This work introduces a visual-based perception system for autonomous driving that integrates trajectory tracking and prediction of moving objects to prevent collisions while addressing the localization and mapping needs of autonomous driving. The system leverages motion cues from pedestrians to monitor and forecast their movements while simultaneously mapping the environment. This integrated approach resolves camera localization and tracks other moving objects in the scene, ultimately generating a sparse map to facilitate vehicle navigation. The performance, efficiency, and resilience of this approach are demonstrated through comprehensive evaluations of both simulated and real-world datasets.

IS Journal 2023 Journal Article

Effective Interpretable Policy Distillation via Critical Experience Point Identification

  • Xiao Liu
  • Shuyang Liu
  • Bo An
  • Yang Gao
  • Shangdong Yang
  • Wenbin Li

Interpretable policy distillation aims to imitate a deep reinforcement learning (DRL) policy into a self-explainable model. However, the distilled policy usually does not generalize well to complex tasks. To investigate this phenomenon, we examine the experience pools of DRL tasks and find that these interactive experience distributions are heavy tailed. However, this critical issue is largely ignored by existing approaches, and, thus, they do not fully unitize the less frequent but very critical experience points. To address this issue, we propose characterizing decision boundaries via the minimum experience retention to deal with the heavy-tailed experience distributions. Our method identifies critical experience points that are close to the model’s decision boundaries, and such experience points are more critical because they portray the prerequisite of a model to take an action. As a result, our method distills the DRL policy to a self-explainable structure without a neural structure and ambiguous intermediate parameters. Through experiments on six games, we show that our method outperforms the state-of-the-art baselines in cumulative rewards, stability, and faithfulness.

NeurIPS Conference 2023 Conference Paper

Efficient Subgame Refinement for Extensive-form Games

  • Zhenxing Ge
  • Zheng Xu
  • Tianyu Ding
  • Wenbin Li
  • Yang Gao

Subgame solving is an essential technique in addressing large imperfect information games, with various approaches developed to enhance the performance of refined strategies in the abstraction of the target subgame. However, directly applying existing subgame solving techniques may be difficult, due to the intricate nature and substantial size of many real-world games. To overcome this issue, recent subgame solving methods allow for subgame solving on limited knowledge order subgames, increasing their applicability in large games; yet this may still face obstacles due to extensive information set sizes. To address this challenge, we propose a generative subgame solving (GS2) framework, which utilizes a generation function to identify a subset of the earliest-reached nodes, reducing the size of the subgame. Our method is supported by a theoretical analysis and employs a diversity-based generation function to enhance safety. Experiments conducted on medium-sized games as well as the challenging large game of GuanDan demonstrate a significant improvement over the blueprint.

AAAI Conference 2023 Conference Paper

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

  • Wubing Chen
  • Wenbin Li
  • Xiao Liu
  • Shangdong Yang
  • Yang Gao

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

AAMAS Conference 2023 Conference Paper

TiLD: Third-person Imitation Learning by Estimating Domain Cognitive Differences of Visual Demonstrations

  • Zixuan Chen
  • Wenbin Li
  • Yang Gao
  • Yiyu Chen

To enable agents to effectively imitate from the third-person visual demonstrations in complex imitation learning (IL) tasks, in this paper, we propose a new IL method, which is named third-person imitation learning by estimating domain cognitive differences (TiLD). The proposed TiLD is able to eliminate the domain cognitive difference between the samples from different perspectives, so as to achieve the purpose of allowing agent to directly learn from the third-person demonstrations. Experimental results indicate that TiLD can achieve significant performance improvements over the existing state-of-the-art IL methods, when dealing with imitation learning tasks with third-person expert demonstrations.

TMLR Journal 2023 Journal Article

Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

  • Wenbin Li
  • Xuesong Yang
  • Meihao Kong
  • Lei Wang
  • Jing Huo
  • Yang Gao
  • Jiebo Luo

Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require non-trivial asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods. Code is available at https://github.com/WenbinLee/Trip-ROMA.

IJCAI Conference 2020 Conference Paper

Asymmetric Distribution Measure for Few-shot Learning

  • Wenbin Li
  • Lei Wang
  • Jing Huo
  • Yinghuan Shi
  • Yang Gao
  • Jiebo Luo

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https: //github. com/WenbinLee/ADM. git.

IJCAI Conference 2020 Conference Paper

Biased Feature Learning for Occlusion Invariant Face Recognition

  • Changbin Shao
  • Jing Huo
  • Lei Qi
  • Zhen-Hua Feng
  • Wenbin Li
  • Chuanqi Dong
  • Yang Gao

To address the challenges posed by unknown occlusions, we propose a Biased Feature Learning (BFL) framework for occlusion-invariant face recognition. We first construct an extended dataset using a multi-scale data augmentation method. For model training, we modify the label loss to adjust the impact of normal and occluded samples. Further, we propose a biased guidance strategy to manipulate the optimization of a network so that the feature embedding space is dominated by non-occluded faces. BFL not only enhances the robustness of a network to unknown occlusions but also maintains or even improves its performance for normal faces. Experimental results demonstrate its superiority as well as the generalization capability with different network architectures and loss functions.

AAAI Conference 2020 Conference Paper

Deep Embedded Complementary and Interactive Information for Multi-View Classification

  • Jinglin Xu
  • Wenbin Li
  • Xinwang Liu
  • Dingwen Zhang
  • Ji Liu
  • Junwei Han

Multi-view classification optimally integrates various features from different views to improve classification tasks. Though most of the existing works demonstrate promising performance in various computer vision applications, we observe that they can be further improved by sufficiently utilizing complementary view-specific information, deep interactive information between different views, and the strategy of fusing various views. In this work, we propose a novel multi-view learning framework that seamlessly embeds various view-specific information and deep interactive information and introduces a novel multi-view fusion strategy to make a joint decision during the optimization for classification. Specifically, we utilize different deep neural networks to learn multiple view-specific representations, and model deep interactive information through a shared interactive network using the cross-correlations between attributes of these representations. After that, we adaptively integrate multiple neural networks by flexibly tuning the power exponent of weight, which not only avoids the trivial solution of weight but also provides a new approach to fuse outputs from different deterministic neural networks. Extensive experiments on several public datasets demonstrate the rationality and effectiveness of our method.

IJCAI Conference 2020 Conference Paper

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

  • Jinglin Xu
  • Xiangsen Zhang
  • Wenbin Li
  • Xinwang Liu
  • Junwei Han

Three-dimensional (3D) object classification is widely involved in various computer vision applications, e. g. , autonomous driving, simultaneous localization and mapping, which has attracted lots of attention in the committee. However, solving 3D object classification by directly employing the 3D convolutional neural networks (CNNs) generally suffers from high computational cost. Besides, existing view-based methods cannot better explore the content relationships between views. To this end, this work proposes a novel multi-view framework by jointly using multiple 2D-CNNs to capture discriminative information with relationships as well as a new multi-view loss fusion strategy, in an end-to-end manner. Specifically, we utilize multiple 2D views of a 3D object as input and integrate the intra-view and inter-view information of each view through the view-specific 2D-CNN and a series of modules (outer product, view pair pooling, 1D convolution, and fully connected transformation). Furthermore, we design a novel view ensemble mechanism that selects several discriminative and informative views to jointly infer the category of a 3D object. Extensive experiments demonstrate that the proposed method is able to outperform current state-of-the-art methods on 3D object classification. More importantly, this work provides a new way to improve 3D object classification from the perspective of fully utilizing well-established 2D-CNNs.

AAAI Conference 2020 Conference Paper

Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio

  • Xiao Liu
  • Wenbin Li
  • Jing Huo
  • Lili Yao
  • Yang Gao

Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, e. g. , pruning. In this way, two kinds of data are usually preserved for this compressed model, i. e. , non-zero weights and meta-data, where metadata is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a layerwise sparse coding (LSC) method to maximize the compression ratio by extremely reducing the amount of meta-data. We first divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel signed relative index (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (e. g. , 51. 03x compression ratio on ADMM-Lenet) and inference computation (i. e. , time reduction and extremely less memory bandwidth), over stateof-the-art baselines.

IJCAI Conference 2020 Conference Paper

Learning Task-aware Local Representations for Few-shot Learning

  • Chuanqi Dong
  • Wenbin Li
  • Jing Huo
  • Zheng Gu
  • Yang Gao

Few-shot learning for visual recognition aims to adapt to novel unseen classes with only a few images. Recent work, especially the work based on low-level information, has achieved great progress. In these work, local representations (LRs) are typically employed, because LRs are more consistent among the seen and unseen classes. However, most of them are limited to an individual image-to-image or image-to-class measure manner, which cannot fully exploit the capabilities of LRs, especially in the context of a certain task. This paper proposes an Adaptive Task-aware Local Representations Network (ATL-Net) to address this limitation by introducing episodic attention, which can adaptively select the important local patches among the entire task, as the process of human recognition. We achieve much superior results on multiple benchmarks. On the miniImagenet, ATL-Net gains 0. 93% and 0. 88% improvements over the compared methods under the 5-way 1-shot and 5-shot settings. Moreover, ATL-Net can naturally tackle the problem that how to adaptively identify and weight the importance of different key local parts, which is the major concern of fine-grained recognition. Specifically, on the fine-grained dataset Stanford Dogs, ATL-Net outperforms the second best method with 5. 39% and 9. 69% gains under the 5-way 1-shot and 5-shot settings.

AAAI Conference 2019 Conference Paper

Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning

  • Wenbin Li
  • Jinglin Xu
  • Jing Huo
  • Lei Wang
  • Yang Gao
  • Jiebo Luo

Few-shot learning aims to recognize new concepts from very few examples. However, most of the existing few-shot learning methods mainly concentrate on the first-order statistic of concept representation or a fixed metric on the relation between a sample and a concept. In this work, we propose a novel end-to-end deep architecture, named Covariance Metric Networks (CovaMNet). The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks. Specifically, we construct an embedded local covariance representation to extract the second-order statistic information of each concept and describe the underlying distribution of this concept. Upon the covariance representation, we further define a new deep covariance metric to measure the consistency of distributions between query samples and new concepts. Furthermore, we employ the episodic training mechanism to train the entire network in an end-to-end manner from scratch. Extensive experiments in two tasks, generic few-shot image classification and fine-grained fewshot image classification, demonstrate the superiority of the proposed CovaMNet. The source code can be available from https: //github. com/WenbinLee/CovaMNet. git.

YNICL Journal 2018 Journal Article

Volume alteration of hippocampal subfields in first-episode antipsychotic-naïve schizophrenia patients before and after acute antipsychotic treatment

  • Wenbin Li
  • Kaiming Li
  • Pujun Guan
  • Ying Chen
  • Yuan Xiao
  • Su Lui
  • John A. Sweeney
  • Qiyong Gong

The nature of hippocampal changes in schizophrenia before first treatment, and whether hippocampal subfields are affected by antipsychotic treatment are important questions for schizophrenia research. Forty-one first-episode antipsychotic-naïve acutely ill schizophrenia inpatients had MRI scans before and six weeks after antipsychotic treatment. Thirty-nine matched healthy controls were also scanned, twenty-two of which were scanned a second time six weeks later. Volumes of hippocampal subfields were measured via FreeSurfer v6.0 using a longitudinal analysis pipeline. Before treatment, schizophrenia patients had no significant changes in total hippocampal volume but exhibited significantly greater subfield volumes than controls in bilateral molecular layers of the hippocampus (ML), bilateral granular cell layers of the dentate gyrus (GC-DG), and bilateral cornu ammonis area 4 (CA4). After six weeks of antipsychotic treatment, patients showed volume reductions compared with pretreatment scans in total hippocampus bilaterally, with subfield volume reduction noted in previously enlarged subfields (i.e., bilateral ML, GC-DG and CA4) and in bilateral hippocampal tails, left CA1, CA3, and fimbria. Subfields with volume increases before treatment were reduced to the level of healthy controls (bilateral ML and GC-DG) or near to it (bilateral CA4) after treatment. These results indicate subfield-specific hippocampal hypertrophy prior to treatment, and that these abnormalities were reduced after acute antipsychotic therapy in a dose-related manner together with volume reductions in other areas that were not hypertrophic before treatment.

AAAI Conference 2017 Conference Paper

Beyond IID: Learning to Combine Non-IID Metrics for Vision Tasks

  • Yinghuan Shi
  • Wenbin Li
  • Yang Gao
  • Longbing Cao
  • Dinggang Shen

Metric learning has been widely employed, especially in various computer vision tasks, with the fundamental assumption that all samples (e. g. , regions/superpixels in images/videos) are independent and identically distributed (IID). However, since the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, they are not IID (non-IID for short), which cannot be directly handled by existing methods. Thus, we propose to learn and integrate non-IID metrics (NIME). To incorporate the non-IID spatial/temporal relations, instead of directly using non-IID features and metric learning as previous methods, NIME first builds several non-IID representations on original (non-IID) features by various graph kernel functions, and then automatically learns the metric under the best combination of various non-IID representations. NIME is applied to solve two typical computer vision tasks: interactive image segmentation and histology image identification. The results show that learning and integrating non-IID metrics improves the performance, compared to the IID methods. Moreover, our method achieves results comparable or better than that of the state-of-the-arts.

IS Journal 2014 Journal Article

WaaS: Wisdom as a Service

  • Jianhui Chen
  • Jianhua Ma
  • Ning Zhong
  • Yiyu Yao
  • Jiming Liu
  • Runhe Huang
  • Wenbin Li
  • Zhisheng Huang

An emerging hyper-world encompasses all human activities in a social-cyber-physical space. Its power derives from the Wisdom Web of Things (W2T) cycle, namely, "from things to data, information, knowledge, wisdom, services, humans, and then back to things. "' The W2T cycle leads to a harmonious symbiosis among humans, computers, and things, which can be constructed by large-scale converging of intelligent information technology applications with an open and interoperable architecture. The recent advances in cloud computing, the Internet of Things, Web of Things, Big Data, and other research fields have provided just such an open system architecture with resource sharing and services. The next step is to develop an open and interoperable content architecture with intelligent sharing and services for the organization and transformation in the data, information, knowledge, and wisdom (DIKW) hierarchy. This article introduces wisdom as a service (WaaS), a content architecture based on the pay-as-you-go IT trend. The WaaS infrastructure and the main challenges in WaaS research and applications are discussed. A case study is also described. Relying on cloud computing and big data, WaaS provides a practical approach to realize the W2T cycle in the hyper-world for the coming age of ubiquitous intelligent IT applications.