Author name cluster

Wenbin Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models

Ziwei Liu
Borui Kang
Wei Li
Hangjie Yuan
Yanbing Yang
Wenbin Li
Yifan Zhu
Tao Feng

Vision-Language Continual Learning (VLCL) has attracted significant research attention for its robust capabilities, and the adoption of Parameter-Efficient Fine-Tuning (PEFT) strategies is enabling these models to achieve competitive performance with substantially reduced resource consumption. However, dominated First-Order (FO) optimization is prone to trap models in suboptimal local minima, especially in limited exploration subspace within PEFT. To overcome this challenge, this paper pioneers a systematic exploration of adopting Zeroth-Order (ZO) optimization for PEFT-based VLCL. We first identify the incompatibility of naive full-ZO adoption in VLCL due to optimization process instability. We then investigate the application of ZO optimization from a modality branch-wise to a fine-grained layer-wise across various training units to identify an optimal strategy. Besides, a key theoretical insight reveals that vision modality exhibit higher variance than language counterparts in VLCL during the ZO optimization process, and we propose a modality-aware stabilized ZO strategy, which adopts gradient sign normalization in ZO and constrains vision modality perturbation to further improve performance. Benefiting from the adoption of ZO optimization, PEFT-based VLCL fulfills better ability to escape local minima during the optimization process, extensive experiments on four benchmarks demonstrate that our method achieves state-of-the-art results.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

DON’T NEED RETRAINING: A Mixture of DETR and Vision Foundation Models for Cross-Domain Few-Shot Object Detection

Changhan Liu
Xunzhi Xiang
Zixuan Duan
Wenbin Li
Qi Fan
Yang Gao

Cross-Domain Few-Shot Object Detection (CD-FSOD) aims to generalize to unseen domains by leveraging a few annotated samples of the target domain, requiring models to exhibit both strong generalization and localization capabilities. However, existing well-trained detectors typically have strong localization capabilities but lack generalization, whereas vision foundation models (VFMs) generally exhibit better generalization but lack accurate localization capabilities. In this paper, we propose a novel Mixture-of-Experts (MoE) structure that integrates the detector's localization capability and the VFM's generalization by using VFM features to improve detector features. Specifically, we propose Expert-wise Router (ER) that selects the most relevant VFM experts for each backbone layer, and Region-wise Router (RR) that emphasizes foreground and suppress background. To bridge representation gaps, we further propose Shared Expert Projection (SEP) module and Private Expert Projection (PEP) module, which align VFM features to the detector feature space while decoupling shared image feature from private image feature in the VFM feature map. Finally, we propose MoE module to transfer the VFM’s generalization to the detector without altering the detector original architecture. Furthermore, our method extend well-trained detectors for detecting novel classes in unseen domains without re-training on the base classes. Experimental results on multiple cross-domain datasets validate the effectiveness of our method.

PDF Details

AAAI Conference 2025 Conference Paper

Effective and Efficient Representation Learning for Flight Trajectories

Shuo Liu
Wenbin Li
Di Yao
Jingping Bi

Flight trajectory data plays a vital role in the traffic management community, especially for downstream tasks such as trajectory prediction, flight recognition, and anomaly detection. Existing works often utilize handcrafted features and design models for different tasks individually, which heavily rely on domain expertise and are hard to extend. We argue that different flight analysis tasks share the same useful features of the trajectory. Jointly learning a unified representation for flight trajectories could be beneficial for improving the performance of various tasks. However, flight trajectory representation learning (TRL) faces two primary challenges, \ie unbalanced behavior density and 3D spatial continuity, which disable recent general TRL methods. In this paper, we propose Flight2Vec, a flight-specific representation learning method to address these challenges. Specifically, a behavior-adaptive patching mechanism is used to inspire the learned representation to pay more attention to behavior-dense segments. Moreover, we introduce a motion trend learning technique that guides the model to memorize not only the precise locations, but also the motion trend to generate better representations. Extensive experimental results demonstrate that Flight2Vec significantly improves performance in downstream tasks such as flight trajectory prediction, flight recognition, and anomaly detection.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Efficient Last-Iterate Convergence in Solving Extensive-Form Games

Linjian Meng
Tianpei Yang
Youzhi Zhang
Zhenxing Ge
Shangdong Yang
Tianyu Ding
Wenbin Li
Bo An

To establish last-iterate convergence for Counterfactual Regret Minimization (CFR) algorithms in learning a Nash equilibrium (NE) of extensive-form games (EFGs), recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs. Hence, proving last-iterate convergence in solving the original EFG reduces to proving last-iterate convergence in solving (perturbed) regularized EFGs. However, these studies only establish last-iterate convergence for Online Mirror Descent (OMD)-based CFR algorithms instead of Regret Matching (RM)-based CFR algorithms in solving perturbed regularized EFGs, resulting in a poor empirical convergence rate, as RM-based CFR algorithms typically outperform OMD-based CFR algorithms. In addition, as solving multiple perturbed regularized EFGs is required, fine-tuning across multiple perturbed regularized EFGs is infeasible, making parameter-free algorithms highly desirable. This paper show that CFR$^+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs. This is the first parameter-free last-iterate convergence for RM-based CFR algorithms in perturbed regularized EFGs. Leveraging CFR$^+$ to solve perturbed regularized EFGs, we get Reward Transformation CFR$^+$ (RTCFR$^+$). Importantly, we extend prior work on the parameter-free property of CFR$^+$, enhancing its stability, which is vital for the empirical convergence of RTCFR$^+$. Experiments show that RTCFR$^+$ exhibits a significantly faster empirical convergence rate than existing algorithms that achieve theoretical last-iterate convergence. Interestingly, RTCFR$^+$ show performance no worse than average-iterate convergence CFR algorithms. It is the first last-iterate convergence algorithm to achieve such performance. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-RTCFR.

PDF Details

NeurIPS Conference 2025 Conference Paper

Last-Iterate Convergence of Smooth Regret Matching$^+$ Variants in Learning Nash Equilibria

Linjian Meng
Youzhi Zhang
Zhenxing Ge
Tianyu Ding
Shangdong Yang
Zheng Xu
Wenbin Li
Yang Gao

Regret Matching$^+$ (RM$^+$) variants are widely used to build superhuman Poker AIs, yet few studies investigate their last-iterate convergence in learning a Nash equilibrium (NE). Although their last-iterate convergence is established for games satisfying the Minty Variational Inequality (MVI), no studies have demonstrated that these algorithms achieve such convergence in the broader class of games satisfying the weak MVI. A key challenge in proving last-iterate convergence for RM$^+$ variants in games satisfying the weak MVI is that even if the game's loss gradient satisfies the weak MVI, RM$^+$ variants operate on a transformed loss feedback which does not satisfy the weak MVI. To provide last-iterate convergence for RM$^+$ variants, we introduce a concise yet novel proof paradigm that involves: (i) transforming an RM$^+$ variant into an Online Mirror Descent (OMD) instance that updates within the original strategy space of the game to recover the weak MVI, and (ii) showing last-iterate convergence by proving the distance between accumulated regrets converges to zero via the recovered weak MVI of the feedback. Inspired by our proof paradigm, we propose Smooth Optimistic Gradient Based RM$^+$ (SOGRM$^+$) and show that it achieves last-iterate and finite-time best-iterate convergence in learning an NE of games satisfying the weak MVI, the weakest condition among all known RM$^+$ variants. Experiments show that SOGRM$^+$ significantly outperforms other algorithms. Our code is available at https: //github. com/menglinjian/NeurIPS-2025-SOGRM.

PDF Details

AAAI Conference 2025 Conference Paper

Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP

Yayuan Li
Jintao Guo
Lei Qi
Wenbin Li
Yinghuan Shi

Contrastive Language-Image Pretraining (CLIP) has been widely used in vision tasks. Notably, CLIP has demonstrated promising performance in few-shot learning (FSL). However, existing CLIP-based methods in training-free FSL (i.e., without the requirement of additional training) mainly learn different modalities independently, leading to two essential issues: 1) severe anomalous match in image modality; 2) varying quality of generated text prompts. To address these issues, we build a mutual guidance mechanism, that introduces an Image-Guided-Text (IGT) component to rectify varying quality of text prompts through image representations, and a Text-Guided-Image (TGI) component to mitigate the anomalous match of image modality through text representations. By integrating IGT and TGI, we adopt a perspective of Text-Image Mutual guidance Optimization, proposing TIMO. Extensive experiments show that TIMO significantly outperforms the state-of-the-art (SOTA) training-free method. Additionally, by exploring the extent of mutual guidance, we propose an enhanced variant, TIMO-S, which even surpasses the best training-required methods by 0.33% with approximately ×100 less time cost.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

DTS-TPT: Dual Temporal-Sync Test-time Prompt Tuning for Zero-shot Activity Recognition

Rui Yan
Hongyu Qu
Xiangbo Shu
Wenbin Li
Jinhui Tang
Tieniu Tan

Finetuning the large vision-language models on video data with a set of learnable prompts has shown promising performance on zero-shot activity recognition but still requires extra video data and expensive training costs. Inspired by recent Test-time Prompt Tuning (TPT) on the image domain, this work attempts to extend TPT to video data for zero-shot activity recognition. However, monotonous spatial augmentation and short class names cannot meet the need to capture diverse and complicated semantics of human behavior during prompt tuning. To this end, this work proposes a Dual Temporal-Sync Test-time Prompt Tuning (DTS-TPT) framework for zero-shot activity recognition. DTS-TPT tunes the learnable prompts appended to text inputs on video feature sequences of different temporal scales in multiple steps during test time. In each tuning step, we minimize the semantic consistency among the predictions from video feature sequences randomly augmented via AugMix with both original class names and the corresponding description generated through LLM. Compared with the state-of-the-art methods, the proposed method improves the zero-shot top-1 accuracy by approximately 2% ~ 5% on popular benchmarks. The code is available at https: //github. com/quhongyu/DTS-TPT.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning

Chao Li
Yupeng Zhang
Jianqi Wang
Yujing Hu
Shaokang Dong
Wenbin Li
Tangjie Lv
Changjie Fan

In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or sub-optimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

Chao Li
Yujing Hu
Shangdong Yang
Tangjie Lv
Changjie Fan
Wenbin Li
Chongjie Zhang
Yang Gao

This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

PDF Details DOI

IROS Conference 2024 Conference Paper

Visual Perception System for Autonomous Driving

Qi Zhang
Siyuan Gou
Wenbin Li

The recent surge in interest in autonomous driving is fueled by its rapidly developing capacity to enhance safety, efficiency, and convenience. A key component of autonomous driving technology lies in its perceptual systems, where advancements have led to more precise algorithms applicable to autonomous driving, such as vision-based Simultaneous Localization and Mapping (SLAM), object detection, and tracking algorithms. This work introduces a visual-based perception system for autonomous driving that integrates trajectory tracking and prediction of moving objects to prevent collisions while addressing the localization and mapping needs of autonomous driving. The system leverages motion cues from pedestrians to monitor and forecast their movements while simultaneously mapping the environment. This integrated approach resolves camera localization and tracks other moving objects in the scene, ultimately generating a sparse map to facilitate vehicle navigation. The performance, efficiency, and resilience of this approach are demonstrated through comprehensive evaluations of both simulated and real-world datasets.

Details

IS Journal 2023 Journal Article

Effective Interpretable Policy Distillation via Critical Experience Point Identification

Xiao Liu
Shuyang Liu
Bo An
Yang Gao
Shangdong Yang
Wenbin Li

Interpretable policy distillation aims to imitate a deep reinforcement learning (DRL) policy into a self-explainable model. However, the distilled policy usually does not generalize well to complex tasks. To investigate this phenomenon, we examine the experience pools of DRL tasks and find that these interactive experience distributions are heavy tailed. However, this critical issue is largely ignored by existing approaches, and, thus, they do not fully unitize the less frequent but very critical experience points. To address this issue, we propose characterizing decision boundaries via the minimum experience retention to deal with the heavy-tailed experience distributions. Our method identifies critical experience points that are close to the model’s decision boundaries, and such experience points are more critical because they portray the prerequisite of a model to take an action. As a result, our method distills the DRL policy to a self-explainable structure without a neural structure and ambiguous intermediate parameters. Through experiments on six games, we show that our method outperforms the state-of-the-art baselines in cumulative rewards, stability, and faithfulness.

Details DOI

NeurIPS Conference 2023 Conference Paper

Efficient Subgame Refinement for Extensive-form Games

Zhenxing Ge
Zheng Xu
Tianyu Ding
Wenbin Li
Yang Gao

Subgame solving is an essential technique in addressing large imperfect information games, with various approaches developed to enhance the performance of refined strategies in the abstraction of the target subgame. However, directly applying existing subgame solving techniques may be difficult, due to the intricate nature and substantial size of many real-world games. To overcome this issue, recent subgame solving methods allow for subgame solving on limited knowledge order subgames, increasing their applicability in large games; yet this may still face obstacles due to extensive information set sizes. To address this challenge, we propose a generative subgame solving (GS2) framework, which utilizes a generation function to identify a subset of the earliest-reached nodes, reducing the size of the subgame. Our method is supported by a theoretical analysis and employs a diversity-based generation function to enhance safety. Experiments conducted on medium-sized games as well as the challenging large game of GuanDan demonstrate a significant improvement over the blueprint.

PDF Details

AAAI Conference 2023 Conference Paper

Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient

Wubing Chen
Wenbin Li
Xiao Liu
Shangdong Yang
Yang Gao

Cooperative multi-agent policy gradient (MAPG) algorithms have recently attracted wide attention and are regarded as a general scheme for the multi-agent system. Credit assignment plays an important role in MAPG and can induce cooperation among multiple agents. However, most MAPG algorithms cannot achieve good credit assignment because of the game-theoretic pathology known as centralized-decentralized mismatch. To address this issue, this paper presents a novel method, Multi-Agent Polarization Policy Gradient (MAPPG). MAPPG takes a simple but efficient polarization function to transform the optimal consistency of joint and individual actions into easily realized constraints, thus enabling efficient credit assignment in MAPPG. Theoretically, we prove that individual policies of MAPPG can converge to the global optimum. Empirically, we evaluate MAPPG on the well-known matrix game and differential game, and verify that MAPPG can converge to the global optimum for both discrete and continuous action spaces. We also evaluate MAPPG on a set of StarCraft II micromanagement tasks and demonstrate that MAPPG outperforms the state-of-the-art MAPG algorithms.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

TiLD: Third-person Imitation Learning by Estimating Domain Cognitive Differences of Visual Demonstrations

Zixuan Chen
Wenbin Li
Yang Gao
Yiyu Chen

To enable agents to effectively imitate from the third-person visual demonstrations in complex imitation learning (IL) tasks, in this paper, we propose a new IL method, which is named third-person imitation learning by estimating domain cognitive differences (TiLD). The proposed TiLD is able to eliminate the domain cognitive difference between the samples from different perspectives, so as to achieve the purpose of allowing agent to directly learn from the third-person demonstrations. Experimental results indicate that TiLD can achieve significant performance improvements over the existing state-of-the-art IL methods, when dealing with imitation learning tasks with third-person expert demonstrations.

PDF

TMLR Journal 2023 Journal Article

Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings

Wenbin Li
Xuesong Yang
Meihao Kong
Lei Wang
Jing Huo
Yang Gao
Jiebo Luo

Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require non-trivial asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods. Code is available at https://github.com/WenbinLee/Trip-ROMA.

PDF Details

IJCAI Conference 2020 Conference Paper

Asymmetric Distribution Measure for Few-shot Learning

Wenbin Li
Lei Wang
Jing Huo
Yinghuan Shi
Yang Gao
Jiebo Luo

The core idea of metric-based few-shot image classification is to directly measure the relations between query images and support classes to learn transferable feature embeddings. Previous work mainly focuses on image-level feature representations, which actually cannot effectively estimate a class's distribution due to the scarcity of samples. Some recent work shows that local descriptor based representations can achieve richer representations than image-level based representations. However, such works are still based on a less effective instance-level metric, especially a symmetric metric, to measure the relation between a query image and a support class. Given the natural asymmetric relation between a query image and a support class, we argue that an asymmetric measure is more suitable for metric-based few-shot learning. To that end, we propose a novel Asymmetric Distribution Measure (ADM) network for few-shot learning by calculating a joint local and global asymmetric measure between two multivariate local distributions of a query and a class. Moreover, a task-aware Contrastive Measure Strategy (CMS) is proposed to further enhance the measure function. On popular miniImageNet and tieredImageNet, ADM can achieve the state-of-the-art results, validating our innovative design of asymmetric distribution measures for few-shot learning. The source code can be downloaded from https: //github. com/WenbinLee/ADM. git.

PDF Details DOI

IJCAI Conference 2020 Conference Paper

Biased Feature Learning for Occlusion Invariant Face Recognition

Changbin Shao
Jing Huo
Lei Qi
Zhen-Hua Feng
Wenbin Li
Chuanqi Dong
Yang Gao

To address the challenges posed by unknown occlusions, we propose a Biased Feature Learning (BFL) framework for occlusion-invariant face recognition. We first construct an extended dataset using a multi-scale data augmentation method. For model training, we modify the label loss to adjust the impact of normal and occluded samples. Further, we propose a biased guidance strategy to manipulate the optimization of a network so that the feature embedding space is dominated by non-occluded faces. BFL not only enhances the robustness of a network to unknown occlusions but also maintains or even improves its performance for normal faces. Experimental results demonstrate its superiority as well as the generalization capability with different network architectures and loss functions.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Deep Embedded Complementary and Interactive Information for Multi-View Classification

Jinglin Xu
Wenbin Li
Xinwang Liu
Dingwen Zhang
Ji Liu
Junwei Han

Multi-view classiﬁcation optimally integrates various features from different views to improve classiﬁcation tasks. Though most of the existing works demonstrate promising performance in various computer vision applications, we observe that they can be further improved by sufﬁciently utilizing complementary view-speciﬁc information, deep interactive information between different views, and the strategy of fusing various views. In this work, we propose a novel multi-view learning framework that seamlessly embeds various view-speciﬁc information and deep interactive information and introduces a novel multi-view fusion strategy to make a joint decision during the optimization for classiﬁcation. Speciﬁcally, we utilize different deep neural networks to learn multiple view-speciﬁc representations, and model deep interactive information through a shared interactive network using the cross-correlations between attributes of these representations. After that, we adaptively integrate multiple neural networks by ﬂexibly tuning the power exponent of weight, which not only avoids the trivial solution of weight but also provides a new approach to fuse outputs from different deterministic neural networks. Extensive experiments on several public datasets demonstrate the rationality and effectiveness of our method.

PDF Details

IJCAI Conference 2020 Conference Paper

Joint Multi-view 2D Convolutional Neural Networks for 3D Object Classification

Jinglin Xu
Xiangsen Zhang
Wenbin Li
Xinwang Liu
Junwei Han

Three-dimensional (3D) object classification is widely involved in various computer vision applications, e. g. , autonomous driving, simultaneous localization and mapping, which has attracted lots of attention in the committee. However, solving 3D object classification by directly employing the 3D convolutional neural networks (CNNs) generally suffers from high computational cost. Besides, existing view-based methods cannot better explore the content relationships between views. To this end, this work proposes a novel multi-view framework by jointly using multiple 2D-CNNs to capture discriminative information with relationships as well as a new multi-view loss fusion strategy, in an end-to-end manner. Specifically, we utilize multiple 2D views of a 3D object as input and integrate the intra-view and inter-view information of each view through the view-specific 2D-CNN and a series of modules (outer product, view pair pooling, 1D convolution, and fully connected transformation). Furthermore, we design a novel view ensemble mechanism that selects several discriminative and informative views to jointly infer the category of a 3D object. Extensive experiments demonstrate that the proposed method is able to outperform current state-of-the-art methods on 3D object classification. More importantly, this work provides a new way to improve 3D object classification from the perspective of fully utilizing well-established 2D-CNNs.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Layerwise Sparse Coding for Pruned Deep Neural Networks with Extreme Compression Ratio

Xiao Liu
Wenbin Li
Jing Huo
Lili Yao
Yang Gao

Deep neural network compression is important and increasingly developed especially in resource-constrained environments, such as autonomous drones and wearable devices. Basically, we can easily and largely reduce the number of weights of a trained deep model by adopting a widely used model compression technique, e. g. , pruning. In this way, two kinds of data are usually preserved for this compressed model, i. e. , non-zero weights and meta-data, where metadata is employed to help encode and decode these non-zero weights. Although we can obtain an ideally small number of non-zero weights through pruning, existing sparse matrix coding methods still need a much larger amount of meta-data (may several times larger than non-zero weights), which will be a severe bottleneck of the deploying of very deep models. To tackle this issue, we propose a layerwise sparse coding (LSC) method to maximize the compression ratio by extremely reducing the amount of meta-data. We ﬁrst divide a sparse matrix into multiple small blocks and remove zero blocks, and then propose a novel signed relative index (SRI) algorithm to encode the remaining non-zero blocks (with much less meta-data). In addition, the proposed LSC performs parallel matrix multiplication without full decoding, while traditional methods cannot. Through extensive experiments, we demonstrate that LSC achieves substantial gains in pruned DNN compression (e. g. , 51. 03x compression ratio on ADMM-Lenet) and inference computation (i. e. , time reduction and extremely less memory bandwidth), over stateof-the-art baselines.

PDF Details

IJCAI Conference 2020 Conference Paper

Learning Task-aware Local Representations for Few-shot Learning

Chuanqi Dong
Wenbin Li
Jing Huo
Zheng Gu
Yang Gao

Few-shot learning for visual recognition aims to adapt to novel unseen classes with only a few images. Recent work, especially the work based on low-level information, has achieved great progress. In these work, local representations (LRs) are typically employed, because LRs are more consistent among the seen and unseen classes. However, most of them are limited to an individual image-to-image or image-to-class measure manner, which cannot fully exploit the capabilities of LRs, especially in the context of a certain task. This paper proposes an Adaptive Task-aware Local Representations Network (ATL-Net) to address this limitation by introducing episodic attention, which can adaptively select the important local patches among the entire task, as the process of human recognition. We achieve much superior results on multiple benchmarks. On the miniImagenet, ATL-Net gains 0. 93% and 0. 88% improvements over the compared methods under the 5-way 1-shot and 5-shot settings. Moreover, ATL-Net can naturally tackle the problem that how to adaptively identify and weight the importance of different key local parts, which is the major concern of fine-grained recognition. Specifically, on the fine-grained dataset Stanford Dogs, ATL-Net outperforms the second best method with 5. 39% and 9. 69% gains under the 5-way 1-shot and 5-shot settings.

PDF Details DOI

AAAI Conference 2019 Conference Paper

Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning

Wenbin Li
Jinglin Xu
Jing Huo
Lei Wang
Yang Gao
Jiebo Luo

Few-shot learning aims to recognize new concepts from very few examples. However, most of the existing few-shot learning methods mainly concentrate on the first-order statistic of concept representation or a fixed metric on the relation between a sample and a concept. In this work, we propose a novel end-to-end deep architecture, named Covariance Metric Networks (CovaMNet). The CovaMNet is designed to exploit both the covariance representation and covariance metric based on the distribution consistency for the few-shot classification tasks. Specifically, we construct an embedded local covariance representation to extract the second-order statistic information of each concept and describe the underlying distribution of this concept. Upon the covariance representation, we further define a new deep covariance metric to measure the consistency of distributions between query samples and new concepts. Furthermore, we employ the episodic training mechanism to train the entire network in an end-to-end manner from scratch. Extensive experiments in two tasks, generic few-shot image classification and fine-grained fewshot image classification, demonstrate the superiority of the proposed CovaMNet. The source code can be available from https: //github. com/WenbinLee/CovaMNet. git.

PDF Details

YNICL Journal 2018 Journal Article

Volume alteration of hippocampal subfields in first-episode antipsychotic-naïve schizophrenia patients before and after acute antipsychotic treatment

Wenbin Li
Kaiming Li
Pujun Guan
Ying Chen
Yuan Xiao
Su Lui
John A. Sweeney
Qiyong Gong

The nature of hippocampal changes in schizophrenia before first treatment, and whether hippocampal subfields are affected by antipsychotic treatment are important questions for schizophrenia research. Forty-one first-episode antipsychotic-naïve acutely ill schizophrenia inpatients had MRI scans before and six weeks after antipsychotic treatment. Thirty-nine matched healthy controls were also scanned, twenty-two of which were scanned a second time six weeks later. Volumes of hippocampal subfields were measured via FreeSurfer v6.0 using a longitudinal analysis pipeline. Before treatment, schizophrenia patients had no significant changes in total hippocampal volume but exhibited significantly greater subfield volumes than controls in bilateral molecular layers of the hippocampus (ML), bilateral granular cell layers of the dentate gyrus (GC-DG), and bilateral cornu ammonis area 4 (CA4). After six weeks of antipsychotic treatment, patients showed volume reductions compared with pretreatment scans in total hippocampus bilaterally, with subfield volume reduction noted in previously enlarged subfields (i.e., bilateral ML, GC-DG and CA4) and in bilateral hippocampal tails, left CA1, CA3, and fimbria. Subfields with volume increases before treatment were reduced to the level of healthy controls (bilateral ML and GC-DG) or near to it (bilateral CA4) after treatment. These results indicate subfield-specific hippocampal hypertrophy prior to treatment, and that these abnormalities were reduced after acute antipsychotic therapy in a dose-related manner together with volume reductions in other areas that were not hypertrophic before treatment.

Details DOI

AAAI Conference 2017 Conference Paper

Beyond IID: Learning to Combine Non-IID Metrics for Vision Tasks

Yinghuan Shi
Wenbin Li
Yang Gao
Longbing Cao
Dinggang Shen

Metric learning has been widely employed, especially in various computer vision tasks, with the fundamental assumption that all samples (e. g. , regions/superpixels in images/videos) are independent and identically distributed (IID). However, since the samples are usually spatially-connected or temporally-correlated with their physically-connected neighbours, they are not IID (non-IID for short), which cannot be directly handled by existing methods. Thus, we propose to learn and integrate non-IID metrics (NIME). To incorporate the non-IID spatial/temporal relations, instead of directly using non-IID features and metric learning as previous methods, NIME ﬁrst builds several non-IID representations on original (non-IID) features by various graph kernel functions, and then automatically learns the metric under the best combination of various non-IID representations. NIME is applied to solve two typical computer vision tasks: interactive image segmentation and histology image identiﬁcation. The results show that learning and integrating non-IID metrics improves the performance, compared to the IID methods. Moreover, our method achieves results comparable or better than that of the state-of-the-arts.

PDF Details

IS Journal 2014 Journal Article

WaaS: Wisdom as a Service

Jianhui Chen
Jianhua Ma
Ning Zhong
Yiyu Yao
Jiming Liu
Runhe Huang
Wenbin Li
Zhisheng Huang

An emerging hyper-world encompasses all human activities in a social-cyber-physical space. Its power derives from the Wisdom Web of Things (W2T) cycle, namely, "from things to data, information, knowledge, wisdom, services, humans, and then back to things. "' The W2T cycle leads to a harmonious symbiosis among humans, computers, and things, which can be constructed by large-scale converging of intelligent information technology applications with an open and interoperable architecture. The recent advances in cloud computing, the Internet of Things, Web of Things, Big Data, and other research fields have provided just such an open system architecture with resource sharing and services. The next step is to develop an open and interoperable content architecture with intelligent sharing and services for the organization and transformation in the data, information, knowledge, and wisdom (DIKW) hierarchy. This article introduces wisdom as a service (WaaS), a content architecture based on the pay-as-you-go IT trend. The WaaS infrastructure and the main challenges in WaaS research and applications are discussed. A case study is also described. Relying on cloud computing and big data, WaaS provides a practical approach to realize the W2T cycle in the hyper-world for the coming age of ubiquitous intelligent IT applications.

Details DOI

YNIMG Journal 2011 Journal Article

Enriched white matter connectivity networks for accurate identification of MCI patients

Chong-Yaw Wee
Pew-Thian Yap
Wenbin Li
Kevin Denny
Jeffrey N. Browndyke
Guy G. Potter
Kathleen A. Welsh-Bohmer
Lihong Wang

Details DOI