Arrow Research search

Author name cluster

Yang Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers
2 author rows

Possible papers

30

AAAI Conference 2026 Conference Paper

Break the Tie: Learning Cluster-Customized Category Relationships for Categorical Data Clustering

  • Mingjie Zhao
  • Zhanpei Huang
  • Yang Lu
  • Mengke Li
  • Yiqun Zhang
  • Weifeng Su
  • Yiu-ming Cheung

Categorical attributes with qualitative values are ubiquitous in cluster analysis of real datasets. Unlike the Euclidean distance of numerical attributes, the categorical attributes lack well-defined relationships of their possible values (also called categories interchangeably), which hampers the exploration of compact categorical data clusters. Although most attempts are made for developing appropriate distance metrics, they typically assume a fixed topological relationship between categories when learning distance metrics, which limits their adaptability to varying cluster structures and often leads to suboptimal clustering performance. This paper, therefore, breaks the intrinsic relationship tie of attribute categories and learns customized distance metrics suitable for flexibly and accurately revealing various cluster distributions. As a result, the fitting ability of the clustering algorithm is significantly enhanced, benefiting from the learnable category relationships. Moreover, the learned category relationships are proved to be Euclidean distance metric-compatible, enabling a seamless extension to mixed datasets that include both numerical and categorical attributes. Comparative experiments on 12 real benchmark datasets with significance tests show the superior clustering accuracy of the proposed method with an average ranking of 1.25, which is significantly higher than the 5.21 ranking of the best-performing methods. Code and extended version with detailed proofs are provided online.

AAAI Conference 2026 Conference Paper

Joint Implicit and Explicit Language Learning for Pedestrian Attribute Recognition

  • Yukang Zhang
  • Lei Tan
  • Yang Lu
  • Yan Yan
  • Hanzi Wang

Pedestrian attribute recognition (PAR) has received increasing attention due to its wide application in video surveillance and pedestrian analysis. Some text-enhanced methods tackle this task by converting attributes into language descriptions to facilitate interactive learning between attributes and visual images. However, these generic languages fail to uniquely describe different pedestrian images, missing individual characteristics. In this paper, we propose a Joint Implicit and Explicit Language Guidance Enhancement Learning (JGEL) method, which converts each pedestrian image into a language description with dual language learning to effectively learn enhanced attribute information. Specifically, we first propose an Implicit Language Guidance Learning (ILGL) stream. It projects visual image features into the text embedding space to generate pseudo-word tokens, implicitly modeling image attributes and providing personalized descriptions. Moreover, we propose an Explicit Attribute Enhancement Learning (EAEL) stream to guide the generated pseudo-word tokens obtained by ILGL explicitly aligned with pedestrian attributes, which can effectively align the pseudo-word tokens with the attribute concepts in the text embedding space. Extensive experiments show that JGEL has significant advantages in improving the performance of PAR and the challenging zero-shot PAR task.

AAAI Conference 2026 Conference Paper

Target Refocusing via Attention Redistribution for Open-Vocabulary Semantic Segmentation: An Explainability Perspective

  • Jiahao Li
  • Yang Lu
  • Yachao Zhang
  • Yong Xie
  • Fangyong Wang
  • Yuan Xie
  • Yanyun Qu

Open-vocabulary semantic segmentation (OVSS) employs pixel-level vision-language alignment to associate category-related prompts with corresponding pixels. A key challenge is enhancing the multimodal dense prediction capability, specifically this pixel-level multimodal alignment. Although existing methods achieve promising results by leveraging CLIP’s vision-language alignment, they rarely investigate the performance boundaries of CLIP for dense prediction from an interpretability mechanisms perspective. In this work, we systematically investigate CLIP's internal mechanisms and identify a critical phenomenon: analogous to human distraction, CLIP diverts significant attention resources from target regions to irrelevant tokens. Our analysis reveals that these tokens arise from dimension-specific over-activation; filtering them enhances CLIP's dense prediction performance. Consequently, we propose Refocusing CLIP (RF-CLIP), a training-free approach that emulates human distraction-refocusing behavior to redirect attention from distraction tokens back to target regions, thereby refining CLIP's multimodal alignment granularity. Our method achieves SOTA performance on eight benchmarks while maintaining high inference efficiency.

AAAI Conference 2025 Conference Paper

Asynchronous Federated Clustering with Unknown Number of Clusters

  • Yunfan Zhang
  • Yiqun Zhang
  • Yang Lu
  • Mengke Li
  • Xi Chen
  • Yiu-ming Cheung

Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters. To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown proper cluster numbers are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to clients as a learning medium and coordinates them across clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.

AAAI Conference 2025 Conference Paper

MaskViM: Domain Generalized Semantic Segmentation with State Space Models

  • Jiahao Li
  • Yang Lu
  • Yuan Xie
  • Yanyun Qu

Domain Generalized Semantic Segmentation (DGSS) aims to utilize segmentation model training on known source domains to make predictions on unknown target domains. Currently, there are two network architectures: one based on Convolutional Neural Networks (CNNs) and the other based on Visual Transformers (ViTs). However, both CNN-based and ViT-based DGSS methods face challenges: the former lacks a global receptive field, while the latter requires more computational demands. Drawing inspiration from State Space Models (SSMs), which not only possess a global receptive field but also maintain linear complexity, we propose SSM-based method for achieving DGSS. In this work, we first elucidate why does mask make sense in SSM-based DGSS and propose our mask learning mechanism. Leveraging this mechanism, we present our Mask Vision Mamba network (MaskViM), a model for SSM-based DGSS, and design our mask loss to optimize MaskViM. Our method achieves superior performance on four diverse DGSS setting, which demonstrates the effectiveness of our method.

NeurIPS Conference 2025 Conference Paper

NeurIPT: Foundation Model for Neural Interfaces

  • Zitao Fang
  • Chenxuan Li
  • Hongting Zhou
  • Shuyang Yu
  • Guodong DU
  • Ashwaq Qasem
  • Yang Lu
  • Jing Li

Electroencephalography (EEG) has wide-ranging applications, from clinical diagnosis to brain-computer interfaces (BCIs). With the increasing volume and variety of EEG data, there has been growing interest in establishing foundation models (FMs) to scale up and generalize neural decoding. Despite showing early potential, applying FMs to EEG remains challenging due to substantial inter-subject, inter-task, and inter-condition variability, as well as diverse electrode configurations across recording setups. To tackle these open challenges, we propose NeurIPT, a foundation model tailored for diverse EEG-based Neur al I nterfaces with a P re-trained T ransformer by capturing both homogeneous and heterogeneous spatio-temporal characteristics inherent in EEG signals. Temporally, we introduce Amplitude-Aware Masked Pretraining (AAMP), masking based on signal amplitude rather than random intervals, to learn robust representations across varying signal intensities beyond local interpolation. Moreover, this temporal representation is enhanced by a progressive Mixture-of-Experts (MoE) architecture, where specialized expert subnetworks are progressively introduced at deeper layers, adapting effectively to the diverse temporal characteristics of EEG signals. Spatially, NeurIPT leverages the 3D physical coordinates of electrodes, enabling effective transfer across varying EEG settings, and develops Intra-Inter Lobe Pooling (IILP) during fine-tuning to efficiently exploit regional brain features. Empirical evaluations across nine downstream BCI datasets, via fine-tuning and training from scratch, demonstrated NeurIPT consistently achieved state-of-the-art performance, highlighting its broad applicability and robust generalization. Our work pushes forward the state of FMs in EEG and offers insights into scalable and generalizable neural information processing systems.

NeurIPS Conference 2025 Conference Paper

Progressive Data Dropout: An Embarrassingly Simple Approach to Train Faster

  • Shriram M S
  • Xinyue Hao
  • Shihao Hou
  • Yang Lu
  • Laura Sevilla-Lara
  • Anurag Arnab
  • Shreyank Gowda

The success of the machine learning field has reliably depended on training on large datasets. While effective, this trend comes at an extraordinary cost. This is due to two deeply intertwined factors: the size of models and the size of datasets. While promising research efforts focus on reducing the size of models, the other half of the equation remains fairly mysterious. Indeed, it is surprising that the standard approach to training remains to iterate over and over, uniformly sampling the training dataset. In this paper we explore a series of alternative training paradigms that leverage insights from hard-data-mining and dropout, simple enough to implement and use that can become the new training standard. The proposed Progressive Data Dropout reduces the number of effective epochs to as little as 12. 4\% of the baseline. This savings actually do not come at any cost for accuracy. Surprisingly, the proposed method improves accuracy by up to 4. 82\%. Our approach requires no changes to model architecture or optimizer, and can be applied across standard training pipelines, thus posing an excellent opportunity for wide adoption. Code can be found here: \url{https: //github. com/bazyagami/LearningWithRevision}.

NeurIPS Conference 2025 Conference Paper

Unlocker: Disentangle the Deadlock of Learning between Label-noisy and Long-tailed Data

  • shu chen
  • HongJun Xu
  • Ruichi Zhang
  • Mengke Li
  • Yonggang Zhang
  • Yang Lu
  • Bo Han
  • Yiu-ming Cheung

In real world, the observed label distribution of a dataset often mismatches its true distribution due to noisy labels. In this situation, noisy labels learning (NLL) methods directly integrated with long-tail learning (LTL) methods tend to fail due to a dilemma: NLL methods normally rely on unbiased model predictions to recover true distribution by selecting and correcting noisy labels; while LTL methods like logit adjustment depends on true distributions to adjust biased predictions, leading to a deadlock of mutual dependency defined in this paper. To address this, we propose \texttt{Unlocker}, a bilevel optimization framework that integrates NLL methods and LTL methods to iteratively disentangle this deadlock. The inner optimization leverages NLL to train the model, incorporating LTL methods to fairly select and correct noisy labels. The outer optimization adaptively determines an adjustment strength, mitigating model bias from over- or under-adjustment. We also theoretically prove that this bilevel optimization problem is convergent by transferring the outer optimization target to an equivalent problem with a closed-form solution. Extensive experiments on synthetic and real-world datasets demonstrate the effectiveness of our method in alleviating model bias and handling long-tailed noisy label data. Code is available at \url{https: //anonymous. 4open. science/r/neurips-2025-anonymous-1015/}.

ICRA Conference 2025 Conference Paper

Versatile Distributed Maneuvering With Generalized Formations Using Guiding Vector Fields

  • Yang Lu
  • Sha Luo
  • Pengming Zhu
  • Weijia Yao
  • Héctor García de Marina
  • Xinglong Zhang
  • Xin Xu 0001

This paper presents a unified approach to realize versatile distributed maneuvering with generalized formations. Specifically, we decompose the robots' maneuvers into two independent components, i. e. , interception and enclosing, which are parameterized by two independent virtual coordinates. Treating these two virtual coordinates as dimensions of an abstract manifold, we derive the corresponding singularity-free guiding vector field (GVF), which, along with a distributed coordination mechanism based on the consensus theory, guides robots to achieve various motions (i. e. , versatile maneuvering), including (a) formation tracking, (b) target enclosing, and (c) circumnavigation. Additional motion parameters can generate more complex cooperative robot motions. Based on GVFs, we design a controller for a nonholonomic robot model. Besides the theoretical results, extensive simulations and experiments are performed to validate the effectiveness of the approach.

AAAI Conference 2024 Conference Paper

CLIP-Guided Federated Learning on Heterogeneity and Long-Tailed Data

  • Jiangming Shi
  • Shanshan Zheng
  • Xiangbo Yin
  • Yang Lu
  • Yuan Xie
  • Yanyun Qu

Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier. Extensive experimental results on several benchmarks demonstrate that CLIP2FL achieves impressive performance and effectively deals with data heterogeneity and long-tail distribution. The code is available at https://github.com/shijiangming1/CLIP2FL.

IJCAI Conference 2024 Conference Paper

Dynamically Anchored Prompting for Task-Imbalanced Continual Learning

  • Chenxing Hong
  • Yan Jin
  • Zhiqi Kang
  • Yizhou Chen
  • Mengke Li
  • Yang Lu
  • Hanzi Wang

Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4. 5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https: //github. com/chenxing6666/DAP.

AAAI Conference 2024 Conference Paper

Feature Fusion from Head to Tail for Long-Tailed Visual Recognition

  • Mengke Li
  • Zhikai Hu
  • Yang Lu
  • Weichao Lan
  • Yiu-ming Cheung
  • Hui Huang

The imbalanced distribution of long-tailed data presents a considerable challenge for deep learning models, as it causes them to prioritize the accurate classification of head classes but largely disregard tail classes. The biased decision boundary caused by inadequate semantic information in tail classes is one of the key factors contributing to their low recognition accuracy. To rectify this issue, we propose to augment tail classes by grafting the diverse semantic information from head classes, referred to as head-to-tail fusion (H2T). We replace a portion of feature maps from tail classes with those belonging to head classes. These fused features substantially enhance the diversity of tail classes. Both theoretical analysis and practical experimentation demonstrate that H2T can contribute to a more optimized solution for the decision boundary. We seamlessly integrate H2T in the classifier adjustment stage, making it a plug-and-play module. Its simplicity and ease of implementation allow for smooth integration with existing long-tailed recognition methods, facilitating a further performance boost. Extensive experiments on various long-tailed benchmarks demonstrate the effectiveness of the proposed H2T. The source code is available at https://github.com/Keke921/H2T.

AAAI Conference 2024 Conference Paper

Federated Learning with Extremely Noisy Clients via Negative Distillation

  • Yang Lu
  • Lin Chen
  • Yonggang Zhang
  • Yiliang Zhang
  • Bo Han
  • Yiu-ming Cheung
  • Hanzi Wang

Federated learning (FL) has shown remarkable success in cooperatively training deep models, while typically struggling with noisy labels. Advanced works propose to tackle label noise by a re-weighting strategy with a strong assumption, i.e., mild label noise. However, it may be violated in many real-world FL scenarios because of highly contaminated clients, resulting in extreme noise ratios, e.g., >90%. To tackle extremely noisy clients, we study the robustness of the re-weighting strategy, showing a pessimistic conclusion: minimizing the weight of clients trained over noisy data outperforms re-weighting strategies. To leverage models trained on noisy clients, we propose a novel approach, called negative distillation (FedNed). FedNed first identifies noisy clients and employs rather than discards the noisy clients in a knowledge distillation manner. In particular, clients identified as noisy ones are required to train models using noisy labels and pseudo-labels obtained by global models. The model trained on noisy labels serves as a ‘bad teacher’ in knowledge distillation, aiming to decrease the risk of providing incorrect information. Meanwhile, the model trained on pseudo-labels is involved in model aggregation if not identified as a noisy client. Consequently, through pseudo-labeling, FedNed gradually increases the trustworthiness of models trained on noisy clients, while leveraging all clients for model aggregation through negative distillation. To verify the efficacy of FedNed, we conduct extensive experiments under various settings, demonstrating that FedNed can consistently outperform baselines and achieve state-of-the-art performance.

NeurIPS Conference 2024 Conference Paper

Improving Visual Prompt Tuning by Gaussian Neighborhood Minimization for Long-Tailed Visual Recognition

  • Mengke Li
  • Ye Liu
  • Yang Lu
  • Yiqun Zhang
  • Yiu-ming Cheung
  • Hui Huang

Long-tailed visual recognition has received increasing attention recently. Despite fine-tuning techniques represented by visual prompt tuning (VPT) achieving substantial performance improvement by leveraging pre-trained knowledge, models still exhibit unsatisfactory generalization performance on tail classes. To address this issue, we propose a novel optimization strategy called Gaussian neighborhood minimization prompt tuning (GNM-PT), for VPT to address the long-tail learning problem. We introduce a novel Gaussian neighborhood loss, which provides a tight upper bound on the loss function of data distribution, facilitating a flattened loss landscape correlated to improved model generalization. Specifically, GNM-PT seeks the gradient descent direction within a random parameter neighborhood, independent of input samples, during each gradient update. Ultimately, GNM-PT enhances generalization across all classes while simultaneously reducing computational overhead. The proposed GNM-PT achieves state-of-the-art classification accuracies of 90. 3%, 76. 5%, and 50. 1% on benchmark datasets CIFAR100-LT (IR 100), iNaturalist 2018, and Places-LT, respectively. The source code is available at https: //github. com/Keke921/GNM-PT.

NeurIPS Conference 2024 Conference Paper

Relationship Prompt Learning is Enough for Open-Vocabulary Semantic Segmentation

  • Jiahao Li
  • Yang Lu
  • Yuan Xie
  • Yanyun Qu

Open-vocabulary semantic segmentation (OVSS) aims to segment unseen classes without corresponding labels. Existing Vision-Language Model (VLM)-based methods leverage VLM's rich knowledge to enhance additional explicit segmentation-specific networks, yielding competitive results, but at the cost of extensive training cost. To reduce the cost, we attempt to enable VLM to directly produce the segmentation results without any segmentation-specific networks. Prompt learning offers a direct and parameter-efficient approach, yet it falls short in guiding VLM for pixel-level visual classification. Therefore, we propose the ${\bf R}$elationship ${\bf P}$rompt ${\bf M}$odule (${\bf RPM}$), which generates the relationship prompt that directs VLM to extract pixel-level semantic embeddings suitable for OVSS. Moreover, RPM integrates with VLM to construct the ${\bf R}$elationship ${\bf P}$rompt ${\bf N}$etwork (${\bf RPN}$), achieving OVSS without any segmentation-specific networks. RPN attains state-of-the-art performance with merely about ${\bf 3M}$ trainable parameters (2\% of total parameters).

ICRA Conference 2023 Conference Paper

DDK: A Deep Koopman Approach for Longitudinal and Lateral Control of Autonomous Ground Vehicles

  • Yongqian Xiao
  • Xinglong Zhang
  • Xin Xu 0001
  • Yang Lu
  • Junxiang Li

Autonomous driving has attracted lots of attention in recent years. For some tasks, e. g. , trajectory prediction, motion planning, and trajectory tracking, an accurate vehicle model can reduce the difficulty of these tasks and improve task completion performance. Prior works focused on parameter estimation of physical models or modeling nonlinear dynamics using neural networks. Still, these methods rely on internal parameters of vehicles or are not friendly for control due to the strong nonlinearity of models. This paper proposes a data-driven method to approximate vehicle dynamics based on the Koopman operator. The resulting model is an interpretable linear time-invariant model, facilitating controller design and solving related optimization problems. In the proposed approach, the state transition matrix is constructed based on the learned Koopman eigenvalues, while the input matrix is trained as a tensor. Based on the resulting model, a linear model predictive controller is designed to implement coupled longitudinal and lateral trajectory tracking. Simulations and experiments, including vehicle dynamics modeling and coupled longitudinal and lateral trajectory tracking, are performed in a high-fidelity CarSim environment and a real vehicle platform. An oil-driven D-Class SUV is selected in the simulation, while a real electric SUV is utilized in the experiment. Simulation and experiment results illustrate that the model of the nonlinear vehicle dynamics can be identified effectively via the proposed method, and high-quality trajectory tracking performance can be obtained with the resulting model.

AAAI Conference 2023 Conference Paper

Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping

  • Ping Xue
  • Yang Lu
  • Jingfei Chang
  • Xing Wei
  • Zhen Wei

Network binarization (i.e., binary neural networks, BNNs) can efficiently compress deep neural networks and accelerate model inference but cause severe accuracy degradation. Existing BNNs are mainly implemented based on the commonly used full-precision network backbones, and then the accuracy is improved with various techniques. However, there is a question of whether the full-precision network backbone is well adapted to BNNs. We start from the factors of the performance degradation of BNNs and analyze the problems of directly using full-precision network backbones for BNNs: for a given computational budget, the backbone of a BNN may need to be shallower and wider compared to the backbone of a full-precision network. With this in mind, Depth-Width Reshaping (DWR) is proposed to reshape the depth and width of existing full-precision network backbones and further optimize them by incorporating pruning techniques to better fit the BNNs. Extensive experiments demonstrate the analytical result and the effectiveness of the proposed method. Compared with the original backbones, the DWR backbones constructed by the proposed method result in close to O(√s) decrease in activations, while achieving an absolute accuracy increase by up to 1.7% with comparable computational cost. Besides, by using the DWR backbones, existing methods can achieve new state-of-the-art (SOTA) accuracy (e.g., 67.2% on ImageNet with ResNet-18 as the original backbone). We hope this work provides a novel insight into the backbone design of BNNs. The code is available at https://github.com/pingxue-hfut/DWR.

IJCAI Conference 2022 Conference Paper

Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features

  • Xinyi Shang
  • Yang Lu
  • Gang Huang
  • Hanzi Wang

Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail distribution, which frequently appears in real FL applications. In this paper, we reveal an intriguing fact that the biased classifier is the primary factor leading to the poor performance of the global model. Motivated by the above finding, we propose a novel and privacy-preserving FL method for heterogeneous and long-tailed data via Classifier Re-training with Federated Features (CReFF). The classifier re-trained on federated features can produce comparable performance as the one re-trained on real data in a privacy-preserving manner without information leakage of local data or class distribution. Experiments on several benchmark datasets show that the proposed CReFF is an effective solution to obtain a promising FL model under heterogeneous and long-tailed data. Comparative results with the state-of-the-art FL methods also validate the superiority of CReFF. Our code is available at https: //github. com/shangxinyi/CReFF-FL.

AAAI Conference 2018 Conference Paper

Cooperative Learning of Energy-Based Model and Latent Variable Model via MCMC Teaching

  • Jianwen Xie
  • Yang Lu
  • Ruiqi Gao
  • Ying Nian Wu

This paper proposes a cooperative learning algorithm to train both the undirected energy-based model and the directed latent variable model jointly. The learning algorithm interweaves the maximum likelihood algorithms for learning the two models, and each iteration consists of the following two steps: (1) Modified contrastive divergence for energy-based model: The learning of the energy-based model is based on the contrastive divergence, but the finite-step MCMC sampling of the model is initialized from the synthesized examples generated by the latent variable model instead of being initialized from the observed examples. (2) MCMC teaching of the latent variable model: The learning of the latent variable model is based on how the MCMC in (1) changes the initial synthesized examples generated by the latent variable model, where the latent variables that generate the initial synthesized examples are known so that the learning is essentially supervised. Our experiments show that the cooperative learning algorithm can learn realistic models of images.

NeurIPS Conference 2018 Conference Paper

DeepPINK: reproducible feature selection in deep neural networks

  • Yang Lu
  • Yingying Fan
  • Jinchi Lv
  • William Stafford Noble

Deep learning has become increasingly popular in both supervised and unsupervised machine learning thanks to its outstanding empirical performance. However, because of their intrinsic complexity, most deep learning methods are largely treated as black box tools with little interpretability. Even though recent attempts have been made to facilitate the interpretability of deep neural networks (DNNs), existing methods are susceptible to noise and lack of robustness. Therefore, scientists are justifiably cautious about the reproducibility of the discoveries, which is often related to the interpretability of the underlying statistical models. In this paper, we describe a method to increase the interpretability and reproducibility of DNNs by incorporating the idea of feature selection with controlled error rate. By designing a new DNN architecture and integrating it with the recently proposed knockoffs framework, we perform feature selection with a controlled error rate, while maintaining high power. This new method, DeepPINK (Deep feature selection using Paired-Input Nonlinear Knockoffs), is applied to both simulated and real data sets to demonstrate its empirical utility.

AAMAS Conference 2018 Conference Paper

Protecting Election from Bribery: New Approach and Computational Complexity Characterization

  • Lin Chen
  • Lei Xu
  • Shouhuai Xu
  • Zhimin Gao
  • Nolan Shah
  • Yang Lu
  • Weidong Shi

The bribery problem in elections has received a considerable amount of attention. In this paper, we initiate the study of a related, but new problem, the protection problem, namely protecting elections from bribery. In this problem, there is a defender who is given a defense budget and can use the budget to award some of the voters such that they cannot be bribed anymore. This naturally leads to the following bi-level decision problem: Is it possible for the defender with a given defense budget to protect an election from being manipulated by the attacker with a given attack budget for bribing voters? We characterize the computational complexity of the protection problem. We show that it is in general significantly harder than the bribery problem. However, the protection problem can be solved, under certain circumstances, in polynomial time.

AAAI Conference 2017 Conference Paper

Alternating Back-Propagation for Generator Network

  • Tian Han
  • Yang Lu
  • Song-Chun Zhu
  • Ying Nian Wu

This paper proposes an alternating back-propagation algorithm for learning the generator network model. The model is a nonlinear generalization of factor analysis. In this model, the mapping from the continuous latent factors to the observed signal is parametrized by a convolutional neural network. The alternating back-propagation algorithm iterates the following two steps: (1) Inferential back-propagation, which infers the latent factors by Langevin dynamics or gradient descent. (2) Learning back-propagation, which updates the parameters given the inferred latent factors by gradient descent. The gradient computations in both steps are powered by back-propagation, and they share most of their code in common. We show that the alternating back-propagation algorithm can learn realistic generator models of natural images, video sequences, and sounds. Moreover, it can also be used to learn from incomplete or indirect training data.

IJCAI Conference 2017 Conference Paper

Dynamic Weighted Majority for Incremental Learning of Imbalanced Data Streams with Concept Drift

  • Yang Lu
  • Yiu-ming Cheung
  • Yuan Yan Tang

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i. e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.

AAAI Conference 2016 Conference Paper

Learning FRAME Models Using CNN Filters

  • Yang Lu
  • Song-Chun Zhu
  • Ying Wu

The convolutional neural network (ConvNet or CNN) has proven to be very successful in many tasks such as those in computer vision. In this conceptual paper, we study the generative perspective of the discriminative CNN. In particular, we propose to learn the generative FRAME (Filters, Random field, And Maximum Entropy) model using the highly expressive filters pre-learned by the CNN at the convolutional layers. We show that the learning algorithm can generate realistic and rich object and texture patterns in natural scenes. We explain that each learned model corresponds to a new CNN unit at a layer above the layer of filters employed by the model. We further show that it is possible to learn a new layer of CNN units using a generative CNN model, which is a product of experts model, and the learning algorithm admits an EM interpretation with binary latent variables.

ICRA Conference 2015 Conference Paper

Precise quadrotor autonomous landing with SRUKF vision perception

  • Shuo Yang
  • Jiahang Ying
  • Yang Lu
  • Zexiang Li 0001

We present an autonomous quadrotor system that is able to perform high precision landing on small platform in both indoor and outdoor environment. Its taking off and landing processes are fully autonomous. We use vision sensor to detect the landing platform, and the vision measurement is enhanced by IMU with SRUKF based sensor fusion method. All computation are done real-time and on-board. We implement the system and carry a series of experiments under various environmental conditions. The experiment results confirm the robustness and precision of our system in real use cases.