Author name cluster

Junjie Yan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu
Zhejun Jiang
Jingyuan Liu
Yulun Du
Tao Jiang
Chao Hong
Shaowei Liu
Weiran He

Scaling the effective context length is essential for advancing large language models (LLMs) toward artificial general intelligence (AGI). However, the quadratic increase in computational complexity inherent in traditional attention mechanisms presents a prohibitive overhead. Existing approaches either impose strongly biased structures, such as sink or window attention which are task-specific, or radically modify the attention mechanism into linear approximations, whose performance in complex reasoning tasks remains inadequately explored. In this work, we propose a solution that adheres to the ``less structure'' principle, allowing the model to determine where to attend autonomously, rather than introducing predefined biases. We introduce Mixture of Block Attention (MoBA), an innovative approach that applies the principles of Mixture of Experts (MoE) to the attention mechanism. This novel architecture demonstrates superior performance on long-context tasks while offering a key advantage: the ability to seamlessly transition between full and sparse attention, enhancing efficiency without the risk of compromising performance. MoBA has already been deployed to handle actual production workloads with long-context requirements, demonstrating significant advancements in efficient attention computation for LLMs. Our code is available at https: //github. com/MoonshotAI/MoBA.

PDF Details

ICLR Conference 2022 Conference Paper

cosFormer: Rethinking Softmax In Attention

Zhen Qin 0003
Weixuan Sun
Hui Deng
Dongxu Li 0003
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong

Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.

Details

ICLR Conference 2022 Conference Paper

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Yangguang Li 0001
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
Fengwei Yu
Junjie Yan

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks. However, CLIP is quite data-hungry and requires 400M image-text pairs for pre-training, thereby restricting its adoption. This work proposes a novel training paradigm, Data efficient CLIP (DeCLIP), to alleviate this limitation. We demonstrate that by carefully utilizing the widespread supervision among the image-text pairs, our De-CLIP can learn generic visual features more efficiently. Instead of using the single image-text contrastive supervision, we fully exploit data potential through the use of (1) self-supervision within each modality; (2) multi-view supervision across modalities; (3) nearest-neighbor supervision from other similar pairs. Benefiting from intrinsic supervision, our DeCLIP-ResNet50 can achieve 60.4% zero-shot top1 accuracy on ImageNet, which is 0.8% above the CLIP-ResNet50 while using 7.1×fewer data. Our DeCLIP-ResNet50 outperforms its counterpart in 8 out of 11 visual datasets when transferred to downstream tasks. Moreover, Scaling up the model and computing also works well in our framework.

Details

ICML Conference 2021 Conference Paper

AutoSampling: Search for Effective Data Sampling Schedules

Ming Sun 0008
Haoxuan Dou
Baopu Li
Junjie Yan
Wanli Ouyang
Lei Cui

Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to its inherent high-dimension as a hyper-parameter. In this paper, we propose an AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local sampling schedules and the exploration step for the ideal sampling distribution. More specifically, we achieve sampling schedule search with shortened exploitation cycle to provide enough supervision. In addition, we periodically estimate the sampling distribution from the learned sampling schedules and perturb it to search in the distribution space. The combination of two searches allows us to learn a robust sampling schedule. We apply our AutoSampling method to a variety of image classification tasks illustrating the effectiveness of the proposed method.

Details

AAAI Conference 2021 Conference Paper

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation

Haisheng Su
Weihao Gan
Wei Wu
Yu Qiao
Junjie Yan

Generating human action proposals in untrimmed videos is an important yet challenging task with wide applications. Current methods often suffer from the noisy boundary locations and the inferior quality of confidence scores used for proposal retrieving. In this paper, we present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation. First, we propose a novel boundary regressor based on the complementary characteristics of both starting and ending boundary classifiers. Specifically, we utilize the Ushaped architecture with nested skip connections to capture rich contexts and introduce bi-directional boundary matching mechanism to improve boundary precision. Second, to account for the proposal-proposal relations ignored in previous methods, we devise a proposal relation block to which includes two self-attention modules from the aspects of position and channel. Furthermore, we find that there inevitably exists data imbalanced problems in the positive/negative proposals and temporal durations, which harm the model performance on tail distributions. To relieve this issue, we introduce the scale-balanced re-sampling strategy. Extensive experiments are conducted on two popular benchmarks: ActivityNet-1. 3 and THUMOS14, which demonstrate that BSN++ achieves the state-of-the-art performance. Not surprisingly, the proposed BSN++ ranked 1st place in the CVPR19 - ActivityNet challenge leaderboard on temporal action localization task.

PDF Details

AAAI Conference 2021 Conference Paper

Context-Aware Graph Convolution Network for Target Re-identification

Deyi Ji
Haoran Wang
Hanzhe Hu
Weihao Gan
Wei Wu
Junjie Yan

Most existing re-identification methods focus on learning robust and discriminative features with deep convolution networks. However, many of them consider content similarity separately and fail to utilize the context information of the query and gallery sets, e. g. probe-gallery and gallery-gallery relations, thus hard samples may not be well solved due to the limited or even misleading information. In this paper, we present a novel Context-Aware Graph Convolution Network (CAGCN), where the probe-gallery relations are encoded into the graph nodes and the graph edge connections are well controlled by the gallery-gallery relations. In this way, hard samples can be addressed with the context information flows among other easy samples during the graph reasoning. Specifically, we adopt an effective hard gallery sampler to obtain high recall for positive samples while keeping a reasonable graph size, which can also weaken the imbalanced problem in training process with low computation complexity. Experiments show that the proposed method achieves state-of-the-art performance on both person and vehicle reidentification datasets in a plug and play fashion with limited overhead.

PDF Details

NeurIPS Conference 2021 Conference Paper

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Yuhang Li
Mingzhu Shen
Jian Ma
Yan Ren
Mingxin Zhao
Qi Zhang
Ruihao Gong
Fengwei Yu

Model quantization has emerged as an indispensable technique to accelerate deep learning inference. Although researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning up the training settings, we find existing algorithms have about-the-same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap and still a long way to go. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.

PDF Details

AAAI Conference 2021 Conference Paper

SSN3D: Self-Separated Network to Align Parts for 3D Convolution in Video Person Re-Identification

Xiaoke Jiang
Yu Qiao
Junjie Yan
Qichen Li
Wanrong Zheng
Dapeng Chen

Temporal appearance misalignment is a crucial problem in video person re-identification. The same part of person (e. g. head or hand) appearing on different locations in video sequence weakens its discriminative ability, especially when we apply standard temporal aggregation such as 3D convolution or LSTM. To address this issue, we propose Self-Separated network (SSN) to seek out the same parts in different images. As the name implies, SSN, if trained in an unsupervised strategy, guarantees the selected parts distinct. With a few samples of labeled parts to guide SSN training, this semi-supervised trained SSN seeks out the parts that are human-understandable within a frame and stable across a video snippet. Given the distinct and stable person parts, rather than performing aggregation on features, we then apply 3D convolution across different frames for person re-identification. This SSN + 3D pipeline, dubbed SSN3D, is proved to be efficient through extensive experiments on both synthetic and real data.

PDF Details

ICLR Conference 2020 Conference Paper

Computation Reallocation for Object Detection

Feng Liang
Chen Lin 0003
Ronghao Guo
Ming Sun 0008
Wei Wu 0021
Junjie Yan
Wanli Ouyang

The allocation of computation resources in the backbone is a crucial issue in object detection. However, classification allocation pattern is usually adopted directly to object detector, which is proved to be sub-optimal. In order to reallocate the engaged computation resources in a more efficient way, we present CR-NAS (Computation Reallocation Neural Architecture Search) that can learn computation reallocation strategies across different feature resolution and spatial position diectly on the target detection dataset. A two-level reallocation space is proposed for both stage and spatial reallocation. A novel hierarchical search procedure is adopted to cope with the complex search space. We apply CR-NAS to multiple backbones and achieve consistent improvements. Our CR-ResNet50 and CR-MobileNetV2 outperforms the baseline by 1.9% and 1.7% COCO AP respectively without any additional computation budget. The models discovered by CR-NAS can be equiped to other powerful detection neck/head and be easily transferred to other dataset, e.g. PASCAL VOC, and other vision tasks, e.g. instance segmentation. Our CR-NAS can be used as a plugin to improve the performance of various networks, which is demanding.

Details

NeurIPS Conference 2020 Conference Paper

Improving Auto-Augment via Augmentation-Wise Weight Sharing

Keyu Tian
Chen Lin
Ming Sun
Luping Zhou
Junjie Yan
Wanli Ouyang

The recent progress on automatically searching augmentation policies has boosted the performance substantially for various tasks. A key component of automatic augmentation search is the evaluation process for a particular augmentation policy, which is utilized to return reward and usually runs thousands of times. A plain evaluation process, which includes full model training and validation, would be time-consuming. To achieve efficiency, many choose to sacrifice evaluation reliability for speed. In this paper, we dive into the dynamics of augmented training of the model. This inspires us to design a powerful and efficient proxy task based on the Augmentation-Wise Weight Sharing (AWS) to form a fast yet accurate evaluation process in an elegant way. Comprehensive analysis verifies the superiority of this approach in terms of effectiveness and efficiency. The augmentation policies found by our method achieve superior accuracies compared with existing auto-augmentation search methods. On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data. On ImageNet, we get a top-1 error rate of 20. 36% for ResNet-50, which leads to 3. 34% absolute error rate reduction over the baseline augmentation.

PDF Details

AAAI Conference 2020 Conference Paper

Learning to Auto Weight: Entirely Data-Driven and Highly Efficient Weighting Framework

Zhenmao Li
Yichao Wu
Ken Chen
Yudong Wu
Shunfeng Zhou
Jiaheng Liu
Junjie Yan

Example weighting algorithm is an effective solution to the training bias problem, however, most previous typical methods are usually limited to human knowledge and require laborious tuning of hyperparameters. In this paper, we propose a novel example weighting framework called Learning to Auto Weight (LAW). The proposed framework ﬁnds step-dependent weighting policies adaptively, and can be jointly trained with target networks without any assumptions or prior knowledge about the dataset. It consists of three key components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge searching space in a complete training process; Duplicate Network Reward (DNR) gives more accurate supervision by removing randomness during the searching process; Full Data Update (FDU) further improves the updating efﬁciency. Experimental results demonstrate the superiority of weighting policy explored by LAW over standard training pipeline. Compared with baselines, LAW can ﬁnd a better weighting schedule which achieves much more superior accuracy on both biased CIFAR and ImageNet.

PDF Details

ICLR Conference 2020 Conference Paper

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

Junjie Yan
Ruosi Wan
Xiangyu Zhang 0005
Wei Zhang 0016
Yichen Wei
Jian Sun 0001

Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.

Details

NeurIPS Conference 2019 Conference Paper

Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

Junran Peng
Ming Sun
ZHAO-XIANG ZHANG
Tieniu Tan
Junjie Yan

Recently, Neural Architecture Search has achieved great success in large-scale image classification. In contrast, there have been limited works focusing on architecture search for object detection, mainly because the costly ImageNet pretraining is always required for detectors. Training from scratch, as a substitute, demands more epochs to converge and brings no computation saving. To overcome this obstacle, we introduce a practical neural architecture transformation search(NATS) algorithm for object detection in this paper. Instead of searching and constructing an entire network, NATS explores the architecture space on the base of existing network and reusing its weights. We propose a novel neural architecture search strategy in channel-level instead of path-level and devise a search space specially targeting at object detection. With the combination of these two designs, an architecture transformation scheme could be discovered to adapt a network designed for image classification to task of object detection. Since our method is gradient-based and only searches for a transformation scheme, the weights of models pretrained in ImageNet could be utilized in both searching and retraining stage, which makes the whole process very efficient. The transformed network requires no extra parameters and FLOPs, and is friendly to hardware optimization, which is practical to use in real-time application. In experiments, we demonstrate the effectiveness of NATS on networks like {\em ResNet} and {\em ResNeXt}. Our transformed networks, combined with various detection frameworks, achieve significant improvements on the COCO dataset while keeping fast.

PDF Details

AAAI Conference 2018 Conference Paper

Accelerated Training for Massive Classification via Dynamic Class Selection

Xingcheng Zhang
Lei Yang
Junjie Yan
Dahua Lin

Massive classiﬁcation, a classiﬁcation task deﬁned over a vast number of classes (hundreds of thousands or even millions), has become an essential part of many real-world systems, such as face recognition. Existing methods, including the deep networks that achieved remarkable success in recent years, were mostly devised for problems with a moderate number of classes. They would meet with substantial difﬁculties, e. g. excessive memory demand and computational cost, when applied to massive problems. We present a new method to tackle this problem. This method can efﬁciently and accurately identify a small number of “active classes” for each mini-batch, based on a set of dynamic class hierarchies constructed on the ﬂy. We also develop an adaptive allocation scheme thereon, which leads to a better tradeoff between performance and cost. On several large-scale benchmarks, our method signiﬁcantly reduces the training cost and memory demand, while maintaining competitive performance.

PDF Details

NeurIPS Conference 2018 Conference Paper

Synaptic Strength For Convolutional Neural Network

Chen Lin
Zhao Zhong
Wu Wei
Junjie Yan

Convolutional Neural Networks(CNNs) are both computation and memory inten-sive which hindered their deployment in mobile devices. Inspired by the relevantconcept in neural science literature, we propose Synaptic Pruning: a data-drivenmethod to prune connections between input and output feature maps with a newlyproposed class of parameters called Synaptic Strength. Synaptic Strength is de-signed to capture the importance of a connection based on the amount of informa-tion it transports. Experiment results show the effectiveness of our approach. OnCIFAR-10, we prune connections for various CNN models with up to96%, whichresults in significant size reduction and computation saving. Further evaluation onImageNet demonstrates that synaptic pruning is able to discover efficient modelswhich is competitive to state-of-the-art compact CNNs such as MobileNet-V2andNasNet-Mobile. Our contribution is summarized as following: (1) We introduceSynaptic Strength, a new class of parameters for CNNs to indicate the importanceof each connections. (2) Our approach can prune various CNNs with high com-pression without compromising accuracy. (3) Further investigation shows, theproposed Synaptic Strength is a better indicator for kernel pruning compared withthe previous approach in both empirical result and theoretical analysis.

PDF Details

IROS Conference 2015 Conference Paper

Haptic passwords

Junjie Yan
Kevin Huang 0001
Tamara Bonaci
Howard Jay Chizeck

Haptic technologies have made it possible for human users to interact with cyber systems not only via traditional interfaces like keyboards and mice but also by applying force and motion. With these extra information channels, how a user haptically interacts with a system potentially presents unique user dependent features and can thus be used for authentication purposes. In this paper, we propose a new biometric technology based on haptic interaction. Our technique leverages artificial neural network (ANN) based wavelet analysis to perform user identification. Identification and authentication are done in two steps: a discrete wavelet transform (DWT) is applied to extract features, and then the neural network is used to perform identification and authentication. The performance of the model is evaluated based on identification and authentication accuracies. The results show that our proposed haptic password system has a high identification accuracy and that it is resistant to forgery attacks.

Details