Author name cluster

Tan Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

AAAI Conference 2025 Conference Paper

KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing

Shu Zhao
Tan Yu
Xiaoshuai Hao
Wenchao Ma
Vijaykrishnan Narayanan

Deep hashing has been widely used for large-scale approximate nearest neighbor search due to its storage and search efficiency. However, existing deep hashing methods predominantly rely on abundant training data, leaving the more challenging scenario of low-resource adaptation for deep hashing relatively underexplored. This setting involves adapting pre-trained models to downstream tasks with only an extremely small number of training samples available. Our preliminary benchmarks reveal that current methods suffer significant performance degradation due to the distribution shift caused by limited training samples. To address these challenges, we introduce Class-Calibration LoRA (CLoRA), a novel plug-and-play approach that dynamically constructs low-rank adaptation matrices by leveraging class-level textual knowledge embeddings. CLoRA effectively incorporates prior class knowledge as anchors, enabling parameter-efficient fine-tuning while maintaining the original data distribution. Furthermore, we propose Knowledge-Guided Discrete Optimization (KIDDO), a framework to utilize class knowledge to compensate for the scarcity of visual information and enhance the discriminability of hash codes. Extensive experiments demonstrate that our proposed method, Knowledge- Anchored Low-Resource Adaptation Hashing (KALAHash), significantly boosts retrieval performance and achieves a 4× data efficiency in low-resource scenarios.

PDF Details DOI

ICLR Conference 2022 Conference Paper

Constructing Orthogonal Convolutions in an Explicit Manner

Tan Yu
Jun Li 0098
Yunfeng Cai
Ping Li 0001

Convolutions with orthogonal input-output Jacobian matrix, i.e., orthogonal convolution, have recently attracted substantial attention. A convolution layer with an orthogonal Jacobian matrix is 1-Lipschitz in the 2-norm, making the output robust to the perturbation in input. Meanwhile, an orthogonal Jacobian matrix preserves the gradient norm in back-propagation, which is critical for stable training deep networks. Nevertheless, existing orthogonal convolutions are burdened by high computational costs for preserving orthogonality. In this work, we exploit the relation between the singular values of the convolution layer's Jacobian and the structure of the convolution kernel. To achieve orthogonality, we explicitly construct the convolution kernel for enforcing all singular values of the convolution layer's Jacobian to be $1$s. After training, the explicitly constructed orthogonal (ECO) convolution is constructed only once, and their weights are stored. Then, in evaluation, we only need to load the stored weights of the trained ECO convolution, and the computational cost of ECO convolution is the same as the standard dilated convolution. It is more efficient than the recent state-of-the-art approach, skew orthogonal convolution (SOC) in evaluation. Experiments on CIFAR-10 and CIFAR-100 demonstrate that the proposed ECO convolution is faster than SOC in evaluation while leading to competitive standard and certified robust accuracies.

Details

AAAI Conference 2022 Conference Paper

Efficient Compact Bilinear Pooling via Kronecker Product

Tan Yu
Yunfeng Cai
Ping Li

Bilinear pooling has achieved excellent performance in finegrained recognition tasks. Nevertheless, high-dimensional bilinear features suffer from over-fitting and inefficiency. To alleviate these issues, compact bilinear pooling (CBP) methods were developed to generate low-dimensional features. Although the low-dimensional features from existing CBP methods enable high efficiency in subsequent classification, CBP methods themselves are inefficient. Thus, the inefficiency issue of the bilinear pooling is still unsolved. In this work, we propose an efficient compact bilinear pooling method to solve the inefficiency problem inherited in bilinear pooling thoroughly. It decomposes the huge-scale projection matrix into a two-level Kronecker product of several smallscale matrices. By exploiting the “vec trick” and the tensor modal product, we can obtain the compact bilinear feature through the decomposed projection matrices in a speedy manner. Systematic experiments on four public benchmarks using two backbones demonstrate the efficiency and effectiveness of the proposed method in fine-grained recognition.

PDF Details

UAI Conference 2021 Conference Paper

Efficient greedy coordinate descent via variable partitioning

Huang Fang
Guanhua Fang
Tan Yu
Ping Li 0001

Greedy coordinate descent (GCD) is an efficient optimization algorithm for a wide range of machine learning and data mining applications. GCD could be significantly faster than randomized coordinate descent (RCD) if they have similar per iteration cost. Nevertheless, in some cases, the greedy rule used in GCD cannot be efficiently implemented, leading to huge per iteration cost and making GCD slower than RCD. To alleviate the cost per iteration, the existing solutions rely on maximum inner product search (MIPS) as an approximate greedy rule. But it has been empirically shown that GCD with approximate greedy rule could suffer from slow convergence even with the state-of-the-art MIPS algorithms. We propose a hybrid coordinate descent algorithm with a simple variable partition strategy to tackle the cases when greedy rule cannot be implemented efficiently. The convergence rate and theoretical properties of the new algorithm are presented. The proposed method is shown to be especially useful when the data matrix has a group structure. Numerical experiments with both synthetic and real-world data demonstrate that our new algorithm is competitive against RCD, GCD, approximate GCD with MIPS and their accelerated variants.

Details

AAAI Conference 2021 Conference Paper

Fast and Compact Bilinear Pooling by Shifted Random Maclaurin

Tan Yu
Xiaoyun Li
Ping Li

Bilinear pooling has achieved an excellent performance in many computer vision tasks. However, the high-dimension features from bilinear pooling can sometimes be inefficient and prone to over-fitting. Random Maclaurin (RM) is a widely used GPU-friendly approximation method to reduce the dimensionality of bilinear features. However, to achieve good performance, huge projection matrices are usually required in practice, making it extremely costly in computation and memory. In this paper, we propose a Shifted Random Maclaurin (SRM) strategy for fast and compact bilinear pooling. With merely negligible extra computational cost, the proposed SRM provides an estimator with a provably smaller variance than RM, which benefits accurate kernel approximation and thus the learning performance. Using a small projection matrix, the proposed SRM achieves a comparable estimation performance as RM based on a large projection matrix, and thus considerably boosts the efficiency. Furthermore, we upgrade the proposed SRM to SRM+ to further improve the efficiency and make the compact bilinear pooling compatible with fast matrix normalization. Fast and Compact Bilinear Network (FCBN) built upon the proposed SRM+ is devised, achieving an end-to-end training. Systematic experiments conducted on four public datasets demonstrate the effectiveness and efficiency of the proposed FCBN.

PDF Details

AAAI Conference 2017 Conference Paper

Efficient Object Instance Search Using Fuzzy Objects Matching

Tan Yu
Yuwei Wu
Sreyasee Bhattacharjee
Junsong Yuan

Recently, global features aggregated from local convolutional features of the convolutional neural network have shown to be much more effective in comparison with hand-crafted features for image retrieval. However, the global feature might not effectively capture the relevance between the query object and reference images in the object instance search task, especially when the query object is relatively small and there exist multiple types of objects in reference images. Moreover, the object instance search requires to localize the object in the reference image, which may not be achieved through global representations. In this paper, we propose a Fuzzy Objects Matching (FOM) framework to effectively and efficiently capture the relevance between the query object and reference images in the dataset. In the proposed FOM scheme, object proposals are utilized to detect the potential regions of the query object in reference images. To achieve high search efficiency, we factorize the feature matrix of all the object proposals from one reference image into the product of a set of fuzzy objects and sparse codes. In addition, we refine the feature of the generated fuzzy objects according to its neighborhood in the feature space to generate more robust representation. The experimental results demonstrate that the proposed FOM framework significantly outperforms the state-of-the-art methods in precision with less memory and computational cost on three public datasets.

PDF Details

IJCAI Conference 2017 Conference Paper

Is My Object in This Video? Reconstruction-based Object Search in Videos

Tan Yu
Jingjing Meng
Junsong Yuan

This paper addresses the problem of video-level object instance search, which aims to retrieve the videos in the database that contain a given query object instance. Without prior knowledge about "when" and "where" an object of interest may appear in a video, determining "whether" a video contains the target object is computationally prohibitive, as it requires exhaustively matching the query against all possible spatial-temporal locations in each video that an object may appear. To alleviate the computational and memory cost, we propose the Reconstruction-based Object SEarch (ROSE) method. It characterizes a huge corpus of features of possible spatial-temporal locations in the video into the parameters of the reconstruction model. Since the memory cost of storing reconstruction model is much less than that of storing features of possible spatial-temporal locations in the video, the efficiency of the search is significantly boosted. Comprehensive experiments on three benchmark datasets demonstrate the promising performance of the proposed ROSE method.

PDF Details