Arrow Research search

Author name cluster

Weiping Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers
2 author rows

Possible papers

33

AAAI Conference 2026 Conference Paper

Test-time Prompt Intervention

  • Chenxu Yang
  • Qingyi Si
  • Mz Dai
  • Dingyu Yao
  • Mingyu Zheng
  • Minghui Chen
  • Zheng Lin
  • Weiping Wang

Test-time compute has led to remarkable success in the large language model (LLM) community, particularly for complex tasks, where longer chains of thought (CoTs) are generated to enhance reasoning capabilities. However, growing evidence reveals that such reasoning models often produce CoTs plagued by excessive redundancy, including repetitive verification steps and unnecessary reasoning shifts. The root cause lies in post-training of them that overly rely on outcome reward paradigms, as the data of process reward paradigms, which regulate intermediate reasoning steps, is difficult to construct at scale. To address this, we propose PI, a novel framework for Test-time Prompt Intervention. PI provides an interface to dynamically guide and regulate reasoning paths during inference through timely (When module) and proper (How module) interventions and post-intervention sampling (Which module). This allows human problem-solving expertise and cognitive science principles to be seamlessly integrated into LLMs’ reasoning processes, enhancing controllability and interpretability. Extensive experiments across multiple models and datasets demonstrate that PI significantly shortens CoTs while reducing hallucination, yielding more concise and reliable reasoning.

NeurIPS Conference 2025 Conference Paper

NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables

  • Lanrui Wang
  • Mingyu Zheng
  • Hongyin Tang
  • Zheng Lin
  • Yanan Cao
  • Jingang Wang
  • Xunliang Cai
  • Weiping Wang

Processing structured tabular data, particularly large and lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context benchmarks like Needle-in-a-Haystack primarily focus on unstructured text, neglecting the challenge of diverse structured tables. Meanwhile, previous tabular benchmarks mainly consider downstream tasks that require high-level reasoning abilities, and overlook models' underlying fine-grained perception of individual table cells, which is crucial for practical and robust LLM-based table applications. To address this gap, we introduce \textsc{NeedleInATable} (NIAT), a new long-context tabular benchmark that treats each table cell as a ``needle'' and requires models to extract the target cell based on cell locations or lookup questions. Our comprehensive evaluation of various LLMs and multimodal LLMs reveals a substantial performance gap between popular downstream tabular tasks and the simpler NIAT task, suggesting that they may rely on dataset-specific correlations or shortcuts to obtain better benchmark results but lack truly robust long-context understanding towards structured tables. Furthermore, we demonstrate that using synthesized NIAT training data can effectively improve performance on both NIAT task and downstream tabular tasks, which validates the importance of NIAT capability for LLMs' genuine table understanding ability. Our data, code and models will be released to facilitate future research.

AAAI Conference 2025 Conference Paper

Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models

  • Xiyu Liu
  • Zhengxiao Liu
  • Naibin Gu
  • Zheng Lin
  • Wanli Ma
  • Ji Xiang
  • Weiping Wang

The storage and recall of factual associations in auto-regressive transformer language models (LMs) have drawn a great deal of attention, inspiring knowledge editing by directly modifying the located model weights. Most editing works achieve knowledge editing under the guidance of existing interpretations of knowledge recall that mainly focus on subject knowledge. However, these interpretations are seriously flawed, neglecting relation information and leading to the *over-generalizing* problem for editing. In this work, we discover a novel relation-focused perspective to interpret the knowledge recall of transformer LMs during inference and apply it on single knowledge editing to avoid over-generalizing. Experimental results on the dataset supplemented with a new R-Specificity criterion demonstrate that our editing approach significantly alleviates over-generalizing while remaining competitive on other criteria, breaking the domination of subject-focused editing for future research.

NeurIPS Conference 2025 Conference Paper

SSTAG: Structure-Aware Self-Supervised Learning Method for Text-Attributed Graphs

  • Ruyue Liu
  • Rong Yin
  • Xiangzhen Bo
  • Xiaoshuai Hao
  • Yong Liu
  • Jinwen Zhong
  • Can Ma
  • Weiping Wang

Large-scale pre-trained models have revolutionized Natural Language Processing (NLP) and Computer Vision (CV), showcasing remarkable cross-domain generalization abilities. However, in graph learning, models are typically trained on individual graph datasets, limiting their capacity to transfer knowledge across different graphs and tasks. This approach also heavily relies on large volumes of annotated data, which presents a significant challenge in resource-constrained settings. Unlike NLP and CV, graph-structured data presents unique challenges due to its inherent heterogeneity, including domain-specific feature spaces and structural diversity across various applications. To address these challenges, we propose a novel structure-aware self-supervised learning method for Text-Attributed Graphs (SSTAG). By leveraging text as a unified representation medium for graph learning, SSTAG bridges the gap between the semantic reasoning of Large Language Models (LLMs) and the structural modeling capabilities of Graph Neural Networks (GNNs). Our approach introduces a dual knowledge distillation framework that co-distills both LLMs and GNNs into structure-aware multilayer perceptrons (MLPs), enhancing the scalability of large-scale TAGs. Additionally, we introduce an in-memory mechanism that stores typical graph representations, aligning them with memory anchors in an in-memory repository to integrate invariant knowledge, thereby improving the model’s generalization ability. Extensive experiments demonstrate that SSTAG outperforms state-of-the-art models on cross-domain transfer learning tasks, achieves exceptional scalability, and reduces inference costs while maintaining competitive performance.

IJCAI Conference 2025 Conference Paper

The Role of Video Generation in Enhancing Data-Limited Action Understanding

  • Wei Li
  • Dezhao Luo
  • Dongbao Yang
  • Zhenhang Li
  • Weiping Wang
  • Yu Zhou

Video action understanding tasks in real-world scenarios often suffer from data limitations. In this paper, we address the data-limited action understanding problem by bridging data scarcity. We propose a novel method that leverages a text-to-video diffusion transformer to generate annotated data for model training. This paradigm enables the generation of realistic annotated data on an infinite scale without human intervention. We proposed the Information Enhancement Strategy and the Uncertainty-Based Soft Target tailored to generate sample training. Through quantitative and qualitative analyzes, we discovered that real samples generally contain a richer level of information compared to generated samples. Based on this observation, the information enhancement strategy was designed to enhance the informational content of the generated samples from two perspectives: the environment and the character. Furthermore, we observed that a portion of low-quality generated samples might negatively affect model training. To address this, we devised an uncertainty-based label-smoothing strategy to increase the smoothing of these low-quality samples, thereby reducing their impact. We demonstrate the effectiveness of the proposed method on four datasets and five tasks, and achieve state-of-the-art performance for zero-shot action recognition.

AAAI Conference 2024 Conference Paper

ASWT-SGNN: Adaptive Spectral Wavelet Transform-Based Self-Supervised Graph Neural Network

  • Ruyue Liu
  • Rong Yin
  • Yong Liu
  • Weiping Wang

Graph Comparative Learning (GCL) is a self-supervised method that combines the advantages of Graph Convolutional Networks (GCNs) and comparative learning, making it promising for learning node representations. However, the GCN encoders used in these methods rely on the Fourier transform to learn fixed graph representations, which is inherently limited by the uncertainty principle involving spatial and spectral localization trade-offs. To overcome the inflexibility of existing methods and the computationally expensive eigen-decomposition and dense matrix multiplication, this paper proposes an Adaptive Spectral Wavelet Transform-based Self-Supervised Graph Neural Network (ASWT-SGNN). The proposed method employs spectral adaptive polynomials to approximate the filter function and optimize the wavelet using contrast loss. This design enables the creation of local filters in both spectral and spatial domains, allowing flexible aggregation of neighborhood information at various scales and facilitating controlled transformation between local and global information. Compared to existing methods, the proposed approach reduces computational complexity and addresses the limitation of graph convolutional neural networks, which are constrained by graph size and lack flexible control over the neighborhood aspect. Extensive experiments on eight benchmark datasets demonstrate that ASWT-SGNN accurately approximates the filter function in high-density spectral regions, avoiding costly eigen-decomposition. Furthermore, ASWT-SGNN achieves comparable performance to state-of-the-art models in node classification tasks.

AAAI Conference 2024 Conference Paper

Convolutional Spectral Kernel Learning with Generalization Guarantees (Abstract Reprint)

  • Jian Li
  • Yong Liu
  • Weiping Wang

Kernel methods are powerful tools to capture nonlinear patterns behind given data but often lead to poor performance on complicated tasks compared to convolutional neural networks. The reason is that kernel methods are still shallow and fully connected models, failing to reveal hierarchical features and local interdependencies. In this paper, to acquire hierarchical and local knowledge, we incorporate kernel methods with deep architectures and convolutional operators in a spectral kernel learning framework. Based on the inverse Fourier transform and Rademacher complexity theory, we provide the generalization error bounds for the proposed model and prove that under suitable initialization, deeper networks lead to tighter error bounds. Inspired by theoretical findings, we finally completed the convolutional spectral kernel network (CSKN) with two additional regularizers and an initialization strategy. Extensive ablation results validate the effectiveness of non-stationary spectral kernel, multiple layers, additional regularizers, and the convolutional filters, which coincide with our theoretical findings. We further devise a VGG-type 8-layers CSKN, and it outperforms the existing kernel-based networks and popular CNN models on the medium-sized image classification tasks.

IJCAI Conference 2024 Conference Paper

DANCE: Dual-View Distribution Alignment for Dataset Condensation

  • Hansong Zhang
  • Shikun Li
  • Fanzhao Lin
  • Weiping Wang
  • Zhenxing Qian
  • Shiming Ge

Dataset condensation addresses the problem of data burden by learning a small synthetic training set that preserves essential knowledge from the larger real training set. To date, the state-of-the-art (SOTA) results are often yielded by optimization-oriented methods, but their inefficiency hinders their application to realistic datasets. On the other hand, the Distribution-Matching (DM) methods show remarkable efficiency but sub-optimal results compared to optimization-oriented methods. In this paper, we reveal the limitations of current DM-based methods from the inner-class and inter-class views, i. e. , Persistent Training and Distribution Shift. To address these problems, we propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE), which exploits a few pre-trained models to improve DM from both inner-class and inter-class views. Specifically, from the inner-class view, we construct multiple ``mid encoders'' to perform pseudo long-term distribution alignment, making the condensed set a good proxy of the real one during the whole training process; while from the inter-class view, we use the expert models to perform distribution calibration, ensuring the synthetic data remains in the real class region during condensing. Experiments demonstrate the proposed method achieves a SOTA performance while maintaining comparable efficiency with the original DM across various scenarios. Source codes are available at https: //github. com/Hansong-Zhang/DANCE.

AAAI Conference 2024 Conference Paper

FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning

  • Jian Li
  • Yong Liu
  • Weiping Wang

Recent Newton-type federated learning algorithms have demonstrated linear convergence with respect to the communication rounds. However, communicating Hessian matrices is often unfeasible due to their quadratic communication complexity. In this paper, we introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. To enhance communication efficiency, we reduce the sketch size to match the effective dimension of the Hessian matrix. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches. Specifically, our approaches reach super-linear convergence rates w.r.t. the communication rounds for the first time. We validate the effectiveness of our algorithms through various experiments, which coincide with our theoretical findings.

AAAI Conference 2024 Conference Paper

High-Dimensional Analysis for Generalized Nonlinear Regression: From Asymptotics to Algorithm

  • Jian Li
  • Yong Liu
  • Weiping Wang

Overparameterization often leads to benign overfitting, where deep neural networks can be trained to overfit the training data but still generalize well on unseen data. However, it lacks a generalized asymptotic framework for nonlinear regressions and connections to conventional complexity notions. In this paper, we propose a generalized high-dimensional analysis for nonlinear regression models, including various nonlinear feature mapping methods and subsampling. Specifically, we first provide an implicit regularization parameter and asymptotic equivalents related to a classical complexity notion, i.e., effective dimension. We then present a high-dimensional analysis for nonlinear ridge regression and extend it to ridgeless regression in the under-parameterized and over-parameterized regimes, respectively. We find that the limiting risks decrease with the effective dimension. Motivated by these theoretical findings, we propose an algorithm, namely RFRed, to improve generalization ability. Finally, we validate our theoretical findings and the proposed algorithm through several experiments.

ECAI Conference 2024 Conference Paper

JOSAL: Joint Learning Framework for Open-Set Active Learning

  • Jun Xie
  • Xiaohui Song
  • Yangjie Cao
  • Zhi Liu 0002
  • Weiping Wang
  • Hongli Xu

Previous research in active learning has primarily focused on selecting examples from closed-set data, which consists solely of unlabeled examples from the target classes. However, this approach overlooks the more prevalent scenario of open-set data in real-world applications. Open-set data encompasses examples from both target classes and non-target classes. To fill this gap, we propose a novel framework called JOSAL, which enhances the accuracy of the classifier by precisely selecting the target class examples from open-set data. The JOSAL framework introduces the concept of joint learning, where the Sampler and Classifier components perform sampling and classification tasks, respectively, by sharing example features extracted from a pre-trained Encoder. To maximize the classification accuracy of the Classifier, the framework adopts a novel joint learning strategy. This strategy initially prioritizes optimizing the Sampler and gradually shifts the optimization attention to the Classifier. The experimental results demonstrate that, compared to baselines, our approach exhibits stronger sampling precision and achieves higher classification accuracy. To the best of our knowledge, this is the first work to address the open-set active learning problem using the joint learning paradigm.

AAAI Conference 2024 Conference Paper

Pairwise-Label-Based Deep Incremental Hashing with Simultaneous Code Expansion

  • Dayan Wu
  • Qinghang Su
  • Bo Li
  • Weiping Wang

Deep incremental hashing has become a subject of considerable interest due to its capability to learn hash codes in an incremental manner, eliminating the need to generate codes for classes that have already been learned. However, accommodating more classes requires longer hash codes, and regenerating database codes becomes inevitable when code expansion is required. In this paper, we present a unified deep hash framework that can simultaneously learn new classes and increase hash code capacity. Specifically, we design a triple-channel asymmetric framework to optimize a new CNN model with a target code length and a code projection matrix. This enables us to directly generate hash codes for new images, and efficiently generate expanded hash codes for original database images from the old ones with the learned projection matrix. Meanwhile, we propose a pairwise-label-based incremental similarity-preserving loss to optimize the new CNN model, which can incrementally preserve new similarities while maintaining the old ones. Additionally, we design a double-end quantization loss to reduce the quantization error from new and original query images. As a result, our method efficiently embeds both new and original similarities into the expanded hash codes, while keeping the original database codes unchanged. We conduct extensive experiments on three widely-used image retrieval benchmarks, demonstrating that our method can significantly reduce the time required to expand existing database codes, while maintaining state-of-the-art retrieval performance.

AAAI Conference 2023 Conference Paper

One-Shot Replay: Boosting Incremental Object Detection via Retrospecting One Object

  • Dongbao Yang
  • Yu Zhou
  • Xiaopeng Hong
  • Aoting Zhang
  • Weiping Wang

Modern object detectors are ill-equipped to incrementally learn new emerging object classes over time due to the well-known phenomenon of catastrophic forgetting. Due to data privacy or limited storage, few or no images of the old data can be stored for replay. In this paper, we design a novel One-Shot Replay (OSR) method for incremental object detection, which is an augmentation-based method. Rather than storing original images, only one object-level sample for each old class is stored to reduce memory usage significantly, and we find that copy-paste is a harmonious way to replay for incremental object detection. In the incremental learning procedure, diverse augmented samples with co-occurrence of old and new objects to existing training data are generated. To introduce more variants for objects of old classes, we propose two augmentation modules. The object augmentation module aims to enhance the ability of the detector to perceive potential unknown objects. The feature augmentation module explores the relations between old and new classes and augments the feature space via analogy. Extensive experimental results on VOC2007 and COCO demonstrate that OSR can outperform the state-of-the-art incremental object detection methods without using extra wild data.

JMLR Journal 2023 Journal Article

Optimal Convergence Rates for Distributed Nystroem Approximation

  • Jian Li
  • Yong Liu
  • Weiping Wang

The distributed kernel ridge regression (DKRR) has shown great potential in processing complicated tasks. However, DKRR only made use of the local samples that failed to capture the global characteristics. Besides, the existing optimal learning guarantees were provided in expectation and only pertain to the attainable case that the target regression lies exactly in the kernel space. In this paper, we propose distributed learning with globally-shared Nystroem centers (DNystroem), which utilizes global information across the local clients. We also study the statistical properties of DNystroem in expectation and in probability, respectively, and obtain several state-of-the-art results with the minimax optimal learning rates. Note that, the optimal convergence rates for DNystroem pertain to the non-attainable case, while the statistical results allow more partitions and require fewer Nystroem centers. Finally, we conduct experiments on several real-world datasets to validate the effectiveness of the proposed algorithm, and the empirical results coincide with our theoretical findings. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

  • Yuanxin Liu
  • Fandong Meng
  • Zheng Lin
  • Jiangnan Li
  • Peng Fu
  • Yanan Cao
  • Weiping Wang
  • Jie Zhou

Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the in-distribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that \textbf{sparse and robust subnetworks (SRNets) can consistently be found in BERT}, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that \textbf{there exist sparse and almost unbiased BERT subnetworks}. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at \url{https: //github. com/llyx97/sparse-and-robust-PLM}.

AAAI Conference 2022 Conference Paper

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

  • Xiaohua Chen
  • Yucan Zhou
  • Dayan Wu
  • Wanqian Zhang
  • Yu Zhou
  • Bo Li
  • Weiping Wang

Real-world data often follows a long-tailed distribution, which makes the performance of existing classification algorithms degrade heavily. A key issue is that samples in tail categories fail to depict their intra-class diversity. Humans can imagine a sample in new poses, scenes, and view angles with their prior knowledge even if it is the first time to see this category. Inspired by this, we propose a novel reasoning-based implicit semantic data augmentation method to borrow transformation directions from other classes. Since the covariance matrix of each category represents the feature transformation directions, we can sample new directions from similar categories to generate definitely different instances. Specifically, the long-tailed distributed data is first adopted to train a backbone and a classifier. Then, a covariance matrix for each category is estimated, and a knowledge graph is constructed to store the relations of any two categories. Finally, tail samples are adaptively enhanced via propagating information from all the similar categories in the knowledge graph. Experimental results on CIFAR-100-LT, ImageNet-LT, and iNaturalist 2018 have demonstrated the effectiveness of our proposed method compared with the state-of-the-art methods.

IJCAI Conference 2022 Conference Paper

Neutral Utterances are Also Causes: Enhancing Conversational Causal Emotion Entailment with Social Commonsense Knowledge

  • Jiangnan Li
  • Fandong Meng
  • Zheng Lin
  • Rui Liu
  • Peng Fu
  • Yanan Cao
  • Weiping Wang
  • Jie Zhou

Conversational Causal Emotion Entailment aims to detect causal utterances for a non-neutral targeted utterance from a conversation. In this work, we build conversations as graphs to overcome implicit contextual modelling of the original entailment style. Following the previous work, we further introduce the emotion information into graphs. Emotion information can markedly promote the detection of causal utterances whose emotion is the same as the targeted utterance. However, it is still hard to detect causal utterances with different emotions, especially neutral ones. The reason is that models are limited in reasoning causal clues and passing them between utterances. To alleviate this problem, we introduce social commonsense knowledge (CSK) and propose a Knowledge Enhanced Conversation graph (KEC). KEC propagates the CSK between two utterances. As not all CSK is emotionally suitable for utterances, we therefore propose a sentiment-realized knowledge selecting strategy to filter CSK. To process KEC, we further construct the Knowledge Enhanced Directed Acyclic Graph networks. Experimental results show that our method outperforms baselines and infers more causes with different emotions from the targeted utterance.

NeurIPS Conference 2022 Conference Paper

Randomized Sketches for Clustering: Fast and Optimal Kernel $k$-Means

  • Rong Yin
  • Yong Liu
  • Weiping Wang
  • Dan Meng

Kernel $k$-means is arguably one of the most common approaches to clustering. In this paper, we investigate the efficiency of kernel $k$-means combined with randomized sketches in terms of both statistical analysis and computational requirements. More precisely, we propose a unified randomized sketches framework to kernel $k$-means and investigate its excess risk bounds, obtaining the state-of-the-art risk bound with only a fraction of computations. Indeed, we prove that it suffices to choose the sketch dimension $\Omega(\sqrt{n})$ to obtain the same accuracy of exact kernel $k$-means with greatly reducing the computational costs, for sub-Gaussian sketches, the randomized orthogonal system (ROS) sketches, and Nystr\"{o}m kernel $k$-means, where $n$ is the number of samples. To the best of our knowledge, this is the first result of this kind for unsupervised learning. Finally, the numerical experiments on simulated data and real-world datasets validate our theoretical analysis.

IJCAI Conference 2021 Conference Paper

Learning Class-Transductive Intent Representations for Zero-shot Intent Detection

  • Qingyi Si
  • Yuanxin Liu
  • Peng Fu
  • Zheng Lin
  • Jiangnan Li
  • Weiping Wang

Zero-shot intent detection (ZSID) aims to deal with the continuously emerging intents without annotated training data. However, existing ZSID systems suffer from two limitations: 1) They are not good at modeling the relationship between seen and unseen intents. 2) They cannot effectively recognize unseen intents under the generalized intent detection (GZSID) setting. A critical problem behind these limitations is that the representations of unseen intents cannot be learned in the training stage. To address this problem, we propose a novel framework that utilizes unseen class labels to learn Class-Transductive Intent Representations (CTIR). Specifically, we allow the model to predict unseen intents during training, with the corresponding label names serving as input utterances. On this basis, we introduce a multi-task learning objective, which encourages the model to learn the distinctions among intents, and a similarity scorer, which estimates the connections among intents more accurately. CTIR is easy to implement and can be integrated with existing ZSID and GZSID methods. Experiments on two real-world datasets show that CTIR brings considerable improvement to the baseline systems.

IJCAI Conference 2021 Conference Paper

Rescuing Deep Hashing from Dead Bits Problem

  • Shu Zhao
  • Dayan Wu
  • Yucan Zhou
  • Bo Li
  • Weiping Wang

Deep hashing methods have shown great retrieval accuracy and efficiency in large-scale image retrieval. How to optimize discrete hash bits is always the focus in deep hashing methods. A common strategy in these methods is to adopt an activation function, e. g. sigmoid() or tanh(), and minimize a quantization loss to approximate discrete values. However, this paradigm may make more and more hash bits stuck into the wrong saturated area of the activation functions and never escaped. We call this problem "Dead Bits Problem (DBP)". Besides, the existing quantization loss will aggravate DBP as well. In this paper, we propose a simple but effective gradient amplifier which acts before activation functions to alleviate DBP. Moreover, we devise an error-aware quantization loss to further alleviate DBP. It avoids the negative effect of quantization loss based on the similarity between two images. The proposed gradient amplifier and error-aware quantization loss are compatible with a variety of deep hashing methods. Experimental results on three datasets demonstrate the efficiency of the proposed gradient amplifier and the error-aware quantization loss.

AAAI Conference 2020 Conference Paper

Automated Spectral Kernel Learning

  • Jian Li
  • Yong Liu
  • Weiping Wang

The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and outputindependent, which limits their applications on complicated tasks. In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training. Using non-stationary spectral kernels and backpropagation w. r. t. the objective, we obtain favorable spectral representations that depends on both inputs and outputs. Further, based on Rademacher complexity, we derive data-dependent generalization error bounds, where we investigate the effect of those factors and introduce regularization terms to improve the performance. Extensive experimental results validate the effectiveness of the proposed algorithm and coincide with our theoretical findings.

AAAI Conference 2020 Conference Paper

Divide-and-Conquer Learning with Nyström: Optimal Rate and Algorithm

  • Rong Yin
  • Yong Liu
  • Lijing Lu
  • Weiping Wang
  • Dan Meng

Kernel Regularized Least Squares (KRLS) is a fundamental learner in machine learning. However, due to the high time and space requirements, it has no capability to large scale scenarios. Therefore, we propose DC-NY, a novel algorithm that combines divide-and-conquer method, Nyström, conjugate gradient, and preconditioning to scale up KRLS, has the same accuracy of exact KRLS and the minimum time and space complexity compared to the state-of-the-art approximate KRLS estimates. We present a theoretical analysis of DC-NY, including a novel error decomposition with the optimal statistical accuracy guarantees. Extensive experimental results on several real-world large-scale datasets containing up to 1M data points show that DC-NY significantly outperforms the state-of-the-art approximate KRLS estimates.

AAAI Conference 2020 Conference Paper

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

  • Dezhao Luo
  • Chang Liu
  • Yu Zhou
  • Dongbao Yang
  • Can Ma
  • Qixiang Ye
  • Weiping Wang

We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich spatial-temporal representations. VCP first generates “blanks” by withholding video clips and then creates “options” by applying spatiotemporal operations on the withheld clips. Finally, it fills the blanks with “options” and learns representations by predicting the categories of operations applied on the clips. VCP can act as either a proxy task or a target task in self-supervised learning. As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning. As a target task, it can assess learned representation models in a uniform and interpretable manner. With VCP, we train spatial-temporal representation models (3D-CNNs) and apply such models on action recognition and video retrieval tasks. Experiments on commonly used benchmarks show that the trained models outperform the state-ofthe-art self-supervised models with significant margins.

IJCAI Conference 2019 Conference Paper

Approximate Manifold Regularization: Scalable Algorithm and Generalization Analysis

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Weiping Wang

Graph-based semi-supervised learning is one of the most popular and successful semi-supervised learning approaches. Unfortunately, it suffers from high time and space complexity, at least quadratic with the number of training samples. In this paper, we propose an efficient graph-based semi-supervised algorithm with a sound theoretical guarantee. The proposed method combines Nystrom subsampling and preconditioned conjugate gradient descent, substantially improving computational efficiency and reducing memory requirements. Extensive empirical results reveal that our method achieves the state-of-the-art performance in a short time even with limited computing resources.

AAAI Conference 2019 Conference Paper

Community Focusing: Yet Another Query-Dependent Community Detection

  • Zhuo Wang
  • Weiping Wang
  • Chaokun Wang
  • Xiaoyan Gu
  • Bo Li
  • Dan Meng

As a major kind of query-dependent community detection, community search finds a densely connected subgraph containing a set of query nodes. As density is the major consideration of community search, most methods of community search often find a dense subgraph with many vertices far from the query nodes, which are not very related to the query nodes. Motivated by this, a new problem called community focusing (CF) is studied. It finds a community where the members are close and densely connected to the query nodes. A distance-sensitive dense subgraph structure called β-attention-core is proposed to remove the vertices loosely connected to or far from the query nodes, and a combinational density is designed to guarantee the density of a subgraph. Then CF is formalized as finding a subgraph with the largest combinational density among the β-attention-core subgraphs containing the query nodes with the largest β. Thereafter, effective methods are devised for CF. Furthermore, a speed-up strategy is developed to make the methods scalable to large networks. Extensive experimental results on real and synthetic networks demonstrate the performance of our methods.

IJCAI Conference 2019 Conference Paper

Multi-Class Learning using Unlabeled Samples: Theory and Algorithm

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Weiping Wang

In this paper, we investigate the generalization performance of multi-class classification, for which we obtain a shaper error bound by using the notion of local Rademacher complexity and additional unlabeled samples, substantially improving the state-of-the-art bounds in existing multi-class learning methods. The statistical learning motivates us to devise an efficient multi-class learning framework with the local Rademacher complexity and Laplacian regularization. Coinciding with the theoretical analysis, experimental results demonstrate that the stated approach achieves better performance.

IJCAI Conference 2018 Conference Paper

Fast Cross-Validation

  • Yong Liu
  • Hailun Lin
  • Lizhong Ding
  • Weiping Wang
  • Shizhong Liao

Cross-validation (CV) is the most widely adopted approach for selecting the optimal model. However, the computation of CV has high complexity due to multiple times of learner training, making it disabled for large scale model selection. In this paper, we present an approximate approach to CV based on the theoretical notion of Bouligand influence function (BIF) and the Nystr\"{o}m method for kernel methods. We first establish the relationship between the theoretical notion of BIF and CV, and propose a method to approximate the CV via the Taylor expansion of BIF. Then, we provide a novel computing method to calculate the BIF for general distribution, and evaluate BIF for sample distribution. Finally, we use the Nystr\"{o}m method to accelerate the computation of the BIF matrix for giving the finally approximate CV criterion. The proposed approximate CV requires training only once and is suitable for a wide variety of kernel methods. Experimental results on lots of datasets how that our approximate CV has no statistical discrepancy with the original CV, but can significantly improve the efficiency.

AAAI Conference 2018 Conference Paper

Learning Sentiment-Specific Word Embedding via Global Sentiment Representation

  • Peng Fu
  • Zheng Lin
  • Fengcheng Yuan
  • Weiping Wang
  • Dan Meng

Context-based word embedding learning approaches can model rich semantic and syntactic information. However, it is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarities, such as good and bad, are mapped into close word vectors in the embedding space. Recently, some sentiment embedding learning methods have been proposed, but most of them are designed to work well on sentence-level texts. Directly applying those models to document-level texts often leads to unsatis- fied results. To address this issue, we present a sentimentspecific word embedding learning architecture that utilizes local context information as well as global sentiment representation. The architecture is applicable for both sentencelevel and document-level texts. We take global sentiment representation as a simple average of word embeddings in the text, and use a corruption strategy as a sentiment-dependent regularization. Extensive experiments conducted on several benchmark datasets demonstrate that the proposed architecture outperforms the state-of-the-art methods for sentiment classification.

NeurIPS Conference 2018 Conference Paper

Multi-Class Learning: From Theory to Algorithm

  • Jian Li
  • Yong Liu
  • Rong Yin
  • Hua Zhang
  • Lizhong Ding
  • Weiping Wang

In this paper, we study the generalization performance of multi-class classification and obtain a shaper data-dependent generalization error bound with fast convergence rate, substantially improving the state-of-art bounds in the existing data-dependent generalization analysis. The theoretical analysis motivates us to devise two effective multi-class kernel learning algorithms with statistical guarantees. Experimental results show that our proposed methods can significantly outperform the existing multi-class classification methods.

IJCAI Conference 2017 Conference Paper

Efficient Kernel Selection via Spectral Analysis

  • Jian Li
  • Yong Liu
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

Kernel selection is a fundamental problem of kernel methods. Existing measures for kernel selection either provide less theoretical guarantee or have high computational complexity. In this paper, we propose a novel kernel selection criterion based on a newly defined spectral measure of a kernel matrix, with sound theoretical foundation and high computational efficiency. We first show that the spectral measure can be used to derive generalization bounds for some kernel-based algorithms. By minimizing the derived generalization bounds, we propose the kernel selection criterion with spectral measure. Moreover, we demonstrate that the popular minimum graph cut and maximum mean discrepancy are two special cases of the proposed criterion. Experimental results on lots of data sets show that our proposed criterion can not only give the comparable results as the state-of-the-art criterion, but also significantly improve the efficiency.

AAAI Conference 2017 Conference Paper

Generalization Analysis for Ranking Using Integral Operator

  • Yong Liu
  • Shizhong Liao
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

The study on generalization performance of ranking algorithms is one of the fundamental issues in ranking learning theory. Although several generalization bounds have been proposed based on different measures, the convergence rates of the existing bounds are usually at most O 1 √ n, where n is the size of data set. In this paper, we derive novel generalization bounds for the regularized ranking in reproducing kernel Hilbert space via integral operator of kernel function. We prove that the rates of our bounds are much faster than O 1 √ n. Specifically, we first introduce a notion of local Rademacher complexity for ranking, called local ranking Rademacher complexity, which is used to measure the complexity of the space of loss functions of the ranking. Then, we use the local ranking Rademacher complexity to obtain a basic generalization bound. Finally, we establish the relationship between the local Rademacher complexity and the eigenvalues of integral operator, and further derive sharp generalization bounds of faster convergence rate.

AAAI Conference 2017 Conference Paper

Infinite Kernel Learning: Generalization Bounds and Algorithms

  • Yong Liu
  • Shizhong Liao
  • Hailun Lin
  • Yinliang Yue
  • Weiping Wang

Kernel learning is a fundamental problem both in recent research and application of kernel methods. Existing kernel learning methods commonly use some measures of generalization errors to learn the optimal kernel in a convex (or conic) combination of prescribed basic kernels. However, the generalization bounds derived by these measures usually have slow convergence rates, and the basic kernels are finite and should be specified in advance. In this paper, we propose a new kernel learning method based on a novel measure of generalization error, called principal eigenvalue proportion (PEP), which can learn the optimal kernel with sharp generalization bounds over the convex hull of a possibly infinite set of basic kernels. We first derive sharp generalization bounds based on the PEP measure. Then we design two kernel learning algorithms for finite kernels and infinite kernels respectively, in which the derived sharp generalization bounds are exploited to guarantee faster convergence rates, moreover, basic kernels can be learned automatically for infinite kernel learning instead of being prescribed in advance. Theoretical analysis and empirical results demonstrate that the proposed kernel learning method outperforms the state-of-the-art kernel learning methods.