Arrow Research search

Author name cluster

Qiang Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

102 papers
2 author rows

Possible papers

102

TMLR Journal 2026 Journal Article

Muon Optimizes Under Spectral Norm Constraints

  • Lizhang Chen
  • Jonathan Li
  • Qiang Liu

The pursuit of faster optimization algorithms remains an active and important research direction in deep learning. Recently, the Muon optimizer has demonstrated promising empirical performance, but its theoretical foundation remains less understood. In this paper, we bridge this gap and provide a theoretical analysis of Muon by placing it within the Lion-$\mathcal{K}$ family of optimizers. Specifically, we show that Muon corresponds to Lion-$\mathcal{K}$ when equipped with the nuclear norm, and we leverage the theoretical results of Lion-$\mathcal{K}$ to establish that Muon (with decoupled weight decay) implicitly solves an optimization problem that enforces a constraint on the spectral norm of weight matrices. This perspective not only demystifies the implicit regularization effects of Muon but also leads to natural generalizations through varying the choice of convex map $\mathcal{K}$, allowing for the exploration of a broader class of implicitly regularized and constrained optimization algorithms.

AAAI Conference 2026 Conference Paper

Q Cache: Visual Attention Is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model

  • Jiedong Zhuang
  • Lu Lu
  • Ming Dai
  • Rui Hu
  • Jian Chen
  • Qiang Liu
  • Haoji Hu

Multimodal large language models (MLLMs) are plagued by exorbitant inference costs attributable to the profusion of visual tokens within the vision encoder. The redundant visual tokens engenders a substantial computational load and key-value (KV) cache footprint bottleneck. Existing approaches focus on token-wise optimization, leveraging diverse intricate token pruning techniques to eliminate non-crucial visual tokens. Nevertheless, these methods often unavoidably undermine the integrity of the KV cache, resulting in failures in long-text generation tasks. To this end, we conduct an in-depth investigation towards the attention mechanism of the model from a new perspective, and discern that attention within more than half of all decode layers are semantic similar. Upon this finding, we contend that the attention in certain layers can be streamlined by inheriting the attention from their preceding layers. Consequently, we propose Lazy Attention, an efficient attention mechanism that enables cross-layer sharing of similar attention patterns. It ingeniously reduces layer-wise redundant computation in attention. In Lazy Attention, we develop a novel layer-shared cache, Q Cache, tailored for MLLMs, which facilitates the reuse of queries across adjacent layers. In particular, Q Cache is lightweight and fully compatible with existing inference frameworks, including Flash Attention and KV cache. Additionally, our method is highly flexible as it is orthogonal to existing token-wise techniques and can be deployed independently or combined with token pruning approaches. Empirical evaluations on multiple benchmarks demonstrate that our method can reduce KV cache usage by over 35% and achieve 1.5x throughput improvement, while sacrificing only approximately 1% of performance on various MLLMs. Compared with SOTA token-wise methods, our technique achieves superior accuracy preservation.

AAAI Conference 2025 Conference Paper

CoRA: Collaborative Information Perception by Large Language Model’s Weights for Recommendation

  • Yuting Liu
  • Jinghao Zhang
  • Yizhou Dang
  • Yuliang Liang
  • Qiang Liu
  • Guibing Guo
  • Jianzhe Zhao
  • Xingwei Wang

Involving collaborative information in Large Language Models (LLMs) is a promising technique for adapting LLMs for recommendation. Existing methods achieve this by concatenating collaborative features with text tokens into a unified sequence input and then fine-tuning to align these features with LLM's input space. Although effective, in this work, we identify two limitations when adapting LLMs to recommendation tasks, which hinder the integration of general knowledge and collaborative information, resulting in sub-optimal recommendation performance. (1) Fine-tuning LLM with recommendation data can undermine its inherent world knowledge and fundamental competencies, which are crucial for interpreting and inferring recommendation text. (2) Incorporating collaborative features into textual prompts disrupts the semantics of the original prompts, preventing LLM from generating appropriate outputs. In this paper, we propose a new paradigm, Collaborative LoRA (CoRA), with a collaborative query generator. Rather than input space alignment, this method aligns collaborative information with LLM's parameter space, representing them as incremental weights to update LLM's output. This way, LLM perceives collaborative information without altering its general knowledge and text inference capabilities. Specifically, we employ a collaborative filtering model to extract user and item embeddings and inject them into a set number of learnable queries. We then convert collaborative queries into collaborative weights with low-rank properties and merge the collaborative weights into LLM's weights, enabling LLM to perceive the collaborative signals and generate personalized recommendations without fine-tuning or extra collaborative tokens in prompts. Extensive experiments confirm that CoRA effectively integrates collaborative information into LLM, enhancing recommendation performance.

ICLR Conference 2025 Conference Paper

PN-GAIL: Leveraging Non-optimal Information from Imperfect Demonstrations

  • Qiang Liu
  • Huiqiao Fu
  • Kaiqiang Tang
  • Chunlin Chen 0001
  • Daoyi Dong

Imitation learning aims at constructing an optimal policy by emulating expert demonstrations. However, the prevailing approaches in this domain typically presume that the demonstrations are optimal, an assumption that seldom holds true in the complexities of real-world applications. The data collected in practical scenarios often contains imperfections, encompassing both optimal and non-optimal examples. In this study, we propose Positive-Negative Generative Adversarial Imitation Learning (PN-GAIL), a novel approach that falls within the framework of Generative Adversarial Imitation Learning (GAIL). PN-GAIL innovatively leverages non-optimal information from imperfect demonstrations, allowing the discriminator to comprehensively assess the positive and negative risks associated with these demonstrations. Furthermore, it requires only a small subset of labeled confidence scores. Theoretical analysis indicates that PN-GAIL deviates from the non-optimal data while mimicking imperfect demonstrations. Experimental results demonstrate that PN-GAIL surpasses conventional baseline methods in dealing with imperfect demonstrations, thereby significantly augmenting the practical utility of imitation learning in real-world contexts. Our codes are available at https://github.com/QiangLiuT/PN-GAIL.

NeurIPS Conference 2025 Conference Paper

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

  • Junfei Wu
  • Jian Guan
  • Kaituo Feng
  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Wei Wu
  • Tieniu Tan

As textual reasoning with large language models (LLMs) has advanced significant, there has been growing interest in enhancing the multimodal reasoning capabilities of large vision-language models (LVLMs). However, existing methods primarily approach multimodal reasoning in a straightforward, text-centric manner, where both reasoning and answer derivation are conducted purely through text, with the only difference being the presence of multimodal input. As a result, these methods often encounter fundamental limitations in spatial reasoning tasks that demand precise geometric understanding and continuous spatial tracking\textemdash capabilities that humans achieve through mental visualization and manipulation. To address the limitations, we propose drawing to reason in space, a novel paradigm that enables LVLMs to reason through elementary drawing operations in the visual space. By equipping models with basic drawing operations including annotating bounding boxes and drawing auxiliary lines, we empower them to express and analyze spatial relationships through direct visual manipulation, meanwhile avoiding the performance ceiling imposed by specialized perception tools in previous tool-integrated reasoning approaches. To cultivate this capability, we develop a three-stage training framework: cold-start training with synthetic data to establish basic drawing abilities, reflective rejection sampling to enhance self-reflection behaviors, and reinforcement learning to directly optimize for target rewards. Extensive experiments demonstrate that our model, named \textsc{Spark}, consistently outperforms existing methods across diverse spatial reasoning benchmarks involving maze navigation, static spatial reasoning, video-based reasoning and multi-view-based reasoning tasks, with an average improvement of 11. 5\%. Ablation studies reveal the critical role of each training stage, with reflective rejection sampling particularly enhancing the model's self-correction capabilities and reasoning potential.

AAAI Conference 2025 Conference Paper

ST3: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming

  • Jiedong Zhuang
  • Lu Lu
  • Ming Dai
  • Rui Hu
  • Jian Chen
  • Qiang Liu
  • Haoji Hu

Multimodal large language models (MLLMs) enhance their perceptual capabilities by integrating visual and textual information. However, processing the massive number of visual tokens incurs a significant computational cost. Existing analysis of the MLLM attention mechanisms remains shallow, leading to coarse-grain token pruning strategies that fail to effectively balance speed and accuracy. In this paper, we conduct a comprehensive investigation of MLLM attention mechanisms with LLaVA. We find that numerous visual tokens and partial attention computations are redundant during the decoding process. Based on this insight, we propose Spatial-Temporal Visual Token Trimming (ST3), a framework designed to accelerate MLLM inference without retraining. ST3 consists of two primary components: 1) Progressive Visual Token Pruning (PVTP), which eliminates inattentive visual tokens across layers, and 2) Visual Token Annealing (VTA), which dynamically reduces the number of visual tokens in each layer as the generated tokens grow. Together, these techniques deliver around 2x faster inference with only about 30% KV cache memory compared to the original LLaVA, while maintaining consistent performance across various datasets. Crucially, ST3 can be seamlessly integrated into existing pre-trained MLLMs, providing a plug-and-play solution for efficient inference.

ICML Conference 2024 Conference Paper

A Computational Framework for Solving Wasserstein Lagrangian Flows

  • Kirill Neklyudov
  • Rob Brekelmans
  • Alexander Tong 0001
  • Lazar Atanackovic
  • Qiang Liu
  • Alireza Makhzani

The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry ( kinetic energy ), and the regularization of density paths ( potential energy ). These combinations yield different variational problems ( Lagrangians ), encompassing many variations of the optimal transport problem such as the Schrödinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. We propose a novel deep learning based framework approaching all of these problems from a unified perspective. Leveraging the dual formulation of the Lagrangians, our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.

NeurIPS Conference 2024 Conference Paper

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies

  • Xixi Hu
  • Bo Liu
  • Xingchao Liu
  • Qiang Liu

Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, makingAdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance with fast inference speed.

NeurIPS Conference 2024 Conference Paper

Beyond Efficiency: Molecular Data Pruning for Enhanced Generalization

  • Dingshuo Chen
  • Zhixun Li
  • Yuyan Ni
  • Guibin Zhang
  • Ding Wang
  • Qiang Liu
  • Shu Wu
  • Jeffrey X. Yu

With the emergence of various molecular tasks and massive datasets, how to perform efficient training has become an urgent yet under-explored issue in the area. Data pruning (DP), as an oft-stated approach to saving training burdens, filters out less influential samples to form a coreset for training. However, the increasing reliance on pretrained models for molecular tasks renders traditional in-domain DP methods incompatible. Therefore, we propose a Mol ecular data P runing framework for e nhanced G eneralization ( MolPeg ), which focuses on the source-free data pruning scenario, where data pruning is applied with pretrained models. By maintaining two models with different updating paces during training, we introduce a novel scoring function to measure the informativeness of samples based on the loss discrepancy. As a plug-and-play framework, MolPeg realizes the perception of both source and target domain and consistently outperforms existing DP methods across four downstream tasks. Remarkably, it can surpass the performance obtained from full-dataset training, even when pruning up to 60-70% of the data on HIV and PCBA dataset. Our work suggests that the discovery of effective data-pruning metrics could provide a viable path to both enhanced efficiency and superior generalization in transfer learning.

NeurIPS Conference 2024 Conference Paper

Communication Efficient Distributed Training with Distributed Lion

  • Bo Liu
  • Lemeng Wu
  • Lizhang Chen
  • Kaizhao Liang
  • Jiaxu Zhu
  • Chen Liang
  • Raghuraman Krishnamoorthi
  • Qiang Liu

The Lion optimizer has been a promising competitor with the AdamW for training large AI models, with advantages in memory, computation, and sample efficiency. In this paper, we introduce Distributed Lion, an innovative adaptation of Lion for distributed training environments. Leveraging the sign operator in Lion, our Distributed Lion only requires to communicate binary or lower-precision vectorsbetween workers to the center server, significantly reducing the communication cost. Our theoretical analysis confirms Distributed Lion's convergence properties. Empirical results demonstrate its robustness across a range of tasks, worker counts, and batch sizes, on both vision and language problems. Notably, Distributed Lion attains comparable performance to standard Lion or AdamW optimizers applied on aggregated gradients, but with significantly reduced communication bandwidth. This feature is particularly advantageous for training large models. In addition, we also demonstrate that \mavolion{} presents a more favorable performance-bandwidth balance compared to existing efficient distributed methods such as deep gradient compression and ternary gradients.

NeurIPS Conference 2024 Conference Paper

Enhancing Protein Mutation Effect Prediction through a Retrieval-Augmented Framework

  • Ruihan Guo
  • Rui Wang
  • Ruidong Wu
  • Zhizhou Ren
  • Jiahan Li
  • Shitong Luo
  • Zuofan Wu
  • Qiang Liu

Predicting the effects of protein mutations is crucial for analyzing protein functions and understanding genetic diseases. However, existing models struggle to effectively extract mutation-related local structure motifs from protein databases, which hinders their predictive accuracy and robustness. To tackle this problem, we design a novel retrieval-augmented framework for incorporating similar structure information in known protein structures. We create a vector database consisting of local structure motif embeddings from a pre-trained protein structure encoder, which allows for efficient retrieval of similar local structure motifs during mutation effect prediction. Our findings demonstrate that leveraging this method results in the SOTA performance across multiple protein mutation prediction datasets, and offers a scalable solution for studying mutation effects.

JBHI Journal 2024 Journal Article

Explainable Federated Medical Image Analysis Through Causal Learning and Blockchain

  • Junsheng Mu
  • Michel Kadoch
  • Tongtong Yuan
  • Wenzhe Lv
  • Qiang Liu
  • Bohan Li

Federated learning (FL) enables collaborative training of machine learning models across distributed medical data sources without compromising privacy. However, applying FL to medical image analysis presents challenges like high communication overhead and data heterogeneity. This paper proposes novel FL techniques using explainable artificial intelligence (XAI) for efficient, accurate, and trustworthy analysis. A heterogeneity-aware causal learning approach selectively sparsifies model weights based on their causal contributions, significantly reducing communication requirements while retaining performance and improving interpretability. Furthermore, blockchain provides decentralized quality assessment of client datasets. The assessment scores adjust aggregation weights so higher-quality data has more influence during training, improving model generalization. Comprehensive experiments show our XAI-integrated FL framework enhances efficiency, accuracy and interpretability. The causal learning method decreases communication overhead while maintaining segmentation accuracy. The blockchain-based data valuation mitigates issues from low-quality local datasets. Our framework provides essential model explanations and trust mechanisms, making FL viable for clinical adoption in medical image analysis.

AAAI Conference 2024 Conference Paper

Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables

  • Haisong Gong
  • Weizhi Xu
  • Shu Wu
  • Qiang Liu
  • Liang Wang

Fact checking aims to predict claim veracity by reasoning over multiple evidence pieces. It usually involves evidence retrieval and veracity reasoning. In this paper, we focus on the latter, reasoning over unstructured text and structured table information. Previous works have primarily relied on fine-tuning pretrained language models or training homogeneous-graph-based models. Despite their effectiveness, we argue that they fail to explore the rich semantic information underlying the evidence with different structures. To address this, we propose a novel word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information, namely HeterFC. Our approach leverages a heterogeneous evidence graph, with words as nodes and thoughtfully designed edges representing different evidence properties. We perform information propagation via a relational graph neural network, facilitating interactions between claims and evidence. An attention-based method is utilized to integrate information, combined with a language model for generating predictions. We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval. Comprehensive experiments on the large fact checking dataset FEVEROUS demonstrate the effectiveness of HeterFC. Code will be released at: https://github.com/Deno-V/HeterFC.

AAAI Conference 2024 Conference Paper

Layer Compression of Deep Networks with Straight Flows

  • Chengyue Gong
  • Xiaocong Du
  • Bhargav Bhushanam
  • Lemeng Wu
  • Xingchao Liu
  • Dhruv Choudhary
  • Arun Kejariwal
  • Qiang Liu

Very deep neural networks lead to significantly better performance on various real tasks. However, it usually causes slow inference and is hard to be deployed on real-world devices. How to reduce the number of layers to save memory and to accelerate the inference is an eye-catching topic. In this work, we introduce an intermediate objective, a continuous-time network, before distilling deep networks into shallow networks. First, we distill a given deep network into a continuous-time neural flow model, which can be discretized with an ODE solver and the inference requires passing through the network multiple times. By forcing the flow transport trajectory to be straight lines, we find that it is easier to compress the infinite step model into a one-step neural flow model, which only requires passing through the flow model once. Secondly, we refine the one-step flow model together with the final head layer with knowledge distillation and finally, we can replace the given deep network with this one-step flow network. Empirically, we demonstrate that our method outperforms direct distillation and other baselines on different model architectures (e.g. ResNet, ViT) on image classification and semantic segmentation tasks. We also manifest that our distilled model naturally serves as an early-exit dynamic inference model.

NeurIPS Conference 2024 Conference Paper

Memory-Efficient LLM Training with Online Subspace Descent

  • Kaizhao Liang
  • Bo Liu
  • Lizhang Chen
  • Qiang Liu

Recently, a wide range of memory-efficient LLM training algorithms have gained substantial popularity. These methods leverage the low-rank structure of gradients to project optimizer states into a subspace using projection matrix found by singular value decomposition (SVD). However, convergence of these algorithms is highly dependent on the update rules of their projection matrix. In this work, we provide the \emph{first} convergence guarantee for arbitrary update rules of projection matrix. This guarantee is generally applicable to optimizers that can be analyzed with Hamiltonian Descent, including most common ones, such as LION, Adam. Inspired by our theoretical understanding, we propose Online Subspace Descent, a new family of subspace descent optimizer without SVD. Instead of updating projection matrix with eigenvectors, Online Subspace Descent updates projection matrix wtih online PCA. Online Subspace Descent is flexible and introduces only minimum overhead to training. We demonstrate that, for the task of pretraining LLaMA models ranging from 60M to 1B parameters on the C4 dataset, Online Subspace Descent achieves lower perplexity than state-of-the-art low-rank training methods across different settings and narrows the gap with full-rank baselines.

NeurIPS Conference 2024 Conference Paper

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

  • Hanshu Yan
  • Xingchao Liu
  • Jiachun Pan
  • Jun Hao Liew
  • Qiang Liu
  • Jiashi Feng

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models.

NeurIPS Conference 2024 Conference Paper

Pin-Tuning: Parameter-Efficient In-Context Tuning for Few-Shot Molecular Property Prediction

  • Qiang Liu
  • Shaozhen Liu
  • Xin Sun
  • Shu Wu
  • Liang Wang

Molecular property prediction (MPP) is integral to drug discovery and material science, but often faces the challenge of data scarcity in real-world scenarios. Addressing this, few-shot molecular property prediction (FSMPP) has been developed. Unlike other few-shot tasks, FSMPP typically employs a pre-trained molecular encoder and a context-aware classifier, benefiting from molecular pre-training and molecular context information. Despite these advancements, existing methods struggle with the ineffective fine-tuning of pre-trained encoders. We attribute this issue to the imbalance between the abundance of tunable parameters and the scarcity of labeled molecules, and the lack of contextual perceptiveness in the encoders. To overcome this hurdle, we propose a parameter-efficient in-context tuning method, named Pin-Tuning. Specifically, we propose a lightweight adapter for pre-trained message passing layers (MP-Adapter) and Bayesian weight consolidation for pre-trained atom/bond embedding layers (Emb-BWC), to achieve parameter-efficient tuning while preventing over-fitting and catastrophic forgetting. Additionally, we enhance the MP-Adapters with contextual perceptiveness. This innovation allows for in-context tuning of the pre-trained encoder, thereby improving its adaptability for specific FSMPP tasks. When evaluated on public datasets, our method demonstrates superior tuning with fewer trainable parameters, improving few-shot predictive performance.

NeurIPS Conference 2024 Conference Paper

Quadratic Quantum Variational Monte Carlo

  • Baiyu Su
  • Qiang Liu

This paper introduces the Quadratic Quantum Variational Monte Carlo (Q$^2$VMC) algorithm, an innovative algorithm in quantum chemistry that significantly enhances the efficiency and accuracy of solving the Schrödinger equation. Inspired by the discretization of imaginary-time Schrödinger evolution, Q$^2$VMC employs a novel quadratic update mechanism that integrates seamlessly with neural network-based ansatzes. Our extensive experiments showcase Q$^2$VMC's superior performance, achieving faster convergence and lower ground state energies in wavefunction optimization across various molecular systems, without additional computational cost. This study not only advances the field of computational quantum chemistry but also highlights the important role of discretized evolution in variational quantum algorithms, offering a scalable and robust framework for future quantum research.

AAAI Conference 2024 Conference Paper

Rethinking Graph Masked Autoencoders through Alignment and Uniformity

  • Xiang Tao
  • Qiang Liu
  • Shu Wu
  • Liang Wang

Self-supervised learning on graphs can be bifurcated into contrastive and generative methods. Contrastive methods, also known as graph contrastive learning (GCL), have dominated graph self-supervised learning in the past few years, but the recent advent of graph masked autoencoder (GraphMAE) rekindles the momentum behind generative methods. Despite the empirical success of GraphMAE, there is still a dearth of theoretical understanding regarding its efficacy. Moreover, while both generative and contrastive methods have been shown to be effective, their connections and differences have yet to be thoroughly investigated. Therefore, we theoretically build a bridge between GraphMAE and GCL, and prove that the node-level reconstruction objective in GraphMAE implicitly performs context-level GCL. Based on our theoretical analysis, we further identify the limitations of the GraphMAE from the perspectives of alignment and uniformity, which have been considered as two key properties of high-quality representations in GCL. We point out that GraphMAE's alignment performance is restricted by the masking strategy, and the uniformity is not strictly guaranteed. To remedy the aforementioned limitations, we propose an Alignment-Uniformity enhanced Graph Masked AutoEncoder, named AUG-MAE. Specifically, we propose an easy-to-hard adversarial masking strategy to provide hard-to-align samples, which improves the alignment performance. Meanwhile, we introduce an explicit uniformity regularizer to ensure the uniformity of the learned representations. Experimental results on benchmark datasets demonstrate the superiority of our model over existing state-of-the-art methods. The code is available at: https://github.com/AzureLeon1/AUG-MAE.

AAAI Conference 2024 Conference Paper

Text-Guided Molecule Generation with Diffusion Language Model

  • Haisong Gong
  • Qiang Liu
  • Shu Wu
  • Liang Wang

Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.

NeurIPS Conference 2024 Conference Paper

VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark

  • Han Huang
  • Haitian Zhong
  • Tao Yu
  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Tieniu Tan

Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: https: //github. com/VLKEB/VLKEB.

NeurIPS Conference 2023 Conference Paper

FAMO: Fast Adaptive Multitask Optimization

  • Bo Liu
  • Yihao Feng
  • Peter Stone
  • Qiang Liu

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, in practice, applying gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe under-optimization of certain tasks. Previous approaches that manipulate task gradients for a more balanced loss decrease require storing and computing all task gradients ($\mathcal{O}(k)$ space and time where $k$ is the number of tasks), limiting their use in large-scale scenarios. In this work, we introduce Fast Adaptive Multitask Optimization (FAMO), a dynamic weighting method that decreases task losses in a balanced way using $\mathcal{O}(1)$ space and time. We conduct an extensive set of experiments covering multi-task supervised and reinforcement learning problems. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques while offering significant improvements in space and computational efficiency. Code is available at \url{https: //github. com/Cranial-XIX/FAMO}.

NeurIPS Conference 2023 Conference Paper

GSLB: The Graph Structure Learning Benchmark

  • Zhixun Li
  • Xin Sun
  • Yifan Luo
  • Yanqiao Zhu
  • Dingshuo Chen
  • Yingtao Luo
  • Xiangxin Zhou
  • Qiang Liu

Graph Structure Learning (GSL) has recently garnered considerable attention due to its ability to optimize both the parameters of Graph Neural Networks (GNNs) and the computation graph structure simultaneously. Despite the proliferation of GSL methods developed in recent years, there is no standard experimental setting or fair comparison for performance evaluation, which creates a great obstacle to understanding the progress in this field. To fill this gap, we systematically analyze the performance of GSL in different scenarios and develop a comprehensive Graph Structure Learning Benchmark (GSLB) curated from 20 diverse graph datasets and 16 distinct GSL algorithms. Specifically, GSLB systematically investigates the characteristics of GSL in terms of three dimensions: effectiveness, robustness, and complexity. We comprehensively evaluate state-of-the-art GSL algorithms in node- and graph-level tasks, and analyze their performance in robust learning and model complexity. Further, to facilitate reproducible research, we have developed an easy-to-use library for training, evaluating, and visualizing different GSL methods. Empirical results of our extensive experiments demonstrate the ability of GSL and reveal its potential benefits on various downstream tasks, offering insights and opportunities for future research. The code of GSLB is available at: https: //github. com/GSL-Benchmark/GSLB.

NeurIPS Conference 2023 Conference Paper

Language Is Not All You Need: Aligning Perception with Language Models

  • Shaohan Huang
  • Li Dong
  • Wenhui Wang
  • Yaru Hao
  • Saksham Singhal
  • Shuming Ma
  • Tengchao Lv
  • Lei Cui

A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i. e. , few-shot), and follow instructions (i. e. , zero-shot). Specifically, we train KOSMOS-1 from scratch on web-scale multi-modal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that KOSMOS-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i. e. , transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.

NeurIPS Conference 2023 Conference Paper

LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

  • Bo Liu
  • Yifeng Zhu
  • Chongkai Gao
  • Yihao Feng
  • Qiang Liu
  • Yuke Zhu
  • Peter Stone

Lifelong learning offers a promising paradigm of building a generalist agent that learns and adapts over its lifespan. Unlike traditional lifelong learning problems in image and text domains, which primarily involve the transfer of declarative knowledge of entities and concepts, lifelong learning in decision-making (LLDM) also necessitates the transfer of procedural knowledge, such as actions and behaviors. To advance research in LLDM, we introduce LIBERO, a novel benchmark of lifelong learning for robot manipulation. Specifically, LIBERO highlights five key research topics in LLDM: 1) how to efficiently transfer declarative knowledge, procedural knowledge, or the mixture of both; 2) how to design effective policy architectures and 3) effective algorithms for LLDM; 4) the robustness of a lifelong learner with respect to task ordering; and 5) the effect of model pretraining for LLDM. We develop an extendible procedural generation pipeline that can in principle generate infinitely many tasks. For benchmarking purpose, we create four task suites (130 tasks in total) that we use to investigate the above-mentioned research topics. To support sample-efficient learning, we provide high-quality human-teleoperated demonstration data for all tasks. Our extensive experiments present several insightful or even unexpected discoveries: sequential finetuning outperforms existing lifelong learning methods in forward transfer, no single visual encoder architecture excels at all types of knowledge transfer, and naive supervised pretraining can hinder agents' performance in the subsequent LLDM.

AAAI Conference 2023 Conference Paper

Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning

  • Bo Liu
  • Yihao Feng
  • Qiang Liu
  • Peter Stone

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function, thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency. The code is available at https://github.com/Cranial-XIX/metric-residual-network.

ICLR Conference 2023 Conference Paper

Sampling with Mollified Interaction Energy Descent

  • Lingxiao Li
  • Qiang Liu
  • Anna Korba
  • Mikhail Yurochkin
  • Justin Solomon 0001

Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this paper, we present a new optimization-based method for sampling called mollified interaction energy descent (MIED). MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs). These energies rely on mollifier functions---smooth approximations of the Dirac delta originated from PDE theory. We show that as the mollifier approaches the Dirac delta, the MIE converges to the chi-square divergence with respect to the target measure and the gradient flow of the MIE agrees with that of the chi-square divergence. Optimizing this energy with proper discretization yields a practical first-order particle-based algorithm for sampling in both unconstrained and constrained domains. We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD, while for constrained sampling problems our method readily incorporates constrained optimization techniques to handle more flexible constraints with strong performance compared to alternatives.

NeurIPS Conference 2023 Conference Paper

Uncovering Neural Scaling Laws in Molecular Representation Learning

  • Dingshuo Chen
  • Yanqiao Zhu
  • Jieyu Zhang
  • Yuanqi Du
  • Zhixun Li
  • Qiang Liu
  • Shu Wu
  • Liang Wang

Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing model-centric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field. In this paper, we delve into the neural scaling behaviors of MRL from a data-centric viewpoint, examining four key dimensions: (1) data modalities, (2) dataset splitting, (3) the role of pre-training, and (4) model capacity. Our empirical studies confirm a consistent power-law relationship between data volume and MRL performance across these dimensions. Additionally, through detailed analysis, we identify potential avenues for improving learning efficiency. To challenge these scaling laws, we adapt seven popular data pruning strategies to molecular data and benchmark their performance. Our findings underline the importance of data-centric MRL and highlight possible directions for future research.

TIST Journal 2023 Journal Article

Unsupervised Graph Representation Learning with Cluster-aware Self-training and Refining

  • Yanqiao Zhu
  • Yichen Xu
  • Feng Yu
  • Qiang Liu
  • Shu Wu

Unsupervised graph representation learning aims to learn low-dimensional node embeddings without supervision while preserving graph topological structures and node attributive features. Previous Graph Neural Networks (GNN) require a large number of labeled nodes, which may not be accessible in real-world applications. To this end, we present a novel unsupervised graph neural network model with Cluster-aware Self-training and Refining ( CLEAR ). Specifically, in the proposed CLEAR model, we perform clustering on the node embeddings and update the model parameters by predicting the cluster assignments. To avoid degenerate solutions of clustering, we formulate the graph clustering problem as an optimal transport problem and leverage a balanced clustering strategy. Moreover, we observe that graphs often contain inter-class edges, which mislead the GNN model to aggregate noisy information from neighborhood nodes. Therefore, we propose to refine the graph topology by strengthening intra-class edges and reducing node connections between different classes based on cluster labels, which better preserves cluster structures in the embedding space. We conduct comprehensive experiments on two benchmark tasks using real-world datasets. The results demonstrate the superior performance of the proposed model over baseline methods. Notably, our model gains over 7% improvements in terms of accuracy on node clustering over state-of-the-arts.

NeurIPS Conference 2023 Conference Paper

Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation

  • Kirill Neklyudov
  • Jannes Nys
  • Luca Thiede
  • Juan Carrasquilla
  • Qiang Liu
  • Max Welling
  • Alireza Makhzani

Solving the quantum many-body Schrödinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher--Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose "Wasserstein Quantum Monte Carlo" (WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than the Fisher--Rao metric, and corresponds to transporting the probability mass, rather than teleporting it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.

AAAI Conference 2022 Conference Paper

AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds

  • Yihan Hu
  • Zhuangzhuang Ding
  • Runzhou Ge
  • Wenxin Shao
  • Li Huang
  • Kun Li
  • Qiang Liu

There have been two streams in the 3D detection from point clouds: single-stage methods and two-stage methods. While the former is more computationally efficient, the latter usually provides better detection accuracy. By carefully examining the two-stage approaches, we have found that if appropriately designed, the first stage can produce accurate box regression. In this scenario, the second stage mainly rescores the boxes such that the boxes with better localization get selected. From this observation, we have devised a single-stage anchor-free network that can fulfill these requirements. This network, named AFDetV2, extends the previous work by incorporating a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head. We take a simple product of the predicted IoU score with the classification heatmap to form the final classification confidence. The enhanced backbone strengthens the box localization capability, and the rescoring approach effectively joins the object presence confidence and the box regression accuracy. As a result, the detection accuracy is drastically boosted in the single-stage. To evaluate our approach, we have conducted extensive experiments on the Waymo Open Dataset and the nuScenes Dataset. We have observed that our AFDetV2 achieves the state-of-theart results on these two datasets, superior to all the prior arts, including both the single-stage and the two-stage 3D detectors. AFDetV2 won the 1st place in the Real-Time 3D Detection of the Waymo Open Dataset Challenge 2021. In addition, a variant of our model AFDetV2-Base was entitled the “Most Efficient Model” by the Challenge Sponsor, showing a superior computational efficiency. To demonstrate the generality of this single-stage method, we have also applied it to the first stage of the two-stage networks. Without exception, the results show that with the strengthened backbone and the rescoring approach, the second stage refinement is no longer needed.

NeurIPS Conference 2022 Conference Paper

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

  • Bo Liu
  • Mao Ye
  • Stephen Wright
  • Peter Stone
  • Qiang Liu

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO methods need to differentiate through the low-level optimization process with implicit differentiation, which requires expensive calculations related to the Hessian matrix. There has been a recent quest for first-order methods for BO, but the methods proposed to date tend to be complicated and impractical for large-scale deep learning applications. In this work, we propose a simple first-order BO algorithm that depends only on first-order gradient information, requires no implicit differentiation, and is practical and efficient for large-scale non-convex functions in deep learning. We provide non-asymptotic convergence analysis of the proposed method to stationary points for non-convex objectives and present empirical results that show its superior practical performance.

NeurIPS Conference 2022 Conference Paper

Diffusion-based Molecule Generation with Informative Prior Bridges

  • Lemeng Wu
  • Chengyue Gong
  • Xingchao Liu
  • Mao Ye
  • Qiang Liu

AI-based molecule generation provides a promising approach to a large area of biomedical sciences and engineering, such as antibody design, hydrolase engineering, or vaccine development. Because the molecules are governed by physical laws, a key challenge is to incorporate prior information into the training procedure to generate high-quality and realistic molecules. We propose a simple and novel approach to steer the training of diffusion-based generative models with physical and statistics prior information. This is achieved by constructing physically informed diffusion bridges, stochastic processes that guarantee to yield a given observation at the fixed terminal time. We develop a Lyapunov function based method to construct and determine bridges, and propose a number of proposals of informative prior bridges for both high-quality molecule generation and uniformity-promoted 3D point cloud generation. With comprehensive experiments, we show that our method provides a powerful approach to the 3D generation task, yielding molecule structures with better quality and stability scores and more uniformly distributed point clouds of high qualities.

NeurIPS Conference 2022 Conference Paper

First Hitting Diffusion Models for Generating Manifold, Graph and Categorical Data

  • Mao Ye
  • Lemeng Wu
  • Qiang Liu

We propose a family of First Hitting Diffusion Models (FHDM), deep generative models that generate data with a diffusion process that terminates at a random first hitting time. This yields an extension of the standard fixed-time diffusion models that terminate at a pre-specified deterministic time. Although standard diffusion models are designed for continuous unconstrained data, FHDM is naturally designed to learn distributions on continuous as well as a range of discrete and structure domains. Moreover, FHDM enables instance-dependent terminate time and accelerates the diffusion process to sample higher quality data with fewer diffusion steps. Technically, we train FHDM by maximum likelihood estimation on diffusion trajectories augmented from observed data with conditional first hitting processes (i. e. , bridge) derived based on Doob's $h$-transform, deviating from the commonly used time-reversal mechanism. We apply FHDM to generate data in various domains such as point cloud (general continuous distribution), climate and geographical events on earth (continuous distribution on the sphere), unweighted graphs (distribution of binary matrices), and segmentation maps of 2D images (high-dimensional categorical distribution). We observe considerable improvement compared with the state-of-the-art approaches in both quality and speed.

IJCAI Conference 2022 Conference Paper

GraphDIVE: Graph Classification by Mixture of Diverse Experts

  • Fenyu Hu
  • Liping Wang
  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Tieniu Tan

Graph classification is a challenging research task in many applications across a broad range of domains. Recently, Graph Neural Network (GNN) models have achieved superior performance on various real-world graph datasets. Despite their successes, most of current GNN models largely suffer from the ubiquitous class imbalance problem, which typically results in prediction bias towards majority classes. Although many imbalanced learning methods have been proposed, they mainly focus on regular Euclidean data and cannot well utilize topological structure of graph (non-Euclidean) data. To boost the performance of GNNs and investigate the relationship between topological structure and class imbalance, we propose GraphDIVE, which learns multi-view graph representations and combine multi-view experts (i. e. , classifiers). Specifically, multi-view graph representations correspond to the intrinsic diverse graph topological structure characteristics. Extensive experiments on molecular benchmark datasets demonstrate the effectiveness of the proposed approach.

JBHI Journal 2022 Journal Article

Multiparametric Quantitative US Examination of Liver Fibrosis: A Feature-Engineering and Machine-Learning Based Analysis

  • Huiying Wen
  • Wei Zheng
  • Min Li
  • Qing Li
  • Qiang Liu
  • Jianhua Zhou
  • Zhong Liu
  • Xin Chen

Quantitative ultrasound (QUS), which attempts to extract quantitative features from the US radiofrequency (RF) or envelope data for tissue characterization, is becoming a promising technique for noninvasive assessments of liver fibrosis. However, the number of feature variables examined and finally used in the existing QUS methods is typically small, limiting the diagnostic performance. Therefore, this paper devises a new multiparametric QUS (MP-QUS) method which enables the extraction of a large number of feature variables from US RF signals and allows for the use of feature-engineering and machine-learning based algorithms for liver fibrosis assessment. In the MP-QUS, eighty-four feature variables were extracted from multiple QUS parametric maps derived from the RF signals and the envelope data. Afterwards, feature reduction and selection were performed in turn to remove the feature redundancy and identify the best combination of features in the reduced feature set. Finally, a variety of machine-learning algorithms were tested for fibrosis classification with the selected features, based on the results of which the optimal classifier was established. The performance of the proposed MP-QUS method for staging liver fibrosis was evaluated on an animal model, with histologic examination as the reference standard. The mean accuracy, sensitivity, specificity and area under the receiver-operating-characteristic curve achieved by MP-QUS are respectively 83. 38%, 86. 04%, 80. 82%, and 0. 891 for recognizing significant liver fibrosis, and 85. 50%, 88. 92%, 85. 24%, and 0. 924 for diagnosing liver cirrhosis. The proposed MP-QUS method paves a way for its future extension to assess liver fibrosis in human subjects.

NeurIPS Conference 2022 Conference Paper

Sampling in Constrained Domains with Orthogonal-Space Variational Gradient Descent

  • Ruqi Zhang
  • Qiang Liu
  • Xin Tong

Sampling methods, as important inference and learning techniques, are typically designed for unconstrained domains. However, constraints are ubiquitous in machine learning problems, such as those on safety, fairness, robustness, and many other properties that must be satisfied to apply sampling results in real-life applications. Enforcing these constraints often leads to implicitly-defined manifolds, making efficient sampling with constraints very challenging. In this paper, we propose a new variational framework with a designed orthogonal-space gradient flow (O-Gradient) for sampling on a manifold $\mathcal{G}_0$ defined by general equality constraints. O-Gradient decomposes the gradient into two parts: one decreases the distance to $\mathcal{G}_0$ and the other decreases the KL divergence in the orthogonal space. While most existing manifold sampling methods require initialization on $\mathcal{G}_0$, O-Gradient does not require such prior knowledge. We prove that O-Gradient converges to the target constrained distribution with rate $\widetilde{O}(1/\text{the number of iterations})$ under mild conditions. Our proof relies on a new Stein characterization of conditional measure which could be of independent interest. We implement O-Gradient through both Langevin dynamics and Stein variational gradient descent and demonstrate its effectiveness in various experiments, including Bayesian deep neural networks.

ICRA Conference 2022 Conference Paper

SEHLNet: Separate Estimation of High- and Low-Frequency components for Depth Completion

  • Qiang Liu
  • Haosong Yue
  • Zhanggang Lyu
  • Wei Wang 0036
  • Zhong Liu 0005
  • Weihai Chen

Depth completion refers to inferring the dense depth map from a sparse depth map with or without corre-sponding color image. Numerous neural networks have been proposed to accomplish this task. However, insufficient uti-lization of heteromorphic data and the fact that predicted dense depth prefers a sparse depth enormously damage the performance of approaches. To reduce data preference and fully utilize two modalities, this paper proposes a novel network that predicts high- and low-frequency components of dense depth separately. Specifically, the framework consists of a Low-Frequency(LF) branch and a High-Frequency(HF) branch. In the LF branch, we recover the low-frequency depth component from sparse depth through an Adaptive Graph-Generate Graph Attention Network, which can be seen as a low-pass filter. In the HF branch, we model the high-frequency component, e. g. boundaries, as residuals to mitigate the impact of data preferences. Moreover, in this branch, we propose an Attention-based Self-Fusion mechanism to efficiently fuse multi-scale features extracted from the sparse depth and color image. Extensive experiments demonstrate that our approach achieves state-of-the-art performance on the KITTI benchmark and ranks 1st in root mean squared error among other published approaches.

NeurIPS Conference 2022 Conference Paper

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

  • Hangbo Bao
  • Wenhui Wang
  • Li Dong
  • Qiang Liu
  • Owais Khan Mohammed
  • Kriti Aggarwal
  • Subhojit Som
  • Songhao Piao

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. Specifically, we introduce Multiway Transformer, where each block contains a pool of modality-specific experts and a shared self-attention layer. Because of the modeling flexibility of Multiway Transformer, pretrained VLMo can be fine-tuned as a fusion encoder for vision-language classification tasks, or used as a dual encoder for efficient image-text retrieval. Moreover, we propose a stagewise pre-training strategy, which effectively leverages large-scale image-only and text-only data besides image-text pairs. Experimental results show that VLMo achieves state-of-the-art results on various vision-language tasks, including VQA, NLVR2 and image-text retrieval.

JMLR Journal 2021 Journal Article

A General Framework for Empirical Bayes Estimation in Discrete Linear Exponential Family

  • Trambak Banerjee
  • Qiang Liu
  • Gourab Mukherjee
  • Wengunag Sun

We develop a Nonparametric Empirical Bayes (NEB) framework for compound estimation in the discrete linear exponential family, which includes a wide class of discrete distributions frequently arising from modern big data applications. We propose to directly estimate the Bayes shrinkage factor in the generalized Robbins' formula via solving a convex program, which is carefully developed based on a RKHS representation of the Stein's discrepancy measure. The new NEB estimation framework is flexible for incorporating various structural constraints into the data driven rule, and provides a unified approach to compound estimation with both regular and scaled squared error losses. We develop theory to show that the class of NEB estimators enjoys strong asymptotic properties. Comprehensive simulation studies as well as analyses of real data examples are carried out to demonstrate the superiority of the NEB estimator over competing methods. [abs] [ pdf ][ bib ] &copy JMLR 2021. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

An Empirical Study of Graph Contrastive Learning

  • Yanqiao Zhu
  • Yichen Xu
  • Qiang Liu
  • Shu Wu

Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. Although remarkable progress has been witnessed recently, the success behind GCL is still left somewhat mysterious. In this work, we first identify several critical design considerations within a general GCL paradigm, including augmentation functions, contrasting modes, contrastive objectives, and negative mining strategies. Then, to understand the interplay of different GCL components, we conduct comprehensive, controlled experiments over benchmark tasks on datasets across various domains. Our empirical studies suggest a set of general receipts for effective GCL, e. g. , simple topology augmentations that produce sparse graph views bring promising performance improvements; contrasting modes should be aligned with the granularities of end tasks. In addition, to foster future research and ease the implementation of GCL algorithms, we develop an easy-to-use library PyGCL, featuring modularized CL components, standardized evaluation, and experiment management. We envision this work to provide useful empirical evidence of effective GCL algorithms and offer several insights for future research.

NeurIPS Conference 2021 Conference Paper

argmax centroid

  • Chengyue Gong
  • Mao Ye
  • Qiang Liu

We propose a general method to construct centroid approximation for the distribution of maximum points of a random function (a. k. a. argmax distribution), which finds broad applications in machine learning. Our method optimizes a set of centroid points to compactly approximate the argmax distribution with a simple objective function, without explicitly drawing exact samples from the argmax distribution. Theoretically, the argmax centroid method can be shown to minimize a surrogate of Wasserstein distance between the ground-truth argmax distribution and the centroid approximation under proper conditions. We demonstrate the applicability and effectiveness of our method on a variety of real-world multi-task learning applications, including few-shot image classification, personalized dialogue systems and multi-target domain adaptation.

NeurIPS Conference 2021 Conference Paper

Automatic and Harmless Regularization with Constrained and Lexicographic Optimization: A Dynamic Barrier Approach

  • Chengyue Gong
  • Xingchao Liu
  • Qiang Liu

Many machine learning tasks have to make a trade-off between two loss functions, typically the main data-fitness loss and an auxiliary loss. The most widely used approach is to optimize the linear combination of the objectives, which, however, requires manual tuning of the combination coefficient and is theoretically unsuitable for non-convex functions. In this work, we consider constrained optimization as a more principled approach for trading off two losses, with a special emphasis on lexicographic optimization, a degenerated limit of constrained optimization which optimizes a secondary loss inside the optimal set of the main loss. We propose a dynamic barrier gradient descent algorithm which provides a unified solution of both constrained and lexicographic optimization. We establish the convergence of the method for general non-convex functions.

NeurIPS Conference 2021 Conference Paper

Conflict-Averse Gradient Descent for Multi-task learning

  • Bo Liu
  • Xingchao Liu
  • Xiaojie Jin
  • Peter Stone
  • Qiang Liu

The goal of multi-task learning is to enable more efficient learning than single task learning by sharing model structures for a diverse set of tasks. A standard multi-task learning objective is to minimize the average loss across all tasks. While straightforward, using this objective often results in much worse final performance for each task than learning them independently. A major challenge in optimizing a multi-task model is the conflicting gradients, where gradients of different task objectives are not well aligned so that following the average gradient direction can be detrimental to specific tasks' performance. Previous work has proposed several heuristics to manipulate the task gradients for mitigating this problem. But most of them lack convergence guarantee and/or could converge to any Pareto-stationary point. In this paper, we introduce Conflict-Averse Gradient descent (CAGrad) which minimizes the average loss function, while leveraging the worst local improvement of individual tasks to regularize the algorithm trajectory. CAGrad balances the objectives automatically and still provably converges to a minimum over the average loss. It includes the regular gradient descent (GD) and the multiple gradient descent algorithm (MGDA) in the multi-objective optimization (MOO) literature as special cases. On a series of challenging multi-task supervised learning and reinforcement learning tasks, CAGrad achieves improved performance over prior state-of-the-art multi-objective gradient manipulation methods.

TIST Journal 2021 Journal Article

Disentangled Item Representation for Recommender Systems

  • Zeyu Cui
  • Feng Yu
  • Shu Wu
  • Qiang Liu
  • Liang Wang

Item representations in recommendation systems are expected to reveal the properties of items. Collaborative recommender methods usually represent an item as one single latent vector. Nowadays the e-commercial platforms provide various kinds of attribute information for items (e.g., category, price, and style of clothing). Utilizing this attribute information for better item representations is popular in recent years. Some studies use the given attribute information as side information, which is concatenated with the item latent vector to augment representations. However, the mixed item representations fail to fully exploit the rich attribute information or provide explanation in recommender systems. To this end, we propose a fine-grained Disentangled Item Representation (DIR) for recommender systems in this article, where the items are represented as several separated attribute vectors instead of a single latent vector. In this way, the items are represented at the attribute level, which can provide fine-grained information of items in recommendation. We introduce a learning strategy, LearnDIR, which can allocate the corresponding attribute vectors to items. We show how DIR can be applied to two typical models, Matrix Factorization (MF) and Recurrent Neural Network (RNN). Experimental results on two real-world datasets show that the models developed under the framework of DIR are effective and efficient. Even using fewer parameters, the proposed model can outperform the state-of-the-art methods, especially in the cold-start situation. In addition, we make visualizations to show that our proposition can provide explanation for users in real-world applications.

AAAI Conference 2021 Conference Paper

Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision

  • Xingchao Liu
  • Mao Ye
  • Dengyong Zhou
  • Qiang Liu

We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a fullprecision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an “effect of mixed precision” but without physical mixed precision implementations (which requires specialized hardware accelerators (Wang et al. 2019)). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.

NeurIPS Conference 2021 Conference Paper

Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent

  • Xingchao Liu
  • Xin Tong
  • Qiang Liu

Finding diverse and representative Pareto solutions from the Pareto front is a key challenge in multi-objective optimization (MOO). In this work, we propose a novel gradient-based algorithm for profiling Pareto front by using Stein variational gradient descent (SVGD). We also provide a counterpart of our method based on Langevin dynamics. Our methods iteratively update a set of points in a parallel fashion to push them towards the Pareto front using multiple gradient descent, while encouraging the diversity between the particles by using the repulsive force mechanism in SVGD, or diffusion noise in Langevin dynamics. Compared with existing gradient-based methods that require predefined preference functions, our method can work efficiently in high dimensional problems, and can obtain more diverse solutions evenly distributed in the Pareto front. Moreover, our methods are theoretically guaranteed to converge to the Pareto front. We demonstrate the effectiveness of our method, especially the SVGD algorithm, through extensive experiments, showing its superiority over existing gradient-based algorithms.

NeurIPS Conference 2021 Conference Paper

Sampling with Trusthworthy Constraints: A Variational Gradient Framework

  • Xingchao Liu
  • Xin Tong
  • Qiang Liu

Sampling-based inference and learning techniques, especially Bayesian inference, provide an essential approach to handling uncertainty in machine learning (ML). As these techniques are increasingly used in daily life, it becomes essential to safeguard the ML systems with various trustworthy-related constraints, such as fairness, safety, interpretability. Mathematically, enforcing these constraints in probabilistic inference can be cast into sampling from intractable distributions subject to general nonlinear constraints, for which practical efficient algorithms are still largely missing. In this work, we propose a family of constrained sampling algorithms which generalize Langevin Dynamics (LD) and Stein Variational Gradient Descent (SVGD) to incorporate a moment constraint specified by a general nonlinear function. By exploiting the gradient flow structure of LD and SVGD, we derive two types of algorithms for handling constraints, including a primal-dual gradient approach and the constraint controlled gradient descent approach. We investigate the continuous-time mean-field limit of these algorithms and show that they have O(1/t) convergence under mild conditions. Moreover, the LD variant converges linearly assuming that a log Sobolev like inequality holds. Various numerical experiments are conducted to demonstrate the efficiency of our algorithms in trustworthy settings.

NeurIPS Conference 2020 Conference Paper

Black-Box Certification with Randomized Smoothing: A Functional Optimization Based Framework

  • Dinghuai Zhang
  • Mao Ye
  • Chengyue Gong
  • Zhanxing Zhu
  • Qiang Liu

Randomized classifiers have been shown to provide a promising approach for achieving certified robustness against adversarial attacks in deep learning. However, most existing methods only leverage Gaussian smoothing noise and only work for $\ell_2$ perturbation. We propose a general framework of adversarial certification with non-Gaussian noise and for more general types of attacks, from a unified \functional optimization perspective. Our new framework allows us to identify a key trade-off between accuracy and robustness via designing smoothing distributions, helping to design new families of non-Gaussian smoothing distributions that work more efficiently for different $\ell_p$ settings, including $\ell_1$, $\ell_2$ and $\ell_\infty$ attacks. Our proposed methods achieve better certification results than previous works and provide a new perspective on randomized smoothing certification.

NeurIPS Conference 2020 Conference Paper

Certified Monotonic Neural Networks

  • Xingchao Liu
  • Xing Han
  • Na Zhang
  • Qiang Liu

Learning monotonic models with respect to a subset of the inputs is a desirable feature to effectively address the fairness, interpretability, and generalization issues in practice. Existing methods for learning monotonic neural networks either require specifically designed model structures to ensure monotonicity, which can be too restrictive/complicated, or enforce monotonicity by adjusting the learning process, which cannot provably guarantee the learned model is monotonic on selected features. In this work, we propose to certify the monotonicity of the general piece-wise linear neural networks by solving a mixed integer linear programming problem. This provides a new general approach for learning monotonic neural networks with arbitrary model structures. Our method allows us to train neural networks with heuristic monotonicity regularizations, and we can gradually increase the regularization magnitude until the learned network is certified monotonic. Compared to prior work, our method does not require human-designed constraints on the weight space and also yields more accurate approximation. Empirical studies on various datasets demonstrate the efficiency of our approach over the state-of-the-art methods, such as Deep Lattice Networks

NeurIPS Conference 2020 Conference Paper

Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks

  • Lemeng Wu
  • Bo Liu
  • Peter Stone
  • Qiang Liu

We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning. In particular, on a challenging continual image classification task, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.

NeurIPS Conference 2020 Conference Paper

Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough

  • Mao Ye
  • Lemeng Wu
  • Qiang Liu

Despite the great success of deep learning, recent works show that large deep neural networks are often highly redundant and can be significantly reduced in size. However, the theoretical question of how much we can prune a neural network given a specified tolerance of accuracy drop is still open. This paper provides one answer to this question by proposing a greedy optimization based pruning method. The proposed method has the guarantee that the discrepancy between the pruned network and the original network decays with exponentially fast rate w. r. t. the size of the pruned network, under weak assumptions that apply for most practical settings. Empirically, our method improves prior arts on pruning various network architectures including ResNet, MobilenetV2/V3 on ImageNet.

NeurIPS Conference 2020 Conference Paper

Implicit Regularization and Convergence for Weight Normalization

  • Xiaoxia Wu
  • Edgar Dobriban
  • Tongzheng Ren
  • Shanshan Wu
  • Zhiyuan Li
  • Suriya Gunasekar
  • Rachel Ward
  • Qiang Liu

Normalization methods such as batch, weight, instance, and layer normalization are commonly used in modern machine learning. Here, we study the weight normalization (WN) method \cite{salimans2016weight} and a variant called reparametrized projected gradient descent (rPGD) for overparametrized least squares regression and some more general loss functions. WN and rPGD reparametrize the weights with a scale $g$ and a unit vector such that the objective function becomes \emph{non-convex}. We show that this non-convex formulation has beneficial regularization effects compared to gradient descent on the original objective. These methods adaptively regularize the weights and \emph{converge linearly} close to the minimum $\ell_2$ norm solution even for initializations far from zero. For certain two-phase variants, they can converge to the min norm solution. This is different from the behavior of gradient descent, which only converges to the min norm solution when started at zero, and thus more sensitive to initialization.

NeurIPS Conference 2020 Conference Paper

Off-Policy Interval Estimation with Lipschitz Value Iteration

  • Ziyang Tang
  • Yihao Feng
  • Na Zhang
  • Jian Peng
  • Qiang Liu

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is essential to provide provably correct upper and lower bounds of the expected reward, not just a classical single point estimate, to the end-users, as executing a poor policy can be very costly. In this work, we propose a provably correct method for obtaining interval bounds for off-policy evaluation in a general continuous setting. The idea is to search for the maximum and minimum values of the expected reward among all the Lipschitz Q-functions that are consistent with the observations, which amounts to solving a constrained optimization problem on a Lipschitz function space. We go on to introduce a Lipschitz value iteration method to monotonically tighten the interval, which is simple yet efficient and provably convergent. We demonstrate the practical efficiency of our method on a range of benchmarks.

NeurIPS Conference 2020 Conference Paper

Stein Self-Repulsive Dynamics: Benefits From Past Samples

  • Mao Ye
  • Tongzheng Ren
  • Qiang Liu

We propose a new Stein self-repulsive dynamics for obtaining diversified samples from intractable un-normalized distributions. Our idea is to introduce Stein variational gradient as a repulsive force to push the samples of Langevin dynamics away from the past trajectories. This simple idea allows us to significantly decrease the auto-correlation in Langevin dynamics and hence increase the effective sample size. Importantly, as we establish in our theoretical analysis, the asymptotic stationary distribution remains correct even with the addition of the repulsive force, thanks to the special properties of the Stein variational gradient. We perform extensive empirical studies of our new algorithm, showing that our method yields much higher sample efficiency and better uncertainty estimation than vanilla Langevin dynamics.

NeurIPS Conference 2019 Conference Paper

A Kernel Loss for Solving the Bellman Equation

  • Yihao Feng
  • Lihong Li
  • Qiang Liu

Value function learning plays a central role in many state-of-the-art reinforcement learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variants of Bellman operator that are not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, using either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks, including classic problems where standard algorithms are known to diverge.

NeurIPS Conference 2019 Conference Paper

Exploration via Hindsight Goal Generation

  • Zhizhou Ren
  • Kefan Dong
  • Yuan Zhou
  • Qiang Liu
  • Jian Peng

Goal-oriented reinforcement learning has recently been a practical framework for robotic manipulation tasks, in which an agent is required to reach a certain goal defined by a function on the state space. However, the sparsity of such reward definition makes traditional reinforcement learning algorithms very inefficient. Hindsight Experience Replay (HER), a recent advance, has greatly improved sample efficiency and practical applicability for such problems. It exploits previous replays by constructing imaginary goals in a simple heuristic way, acting like an implicit curriculum to alleviate the challenge of sparse reward signal. In this paper, we introduce Hindsight Goal Generation (HGG), a novel algorithmic framework that generates valuable hindsight goals which are easy for an agent to achieve in the short term and are also potential for guiding the agent to reach the actual goal in the long term. We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efficiency.

NeurIPS Conference 2019 Conference Paper

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

  • Colin Wei
  • Jason Lee
  • Qiang Liu
  • Tengyu Ma

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard $\ell_2$ regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in $d$ dimensions which the optimal regularized neural net learns with $O(d)$ samples but the NTK requires $\Omega(d^2)$ samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new technique for proving lower bounds for kernel methods, which relies on showing that the kernel cannot focus on informative features. Motivated by our generalization results, we study whether the regularized global optimum is attainable. We prove that for infinite-width two-layer nets, noisy gradient descent optimizes the regularized neural net loss to a global minimum in polynomial iterations.

AAAI Conference 2019 Conference Paper

Robustness Can Be Cheap: A Highly Efficient Approach to Discover Outliers under High Outlier Ratios

  • Siqi Wang
  • En Zhu
  • Xiping Hu
  • Xinwang Liu
  • Qiang Liu
  • Jianping Yin
  • Fei Wang

Efficient detection of outliers from massive data with a high outlier ratio is challenging but not explicitly discussed yet. In such a case, existing methods either suffer from poor robustness or require expensive computations. This paper proposes a Low-rank based Efficient Outlier Detection (LEOD) framework to achieve favorable robustness against high outlier ratios with much cheaper computations. Specifically, it is worth highlighting the following aspects of LEOD: (1) Our framework exploits the low-rank structure embedded in the similarity matrix and considers inliers/outliers equally based on this low-rank structure, which facilitates us to encourage satisfying robustness with low computational cost later; (2) A novel re-weighting algorithm is derived as a new general solution to the constrained eigenvalue problem, which is a major bottleneck for the optimization process. Instead of the high space and time complexity (O((2n)2 )/O((2n)3 )) required by the classic solution, our algorithm enjoys O(n) space complexity and a faster optimization speed in the experiments; (3) A new alternative formulation is proposed for further acceleration of the solution process, where a cheap closed-form solution can be obtained. Experiments show that LEOD achieves strong robustness under an outlier ratio from 20% to 60%, while it is at most 100 times more memory efficient and 1000 times faster than its previous counterpart that attains comparable performance. The codes of LEOD are publicly available at https: //github. com/demonzyj56/LEOD.

NeurIPS Conference 2019 Conference Paper

Splitting Steepest Descent for Growing Neural Architectures

  • Lemeng Wu
  • Dilin Wang
  • Qiang Liu

We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a \emph{splitting gradient} for optimally updating the off-springs. Theoretically, our splitting strategy is a second order functional steepest descent for escaping saddle points in an $\Linfty$-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings.

NeurIPS Conference 2019 Conference Paper

Stein Variational Gradient Descent With Matrix-Valued Kernels

  • Dilin Wang
  • Ziyang Tang
  • Chandrajit Bajaj
  • Qiang Liu

Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference. In this work, we enhance SVGD by leveraging preconditioning matrices, such as the Hessian and Fisher information matrix, to incorporate geometric information into SVGD updates. We achieve this by presenting a generalization of SVGD that replaces the scalar-valued kernels in vanilla SVGD with more general matrix-valued kernels. This yields a significant extension of SVGD, and more importantly, allows us to flexibly incorporate various preconditioning matricesto accelerate the exploration in the probability landscape. Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks.

NeurIPS Conference 2018 Conference Paper

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

  • Qiang Liu
  • Lihong Li
  • Ziyang Tang
  • Dengyong Zhou

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators. Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

IJCAI Conference 2018 Conference Paper

Efficient Localized Inference for Large Graphical Models

  • Jinglin Chen
  • Jian Peng
  • Qiang Liu

We propose a new localized inference algorithm for answering marginalization queries in large graphical models with the correlation decay property. Given a query variable and a large graphical model, we define a much smaller model in a local region around the query variable in the target model so that the marginal distribution of the query variable can be accurately approximated. We introduce two approximation error bounds based on the Dobrushin’s comparison theorem and apply our bounds to derive a greedy expansion algorithm that efficiently guides the selection of neighbor nodes for localized inference. We verify our theoretical bounds on various datasets and demonstrate that our localized inference algorithm can provide fast and accurate approximation for large graphical models.

IJCAI Conference 2018 Conference Paper

Energy-efficient Amortized Inference with Cascaded Deep Classifiers

  • Jiaqi Guan
  • Yang Liu
  • Qiang Liu
  • Jian Peng

Deep neural networks have been remarkable successful in various AI tasks but often cast high computation and energy cost for energy-constrained applications such as mobile sensing. We address this problem by proposing a novel framework that optimizes the prediction accuracy and energy cost simultaneously, thus enabling effective cost-accuracy trade-off at test time. In our framework, each data instance is pushed into a cascade of deep neural networks with increasing sizes, and a selection module is used to sequentially determine when a sufficiently accurate classifier can be used for this data instance. The cascade of neural networks and the selection module are jointly trained in an end-to-end fashion by the REINFORCE algorithm to optimize a trade-off between the computational cost and the predictive accuracy. Our method is able to simultaneously improve the accuracy and efficiency by learning to assign easy instances to fast yet sufficiently accurate classifiers to save computation and energy cost, while assigning harder instances to deeper and more powerful classifiers to ensure satisfiable accuracy. Moreover, we demonstrate our method's effectiveness with extensive experiments on CIFAR-10/100, ImageNet32x32 and original ImageNet dataset.

ICML Conference 2018 Conference Paper

Goodness-of-fit Testing for Discrete Distributions via Stein Discrepancy

  • Jiasen Yang
  • Qiang Liu
  • Vinayak A. Rao
  • Jennifer Neville

Recent work has combined Stein’s method with reproducing kernel Hilbert space theory to develop nonparametric goodness-of-fit tests for un-normalized probability distributions. However, the currently available tests apply exclusively to distributions with smooth density functions. In this work, we introduce a kernelized Stein discrepancy measure for discrete spaces, and develop a nonparametric goodness-of-fit test for discrete distributions with intractable normalization constants. Furthermore, we propose a general characterization of Stein operators that encompasses both discrete and continuous distributions, providing a recipe for constructing new Stein operators. We apply the proposed goodness-of-fit test to three statistical models involving discrete distributions, and our experiments show that the proposed test typically outperforms a two-sample test based on the maximum mean discrepancy.

TIST Journal 2018 Journal Article

Mining Significant Microblogs for Misinformation Identification

  • Qiang Liu
  • Feng Yu
  • Shu Wu
  • Liang Wang

With the rapid growth of social media, massive misinformation is also spreading widely on social media, e.g., Weibo and Twitter, and brings negative effects to human life. Today, automatic misinformation identification has drawn attention from academic and industrial communities. Whereas an event on social media usually consists of multiple microblogs, current methods are mainly constructed based on global statistical features. However, information on social media is full of noise, which should be alleviated. Moreover, most of the microblogs about an event have little contribution to the identification of misinformation, where useful information can be easily overwhelmed by useless information. Thus, it is important to mine significant microblogs for constructing a reliable misinformation identification method. In this article, we propose an attention-based approach for identification of misinformation (AIM). Based on the attention mechanism, AIM can select microblogs with the largest attention values for misinformation identification. The attention mechanism in AIM contains two parts: content attention and dynamic attention. Content attention is the calculated-based textual features of each microblog. Dynamic attention is related to the time interval between the posting time of a microblog and the beginning of the event. To evaluate AIM, we conduct a series of experiments on the Weibo and Twitter datasets, and the experimental results show that the proposed AIM model outperforms the state-of-the-art methods.

IJCAI Conference 2018 Conference Paper

Multi-agent Epistemic Planning with Common Knowledge

  • Qiang Liu
  • Yongmei Liu

In the past decade, multi-agent epistemic planning has received much attention from both dynamic logic and planning communities. Common knowledge is an essential part of multi-agent modal logics, and plays an important role in coordination and interaction of multiple agents. However, existing implementations of multi-agent epistemic planning provide very limited support for common knowledge, basically static propositional common knowledge. Our work aims to extend an existing multi-agent epistemic planning framework based on higher-order belief change with the capability to deal with common knowledge. We propose a novel normal form for multi-agent KD45 logic with common knowledge. We propose satisfiability solving, revision and update algorithms for this normal form. Based on our algorithms, we implemented a multi-agent epistemic planner with common knowledge called MEPC. Our planner successfully generated solutions for several domains that demonstrate the typical usage of common knowledge.

NeurIPS Conference 2018 Conference Paper

Stein Variational Gradient Descent as Moment Matching

  • Qiang Liu
  • Dilin Wang

Stein variational gradient descent (SVGD) is a non-parametric inference algorithm that evolves a set of particles to fit a given distribution of interest. We analyze the non-asymptotic properties of SVGD, showing that there exists a set of functions, which we call the Stein matching set, whose expectations are exactly estimated by any set of particles that satisfies the fixed point equation of SVGD. This set is the image of Stein operator applied on the feature maps of the positive definite kernel used in SVGD. Our results provide a theoretical framework for analyzing the properties of SVGD with different kernels, shedding insight into optimal kernel choice. In particular, we show that SVGD with linear kernels yields exact estimation of means and variances on Gaussian distributions, while random Fourier features enable probabilistic bounds for distributional approximation. Our results offer a refreshing view of the classical inference problem as fitting Stein’s identity or solving the Stein equation, which may motivate more efficient algorithms.

NeurIPS Conference 2018 Conference Paper

Variational Inference with Tail-adaptive f-Divergence

  • Dilin Wang
  • Hao Liu
  • Qiang Liu

Variational inference with α-divergences has been widely used in modern probabilistic machine learning. Compared to Kullback-Leibler (KL) divergence, a major advantage of using α-divergences (with positive α values) is their mass-covering property. However, estimating and optimizing α-divergences require to use importance sampling, which could have extremely large or infinite variances due to heavy tails of importance weights. In this paper, we propose a new class of tail-adaptive f-divergences that adaptively change the convex function f with the tail of the importance weights, in a way that theoretically guarantee finite moments, while simultaneously achieving mass-covering properties. We test our methods on Bayesian neural networks, as well as deep reinforcement learning in which our method is applied to improve a recent soft actor-critic (SAC) algorithm (Haarnoja et al. , 2018). Our results show that our approach yields significant advantages compared with existing methods based on classical KL and α-divergences.

IJCAI Conference 2017 Conference Paper

A Convolutional Approach for Misinformation Identification

  • Feng Yu
  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Tieniu Tan

The fast expanding of social media fuels the spreading of misinformation which disrupts people's normal lives. It is urgent to achieve goals of misinformation identification and early detection in social media. In dynamic and complicated social media scenarios, some conventional methods mainly concentrate on feature engineering which fail to cover potential features in new scenarios and have difficulty in shaping elaborate high-level interactions among significant features. Moreover, a recent Recurrent Neural Network (RNN) based method suffers from deficiencies that it is not qualified for practical early detection of misinformation and poses a bias to the latest input. In this paper, we propose a novel method, Convolutional Approach for Misinformation Identification (CAMI) based on Convolutional Neural Network (CNN). CAMI can flexibly extract key features scattered among an input sequence and shape high-level interactions among significant features, which help effectively identify misinformation and achieve practical early detection. Experiment results on two large-scale datasets validate the effectiveness of CAMI model on both misinformation identification and early detection tasks.

JMLR Journal 2017 Journal Article

Communication-efficient Sparse Regression

  • Jason D. Lee
  • Qiang Liu
  • Yuekai Sun
  • Jonathan E. Taylor

We devise a communication-efficient approach to distributed sparse regression in the high-dimensional setting. The key idea is to average debiased or desparsified lasso estimators. We show the approach converges at the same rate as the lasso as long as the dataset is not split across too many machines, and consistently estimates the support under weaker conditions than the lasso. On the computational side, we propose a new parallel and computationally-efficient algorithm to compute the approximate inverse covariance required in the debiasing approach, when the dataset is split across samples. We further extend the approach to generalized linear models. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

AAAI Conference 2017 Conference Paper

Reference Based LSTM for Image Captioning

  • Minghai Chen
  • Guiguang Ding
  • Sicheng Zhao
  • Hui Chen
  • Qiang Liu
  • Jungong Han

Image captioning is an important problem in artificial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difficult to find which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might fix the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.

NeurIPS Conference 2017 Conference Paper

Stein Variational Gradient Descent as Gradient Flow

  • Qiang Liu

Stein variational gradient descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate given distributions, based on a gradient-based update constructed to optimally decrease the KL divergence within a function space. This paper develops the first theoretical analysis on SVGD. We establish that the empirical measures of the SVGD samples weakly converge to the target distribution, and show that the asymptotic behavior of SVGD is characterized by a nonlinear Fokker-Planck equation known as Vlasov equation in physics. We develop a geometric perspective that views SVGD as a gradient flow of the KL divergence functional under a new metric structure on the space of distributions induced by Stein operator.

NeurIPS Conference 2016 Conference Paper

Bootstrap Model Aggregation for Distributed Statistical Learning

  • Jun Han
  • Qiang Liu

In distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A simple method is to linearly average the parameters of the local models, which, however, tends to be degenerate or not applicable on non-convex models, or models with different parameter dimensions. One more practical strategy is to generate bootstrap samples from the local models, and then learn a joint model based on the combined bootstrap set. Unfortunately, the bootstrap procedure introduces additional noise and can significantly deteriorate the performance. In this work, we propose two variance reduction methods to correct the bootstrap noise, including a weighted M-estimator that is both statistically efficient and practically powerful. Both theoretical and empirical analysis is provided to demonstrate our methods.

UAI Conference 2016 Conference Paper

Importance Weighted Consensus Monte Carlo for Distributed Bayesian Inference

  • Qiang Liu

The recent explosion in big data has created a significant challenge for efficient and scalable Bayesian inference. In this paper, we consider a divide-and-conquer setting in which the data is partitioned into different subsets with communication constraints, and a proper combination strategy is used to aggregate the Monte Carlo samples drawn from the local posteriors based on the dataset subsets. We propose a new importance weighted consensus Monte Carlo method for efficient Bayesian inference in this setting. Our method outperforms the previous one-shot combination strategies in terms of accuracy, and is more computation- and communicationefficient than the previous iterative combination methods that require iterative re-sampling and communication steps. We provide two practical versions of our approach, and illustrate their properties both theoretically and empirically.

AAAI Conference 2016 Conference Paper

Information Credibility Evaluation on Social Media

  • Shu Wu
  • Qiang Liu
  • Yong Liu
  • Liang Wang
  • Tieniu Tan

With the growth of social media, rumors are spread fast and viewed by more and more people on the Internet. Rumors bring significant harm to daily life and public security. It is crucial to evaluate the credibility of information and detect the rumors on social media automatically. In this work, we establish a Network Information Credibility Evaluation (NICE) platform, which collects a database of rumors that have been verified on Sina Weibo and automatically evaluates the information which is generated by users on social media but has not been verified. Users can use a query to search related information. If the according information appears in our database, users can identify it is a rumor immediately. Otherwise, NICE will show users with realtime results crawled automatically from social media and can calculate credibility of a specific result with our algorithm. Our algorithm learns dynamic representations for information on social media based on behavior information, dynamic information, user information and comment information. Then, we use an ordinary logistic regression to classify information into rumors and non-rumors. Based on our algorithm, NICE system achieves satisfactory performance on evaluating information credibility and detecting rumors on social media.

NeurIPS Conference 2016 Conference Paper

Learning Infinite RBMs with Frank-Wolfe

  • Wei Ping
  • Qiang Liu
  • Alexander Ihler

In this work, we propose an infinite restricted Boltzmann machine (RBM), whose maximum likelihood estimation (MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.

AAAI Conference 2016 Conference Paper

Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts

  • Qiang Liu
  • Shu Wu
  • Liang Wang
  • Tieniu Tan

Spatial and temporal contextual information plays a key role for analyzing user behaviors, and is helpful for predicting where he or she will go next. With the growing ability of collecting information, more and more temporal and spatial contextual information is collected in systems, and the location prediction problem becomes crucial and feasible. Some works have been proposed to address this problem, but they all have their limitations. Factorizing Personalized Markov Chain (FPMC) is constructed based on a strong independence assumption among different factors, which limits its performance. Tensor Factorization (TF) faces the cold start problem in predicting future actions. Recurrent Neural Networks (RNN) model shows promising performance comparing with PFMC and TF, but all these methods have problem in modeling continuous time interval and geographical distance. In this paper, we extend RNN and propose a novel method called Spatial Temporal Recurrent Neural Networks (ST-RNN). ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances. Experimental results show that the proposed ST-RNN model yields significant improvements over the competitive compared methods on two typical datasets, i. e. , Global Terrorism Database (GTD) and Gowalla dataset.

AAAI Conference 2016 Conference Paper

SAPE: A System for Situation-Aware Public Security Evaluation

  • Shu Wu
  • Qiang Liu
  • Ping Bai
  • Liang Wang
  • Tieniu Tan

Public security events are occurring all over the world, bringing threat to personal and property safety, and homeland security. It is vital to construct an effective model to evaluate and predict the public security. In this work, we establish a Situation-Aware Public Security Evaluation (SAPE) platform. Based on conventional Recurrent Neural Networks (RNN), we develop a new variant for temporal contexts in public security event datasets. This model can achieve better performance than the compared state-of-the-art methods. SAPE has two demonstrations, i. e. , global public security evaluation and China public security evaluation. In the global part, based on Global Terrorism Database from UMD, for each country, SAPE can predict risk level and top-n potential terrorist organizations which might attack the country. Users can also view the actual attacking organizations and predicted results. For each province in China, SAPE can predict the risk level and the probability scores of different types of events in the next month. Users can also view the actual numbers of events and predicted risk levels of the past one year.

NeurIPS Conference 2016 Conference Paper

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

  • Qiang Liu
  • Dilin Wang

We propose a general purpose variational inference algorithm that forms a natural counterpart of gradient descent for optimization. Our method iteratively transports a set of particles to match the target distribution, by applying a form of functional gradient descent that minimizes the KL divergence. Empirical studies are performed on various real world models and datasets, on which our method is competitive with existing state-of-the-art methods. The derivation of our method is based on a new theoretical result that connects the derivative of KL divergence under smooth transforms with Stein’s identity and a recently proposed kernelized Stein discrepancy, which is of independent interest.

AAAI Conference 2015 Conference Paper

COT: Contextual Operating Tensor for Context-Aware Recommender Systems

  • Qiang Liu
  • Shu Wu
  • Liang Wang

With rapid growth of information on the internet, recommender systems become fundamental for helping users alleviate the problem of information overload. Since contextual information can be used as a significant factor in modeling user behavior, various contextaware recommendation methods are proposed. However, the state-of-the-art context modeling methods treat contexts as other dimensions similar to the dimensions of users and items, and cannot capture the special semantic operation of contexts. On the other hand, some works on multi-domain relation prediction can be used for the context-aware recommendation, but they have problems in generating recommendation under a large amount of contextual information. In this work, we propose Contextual Operating Tensor (COT) model, which represents the common semantic effects of contexts as a contextual operating tensor and represents a context as a latent vector. Then, to model the semantic operation of a context combination, we generate contextual operating matrix from the contextual operating tensor and latent vectors of contexts. Thus latent vectors of users and items can be operated by the contextual operating matrices. Experimental results show that the proposed COT model yields significant improvements over the competitive compared methods on three typical datasets, i. e. , Food, Adom and Movielens-1M datasets.

NeurIPS Conference 2015 Conference Paper

Decomposition Bounds for Marginal MAP

  • Wei Ping
  • Qiang Liu
  • Alexander Ihler

Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information. It is significantly more difficult than pure marginalization and MAP tasks, for which a large class of efficient and convergent variational algorithms, such as dual decomposition, exist. In this work, we generalize dual decomposition to a generic powered-sum inference task, which includes marginal MAP, along with pure marginalization and MAP, as special cases. Our method is based on a block coordinate descent algorithm on a new convex decomposition bound, that is guaranteed to converge monotonically, and can be parallelized efficiently. We demonstrate our approach on various inference queries over real-world problems from the UAI approximate inference challenge, showing that our framework is faster and more reliable than previous methods.

NeurIPS Conference 2015 Conference Paper

Probabilistic Variational Bounds for Graphical Models

  • Qiang Liu
  • John Fisher III
  • Alexander Ihler

Variational algorithms such as tree-reweighted belief propagation can provide deterministic bounds on the partition function, but are often loose and difficult to use in an any-time'' fashion, expending more computation for tighter bounds. On the other hand, Monte Carlo estimators such as importance sampling have excellent any-time behavior, but depend critically on the proposal distribution. We propose a simple Monte Carlo based inference method that augments convex variational bounds by adding importance sampling (IS). We argue that convex variational methods naturally provide good IS proposals that cover the probability of the target distribution, and reinterpret the variational optimization as designing a proposal to minimizes an upper bound on the variance of our IS estimator. This both provides an accurate estimator and enables the construction of any-time probabilistic bounds that improve quickly and directly on state of-the-art variational bounds, which provide certificates of accuracy given enough samples relative to the error in the initial bound.

NeurIPS Conference 2014 Conference Paper

Distributed Estimation, Information Loss and Exponential Families

  • Qiang Liu
  • Alexander Ihler

Distributed learning of probabilistic models from multiple data repositories with minimum communication is increasingly important. We study a simple communication-efficient learning framework that first calculates the local maximum likelihood estimates (MLE) based on the data subsets, and then combines the local MLEs to achieve the best possible approximation to the global MLE, based on the whole dataset jointly. We study the statistical properties of this framework, showing that the loss of efficiency compared to the global setting relates to how much the underlying distribution families deviate from full exponential families, drawing connection to the theory of information loss by Fisher, Rao and Efron. We show that the full-exponential-family-ness" represents the lower bound of the error rate of arbitrary combinations of local MLEs, and is achieved by a KL-divergence-based combination method but not by a more common linear combination method. We also study the empirical properties of the KL and linear combination methods, showing that the KL method significantly outperforms linear combination in practical settings with issues such as model misspecification, non-convexity, and heterogeneous data partitions. "

IS Journal 2013 Journal Article

Extreme Learning Machines [Trends & Controversies]

  • Erik Cambria
  • Guang-Bin Huang
  • Liyanaarachchi Lekamalage Chamara Kasun
  • Hongming Zhou
  • Chi Man Vong
  • Jiarun Lin
  • Jianping Yin
  • Zhiping Cai

This special issue includes eight original works that detail the further developments of ELMs in theories, applications, and hardware implementation. In "Representational Learning with ELMs for Big Data, " Liyanaarachchi Lekamalage Chamara Kasun, Hongming Zhou, Guang-Bin Huang, and Chi Man Vong propose using the ELM as an auto-encoder for learning feature representations using singular values. In "A Secure and Practical Mechanism for Outsourcing ELMs in Cloud Computing, " Jiarun Lin, Jianping Yin, Zhiping Cai, Qiang Liu, Kuan Li, and Victor C. M. Leung propose a method for handling large data applications by outsourcing to the cloud that would dramatically reduce ELM training time. In "ELM-Guided Memetic Computation for Vehicle Routing, " Liang Feng, Yew-Soon Ong, and Meng-Hiot Lim consider the ELM as an engine for automating the encapsulation of knowledge memes from past problem-solving experiences. In "ELMVIS: A Nonlinear Visualization Technique Using Random Permutations and ELMs, " Anton Akusok, Amaury Lendasse, Rui Nian, and Yoan Miche propose an ELM method for data visualization based on random permutations to map original data and their corresponding visualization points. In "Combining ELMs with Random Projections, " Paolo Gastaldo, Rodolfo Zunino, Erik Cambria, and Sergio Decherchi analyze the relationships between ELM feature-mapping schemas and the paradigm of random projections. In "Reduced ELMs for Causal Relation Extraction from Unstructured Text, " Xuefeng Yang and Kezhi Mao propose combining ELMs with neuron selection to optimize the neural network architecture and improve the ELM ensemble's computational efficiency. In "A System for Signature Verification Based on Horizontal and Vertical Components in Hand Gestures, " Beom-Seok Oh, Jehyoung Jeon, Kar-Ann Toh, Andrew Beng Jin Teoh, and Jaihie Kim propose a novel paradigm for hand signature biometry for touchless applications without the need for handheld devices. Finally, in "An Adaptive and Iterative Online Sequential ELM-Based Multi-Degree-of-Freedom Gesture Recognition System, " Hanchao Yu, Yiqiang Chen, Junfa Liu, and Guang-Bin Huang propose an online sequential ELM-based efficient gesture recognition algorithm for touchless human-machine interaction.

NeurIPS Conference 2013 Conference Paper

Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?

  • Qiang Liu
  • Alexander Ihler
  • Mark Steyvers

We study the problem of estimating continuous quantities, such as prices, probabilities, and point spreads, using a crowdsourcing approach. A challenging aspect of combining the crowd's answers is that workers' reliabilities and biases are usually unknown and highly diverse. Control items with known answers can be used to evaluate workers' performance, and hence improve the combined results on the target items with unknown answers. This raises the problem of how many control items to use when the total number of items each workers can answer is limited: more control items evaluates the workers better, but leaves fewer resources for the target items that are of direct interest, and vice versa. We give theoretical results for this problem under different scenarios, and provide a simple rule of thumb for crowdsourcing practitioners. As a byproduct, we also provide theoretical analysis of the accuracy of different consensus methods.

JMLR Journal 2013 Journal Article

Using Symmetry and Evolutionary Search to Minimize Sorting Networks

  • Qiang Liu
  • Alexander Ihler

The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of 'mixed-product' message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel 'argmax-product' message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms significantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )

NeurIPS Conference 2013 Conference Paper

Variational Planning for Graph-based MDPs

  • Qiang Cheng
  • Qiang Liu
  • Feng Chen
  • Alexander Ihler

Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at finding good policies.

NeurIPS Conference 2012 Conference Paper

Variational Inference for Crowdsourcing

  • Qiang Liu
  • Jian Peng
  • Alexander Ihler

Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al, while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers' reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions.