Arrow Research search

Author name cluster

Quanming Yao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

47 papers
2 author rows

Possible papers

47

AAAI Conference 2026 Conference Paper

Efficient Reinforcement Learning for Zero-Shot Coordination in Evolving Games

  • Bingyu Hui
  • Lebin Yu
  • Quanming Yao
  • Yunpeng Qu
  • Xudong Zhang
  • Jian Wang

Zero-shot coordination(ZSC), a key challenge in multi-agent game theory, has become a hot topic in reinforcement learning (RL) research recently, especially in complex evolving games. It focuses on the generalization ability of agents, requiring them to coordinate well with collaborators from a diverse, potentially evolving, pool of partners that are not seen before without any fine-tuning. Population-based training, which approximates such an evolving partner pool, has been proven to provide good zero-shot coordination performance; nevertheless, existing methods are limited by computational resources, mainly focusing on optimizing diversity in small populations while neglecting the potential performance gains from scaling population size. To address this issue, this paper proposes the Scalable Population Training (ScaPT), an efficient RL training framework comprising two key components: a meta-agent that efficiently realizes a population by selectively sharing parameters across agents, and a mutual information regularizer that guarantees population diversity. To empirically validate the effectiveness of ScaPT, this paper evaluates it along with representational frameworks in Hanabi cooperative game and confirms its superiority.

NeurIPS Conference 2025 Conference Paper

Adaptive Preference Arithmetic: A Personalized Agent with Adaptive Preference Arithmetic for Dynamic Preference Modeling

  • Hongyi Nie
  • Yaqing Wang
  • Mingyang Zhou
  • Feiyang Pan
  • Quanming Yao
  • Zhen Wang

As large language models (LLMs) are increasingly used as personalized user assistants, effectively adapting to users' evolving preferences is critical for delivering high-quality personalized responses. While user preferences are often stable in content, their relative strengths shift over time due to changing goals and contexts. Therefore, modeling these dynamic preference strengths can enable finer-grained personalization. However, current methods face two major challenges: (i) limited user feedback makes it difficult to estimate preference strengths accurately, and (ii) natural language ambiguity limits the controllability of preference-guided generation. To address these issues, we propose AdaPA-Agent, a LLM-agent personalization framework that models dynamic preference strengths via Adaptive Preference Arithmetic. First, instead of requiring additional user feedback, AdaPA-Agent employs an alignment-based strength estimation module to estimate the strength of user preferences from the existing user-agent interaction. Then, it guides controllable personalized generation by linearly combining next-token distributions, weighted by the estimated strengths of individual preferences. Experiments on two personalization tasks-conversational recommendation and personalized web interaction-demonstrate that AdaPA-Agent better aligning with users' changing intents, and has achieved over 18. 9\% and 14. 2\% improvements compared to ReAct, the widely-used agent framework.

IJCAI Conference 2025 Conference Paper

Automated Decision-Making on Networks with LLMs through Knowledge-Guided Evolution

  • Xiaohan Zheng
  • Lanning Wei
  • Yong Li
  • Quanming Yao

Effective decision-making on networks often relies on learning from graph-structured data, where Graph Neural Networks (GNNs) play a central role, but they take efforts to configure and tune. In this demo, we propose LLMNet, showing how to design GNN automated through Large Language Models. Our system develops a set of agents that construct graph-related knowlege bases and then leverages Retrieval-Augmented Generation (RAG) to support automated configuration and refinement of GNN models through a knowledge-guided evolution process. These agents, equipped with specialized knowledge bases, extract insights into tasks and graph structures by interacting with the knowledge bases. Empirical results show LLMNet excels in twelve datasets across three graph learning tasks, validating its effectiveness of GNN model designing.

ICLR Conference 2025 Conference Paper

Curriculum-aware Training for Discriminating Molecular Property Prediction Models

  • Hansi Yang
  • Quanming Yao
  • James T. Kwok

Despite their wide application across various fields, current molecular property prediction models struggle with the challenge of activity cliff, which refers to the situation where molecules with similar chemical structures display remarkable different properties. This phenomenon hinders existing models' ability to learn distinctive representations for molecules with similar chemical structures, and results in inaccurate predictions on molecules with activity cliff. To address this limitation, we first present empirical evidence demonstrating the ineffectiveness of standard training pipelines on molecules with activity cliff. We propose a novel approach that reformulates molecular property prediction as a node classification problem, introducing two innovative tasks at both the node and edge levels to improve learning outcomes for these challenging molecules with activity cliff. Our method is versatile, allowing seamless integration with a variety of base models, whether pre-trained or randomly initialized. Extensive evaluation across different molecular property prediction datasets validate the effectiveness of our approach.

ICLR Conference 2025 Conference Paper

Erasing Concept Combination from Text-to-Image Diffusion Model

  • Hongyi Nie
  • Quanming Yao
  • Yang Liu
  • Zhen Wang 0004
  • Yatao An Bian

Advancements in the text-to-image diffusion model have raised security concerns due to their potential to generate images with inappropriate themes such as societal biases and copyright infringements. Current studies have made notable progress in preventing the model from generating images containing specific high-risk visual concepts. However, these methods neglect the issue that inappropriate themes may also arise from the combination of benign visual concepts. A crucial challenge arises because the same image theme can be represented through multiple distinct visual concept combinations, and the model's ability to generate individual concepts may become distorted when processing these combinations. Consequently, effectively erasing such visual concept combinations from the diffusion model remains a formidable challenge. To tackle this problem, we formalize the problem as the Concept Combination Erasing (CCE) problem and propose a Concept Graph-based high-level Feature Decoupling framework (CoGFD) to address CCE. CoGFD identifies and decomposes visual concept combinations with a consistent image theme from an LLM-induced concept logic graph, and erases these combinations through decoupling co-occurrent high-level features. These techniques enable CoGFD to eliminate undesirable visual concept combinations while minimizing adverse effects on the generative fidelity of related individual concepts, outperforming state-of-the-art baselines. Extensive experiments across diverse visual concept combination scenarios verify the effectiveness of CoGFD.

ICML Conference 2025 Conference Paper

Hierarchical Graph Tokenization for Molecule-Language Alignment

  • Yongqiang Chen 0002
  • Quanming Yao
  • Juzheng Zhang
  • James Cheng
  • Yatao An Bian

Recently, there has been a surge of interest in extending the success of large language models (LLMs) from texts to molecules. Most existing approaches adopt a graph neural network to represent a molecule as a series of node tokens for molecule-language alignment, which, however, have overlooked the inherent hierarchical structures in molecules. Notably, higher-order molecular structures contain rich semantics of functional groups, which encode crucial biochemical functionalities of the molecules. We show that neglecting the hierarchical information in tokenization will lead to subpar molecule-language alignment and severe hallucination. To address this limitation, we propose HIerarchical GrapH Tokenization (HIGHT). HIGHT employs a hierarchical graph tokenizer that encodes the hierarchy of atom, motif, and molecular levels of informative tokens to improve the molecular perception of LLMs. HIGHT also adopts an augmented instruction tuning dataset, enriched with the hierarchical graph information, to further enhance the molecule-language alignment. Extensive experiments on 14 real-world benchmarks verify the effectiveness of HIGHT in reducing hallucination by 40%, and significant improvements in various molecule-language downstream tasks. The project is available at https: //higraphllm. github. io/.

NeurIPS Conference 2025 Conference Paper

Learning to Learn with Contrastive Meta-Objective

  • Shiguang Wu
  • Yaqing Wang
  • Yatao Bian
  • Quanming Yao

Meta-learning enables learning systems to adapt quickly to new tasks, similar to humans. Different meta-learning approaches all work under/with the mini-batch episodic training framework. Such framework naturally gives the information about task identity, which can serve as additional supervision for meta-training to improve generalizability. We propose to exploit task identity as additional supervision in meta-training, inspired by the alignment and discrimination ability which is is intrinsic in human's fast learning. This is achieved by contrasting what meta-learners learn, i. e. , model representations. The proposed ConML is evaluating and optimizing the contrastive meta-objective under a problem- and learner-agnostic meta-training framework. We demonstrate that ConML integrates seamlessly with existing meta-learners, as well as in-context learning models, and brings significant boost in performance with small implementation cost.

TIST Journal 2025 Journal Article

Modeling N-ary Relational Knowledge Bases with Tensor Decomposition

  • Yu Liu
  • Quanming Yao
  • Yong Li

The binary relational knowledge base (KB, a.k.a. knowledge graph), representing real-world knowledge with binary relations and entities, has been an important research topic in artificial intelligence, while, considerable knowledge also involves beyond-binary relations. Recently, the area proposes to model n-ary relational KBs with both binary and beyond-binary relations included. However, most current models are extended from translational distance and neural network models in binary relational KBs, which suffer from weak expressiveness and high complexity, respectively. To overcome such issues, in this work, we propose a novel two-step modeling framework, GETD, generalizing the powerful tensor decomposition technique from binary relational KBs to the n-ary case. For n-ary relational KBs with single-arity relations, the GETD framework introduces Tucker decomposition and Tensor Ring decomposition for expressive and efficient modeling. Furthermore, the framework is technically extended for the representation of n-ary relational KBs with mixed-arity relations. The existing negative sampling technique is also generalized to the n-ary case for GETD. In addition, we theoretically prove that the GETD framework is fully expressive to completely represent any KBs. Empirical results on two representative datasets show that the proposed framework significantly outperforms the state-of-the-art methods, achieving 11–26% and 4–7% improvements on Hits@10 for the single-arity and the mixed-arity cases, respectively.

IJCAI Conference 2025 Conference Paper

Unified Molecule-Text Language Model with Discrete Token Representation

  • Shuhan Guo
  • Yatao Bian
  • Ruibing Wang
  • Nan Yin
  • Zhen Wang
  • Quanming Yao

The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that fail to equally integrate molecule and text modalities and lack explicit supervision signals for the molecular modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLMs with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecular structures into sequences of tokens exhibiting causal dependency, thereby encapsulating both high-level molecular features and textual information. Equipped with this tokenizer, UniMoT unifies molecule and text modalities under a shared token representation and an autoregressive training paradigm. This enables the model to process molecular structures as a distinct linguistic system and generate them in textual form. Through a four-stage training scheme, UniMoT functions as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.

ICLR Conference 2025 Conference Paper

Why In-Context Learning Models are Good Few-Shot Learners?

  • Shiguang Wu 0002
  • Yaqing Wang 0002
  • Quanming Yao

We explore in-context learning (ICL) models from a learning-to-learn perspective. Unlike studies that identify specific learning algorithms in ICL models, we compare ICL models with typical meta-learners to understand their superior performance. We theoretically prove the expressiveness of ICL models as learning algorithms and examine their learnability and generalizability. Our findings show that ICL with transformers can effectively construct data-dependent learning algorithms instead of directly follow existing ones (including gradient-based, metric-based, and amortization-based meta-learners). The construction of such learning algorithm is determined by the pre-training process, as a function fitting the training distribution, which raises generalizability as an important issue. With above understanding, we propose strategies to transfer techniques for classical deep networks to meta-level to further improve ICL. As examples, we implement meta-level meta-learning for domain adaptability with limited data and meta-level curriculum learning for accelerated convergence during pre-training, demonstrating their empirical effectiveness.

NeurIPS Conference 2024 Conference Paper

Customized Subgraph Selection and Encoding for Drug-drug Interaction Prediction

  • Haotong Du
  • Quanming Yao
  • Juzheng Zhang
  • Yang Liu
  • Zhen Wang

Subgraph-based methods have proven to be effective and interpretable in predicting drug-drug interactions (DDIs), which are essential for medical practice and drug development. Subgraph selection and encoding are critical stages in these methods, yet customizing these components remains underexplored due to the high cost of manual adjustments. In this study, inspired by the success of neural architecture search (NAS), we propose a method to search for data-specific components within subgraph-based frameworks. Specifically, we introduce extensive subgraph selection and encoding spaces that account for the diverse contexts of drug interactions in DDI prediction. To address the challenge of large search spaces and high sampling costs, we design a relaxation mechanism that uses an approximation strategy to efficiently explore optimal subgraph configurations. This approach allows for robust exploration of the search space. Extensive experiments demonstrate the effectiveness and superiority of the proposed method, with the discovered subgraphs and encoding functions highlighting the model’s adaptability.

ICLR Conference 2024 Conference Paper

Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs

  • Zhanke Zhou
  • Yongqi Zhang
  • Jiangchao Yao
  • Quanming Yao
  • Bo Han 0003

To deduce new facts on a knowledge graph (KG), a link predictor learns from the graph structure and collects local evidence to find the answer to a given query. However, existing methods suffer from a severe scalability problem due to the utilization of the whole KG for prediction, which hinders their promise on large scale KGs and cannot be directly addressed by vanilla sampling methods. In this work, we propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction. The design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps, i.e., (i) extracting only one subgraph according to the query and (ii) predicting on this single, query dependent subgraph. We reveal that the non-parametric and computation-efficient heuristics Personalized PageRank (PPR) can effectively identify the potential answers and supporting evidence. With efficient subgraph-based prediction, we further introduce the automated searching of the optimal configurations in both data and model spaces. Empirically, we achieve promoted efficiency and leading performances on five large-scale benchmarks. The code is publicly available at: https://github.com/tmlr-group/one-shot-subgraph.

IJCAI Conference 2024 Conference Paper

PACIA: Parameter-Efficient Adapter for Few-Shot Molecular Property Prediction

  • Shiguang Wu
  • Yaqing Wang
  • Quanming Yao

Molecular property prediction (MPP) plays a crucial role in biomedical applications, but it often encounters challenges due to a scarcity of labeled data. Existing works commonly adopt gradient-based strategy to update a large amount of parameters for task-level adaptation. However, the increase of adaptive parameters can lead to overfitting and poor performance. Observing that graph neural network (GNN) performs well as both encoder and predictor, we propose PACIA, a parameter-efficient GNN adapter for few-shot MPP. We design a unified adapter to generate a few adaptive parameters to modulate the message passing process of GNN. We then adopt a hierarchical adaptation mechanism to adapt the encoder at task-level and the predictor at query-level by the unified GNN adapter. Extensive results show that PACIA obtains the state-of-the-art performance in few-shot MPP problems, and our proposed hierarchical adaptation mechanism is rational and effective.

AAAI Conference 2024 Conference Paper

Robust Communicative Multi-Agent Reinforcement Learning with Active Defense

  • Lebin Yu
  • Yunbo Qiu
  • Quanming Yao
  • Yuan Shen
  • Xudong Zhang
  • Jian Wang

Communication in multi-agent reinforcement learning (MARL) has been proven to effectively promote cooperation among agents recently. Since communication in real-world scenarios is vulnerable to noises and adversarial attacks, it is crucial to develop robust communicative MARL technique. However, existing research in this domain has predominantly focused on passive defense strategies, where agents receive all messages equally, making it hard to balance performance and robustness. We propose an active defense strategy, where agents automatically reduce the impact of potentially harmful messages on the final decision. There are two challenges to implement this strategy, that are defining unreliable messages and adjusting the unreliable messages' impact on the final decision properly. To address them, we design an Active Defense Multi-Agent Communication framework (ADMAC), which estimates the reliability of received messages and adjusts their impact on the final decision accordingly with the help of a decomposable decision structure. The superiority of ADMAC over existing methods is validated by experiments in three communication-critical tasks under four types of attacks.

AAAI Conference 2024 Conference Paper

Towards Human-like Learning from Relational Structured Data

  • Quanming Yao

Relational structured data is a way of representing knowledge using nodes and edges, while also capturing the meaning of that knowledge in a structured form that can be used for machine learning. Compared with vision and natural language data, relational structured data represents and manipulates structured knowledge, which can be beneficial for tasks that involve reasoning or inference. On the other hand, vision and NLP deal more with unstructured data (like images and text), and they often require different types of models and algorithms to extract useful information or features from the data. Human-like Learning develops methods that can harness relational structures and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. With Human-like Learning, the learning algorithm is efficient and can adapt to new or unseen situations, which is crucial in real-world applications where environments may change unpredictably. Moreover, the models are easier for humans to understand and interpret, which is important for transparency and trust in AI systems. In this talk, we present our recent attempts towards human-like learning from relational structured data.

ICLR Conference 2024 Conference Paper

Understanding Expressivity of GNN in Rule Learning

  • Haiquan Qiu
  • Yongqi Zhang
  • Yong Li 0008
  • Quanming Yao

Rule learning is critical to improving knowledge graph (KG) reasoning due to their ability to provide logical and interpretable explanations. Recently, Graph Neural Networks (GNNs) with tail entity scoring achieve the state-of-the-art performance on KG reasoning. However, the theoretical understandings for these GNNs are either lacking or focusing on single-relational graphs, leaving what the kind of rules these GNNs can learn an open problem. We propose to fill the above gap in this paper. Specifically, GNNs with tail entity scoring are unified into a common framework. Then, we analyze their expressivity by formally describing the rule structures they can learn and theoretically demonstrating their superiority. These results further inspire us to propose a novel labeling strategy to learn more rules in KG reasoning. Experimental results are consistent with our theoretical findings and verify the effectiveness of our proposed method. The code is publicly available at https://github.com/LARS-research/Rule-learning-expressivity.

NeurIPS Conference 2023 Conference Paper

Combating Bilateral Edge Noise for Robust Link Prediction

  • Zhanke Zhou
  • Jiangchao Yao
  • Jiaxu Liu
  • Xiawei Guo
  • Quanming Yao
  • Li He
  • Liang Wang
  • Bo Zheng

Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i. e. , self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https: //github. com/tmlr-group/RGIB.

ICLR Conference 2023 Conference Paper

Combating Exacerbated Heterogeneity for Robust Models in Federated Learning

  • Jianing Zhu
  • Jiangchao Yao
  • Tongliang Liu
  • Quanming Yao
  • Jianliang Xu
  • Bo Han 0003

Privacy and security concerns in real-world applications have led to the development of adversarially robust federated models. However, the straightforward combination between adversarial training and federated learning in one framework can lead to the undesired robustness deterioration. We discover that the attribution behind this phenomenon is that the generated adversarial data could exacerbate the data heterogeneity among local clients, making the wrapped federated learning perform poorly. To deal with this problem, we propose a novel framework called Slack Federated Adversarial Training (SFAT), assigning the client-wise slack during aggregation to combat the intensified heterogeneity. Theoretically, we analyze the convergence of the proposed method to properly relax the objective when combining federated learning and adversarial training. Experimentally, we verify the rationality and effectiveness of SFAT on various benchmarked and real-world datasets with different adversarial training and federated optimization methods. The code is publicly available at: https://github.com/ZFancy/SFAT.

NeurIPS Conference 2023 Conference Paper

Efficient Hyper-parameter Optimization with Cubic Regularization

  • Zhenqian Shen
  • Hansi Yang
  • Yong Li
  • James Kwok
  • Quanming Yao

As hyper-parameters are ubiquitous and can significantly affect the model performance, hyper-parameter optimization is extremely important in machine learning. In this paper, we consider a sub-class of hyper-parameter optimization problems, where the hyper-gradients are not available. Such problems frequently appear when the performance metric is non-differentiable or the hyper-parameter is not continuous. However, existing algorithms, like Bayesian optimization and reinforcement learning, often get trapped in local optimals with poor performance. To address the above limitations, we propose to use cubic regularization to accelerate convergence and avoid saddle points. First, we adopt stochastic relaxation, which allows obtaining gradient and Hessian information without hyper-gradients. Then, we exploit the rich curvature information by cubic regularization. Theoretically, we prove that the proposed method can converge to approximate second-order stationary points, and the convergence is also guaranteed when the lower-level problem is inexactly solved. Experiments on synthetic and real-world data demonstrate the effectiveness of our proposed method.

ICAPS Conference 2023 Conference Paper

Improving Zero-Shot Coordination Performance Based on Policy Similarity

  • Lebin Yu
  • Yunbo Qiu
  • Quanming Yao
  • Xudong Zhang 0001
  • Jian Wang 0030

Over these years, multi-agent reinforcement learning have achieved remarkable performance in multi-agent planning and scheduling tasks. It typically follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of zero-shot coordination, where an agent must coordinate with unseen partners, self-play agents may fail. Several methods have been proposed to handle this problem, but they either take a lot of time or lack generalizability. In this paper, we firstly reveal an important phenomenon: the zero-shot coordination performance is strongly linearly correlated with the similarity between an agent

ICLR Conference 2023 Conference Paper

Learning Symbolic Models for Graph-structured Physical Mechanism

  • Hongzhi Shi
  • Jingtao Ding
  • Yufan Cao 0002
  • Quanming Yao
  • Li Liu
  • Yong Li 0008

Graph-structured physical mechanisms are ubiquitous in real-world scenarios, thus revealing underneath formulas is of great importance for scientific discovery. However, classical symbolic regression methods fail on this task since they can only handle input-output pairs that are not graph-structured. In this paper, we propose a new approach that generalizes symbolic regression to graph-structured physical mechanisms. The essence of our method is to model the formula skeleton with a message-passing flow, which helps transform the discovery of the skeleton into the search for the message-passing flow. Such a transformation guarantees that we are able to search a message-passing flow, which is efficient and Pareto-optimal in terms of both accuracy and simplicity. Subsequently, the underneath formulas can be identified by interpreting component functions of the searched message-passing flow, reusing classical symbolic regression methods. We conduct extensive experiments on datasets from different physical domains, including mechanics, electricity, and thermology, and on real-world datasets of pedestrian dynamics without ground-truth formulas. The experimental results not only verify the rationale of our design but also demonstrate that the proposed method can automatically learn precise and interpretable formulas for graph-structured physical mechanisms.

ICML Conference 2023 Conference Paper

On Strengthening and Defending Graph Reconstruction Attack with Markov Chain Approximation

  • Zhanke Zhou
  • Chenyu Zhou
  • Xuan Li 0005
  • Jiangchao Yao
  • Quanming Yao
  • Bo Han 0003

Although powerful graph neural networks (GNNs) have boosted numerous real-world applications, the potential privacy risk is still underexplored. To close this gap, we perform the first comprehensive study of graph reconstruction attack that aims to reconstruct the adjacency of nodes. We show that a range of factors in GNNs can lead to the surprising leakage of private links. Especially by taking GNNs as a Markov chain and attacking GNNs via a flexible chain approximation, we systematically explore the underneath principles of graph reconstruction attack, and propose two information theory-guided mechanisms: (1) the chain-based attack method with adaptive designs for extracting more private information; (2) the chain-based defense method that sharply reduces the attack fidelity with moderate accuracy loss. Such two objectives disclose a critical belief that to recover better in attack, you must extract more multi-aspect knowledge from the trained GNN; while to learn safer for defense, you must forget more link-sensitive information in training GNNs. Empirically, we achieve state-of-the-art results on six datasets and three common GNNs. The code is publicly available at: https: //github. com/tmlr-group/MC-GRA.

TMLR Journal 2023 Journal Article

Understanding and Simplifying Architecture Search in Spatio-Temporal Graph Neural Networks

  • Zhen Xu
  • Quanming Yao
  • Yong Li
  • Qiang Yang

Compiling together spatial and temporal modules via a unified framework, Spatio-Temporal Graph Neural Networks (STGNNs) have been popularly used in the multivariate spatio-temporal forecasting task, e.g. traffic prediction. After the numerous propositions of manually designed architectures, researchers show interest in the Neural Architecture Search (NAS) of STGNNs. Existing methods suffer from two issues: (1) hyperparameters like learning rate, channel size cannot be integrated into the NAS framework, which makes the model evaluation less accurate, potentially misleading the architecture search (2) the current search space, which basically mimics Darts-like methods, is too large for the search algorithm to find a sufficiently good candidate. In this work, we deal with both issues at the same time. We first re-examine the importance and transferability of the training hyperparameters to ensure a fair and fast comparison. Next, we set up a framework that disentangles architecture design into three disjoint angles according to how spatio-temporal representations flow and transform in architectures, which allows us to understand the behavior of architectures from a distributional perspective. This way, we can obtain good guidelines to reduce the STGNN search space and find state-of-the-art architectures by simple random search. As an illustrative example, we combine these principles with random search which already significantly outperforms both state-of-the-art hand-designed models and recently automatically searched ones.

ICML Conference 2022 Conference Paper

Fast and Provable Nonconvex Tensor RPCA

  • Haiquan Qiu
  • Yao Wang 0003
  • Shaojie Tang 0001
  • Deyu Meng
  • Quanming Yao

In this paper, we study nonconvex tensor robust principal component analysis (RPCA) based on the $t$-SVD. We first propose an alternating projection method, i. e. , APT, which converges linearly to the ground-truth under the incoherence conditions of tensors. However, as the projection to the low-rank tensor space in APT can be slow, we further propose to speedup such a process by utilizing the property of the tangent space of low-rank. The resulting algorithm, i. e. , EAPT, is not only more efficient than APT but also keeps the linear convergence. Compared with existing tensor RPCA works, the proposed method, especially EAPT, is not only more effective due to the recovery guarantee and adaption in the transformed (frequency) domain but also more efficient due to faster convergence rate and lower iteration complexity. These benefits are also empirically verified both on synthetic data, and real applications, e. g. , hyperspectral image denoising and video background subtraction.

JMLR Journal 2022 Journal Article

Low-rank Tensor Learning with Nonconvex Overlapped Nuclear Norm Regularization

  • Quanming Yao
  • Yaqing Wang
  • Bo Han
  • James T. Kwok

Nonconvex regularization has been popularly used in low-rank matrix learning. However, extending it for low-rank tensor learning is still computationally expensive. To address this problem, we develop an efficient solver for use with a nonconvex extension of the overlapped nuclear norm regularizer. Based on the proximal average algorithm, the proposed algorithm can avoid expensive tensor folding/unfolding operations. A special “sparse plus low-rank" structure is maintained throughout the iterations, and allows fast computation of the individual proximal steps. Empirical convergence is further improved with the use of adaptive momentum. We provide convergence guarantees to critical points on smooth losses and also on objectives satisfying the Kurdyka-Lojasiewicz condition. While the optimization problem is nonconvex and nonsmooth, we show that its critical points still have good statistical performance on the tensor completion problem. Experiments on various synthetic and real-world data sets show that the proposed algorithm is efficient in both time and space and more accurate than the existing state-of-the-art. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2021 Conference Paper

Automorphic Equivalence-aware Graph Neural Network

  • Fengli Xu
  • Quanming Yao
  • Pan Hui
  • Yong Li

Distinguishing the automorphic equivalence of nodes in a graph plays an essential role in many scientific domains, e. g. , computational biologist and social network analysis. However, existing graph neural networks (GNNs) fail to capture such an important property. To make GNN aware of automorphic equivalence, we first introduce a localized variant of this concept --- ego-centered automorphic equivalence (Ego-AE). Then, we design a novel variant of GNN, i. e. , GRAPE, that uses learnable AE-aware aggregators to explicitly differentiate the Ego-AE of each node's neighbors with the aids of various subgraph templates. While the design of subgraph templates can be hard, we further propose a genetic algorithm to automatically search them from graph data. Moreover, we theoretically prove that GRAPE is expressive in terms of generating distinct representations for nodes with different Ego-AE features, which fills in a fundamental gap of existing GNN variants. Finally, we empirically validate our model on eight real-world graph data, including social network, e-commerce co-purchase network, and citation network, and show that it consistently outperforms existing GNNs. The source code is public available at https: //github. com/tsinghua-fib-lab/GRAPE.

NeurIPS Conference 2021 Conference Paper

Progressive Feature Interaction Search for Deep Sparse Network

  • Chen Gao
  • Yinfeng Li
  • Quanming Yao
  • Depeng Jin
  • Yong Li

Deep sparse networks (DSNs), of which the crux is exploring the high-order feature interactions, have become the state-of-the-art on the prediction task with high-sparsity features. However, these models suffer from low computation efficiency, including large model size and slow model inference, which largely limits these models' application value. In this work, we approach this problem with neural architecture search by automatically searching the critical component in DSNs, the feature-interaction layer. We propose a distilled search space to cover the desired architectures with fewer parameters. We then develop a progressive search algorithm for efficient search on the space and well capture the order-priority property in sparse prediction tasks. Experiments on three real-world benchmark datasets show promising results of PROFIT in both accuracy and efficiency. Further studies validate the feasibility of our designed search space and search algorithm.

NeurIPS Conference 2021 Conference Paper

Property-Aware Relation Networks for Few-Shot Molecular Property Prediction

  • Yaqing Wang
  • ABULIKEMU ABUDUWEILI
  • Quanming Yao
  • Dejing Dou

Molecular property prediction plays a fundamental role in drug discovery to identify candidate molecules with target properties. However, molecular property prediction is essentially a few-shot problem, which makes it hard to use regular machine learning models. In this paper, we propose Property-Aware Relation networks (PAR) to handle this problem. In comparison to existing works, we leverage the fact that both relevant substructures and relationships among molecules change across different molecular properties. We first introduce a property-aware embedding function to transform the generic molecular embeddings to substructure-aware space relevant to the target property. Further, we design an adaptive relation graph learning module to jointly estimate molecular relation graph and refine molecular embeddings w. r. t. the target property, such that the limited labels can be effectively propagated among similar molecules. We adopt a meta-learning strategy where the parameters are selectively updated within tasks in order to model generic and property-aware knowledge separately. Extensive experiments on benchmark molecular property prediction datasets show that PAR consistently outperforms existing methods and can obtain property-aware molecular embeddings and model molecular relation graph properly.

AAAI Conference 2020 Conference Paper

Efficient Neural Architecture Search via Proximal Iterations

  • Quanming Yao
  • Ju Xu
  • Wei-Wei Tu
  • Zhanxing Zhu

Neural architecture search (NAS) attracts much research attention because of its ability to identify better architectures than handcrafted ones. Recently, differentiable search methods become the state-of-the-arts on NAS, which can obtain highperformance architectures in several days. However, they still suffer from huge computation costs and inferior performance due to the construction of the supernet. In this paper, we propose an efficient NAS method based on proximal iterations (denoted as NASP). Different from previous works, NASP reformulates the search process as an optimization problem with a discrete constraint on architectures and a regularizer on model complexity. As the new objective is hard to solve, we further propose an efficient algorithm inspired by proximal iterations for optimization. In this way, NASP is not only much faster than existing differentiable search methods, but also can find better architectures and balance the model complexity. Finally, extensive experiments on various tasks demonstrate that NASP can obtain high-performance architectures with more than 10 times speedup over the state-of-the-arts.

NeurIPS Conference 2020 Conference Paper

Interstellar: Searching Recurrent Architecture for Knowledge Graph Embedding

  • Yongqi Zhang
  • Quanming Yao
  • Lei Chen

Knowledge graph (KG) embedding is well-known in learning representations of KGs. Many models have been proposed to learn the interactions between entities and relations of the triplets. However, long-term information among multiple triplets is also important to KG. In this work, based on the relational paths, which are composed of a sequence of triplets, we define the Interstellar as a recurrent neural architecture search problem for the short-term and long-term information along the paths. First, we analyze the difficulty of using a unified model to work as the Interstellar. Then, we propose to search for recurrent architecture as the Interstellar for different KG tasks. A case study on synthetic data illustrates the importance of the defined search problem. Experiments on real datasets demonstrate the effectiveness of the searched models and the efficiency of the proposed hybrid-search algorithm.

ICML Conference 2020 Conference Paper

Searching to Exploit Memorization Effect in Learning with Noisy Labels

  • Quanming Yao
  • Hansi Yang
  • Bo Han 0003
  • Gang Niu 0001
  • James T. Kwok

Sample selection approaches are popular in robust learning from noisy labels. However, how to properly control the selection process so that deep networks can benefit from the memorization effect is a hard problem. In this paper, motivated by the success of automated machine learning (AutoML), we model this issue as a function approximation problem. Specifically, we design a domain-specific search space based on general patterns of the memorization effect and propose a novel Newton algorithm to solve the bi-level optimization problem efficiently. We further provide a theoretical analysis of the algorithm, which ensures a good approximation to critical points. Experiments are performed on both benchmark and real-world data sets. Results demonstrate that the proposed method is much better than the state-of-the-art noisy-label-learning approaches, and also much more efficient than existing AutoML algorithms.

ICML Conference 2020 Conference Paper

SIGUA: Forgetting May Make Learning with Noisy Labels More Robust

  • Bo Han 0003
  • Gang Niu 0001
  • Xingrui Yu
  • Quanming Yao
  • Miao Xu 0001
  • Ivor W. Tsang
  • Masashi Sugiyama

Given data with noisy labels, over-parameterized deep networks can gradually memorize the data, and fit everything in the end. Although equipped with corrections for noisy labels, many learning methods in this area still suffer overfitting due to undesired memorization. In this paper, to relieve this issue, we propose stochastic integrated gradient underweighted ascent (SIGUA): in a mini-batch, we adopt gradient descent on good data as usual, and learning-rate-reduced gradient ascent on bad data; the proposal is a versatile approach where data goodness or badness is w. r. t. desired or undesired memorization given a base learning method. Technically, SIGUA pulls optimization back for generalization when their goals conflict with each other; philosophically, SIGUA shows forgetting undesired memorization can reinforce desired memorization. Experiments demonstrate that SIGUA successfully robustifies two typical base learning methods, so that their performance is often significantly improved.

NeurIPS Conference 2020 Conference Paper

Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

  • Jingtao Ding
  • Yuhan Quan
  • Quanming Yao
  • Yong Li
  • Depeng Jin

Negative sampling approaches are prevalent in implicit collaborative filtering for obtaining negative labels from massive unlabeled data. As two major concerns in negative sampling, efficiency and effectiveness are still not fully achieved by recent works that use complicate structures and overlook risk of false negative instances. In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations. Above findings motivate us to simplify the model by sampling from designed memory that only stores a few important candidates and, more importantly, tackle the untouched false negative problem by favouring high-variance samples stored in memory, which achieves efficient sampling of true negatives with high-quality. Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method. The implementation is available at https: //github. com/dingjingtao/SRNS.

ICML Conference 2019 Conference Paper

Efficient Nonconvex Regularized Tensor Completion with Structure-aware Proximal Iterations

  • Quanming Yao
  • James T. Kwok
  • Bo Han 0003

Nonconvex regularizers have been successfully used in low-rank matrix learning. In this paper, we extend this to the more challenging problem of low-rank tensor completion. Based on the proximal average algorithm, we develop an efficient solver that avoids expensive tensor folding and unfolding. A special “sparse plus low-rank" structure, which is essential for fast computation of individual proximal steps, is maintained throughout the iterations. We also incorporate adaptive momentum to further speed up empirical convergence. Convergence results to critical points are provided under smoothness and Kurdyka-Lojasiewicz conditions. Experimental results on a number of synthetic and real-world data sets show that the proposed algorithm is more efficient in both time and space, and is also more accurate than existing approaches.

IJCAI Conference 2019 Conference Paper

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction

  • Quanming Yao
  • Xiawei Guo
  • James Kwok
  • Weiwei Tu
  • Yuqiang Chen
  • Wenyuan Dai
  • Qiang Yang

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i. e. , MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.

IJCAI Conference 2019 Conference Paper

Robust Learning from Noisy Side-information by Semidefinite Programming

  • En-Liang Hu
  • Quanming Yao

Robustness recently becomes one of the major concerns among machine learning community, since learning algorithms are usually vulnerable to outliers or corruptions. Motivated by such a trend and needs, we pursue robustness in semi-definite programming (SDP) in this paper. Specifically, this is done by replacing the commonly used squared loss with the more robust L1-loss in the low-rank SDP. However, the resulting objective becomes neither convex nor smooth. As no existing algorithms can be applied, we design an efficient algorithm, based on majorization-minimization, to optimize the objective. The proposed algorithm not only has cheap iterations and low space complexity but also theoretically converges to some critical points. Finally, empirical study shows that the new objective armed with proposed algorithm outperforms state-of-the-art in terms of both speed and accuracy.

NeurIPS Conference 2018 Conference Paper

Co-teaching: Robust training of deep neural networks with extremely noisy labels

  • Bo Han
  • Quanming Yao
  • Xingrui Yu
  • Gang Niu
  • Miao Xu
  • Weihua Hu
  • Ivor Tsang
  • Masashi Sugiyama

Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

JMLR Journal 2018 Journal Article

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

  • Quanming Yao
  • James T. Kwok

The use of convex regularizers allows for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, a popular subclass of $\ell_1$-based nonconvex sparsity-inducing and low-rank regularizers is considered. This includes nonconvex variants of lasso, sparse group lasso, tree- structured lasso, nuclear norm and total variation regularizers. We propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex one, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the proximal algorithm, Frank-Wolfe algorithm, alternating direction method of multipliers and stochastic gradient descent). This is further extended to consider cases where the convexified regularizer does not have a closed-form proximal step, and when the loss function is nonconvex nonsmooth. Extensive experiments on a variety of machine learning application scenarios show that optimizing the transformed problem is much faster than running the state-of-the-art on the original problem. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

ICML Conference 2018 Conference Paper

Online Convolutional Sparse Coding with Sample-Dependent Dictionary

  • Yaqing Wang 0002
  • Quanming Yao
  • James T. Kwok
  • Lionel M. Ni

Convolutional sparse coding (CSC) has been popularly used for the learning of shift-invariant dictionaries in image and signal processing. However, existing methods have limited scalability. In this paper, instead of convolving with a dictionary shared by all samples, we propose the use of a sample-dependent dictionary in which each filter is a linear combination of a small set of base filters learned from data. This added flexibility allows a large number of sample-dependent patterns to be captured, which is especially useful in the handling of large or high-dimensional data sets. Computationally, the resultant model can be efficiently learned by online learning. Extensive experimental results on a number of data sets show that the proposed method outperforms existing CSC algorithms with significantly reduced time and space complexities.

NeurIPS Conference 2018 Conference Paper

Scalable Robust Matrix Factorization with Nonconvex Loss

  • Quanming Yao
  • James Kwok

Robust matrix factorization (RMF), which uses the $\ell_1$-loss, often outperforms standard matrix factorization using the $\ell_2$-loss, particularly when outliers are present. The state-of-the-art RMF solver is the RMF-MM algorithm, which, however, cannot utilize data sparsity. Moreover, sometimes even the (convex) $\ell_1$-loss is not robust enough. In this paper, we propose the use of nonconvex loss to enhance robustness. To address the resultant difficult optimization problem, we use majorization-minimization (MM) optimization and propose a new MM surrogate. To improve scalability, we exploit data sparsity and optimize the surrogate via its dual with the accelerated proximal gradient algorithm. The resultant algorithm has low time and space complexities and is guaranteed to converge to a critical point. Extensive experiments demonstrate its superiority over the state-of-the-art in terms of both accuracy and scalability.

IJCAI Conference 2017 Conference Paper

Efficient Inexact Proximal Gradient Algorithm for Nonconvex Problems

  • Quanming Yao
  • James T. Kwok
  • Fei Gao
  • Wei Chen
  • Tie-Yan Liu

While proximal gradient algorithm is originally designed for convex optimization, several variants have been recently proposed for nonconvex problems. Among them, nmAPG [Li and Lin, 2015] is the state-of-art. However, it is inefficient when the proximal step does not have closed-form solution, or such solution exists but is expensive, as it requires more than one proximal steps to be exactly solved in each iteration. In this paper, we propose an efficient accelerate proximal gradient (niAPG) algorithm for nonconvex problems. In each iteration, it requires only one inexact (less expensive) proximal step. Convergence to a critical point is still guaranteed, and a O(1/k) convergence rate is derived. Experiments on image inpainting and matrix completion problems demonstrate that the proposed algorithm has comparable performance as the state-of-the-art, but is much faster.

AAAI Conference 2017 Conference Paper

Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm

  • Xiawei Guo
  • Quanming Yao
  • James Kwok

Most tensor problems are NP-hard, and low-rank tensor completion is much more difficult than low-rank matrix completion. In this paper, we propose a time and spaceefficient low-rank tensor completion algorithm by using the scaled latent nuclear norm for regularization and the Frank- Wolfe (FW) algorithm for optimization. We show that all the steps can be performed efficiently. In particular, FW’s linear subproblem has a closed-form solution which can be obtained from rank-one SVD. By utilizing sparsity of the observed tensor, we only need to maintain sparse tensors and a set of small basis matrices. Experimental results show that the proposed algorithm is more accurate, much faster and more scalable than the state-of-the-art.

ICML Conference 2016 Conference Paper

Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity

  • Quanming Yao
  • James T. Kwok

The use of convex regularizers allow for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the standard proximal algorithm and Frank-Wolfe algorithm). Moreover, it can be shown that critical points of the transformed problem are also critical points of the original problem. Extensive experiments on a number of nonconvex regularization problems show that the proposed procedure is much faster than the state-of-the-art nonconvex solvers.

IJCAI Conference 2016 Conference Paper

Greedy Learning of Generalized Low-Rank Models

  • Quanming Yao
  • James T. Kwok

Learning of low-rank matrices is fundamental to many machine learning applications. A state-of-the-art algorithm is the rank-one matrix pursuit (R1MP). However, it can only be used in matrix completion problems with the square loss. In this paper, we develop a more flexible greedy algorithm for generalized low-rank models whose optimization objective can be smooth or nonsmooth, general convex or strongly convex. The proposed algorithm has low per-iteration time complexity and fast convergence rate. Experimental results show that it is much faster than the state-of-the-art, with comparable or even better prediction performance.

IJCAI Conference 2015 Conference Paper

Accelerated Inexact Soft-Impute for Fast Large-Scale Matrix Completion

  • Quanming Yao
  • James T. Kwok

Matrix factorization tries to recover a low-rank matrix from limited observations. A state-of-theart algorithm is the Soft-Impute, which exploits a special “sparse plus low-rank” structure of the matrix iterates to allow efficient SVD in each iteration. Though Soft-Impute is also a proximal gradient algorithm, it is generally believed that acceleration techniques are not useful and will destroy the special structure. In this paper, we show that Soft-Impute can indeed be accelerated without compromising the “sparse plus low-rank” structure. To further reduce the per-iteration time complexity, we propose an approximate singular value thresholding scheme based on the power method. Theoretical analysis shows that the proposed algorithm enjoys the fast O(1/T2 ) convergence rate of accelerated proximal gradient algorithms. Extensive experiments on both synthetic and large recommendation data sets show that the proposed algorithm is much faster than Soft-Impute and other state-of-the-art matrix completion algorithms.

AAAI Conference 2015 Conference Paper

Colorization by Patch-Based Local Low-Rank Matrix Completion

  • Quanming Yao
  • T. Kwok James

Colorization aims at recovering the original color of a monochrome image from only a few color pixels. A stateof-the-art approach is based on matrix completion, which assumes that the target color image is low-rank. However, this low-rank assumption is often invalid on natural images. In this paper, we propose a patch-based approach that divides the image into patches and then imposes a low-rank structure only on groups of similar patches. Each local matrix completion problem is solved by an accelerated version of alternating direction method of multipliers (ADMM), and each ADMM subproblem is solved efficiently by divide-and-conquer. Experiments on a number of benchmark images demonstrate that the proposed method outperforms existing approaches.