Arrow Research search

Author name cluster

Daochen Zha

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

ICLR Conference 2025 Conference Paper

MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations

  • Shaochen (Henry) Zhong
  • Yifan Lu
  • Lize Shao
  • Bhargav Bhushanam
  • Xiaocong Du
  • Yixin Wan
  • Yucheng Shi
  • Daochen Zha

Large language models (LLMs) can give out erroneous answers to factually rooted questions either as a result of undesired training outcomes or simply because the world has moved on after a certain knowledge cutoff date. Under such scenarios, *knowledge editing* often comes to the rescue by delivering efficient patches for such erroneous answers without significantly altering the rest, where many editing methods have seen reasonable success when the editing targets are simple and direct (e.g., *``what club does Lionel Messi currently play for?''*). However, knowledge fragments like this are often deeply intertwined in the real world, making effectively propagating the editing effect to non-directly related questions a practical challenge (to entertain an extreme example: [*"What car did the wife of the owner of the club that Messi currently plays for used to get to school in the 80s?"*](youtube.com/watch?v=DbwiHC1Fu-E\&t=132s)). Prior arts have coined this task as *multi-hop knowledge editing* with the most popular dataset being MQuAKE, serving as the sole evaluation benchmark for many later proposed editing methods due to the expensive nature of constructing knowledge editing datasets at scale. In this work, we reveal that **up to 33\% or 76\% of \mquake{}'s questions and ground truth labels are, in fact, corrupted in various fashions due to some unintentional clerical or procedural oversights**. Our work provides a detailed audit of MQuAKE's error pattern and a comprehensive fix without sacrificing its dataset capacity. Additionally, we benchmarked almost all proposed MQuAKE-evaluated editing methods on our post-fix dataset, **MQuAKE-Remastered**. We observe that many methods try to overfit the original MQuAKE by exploiting some dataset idiosyncrasies of MQuAKE. We provide a guideline on how to approach such datasets faithfully and show that a simple, minimally invasive approach — **GWalk** — can offer beyond SOTA editing performance without such exploitation. The MQuAKE-Remastered datasets and utilities are available at [huggingface.co/datasets/henryzhongsc/MQuAKE-Remastered](https://huggingface.co/datasets/henryzhongsc/MQuAKE-Remastered) and [github.com/henryzhongsc/MQuAKE-Remastered](https://github.com/henryzhongsc/MQuAKE-Remastered), respectively.

NeurIPS Conference 2024 Conference Paper

Cost-efficient Knowledge-based Question Answering with Large Language Models

  • Junnan Dong
  • Qinggang Zhang
  • Chuang Zhou
  • Hao Chen
  • Daochen Zha
  • Xiao Huang

Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge. Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training. We are motivated to combine LLMs and prior small models on knowledge graphs (KGMs) for both inferential accuracy and cost saving. However, it remains challenging since accuracy and cost are not readily combined in the optimization as two distinct metrics. It is also laborious for model selection since different models excel in diverse knowledge. To this end, we propose Coke, a novel cost-efficient strategy for KBQA with LLMs, modeled as a tailored multi-armed bandit problem to minimize calls to LLMs within limited budgets. We first formulate the accuracy expectation with a cluster-level Thompson Sampling for either KGMs or LLMs. A context-aware policy is optimized to further distinguish the expert model subject to the question semantics. The overall decision is bounded by the cost regret according to historical expenditure on failures. Extensive experiments showcase the superior performance of Coke, which moves the Pareto frontier with up to 20. 89% saving of GPT-4 fees while achieving a 2. 74% higher accuracy on the benchmark datasets.

IJCAI Conference 2024 Conference Paper

Denoising-Aware Contrastive Learning for Noisy Time Series

  • Shuang Zhou
  • Daochen Zha
  • Xiao Shen
  • Xiao Huang
  • Rui Zhang
  • Korris Chung

Time series self-supervised learning (SSL) aims to exploit unlabeled data for pre-training to mitigate the reliance on labels. Despite the great success in recent years, there is limited discussion on the potential noise in the time series, which can severely impair the performance of existing SSL methods. To mitigate the noise, the de facto strategy is to apply conventional denoising methods before model training. However, this pre-processing approach may not fully eliminate the effect of noise in SSL for two reasons: (i) the diverse types of noise in time series make it difficult to automatically determine suitable denoising methods; (ii) noise can be amplified after mapping raw data into latent space. In this paper, we propose denoising-aware contrastive learning (DECL), which uses contrastive learning objectives to mitigate the noise in the representation and automatically selects suitable denoising methods for every sample. Extensive experiments on various datasets verify the effectiveness of our method. The code is open-sourced.

IJCAI Conference 2024 Conference Paper

Enhanced DouDiZhu Card Game Strategy Using Oracle Guiding and Adaptive Deep Monte Carlo Method

  • Qian Luo
  • Tien Ping Tan
  • Daochen Zha
  • Tianqiao Zhang

Deep Reinforcement Learning (DRL) exhibits significant advancements in games with both perfect and imperfect information, such as Go, Chess, Texas Hold'em, and Dota2. However, DRL encounters considerable challenges when tackling card game DouDiZhu because of the imperfect information, large state-action space, and the sparse reward issue. This paper presents OADMCDou, which combines Oracle Guiding and Adaptive Deep Monte Carlo Method to address the challenges in DouDiZhu. Oracle Guiding trains an Oracle agent with both imperfect and perfect information, gradually reducing reliance on imperfect information to transition to a standard agent. Adaptive Deep Monte Carlo uses gradient weight clipping and constrains the magnitude of updates to prevent extreme policy updates. We conduct extensive experiments to evaluate the effectiveness of the proposed methods, demonstrating OADMCDou's superior performance over the state-of-the-art DouDiZhu AI, DouZero. This superiority over DouZero is reflected in two metrics: a 95% confidence interval of 0. 104 ± 0. 041 for performance, and a 28. 6% reduction in loss.

NeurIPS Conference 2024 Conference Paper

KnowGPT: Knowledge Graph based Prompting for Large Language Models

  • Qinggang Zhang
  • Junnan Dong
  • Hao Chen
  • Daochen Zha
  • Zailiang Yu
  • Xiao Huang

Large Language Models (LLMs) have demonstrated remarkable capabilities in many real-world applications. Nonetheless, LLMs are often criticized for their tendency to produce hallucinations, wherein the models fabricate incorrect statements on tasks beyond their knowledge and perception. To alleviate this issue, graph retrieval-augmented generation (GraphRAG) has been extensively explored which leverages the factual knowledge in knowledge graphs (KGs) to ground the LLM's responses in established facts and principles. However, most state-of-the-art LLMs are closed-source, making it challenging to develop a prompting framework that can efficiently and effectively integrate KGs into LLMs with hard prompts only. Generally, existing KG-enhanced LLMs usually suffer from three critical issues, including huge search space, high API costs, and laborious prompt engineering, that impede their widespread application in practice. To this end, we introduce a novel Know ledge Gr aph based P romp T ing framework, namely KnowGPT, to enhance LLMs with domain knowledge. KnowGPT contains a knowledge extraction module to extract the most informative knowledge from KGs, and a context-aware prompt construction module to automatically convert extracted knowledge into effective prompts. Experiments on three benchmarks demonstrate that KnowGPT significantly outperforms all competitors. Notably, KnowGPT achieves a 92. 6% accuracy on OpenbookQA leaderboard, comparable to human-level performance.

NeurIPS Conference 2023 Conference Paper

One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning

  • Shaochen (Henry) Zhong
  • Zaichuan You
  • Jiamu Zhang
  • Sebastian Zhao
  • Zachary LeClaire
  • Zirui Liu
  • Daochen Zha
  • Vipin Chaudhary

Densely structured pruning methods utilizing simple pruning heuristics can deliver immediate compression and acceleration benefits with acceptable benign performances. However, empirical findings indicate such naively pruned networks are extremely fragile under simple adversarial attacks. Naturally, we would be interested in knowing if such a phenomenon also holds for carefully designed modern structured pruning methods. If so, then to what extent is the severity? And what kind of remedies are available? Unfortunately, both the questions and the solution remain largely unaddressed: no prior art is able to provide a thorough investigation on the adversarial performance of modern structured pruning methods (spoiler: it is not good), yet the few works that attempt to provide mitigation often do so at various extra costs with only to-be-desired performance. In this work, we answer both questions by fairly and comprehensively investigating the adversarial performance of 10+ popular structured pruning methods. Solution-wise, we take advantage of Grouped Kernel Pruning (GKP) 's recent success in pushing densely structured pruning freedom to a more fine-grained level. By mixing up kernel smoothness — a classic robustness-related kernel-level metric — into a modified GKP procedure, we present a one-shot-post-train-weight-dependent GKP method capable of advancing SOTA performance on both the benign and adversarial scale, while requiring no extra (in fact, often less) cost than a standard pruning procedure. Please refer to our GitHub repository for code implementation, tool sharing, and model checkpoints.

NeurIPS Conference 2023 Conference Paper

OpenGSL: A Comprehensive Benchmark for Graph Structure Learning

  • Zhiyao Zhou
  • Sheng Zhou
  • Bochao Mao
  • Xuanyi Zhou
  • Jiawei Chen
  • Qiaoyu Tan
  • Daochen Zha
  • Yan Feng

Graph Neural Networks (GNNs) have emerged as the de facto standard for representation learning on graphs, owing to their ability to effectively integrate graph topology and node attributes. However, the inherent suboptimal nature of node connections, resulting from the complex and contingent formation process of graphs, presents significant challenges in modeling them effectively. To tackle this issue, Graph Structure Learning (GSL), a family of data-centric learning approaches, has garnered substantial attention in recent years. The core concept behind GSL is to jointly optimize the graph structure and the corresponding GNN models. Despite the proposal of numerous GSL methods, the progress in this field remains unclear due to inconsistent experimental protocols, including variations in datasets, data processing techniques, and splitting strategies. In this paper, we introduce OpenGSL, the first comprehensive benchmark for GSL, aimed at addressing this gap. OpenGSL enables a fair comparison among state-of-the-art GSL methods by evaluating them across various popular datasets using uniform data processing and splitting strategies. Through extensive experiments, we observe that existing GSL methods do not consistently outperform vanilla GNN counterparts. We also find that there is no significant correlation between the homophily of the learned structure and task performance, challenging the common belief. Moreover, we observe that the learned graph structure demonstrates a strong generalization ability across different GNN models, despite the high computational and space consumption. We hope that our open-sourced library will facilitate rapid and equitable evaluation and inspire further innovative research in this field. The code of the benchmark can be found in https: //github. com/OpenGSL/OpenGSL.

ICML Conference 2023 Conference Paper

RSC: Accelerate Graph Neural Networks Training via Randomized Sparse Computations

  • Zirui Liu 0001
  • Shengyuan Chen
  • Kaixiong Zhou
  • Daochen Zha
  • Xiao Huang 0001
  • Xia Hu 0001

Training graph neural networks (GNNs) is extremely time consuming because sparse graph-based operations are hard to be accelerated by community hardware. Prior art successfully reduces the computation cost of dense matrix based operations (e. g. , convolution and linear) via sampling-based approximation. However, unlike dense matrices, sparse matrices are stored in the irregular data format such that each row/column may have different number of non-zero entries. Thus, compared to the dense counterpart, approximating sparse operations has two unique challenges (1) we cannot directly control the efficiency of approximated sparse operation since the computation is only executed on non-zero entries; (2) sampling sparse matrices is much more inefficient due to the irregular data format. To address the issues, our key idea is to control the accuracy-efficiency trade off by optimizing computation resource allocation layer-wisely and epoch-wisely. For the first challenge, we customize the computation resource to different sparse operations, while limit the total used resource below a certain budget. For the second challenge, we cache previous sampled sparse matrices to reduce the epoch-wise sampling overhead. Finally, we propose a switching mechanisms to improve the generalization of GNNs trained with approximated operations. To this end, we propose Randomized Sparse Computation. In practice, rsc can achieve up to 11. 6X speedup for a single sparse operation and 1. 6X end-to-end wall-clock time speedup with almost no accuracy drop.

ICML Conference 2023 Conference Paper

SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems

  • Aaron M. Ferber
  • Taoan Huang
  • Daochen Zha
  • Martin Schubert
  • Benoit Steiner
  • Bistra Dilkina
  • Yuandong Tian

Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{\emph{\texttt{SurCo}}}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. The surrogate costs are learned end-to-end with nonlinear loss by differentiating through the linear surrogate solver, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We propose three $\texttt{SurCo}$ variants: $\texttt{SurCo}-\texttt{zero}$ for individual nonlinear problems, $\texttt{SurCo}-\texttt{prior}$ for problem distributions, and $\texttt{SurCo}-\texttt{hybrid}$ to combine both distribution and problem-specific information. We give theoretical intuition motivating $\texttt{SurCo}$, and evaluate it empirically. Experiments show that $\texttt{SurCo}$ finds better solutions faster than state-of-the-art and domain expert approaches in real-world optimization problems such as embedding table sharding, inverse photonic design, and nonlinear route planning.

NeurIPS Conference 2023 Conference Paper

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

  • Zirui Liu
  • Guanchu Wang
  • Shaochen (Henry) Zhong
  • Zhaozhuo Xu
  • Daochen Zha
  • Ruixiang (Ryan) Tang
  • Zhimeng (Stephen) Jiang
  • Kaixiong Zhou

As the model size grows rapidly, fine-tuning the large pre-trained language model has become increasingly difficult due to its extensive memory usage. Previous works usually focus on reducing the number of trainable parameters in the network. While the model parameters do contribute to memory usage, the primary memory bottleneck during training arises from storing feature maps, also known as activations, as they are crucial for gradient calculation. Notably, machine learning models are typically trained using stochastic gradient descent. We argue that in stochastic optimization, models can handle noisy gradients as long as the gradient estimator is unbiased with reasonable variance. Following this motivation, we propose a new family of unbiased estimators called \sas, for matrix production with reduced variance, which only requires storing the sub-sampled activations for calculating the gradient. Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones. By replacing the linear operation with our approximated one in transformers, we can achieve up to 2. 7X peak memory reduction with almost no accuracy drop and enables up to $6. 4\times$ larger batch size. Under the same hardware, \sas enables better down-streaming task performance by applying larger models and/or faster training speed with larger batch sizes. The code is available at https: //anonymous. 4open. science/r/WTACRS-A5C5/.

IJCAI Conference 2022 Conference Paper

AutoVideo: An Automated Video Action Recognition System

  • Daochen Zha
  • Zaid Pervaiz Bhat
  • Yi-Wei Chen
  • Yicheng Wang
  • Sirui Ding
  • Jiaben Chen
  • Kwei-Herng Lai
  • Mohammad Qazim Bhat

Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured for 1) highly modular and extendable infrastructure following the standard pipeline language, 2) an exhaustive list of primitives for pipeline construction, 3) data-driven tuners to save the efforts of pipeline tuning, and 4) easy-to-use Graphical User Interface (GUI). AutoVideo is released under MIT license at https: //github. com/datamllab/autovideo

NeurIPS Conference 2022 Conference Paper

DreamShard: Generalizable Embedding Table Placement for Recommender Systems

  • Daochen Zha
  • Louis Feng
  • Qiaoyu Tan
  • Zirui Liu
  • Kwei-Herng Lai
  • Bhargav Bhushanam
  • Yuandong Tian
  • Arun Kejariwal

We study embedding table placement for distributed recommender systems, which aims to partition and place the tables on multiple hardware devices (e. g. , GPUs) to balance the computation and communication costs. Although prior work has explored learning-based approaches for the device placement of computational graphs, embedding table placement remains to be a challenging problem because of 1) the operation fusion of embedding tables, and 2) the generalizability requirement on unseen placement tasks with different numbers of tables and/or devices. To this end, we present DreamShard, a reinforcement learning (RL) approach for embedding table placement. DreamShard achieves the reasoning of operation fusion and generalizability with 1) a cost network to directly predict the costs of the fused operation, and 2) a policy network that is efficiently trained on an estimated Markov decision process (MDP) without real GPU execution, where the states and the rewards are estimated with the cost network. Equipped with sum and max representation reductions, the two networks can directly generalize to any unseen tasks with different numbers of tables and/or devices without fine-tuning. Extensive experiments show that DreamShard substantially outperforms the existing human expert and RNN-based strategies with up to 19% speedup over the strongest baseline on large-scale synthetic tables and our production tables. The code is available.

NeurIPS Conference 2021 Conference Paper

Dirichlet Energy Constrained Learning for Deep Graph Neural Networks

  • Kaixiong Zhou
  • Xiao Huang
  • Daochen Zha
  • Rui Chen
  • Li Li
  • Soo-Hyun Choi
  • Xia Hu

Graph neural networks (GNNs) integrate deep architectures and topological structure modeling in an effective way. However, the performance of existing GNNs would decrease significantly when they stack many layers, because of the over-smoothing issue. Node embeddings tend to converge to similar vectors when GNNs keep recursively aggregating the representations of neighbors. To enable deep GNNs, several methods have been explored recently. But they are developed from either techniques in convolutional neural networks or heuristic strategies. There is no generalizable and theoretical principle to guide the design of deep GNNs. To this end, we analyze the bottleneck of deep GNNs by leveraging the Dirichlet energy of node embeddings, and propose a generalizable principle to guide the training of deep GNNs. Based on it, a novel deep GNN framework -- Energetic Graph Neural Networks (EGNN) is designed. It could provide lower and upper constraints in terms of Dirichlet energy at each layer to avoid over-smoothing. Experimental results demonstrate that EGNN achieves state-of-the-art performance by using deep layers.

ICML Conference 2021 Conference Paper

DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

  • Daochen Zha
  • Jingru Xie
  • Wenye Ma
  • Sheng Zhang
  • Xiangru Lian
  • Xia Hu 0001
  • Ji Liu

Games are abstractions of the real world, where artificial agents learn to compete and cooperate with other agents. While significant achievements have been made in various perfect- and imperfect-information games, DouDizhu (a. k. a. Fighting the Landlord), a three-player card game, is still unsolved. DouDizhu is a very challenging domain with competition, collaboration, imperfect information, large state space, and particularly a massive set of possible actions where the legal actions vary significantly from turn to turn. Unfortunately, modern reinforcement learning algorithms mainly focus on simple and small action spaces, and not surprisingly, are shown not to make satisfactory progress in DouDizhu. In this work, we propose a conceptually simple yet effective DouDizhu AI system, namely DouZero, which enhances traditional Monte-Carlo methods with deep neural networks, action encoding, and parallel actors. Starting from scratch in a single server with four GPUs, DouZero outperformed all the existing DouDizhu AI programs in days of training and was ranked the first in the Botzone leaderboard among 344 AI agents. Through building DouZero, we show that classic Monte-Carlo methods can be made to deliver strong results in a hard domain with a complex action space. The code and an online demo are released at https: //github. com/kwai/DouZero with the hope that this insight could motivate future work.

ICLR Conference 2021 Conference Paper

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

  • Daochen Zha
  • Wenye Ma
  • Lei Yuan
  • Xia Hu 0001
  • Ji Liu

Exploration under sparse reward is a long-standing challenge of model-free reinforcement learning. The state-of-the-art methods address this challenge by introducing intrinsic rewards to encourage exploration in novel states or uncertain environment dynamics. Unfortunately, methods based on intrinsic rewards often fall short in procedurally-generated environments, where a different environment is generated in each episode so that the agent is not likely to visit the same state more than once. Motivated by how humans distinguish good exploration behaviors by looking into the entire episode, we introduce RAPID, a simple yet effective episode-level exploration method for procedurally-generated environments. RAPID regards each episode as a whole and gives an episodic exploration score from both per-episode and long-term views. Those highly scored episodes are treated as good exploration behaviors and are stored in a small ranking buffer. The agent then imitates the episodes in the buffer to reproduce the past good exploration behaviors. We demonstrate our method on several procedurally-generated MiniGrid environments, a first-person-view 3D Maze navigation task from MiniWorld, and several sparse MuJoCo tasks. The results show that RAPID significantly outperforms the state-of-the-art intrinsic reward strategies in terms of sample efficiency and final performance. The code is available at https://github.com/daochenzha/rapid

NeurIPS Conference 2021 Conference Paper

Revisiting Time Series Outlier Detection: Definitions and Benchmarks

  • Kwei-Herng Lai
  • Daochen Zha
  • Junjie Xu
  • Yue Zhao
  • Guanchu Wang
  • Xia Hu

Time series outlier detection has been extensively studied with many advanced algorithms proposed in the past decade. Despite these efforts, very few studies have investigated how we should benchmark the existing algorithms. In particular, using synthetic datasets for evaluation has become a common practice in the literature, and thus it is crucial to have a general synthetic criterion to benchmark algorithms. This is a non-trivial task because the existing synthetic methods are very different in different applications and the outlier definitions are often ambiguous. To bridge this gap, we propose a behavior-driven taxonomy for time series outliers and categorize outliers into point- and pattern-wise outliers with clear context definitions. Following the new taxonomy, we then present a general synthetic criterion and generate 35 synthetic datasets accordingly. We further identify 4 multivariate real-world datasets from different domains and benchmark 9 algorithms on the synthetic and the real-world datasets. Surprisingly, we observe that some classical algorithms could outperform many recent deep learning approaches. The datasets, pre-processing and synthetic scripts, and the algorithm implementations are made publicly available at https: //github. com/datamllab/tods/tree/benchmark

AAAI Conference 2021 System Paper

TODS: An Automated Time Series Outlier Detection System

  • Kwei-Herng Lai
  • Daochen Zha
  • Guanchu Wang
  • Junjie Xu
  • Yue Zhao
  • Devesh Kumar
  • Yile Chen
  • Purav Zumkhawaka

We present TODS, an automated Time Series Outlier Detection System for research and industrial applications. TODS is a highly modular system that supports easy pipeline construction. The basic building block of TODS is primitive, which is an implementation of a function with hyperparameters. TODS currently supports 70 primitives, including data processing, time series processing, feature analysis, detection algorithms, and a reinforcement module. Users can freely construct a pipeline using these primitives and perform endto-end outlier detection with the constructed pipeline. TODS provides a Graphical User Interface (GUI), where users can flexibly design a pipeline with drag-and-drop. Moreover, a data-driven searcher is provided to automatically discover the most suitable pipelines given a dataset. TODS is released under Apache 2. 0 license at https: //github. com/datamllab/tods. A video is available on YouTube1.

IJCAI Conference 2020 Conference Paper

Dual Policy Distillation

  • Kwei-Herng Lai
  • Daochen Zha
  • Yuening Li
  • Xia Hu

Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation (DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.

IJCAI Conference 2020 Conference Paper

Multi-Channel Graph Neural Networks

  • Kaixiong Zhou
  • Qingquan Song
  • Xiao Huang
  • Daochen Zha
  • Na Zou
  • Xia Hu

The classification of graph-structured data has be-come increasingly crucial in many disciplines. It has been observed that the implicit or explicit hierarchical community structures preserved in real-world graphs could be useful for downstream classification applications. A straightforward way to leverage the hierarchical structure is to make use the pooling algorithms to cluster nodes into fixed groups, and shrink the input graph layer by layer to learn the pooled graphs. However, the pool shrinking discards the graph details to make it hard to distinguish two non-isomorphic graphs, and the fixed clustering ignores the inherent multiple characteristics of nodes. To compensate the shrinking loss and learn the various nodes’ characteristics, we propose the multi-channel graph neural networks (MuchGNN). Motivated by the underlying mechanisms developed in convolutional neural networks, we define the tailored graph convolutions to learn a series of graph channels at each layer, and shrink the graphs hierarchically to en-code the pooled structures. Experimental results on real-world datasets demonstrate the superiority of MuchGNN over the state-of-the-art methods.

IJCAI Conference 2020 Conference Paper

RLCard: A Platform for Reinforcement Learning in Card Games

  • Daochen Zha
  • Kwei-Herng Lai
  • Songyi Huang
  • Yuanpu Cao
  • Keerthana Reddy
  • Juan Vargas
  • Alex Nguyen
  • Ruzhe Wei

We present RLCard, a Python platform for reinforcement learning research and development in card games. RLCard supports various card environments and several baseline algorithms with unified easy-to-use interfaces, aiming at bridging reinforcement learning and imperfect information games. The platform provides flexible configurations of state representation, action encoding, and reward design. RLCard also supports visualizations for algorithm debugging. In this demo, we showcase two representative environments and their visualization results. We conclude this demo with challenges and research opportunities brought by RLCard. A video is available on YouTube.

NeurIPS Conference 2020 Conference Paper

Towards Deeper Graph Neural Networks with Differentiable Group Normalization

  • Kaixiong Zhou
  • Xiao Huang
  • Yuening Li
  • Daochen Zha
  • Rui Chen
  • Xia Hu

Graph neural networks (GNNs), which learn the representation of a node by aggregating its neighbors, have become an effective computational tool in downstream applications. Over-smoothing is one of the key issues which limit the performance of GNNs as the number of layers increases. It is because the stacked aggregators would make node representations converge to indistinguishable vectors. Several attempts have been made to tackle the issue by bringing linked node pairs close and unlinked pairs distinct. However, they often ignore the intrinsic community structures and would result in sub-optimal performance. The representations of nodes within the same community/class need be similar to facilitate the classification, while different classes are expected to be separated in embedding space. To bridge the gap, we introduce two over-smoothing metrics and a novel technique, i. e. , differentiable group normalization (DGN). It normalizes nodes within the same group independently to increase their smoothness, and separates node distributions among different groups to significantly alleviate the over-smoothing issue. Experiments on real-world datasets demonstrate that DGN makes GNN models more robust to over-smoothing and achieves better performance with deeper GNNs.

IJCAI Conference 2019 Conference Paper

Experience Replay Optimization

  • Daochen Zha
  • Kwei-Herng Lai
  • Kaixiong Zhou
  • Xia Hu

Experience replay enables reinforcement learning agents to memorize and reuse past experiences, just as humans replay memories for the situation at hand. Contemporary off-policy algorithms either replay past experiences uniformly or utilize a rule-based replay strategy, which may be sub-optimal. In this work, we consider learning a replay policy to optimize the cumulative reward. Replay learning is challenging because the replay memory is noisy and large, and the cumulative reward is unstable. To address these issues, we propose a novel experience replay optimization (ERO) framework which alternately updates two policies: the agent policy, and the replay policy. The agent is updated to maximize the cumulative reward based on the replayed data, while the replay policy is updated to provide the agent with the most useful experiences. The conducted experiments on various continuous control tasks demonstrate the effectiveness of ERO, empirically showing promise in experience replay learning to improve the performance of off-policy reinforcement learning algorithms.