Author name cluster

Zixiang Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2024 Conference Paper

How to Trade Off the Quantity and Capacity of Teacher Ensemble: Learning Categorical Distribution to Stochastically Employ a Teacher for Distillation

Zixiang Ding
Guoqing Jiang
Shuai Zhang
Lin Guo
Wei Lin

We observe two phenomenons with respect to quantity and capacity: 1) more teacher is not always better for multi-teacher knowledge distillation, and 2) stronger teacher is not always better for single-teacher knowledge distillation. To trade off the quantity and capacity of teacher ensemble, in this paper, we propose a new distillation paradigm named Dynamic Knowledge Distillation (DynaKD) that learn an adaptive categorical distribution to stochastically employ a teacher from a teacher ensemble in each step, to transfer knowledge from teacher ensemble into student. DynaKD has three advantages: 1) it can preserve diversity of each teacher via one-to-one distillation manner instead of several-for-one, 2) it can make the best of powerful teacher via those multi-level assistant teachers in ensemble, and 3) it can also dynamically determine the importance of each teacher for various tasks. To verify the effectiveness of the proposed approach, we conduct extensive experiments for BERT compression on GLUE benchmark. Experimental results show that the proposed approach achieves state-of-the-art score compared to previous compression approaches on five out of seven downstream tasks, including pushing MRPC F1 and accuracy to 92.2 (1.4 point absolute improvement), RTE accuracy to 76.2 (2.8 point absolute improvement). Moreover, we conduct also extensive experiments for image classification on CIFAR-100. Similarly, DynaKD achieves also state-of-the-art performance.

PDF Details DOI

AAAI Conference 2023 Conference Paper

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

Zixiang Ding
Guoqing Jiang
Shuai Zhang
Lin Guo
Wei Lin

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-style language model dubbed SKDBERT. In each distillation iteration, SKD samples a teacher model from a pre-defined teacher team, which consists of multiple teacher models with multi-level capacities, to transfer knowledge into student model in an one-to-one manner. Sampling distribution plays an important role in SKD. We heuristically present three types of sampling distributions to assign appropriate probabilities for multi-level teacher models. SKD has two advantages: 1) it can preserve the diversities of multi-level teacher models via stochastically sampling single teacher model in each distillation iteration, and 2) it can also improve the efficacy of knowledge distillation via multi-level teacher models when large capacity gap exists between the teacher model and the student model. Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT model by 40% while retaining 99.5% performances of language understanding and being 100% faster.

PDF Details DOI

AAAI Conference 2019 Conference Paper

From Independent Prediction to Reordered Prediction: Integrating Relative Position and Global Label Information to Emotion Cause Identification

Zixiang Ding
Huihui He
Mengran Zhang
Rui Xia

Emotion cause identification aims at identifying the potential causes that lead to a certain emotion expression in text. Several techniques including rule based methods and traditional machine learning methods have been proposed to address this problem based on manually designed rules and features. More recently, some deep learning methods have also been applied to this task, with the attempt to automatically capture the causal relationship of emotion and its causes embodied in the text. In this work, we find that in addition to the content of the text, there are another two kinds of information, namely relative position and global labels, that are also very important for emotion cause identification. To integrate such information, we propose a model based on the neural network architecture to encode the three elements (i. e. , text content, relative position and global label), in an unified and end-to-end fashion. We introduce a relative position augmented embedding learning algorithm, and transform the task from an independent prediction problem to a reordered prediction problem, where the dynamic global label information is incorporated. Experimental results on a benchmark emotion cause dataset show that our model achieves new state-ofthe-art performance and performs significantly better than a number of competitive baselines. Further analysis shows the effectiveness of the relative position augmented embedding learning algorithm and the reordered prediction mechanism with dynamic global labels.

PDF Details

IJCAI Conference 2019 Conference Paper

RTHN: A RNN-Transformer Hierarchical Network for Emotion Cause Extraction

Rui Xia
Mengran Zhang
Zixiang Ding

The emotion cause extraction (ECE) task aims at discovering the potential causes behind a certain emotion expression in a document. Techniques including rule-based methods, traditional machine learning methods and deep neural networks have been proposed to solve this task. However, most of the previous work considered ECE as a set of independent clause classification problems and ignored the relations between multiple clauses in a document. In this work, we propose a joint emotion cause extraction framework, named RNN-Transformer Hierarchical Network (RTHN), to encode and classify multiple clauses synchronously. RTHN is composed of a lower word-level encoder based on RNNs to encode multiple words in each clause, and an upper clause-level encoder based on Transformer to learn the correlation between multiple clauses in a document. We furthermore propose ways to encode the relative position and global predication information into Transformer that can capture the causality between clauses and make RTHN more efficient. We finally achieve the best performance among 12 compared systems and improve the F1 score of the state-of-the-art from 72. 69% to 76. 77%.

PDF Details