Arrow Research search

Author name cluster

Xinyu Duan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

ICLR Conference 2025 Conference Paper

Beware of Calibration Data for Pruning Large Language Models

  • Yixin Ji
  • Yang Xiang 0003
  • Juntao Li 0005
  • Qingrong Xia
  • Ping Li 0016
  • Xinyu Duan
  • Zhefeng Wang 0001
  • Min Zhang 0005

As large language models (LLMs) are widely applied across various fields, model compression has become increasingly crucial for reducing costs and improving inference efficiency. Post-training pruning is a promising method that does not require resource-intensive iterative training and only needs a small amount of calibration data to assess the importance of parameters. Recent research has enhanced post-training pruning from different aspects but few of them systematically explore the effects of calibration data, and it is unclear if there exist better calibration data construction strategies. We fill this blank and surprisingly observe that calibration data is also crucial to post-training pruning, especially for high sparsity. Through controlled experiments on important influence factors of calibration data, including the pruning settings, the amount of data, and its similarity with pre-training data, we observe that a small size of data is adequate, and more similar data to its pre-training stage can yield better performance. As pre-training data is usually inaccessible for advanced LLMs, we further provide a self-generating calibration data synthesis strategy to construct feasible calibration data. Experimental results on recent strong open-source LLMs (e.g., DCLM, and LLaMA-3) show that the proposed strategy can enhance the performance of strong pruning methods (e.g., Wanda, DSnoT, OWL) by a large margin (up to 2.68%).

AAAI Conference 2025 Conference Paper

Multi-Branch Self-Drafting for LLM Inference Acceleration

  • Zipeng Gao
  • Qingrong Xia
  • Tong Xu
  • Xinyu Duan
  • Zhi Zheng
  • Zhefeng Wang
  • Enhong Chen

The autoregressive decoding paradigm endows large language models (LLMs) with superior language generation capabilities; however, its step-by-step decoding process inherently limits decoding speed. To mitigate these constraints, the prevalent “draft and validation” strategy enables parallel validation of candidate drafts, allowing LLMs to decode multiple tokens simultaneously during one model forward propagation. However, existing methodologies for obtaining drafts often incur additional overhead in communication or training process, or statistical biases from the corpus. To this end, we propose an innovative draft generation and maintenance approach that leverages the capabilities of LLM itself. Specifically, we extend the autoregressive decoding paradigm to a multi-branch drafting procedure, which can efficiently generate draft sequences without any additional models or training process, while preserving the quality of the generated content by maintaining LLM parameters. Experiments across various open-source benchmarks show that our method generates 2.0 to 3.2 tokens per forward step and achieves around 2 times improvement of end-to-end throughput compared to the autoregressive decoding strategy.

ICLR Conference 2024 Conference Paper

Are Bert Family Good Instruction Followers? A Study on Their Potential And Limitations

  • Yisheng Xiao
  • Juntao Li 0005
  • Zechen Sun
  • Zechang Li
  • Qingrong Xia
  • Xinyu Duan
  • Zhefeng Wang 0001
  • Min Zhang 0005

Language modeling at scale has proven very effective and brought unprecedented success to natural language models. Many typical representatives, especially decoder-only models, e.g., BLOOM and LLaMA, and encoder-decoder models, e.g., Flan-T5 and AlexaTM, have exhibited incredible instruction-following capabilities while keeping strong task completion ability. These large language models can achieve superior performance in various tasks and even yield emergent capabilities, e.g., reasoning and universal generalization. Though the above two paradigms are mainstream and well explored, the potential of the BERT family, which are encoder-only based models and have ever been one of the most representative pre-trained models, also deserves attention, at least should be discussed. In this work, we adopt XML-R to explore the effectiveness of the BERT family for instruction following and zero-shot learning. We first design a simple yet effective strategy to utilize the encoder-only models for generation tasks and then conduct multi-task instruction tuning. Experimental results demonstrate that our fine-tuned model, Instruct-XMLR, outperforms Bloomz on all evaluation tasks and achieves comparable performance with mT0 on most tasks. Surprisingly, Instruct-XMLR also possesses strong task and language generalization abilities, indicating that Instruct-XMLR can also serve as a good instruction follower and zero-shot learner. Besides, Instruct-XMLR can accelerate decoding due to its non-autoregressive generation manner, achieving around 3 times speedup compared with current autoregressive large language models. Although we also witnessed several limitations through our experiments, such as the performance decline in long-generation tasks and the shortcoming of length prediction, Instruct-XMLR can still become a good member of the family of current large language models.

ICLR Conference 2024 Conference Paper

AuG-KD: Anchor-Based Mixup Generation for Out-of-Domain Knowledge Distillation

  • Zihao Tang
  • Zheqi Lv
  • Shengyu Zhang 0001
  • Yifan Zhou
  • Xinyu Duan
  • Fei Wu 0001
  • Kun Kuang 0001

Due to privacy or patent concerns, a growing number of large models are released without granting access to their training data, making transferring their knowledge inefficient and problematic. In response, Data-Free Knowledge Distillation (DFKD) methods have emerged as direct solutions. However, simply adopting models derived from DFKD for real-world applications suffers significant performance degradation, due to the discrepancy between teachers' training data and real-world scenarios (student domain). The degradation stems from the portions of teachers' knowledge that are not applicable to the student domain. They are specific to the teacher domain and would undermine students' performance. Hence, selectively transferring teachers' appropriate knowledge becomes the primary challenge in DFKD. In this work, we propose a simple but effective method AuG-KD. It utilizes an uncertainty-guided and sample-specific anchor to align student-domain data with the teacher domain and leverages a generative method to progressively trade off the learning process between OOD knowledge distillation and domain-specific information learning via mixup learning. Extensive experiments in 3 datasets and 8 settings demonstrate the stability and superiority of our approach.

AAAI Conference 2024 Conference Paper

Dialogues Are Not Just Text: Modeling Cognition for Dialogue Coherence Evaluation

  • Xue Li
  • Jia Su
  • Yang Yang
  • Zipeng Gao
  • Xinyu Duan
  • Yi Guan

The generation of logically coherent dialogues by humans relies on underlying cognitive abilities. Based on this, we redefine the dialogue coherence evaluation process, combining cognitive judgment with the basic text to achieve a more human-like evaluation. We propose a novel dialogue evaluation framework based on Dialogue Cognition Graph (DCGEval) to implement the fusion by in-depth interaction between cognition modeling and text modeling. The proposed Abstract Meaning Representation (AMR) based graph structure called DCG aims to uniformly model four dialogue cognitive abilities. Specifically, core-semantic cognition is modeled by converting the utterance into an AMR graph, which can extract essential semantic information without redundancy. The temporal and role cognition are modeled by establishing logical relationships among the different AMR graphs. Finally, the commonsense knowledge from ConceptNet is fused to express commonsense cognition. Experiments demonstrate the necessity of modeling human cognition for dialogue evaluation, and our DCGEval presents stronger correlations with human judgments compared to other state-of-the-art evaluation metrics.

AAAI Conference 2024 Conference Paper

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

  • Yu Zhang
  • Rongjie Huang
  • Ruiqi Li
  • JinZheng He
  • Yan Xia
  • Feiyang Chen
  • Xinyu Duan
  • Baoxing Huai

Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/.

AAAI Conference 2018 Short Paper

Multi-Label Community-Based Question Classification via Personalized Sequence Memory Network Learning

  • Xinyu Duan
  • Shengyu Zhang
  • Zhou Zhao
  • Fei Wu
  • Yueting Zhuang

Multi-label community-based question classification is a challenging problem in Community-based Question Answering (CQA), arising in many real applications such as question navigation and expert finding. Most of the existing approaches consider the problem as content-based tag suggestion task, which suffers from the textual sparsity issue. In this paper, we consider the problem from the viewpoint of personalized sequence learning. We introduce the personalized sequence memory network that leverages not only the semantics of questions but also the personalized information of askers to provide the sequence tag learning function to capture the high-order tag dependency. The experiment on realworld dataset shows the effectiveness of our method.

AAAI Conference 2016 Conference Paper

Community-Based Question Answering via Heterogeneous Social Network Learning

  • Hanyin Fang
  • Fei Wu
  • Zhou Zhao
  • Xinyu Duan
  • Yueting Zhuang
  • Martin Ester

Community-based question answering (cQA) sites have accumulated vast amount of questions and corresponding crowdsourced answers over time. How to efficiently share the underlying information and knowledge from reliable (usually highly-reputable) answerers has become an increasingly popular research topic. A major challenge in cQA tasks is the accurate matching of high-quality answers w. r. t given questions. Many of traditional approaches likely recommend corresponding answers merely depending on the content similarity between questions and answers, therefore suffer from the sparsity bottleneck of cQA data. In this paper, we propose a novel framework which encodes not only the contents of question-answer(Q-A) but also the social interaction cues in the community to boost the cQA tasks. More specifically, our framework collaboratively utilizes the rich interaction among questions, answers and answerers to learn the relative quality rank of different answers w. r. t a same question. Moreover, the information in heterogeneous social networks is comprehensively employed to enhance the quality of question-answering (QA) matching by our deep random walk learning framework. Extensive experiments on a large-scale dataset from a real world cQA site show that leveraging the heterogeneous social information indeed achieves better performance than other state-of-the-art cQA methods.