Author name cluster

Yifeng Ding

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

EAAI Journal 2025 Journal Article

Joint class attention knowledge and self-knowledge for multi-teacher knowledge distillation

Yifeng Ding
Gaoming Yang
Xinxin Ye
Xiujun Wang
Zhi Liu

Intelligent applications using large-scale deep neural networks face significant challenges due to their high storage and computational demands, hindering deployment on resource-limited edge devices. Knowledge distillation addresses this by transferring knowledge from an extensive teacher network to a smaller student network, thereby reducing computational costs while preserving performance. Multi-teacher Knowledge Distillation (MKD) further enhances this by allowing the student to learn from multiple teachers. However, MKD methods have two key limitations: (1) They typically use non-interpretable logits or features as knowledge, limiting the transparency of the learning process. (2) They focus primarily on teacher-guided learning, neglecting the potential of combining teacher supervision with self-learning. To address these limitations, this study presents a novel method, Joint Class attention knowledge and Self-knowledge for Multi-teacher Knowledge Distillation (JCS-MKD), which combines both teacher supervision and self-learning. Our method introduces two key innovations: (1) A class attention mechanism that integrates class activation maps from multiple teachers to deliver more interpretable knowledge to the student. Additionally, an adaptive weighting scheme is employed to assign greater importance to teacher predictions that are closer to the ground truth, ensuring the student primarily learns from high-quality teacher knowledge. (2) A self-knowledge mechanism that decouples the student's logit into target and non-target components, customizing soft labels respectively to achieve adaptive self-supervision, enabling the student to refine their understanding independently. Experimental results on standard benchmark datasets demonstrate that JCS-MKD consistently outperforms state-of-the-art distillation methods across various teacher-student architectures. The code is available at: https: //github. com/EifelTing/JCS-MKD.

Details DOI

ICML Conference 2024 Conference Paper

Magicoder: Empowering Code Generation with OSS-Instruct

Yuxiang Wei 0003
Zhe Wang
Jiawei Liu 0004
Yifeng Ding
Lingming Zhang 0001

We introduce Magicoder, a series of fully open-source (code, weights, and data) Large Language Models (LLMs) for code that significantly closes the gap with top code models while having no more than 7B parameters. Magicoder models are trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets to generate diverse instruction data for code. Our main motivation is to mitigate the inherent bias of the synthetic data generated by LLMs through the wealth of open-source references for the production of more realistic and controllable data. The orthogonality of OSS-Instruct and other data generation methods like Evol-Instruct further enables us to build an enhanced MagicoderS. Both Magicoder and MagicoderS substantially outperform state-of-the-art code models with similar or even larger sizes on a wide range of coding benchmarks. Notably, MagicoderS-CL-7B based on CodeLlama even surpasses the prominent ChatGPT on HumanEval+ (66. 5 vs. 65. 9 in pass@1 ). Overall, OSS-Instruct opens a new direction for crafting diverse synthetic instruction data for code using abundant open-source references.

Details

NeurIPS Conference 2024 Conference Paper

SelfCodeAlign: Self-Alignment for Code Generation

Yuxiang Wei
Federico Cassano
Jiawei Liu
Yifeng Ding
Naman Jain
Zachary Mueller
Harm de Vries
Leandro Von Werra

Instruction tuning is a supervised fine-tuning approach that significantly improves the ability of large language models (LLMs) to follow human instructions. For programming tasks, most models are finetuned with costly human-annotated instruction-response pairs or those generated by large, proprietary LLMs, which may not be permitted. We propose SelfCodeAlign, the first fully transparent and permissive pipeline for self-aligning code LLMs without extensive human annotations or distillation. SelfCodeAlign employs the same base model for inference throughout the data generation process. It first extracts diverse coding concepts from high-quality seed snippets to generate new tasks. It then samples multiple responses per task, pairs each with test cases, and validates them in a sandbox environment. Finally, passing examples are selected for instruction tuning. In our primary experiments, we use SelfCodeAlign with CodeQwen1. 5-7B to generate a dataset of 74k instruction-response pairs. Finetuning on this dataset leads to a model that achieves a 67. 1 pass@1 on HumanEval+, surpassing CodeLlama-70B-Instruct despite being ten times smaller. Across all benchmarks, this finetuned model consistently outperforms the original version trained with OctoPack, the previous state-of-the-art method for instruction tuning without human annotations or distillation. Additionally, we show that SelfCodeAlign is effective across LLMs of various sizes, from 3B to 33B, and that the base models can benefit more from alignment with their own data distribution. We further validate each component’s effectiveness in our pipeline, showing that SelfCodeAlign outperforms both direct distillation from GPT-4o and leading GPT-3. 5-based distillation methods, such as OSS-Instruct and Evol-Instruct. SelfCodeAlign has also led to the creation of StarCoder2-Instruct, the first fully transparent, permissively licensed, and self-aligned code LLM that achieves state-of-the-art coding performance. Overall, SelfCodeAlign shows for the first time that a strong instruction-tuned code LLM can result from self-alignment rather than distillation.

PDF Details DOI

LORI Conference 2023 Conference Paper

Modal Logics with Non-rigid Propositional Designators

Yifeng Ding

Abstract In most modal logics, atomic propositional symbols are directly representing the meaning of sentences (such as sets of possible worlds). In other words, they use only rigid propositional designators. This means they are not able to handle uncertainty in meaning directly at the sentential level. In this paper, we offer a modal language involving non-rigid propositional designators which can also carefully distinguish de re and de dicto use of these designators. Then, we axiomatize the logics in this language with respect to all Kripke models with multiple modalities and with respect to S5 Kripke models with a single modality.

Details

LORI Conference 2021 Conference Paper

Hypergraphs, Local Reasoning, and Weakly Aggregative Modal Logic

Yifeng Ding
Jixin Liu
Yanjing Wang 0001

Abstract This paper connects the following three apparently unrelated topics: an epistemic framework fighting logical omniscience, a class of generalized graphs without the arities of relations, and a family of non-normal modal logics rejecting the aggregative axiom. Through neighborhood frames as their meeting point, we show that, among many completeness results obtained in this paper, the limit of a family of weakly aggregative logics is both exactly the modal logic of hypergraphs and also the epistemic logic of local reasoning with veracity and positive introspection. The logics studied are shown to be decidable based on a filtration construction.

Details

LORI Conference 2019 Conference Paper

Weakly Aggregative Modal Logic: Characterization and Interpolation

Jixin Liu
Yanjing Wang 0001
Yifeng Ding

Abstract Weakly Aggregative Modal Logic ( \(\textsf {WAML}\) ) is a collection of disguised polyadic modal logics with n-ary modalities whose arguments are all the same. \(\textsf {WAML}\) has some interesting applications on epistemic logic and logic of games, so we study some basic model theoretical aspects of \(\textsf {WAML}\) in this paper. Specifically, we give a van Benthem-Rosen characterization theorem of \(\textsf {WAML}\) based on an intuitive notion of bisimulation and show that each basic \(\textsf {WAML}\) system \(\mathbb {K}_n\) lacks Craig Interpolation.

Details

TARK Conference 2019 Conference Paper

When Do Introspection Axioms Matter for Multi-Agent Epistemic Reasoning?

Yifeng Ding
Wesley H. Holliday
Cedegao Zhang

The early literature on epistemic logic in philosophy focused on reasoning about the knowledge or belief of a single agent, especially on controversies about "introspection axioms" such as the 4 and 5 axioms. By contrast, the later literature on epistemic logic in computer science and game theory has focused on multi-agent epistemic reasoning, with the single-agent 4 and 5 axioms largely taken for granted. In the relevant multi-agent scenarios, it is often important to reason about what agent A believes about what agent B believes about what agent A believes; but it is rarely important to reason just about what agent A believes about what agent A believes. This raises the question of the extent to which single-agent introspection axioms actually matter for multi-agent epistemic reasoning. In this paper, we formalize and answer this question. To formalize the question, we first define a set of multi-agent formulas that we call agent-alternating formulas, including formulas like Box_a Box_b Box_a p but not formulas like Box_a Box_a p. We then prove, for the case of belief, that if one starts with multi-agent K or KD, then adding both the 4 and 5 axioms (or adding the B axiom) does not allow the derivation of any new agent-alternating formulas -- in this sense, introspection axioms do not matter. By contrast, we show that such conservativity results fail for knowledge and multi-agent KT, though they hold with respect to a smaller class of agent-nonrepeating formulas.

Details DOI