Arrow Research search

Author name cluster

Wenjun Ke

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers
1 author row

Possible papers

17

AAAI Conference 2026 Conference Paper

Balanced Knowledge Distillation for Large Language Models with Mix-of-Experts

  • Jiajun Liu
  • Yao He
  • Wenjun Ke
  • Peng Wang
  • Ziyu Shang
  • Guozheng Li
  • Zijie Xu

Mixture-of-Experts (MoE) architectures have recently become a more prevalent choice for large language models (LLMs) than dense architectures due to their superior performance. However, billions of parameters bring MoE LLMs a huge cost for deployment and inference. To address these issues, knowledge distillation (KD) has become a widely adopted technique to compress LLMs. Existing KD methods for LLMs can be divided into dense-to-dense and moe-to-dense distillation. Dense-to-dense distillation transfers knowledge between single dense LLMs, while moe-to-dense distillation attempts to transfer knowledge between the MoE LLMs and the dense LLMs. However, the architectural mismatch prevents the student from fully absorbing knowledge when distilling MoE LLMs. To address this limitation, we investigate a new distillation setting, moe-to-moe, which aims to fully leverage expert knowledge of teachers and enable the student to absorb it more effectively. Compared to dense-to-dense and moe-to-dense, moe-to-moe suffers from two imbalance issues. First, expert-coverage deficiency reflects an imbalanced knowledge transfer of teacher experts: traditional distillation utilizes only the few experts activated by the teacher router. Second, routing imbalance appears when the student routing distribution drifts from the teacher, which makes it difficult for students to learn how to distribute different experts. To overcome these issues, we propose a novel distillation framework for moe-to-moe, Balanced Distillation (B-Distill), which equally spreads teacher expertise across student experts while regularizing the student router toward teacher-consistent balance. First, to mitigate expert-coverage deficiency, we introduce Monte Carlo exploration, which stochastically perturbs router probabilities so every teacher and student expert is sampled without enlarging the search space. Second, to correct routing imbalance and avert load collapse, we propose an entropy-aware router distillation mechanism that aligns the student router with the teacher while curbing over-concentration. Experiments show that B-Distill outperforms baselines by up to 6.6% in Rouge-L.

AAAI Conference 2026 Conference Paper

Benchmarking and Enhancing Rule Knowledge-Driven Reasoning of Large Language Models

  • Zijie Xu
  • Wenjun Ke
  • Peng Wang
  • Guozheng Li
  • Qingjian Ni
  • Jiajun Liu
  • Ziyu Shang
  • Jing Zhou

Large Language Models (LLMs) have demonstrated strong capabilities across diverse tasks under the example-driven learning paradigm. However, in high-stakes domains such as emergency response and industrial safety, historical incidents are scarce, confidential, or both, while concise rule books are abundant. We formalize this underexplored setting as rule knowledge-driven reasoning and ask: Can LLMs reason reliably when rules are plentiful but examples are nearly absent? To study this question, we introduce RULER, an automatic benchmark that generates 32K rigorously verified questions from 1K expert-curated emergency response rules to probe three core abilities: rule memorization, single-rule application, and multi-rule complex reasoning. RULER is further equipped with a hallucination-aware evaluation suite and novel relational metrics. A comprehensive empirical study of five representative LLMs and five enhancement strategies shows that, even when models achieve reliable performance on rule memorization and single-rule application, multi-rule complex reasoning plateaus at 5.4 on a 10-point scale. To address this limitation, we propose RAMPS, a Rule knowledge-Aware Monte Carlo Tree Search Process-reward Supervision framework. RAMPS injects rule knowledge priors into MCTS, distills 12K step-level traces without human annotation, and trains an advantage-based reward model that scores candidate reasoning paths during beam search inference. Experimental results show that RAMPS significantly improves multi-rule complex reasoning performance to 7.7.

AAAI Conference 2026 Conference Paper

Optimizing LoRA Allocation of MoE with the Alignment of Topic Correlation

  • Hengyuan Xu
  • Wenjun Ke
  • Yao He
  • Jiajun Liu
  • Dong Nie
  • Peng Wang
  • Ziyu Shang
  • Zijie Xu

Mixture of experts (MoE) dynamically routes inputs to specialized expert networks to scale model capacity with low inference overhead. However, the excessive parameter growth in MoE models poses challenges in low-resource settings. To address these issues, MoE with parameter-efficient fine-tuning (PEFT) methods have emerged as a lightweight adaptation paradigm that distributes knowledge among experts via multiple LoRA blocks. Existing MoE-PEFT methods can be broadly categorized into External and Internal PEFT methods. External PEFT methods incorporate lightweight models into existing MoE architectures without modifying their routing, which limits the model’s parameter efficiency. To overcome these issues, Internal PEFT methods integrate MoE architectures into PEFT, enabling minimal parameter overhead. However, they still face two major challenges: (1) lack of expert functional differentiation, resulting in overlapping specialization across modules, and (2) absence of a structured attribution mechanism to guide expert selection based on semantic relevance. To alleviate these challenges, we propose TopicLoRA, a novel three-stage framework that leverages topic knowledge as semantic anchors to guide expert allocation. Specifically, (1) to address expert redundancy, we construct a topic-level prior graph using Graph Neural Network-enhanced representation learning over Big-Bench categories, enforcing structural separation among expert embeddings, and (2) to introduce semantic attribution, we design a dual-loss training mechanism that softly aligns input-query relevance with topic-guided routing distributions via KL divergence. Extensive experiments on representative datasets (e.g., MMLU, GSM8K, Flanv2) demonstrate that TopicLoRA outperforms state-of-the-art PEFT baselines by 2.40% on average in accuracy. Notably, the maximum improvement is 4.21%. Furthermore, ablation studies demonstrate that our framework's robustness to intricate topics and input sequence variations, which stems from the dual-loss training mechanism.

AAAI Conference 2024 Conference Paper

ConsistNER: Towards Instructive NER Demonstrations for LLMs with the Consistency of Ontology and Context

  • Chenxiao Wu
  • Wenjun Ke
  • Peng Wang
  • Zhizhao Luo
  • Guozheng Li
  • Wanyi Chen

Named entity recognition (NER) aims to identify and classify specific entities mentioned in textual sentences. Most existing superior NER models employ the standard fully supervised paradigm, which requires a large amount of annotated data during training. In order to maintain performance with insufficient annotation resources (i.e., low resources), in-context learning (ICL) has drawn a lot of attention, due to its plug-and-play nature compared to other methods (e.g., meta-learning and prompt learning). In this manner, how to retrieve high-correlated demonstrations for target sentences serves as the key to emerging ICL ability. For the NER task, the correlation implies the consistency of both ontology (i.e., generalized entity type) and context (i.e., sentence semantic), which is ignored by previous NER demonstration retrieval techniques. To address this issue, we propose ConsistNER, a novel three-stage framework that incorporates ontological and contextual information for low-resource NER. Firstly, ConsistNER employs large language models (LLMs) to pre-recognize potential entities in a zero-shot manner. Secondly, ConsistNER retrieves the sentence-specific demonstrations for each target sentence based on the two following considerations: (1) Regarding ontological consistency, demonstrations are filtered into a candidate set based on ontology distribution. (2) Regarding contextual consistency, an entity-aware self-attention mechanism is introduced to focus more on the potential entities and semantic-correlated tokens. Finally, ConsistNER feeds the retrieved demonstrations for all target sentences into LLMs for prediction. We conduct experiments on four widely-adopted NER datasets, including both general and specific domains. Experimental results show that ConsistNER achieves a 6.01%-26.37% and 3.07%-21.18% improvement over the state-of-the-art baselines on Micro-F1 scores under 1- and 5-shot settings, respectively.

IJCAI Conference 2024 Conference Paper

Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

  • Ke Ji
  • Peng Wang
  • Wenjun Ke
  • Guozheng Li
  • Jiajun Liu
  • Jingsheng Gao
  • Ziyu Shang

Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex hierarchically dependent tasks, especially when the downstream data is extremely scarce. The main challenge is how to transfer the unstructured semantic space in PLMs to the downstream domain hierarchy. Unlike previous work on hierarchical text classification (HTC) which directly performs multi-label classification or uses graph neural network (GNN) to inject label hierarchy, in this work, we study the HTC problem under a few-shot setting to adapt knowledge in PLMs from an unstructured manner to the downstream hierarchy. Technically, we design a simple yet effective method named Hierarchical Iterative Conditional Random Field (HierICRF) to search the most domain-challenging directions and exquisitely crafts domain-hierarchy adaptation as a hierarchical iterative language modeling problem, and then it encourages the model to make hierarchical consistency self-correction during the inference, thereby achieving knowledge transfer with hierarchical consistency preservation. We perform HierICRF on various architectures, and extensive experiments on two popular HTC datasets demonstrate that prompt with HierICRF significantly boosts the few-shot HTC performance with an average Micro-F1 by 28. 80% to 1. 50% and Macro-F1 by 36. 29% to 1. 5% over the previous state-of-the-art (SOTA) baselines under few-shot settings (1->16), while remaining SOTA hierarchical consistency performance.

IJCAI Conference 2024 Conference Paper

Fast and Continual Knowledge Graph Embedding via Incremental LoRA

  • Jiajun Liu
  • Wenjun Ke
  • Peng Wang
  • Jiahao Wang
  • Jinhua Gao
  • Ziyu Shang
  • Guozheng Li
  • Zijie Xu

Continual Knowledge Graph Embedding (CKGE) aims to efficiently learn new knowledge and simultaneously preserve old knowledge. Dominant approaches primarily focus on alleviating catastrophic forgetting of old knowledge but neglect efficient learning for the emergence of new knowledge. However, in real-world scenarios, knowledge graphs (KGs) are continuously growing, which brings a significant challenge to fine-tuning KGE models efficiently. To address this issue, we propose a fast CKGE framework (FastKGE), incorporating an incremental low-rank adapter (IncLoRA) mechanism to efficiently acquire new knowledge while preserving old knowledge. Specifically, to mitigate catastrophic forgetting, FastKGE isolates and allocates new knowledge to specific layers based on the fine-grained influence between old and new KGs. Subsequently, to accelerate fine-tuning, FastKGE devises an efficient IncLoRA mechanism, which embeds the specific layers into incremental low-rank adapters with fewer training parameters. Moreover, IncLoRA introduces adaptive rank allocation, which makes the LoRA aware of the importance of entities and adjusts its rank scale adaptively. We conduct experiments on four public datasets and two new datasets with a larger initial scale. Experimental results demonstrate that FastKGE can reduce training time by 34%-49% while still achieving competitive link prediction performance against state-of-the-art models on four public datasets (average MRR score of 21. 0% vs. 21. 1%). Meanwhile, on two newly constructed datasets, FastKGE saves 51%-68% training time and improves link prediction performance by 1. 5%.

IJCAI Conference 2024 Conference Paper

Incorporating Schema-Aware Description into Document-Level Event Extraction

  • Zijie Xu
  • Peng Wang
  • Wenjun Ke
  • Guozheng Li
  • Jiajun Liu
  • Ke Ji
  • Xiye Chen
  • Chenxiao Wu

Document-level event extraction (DEE) aims to extract the structured event information from a given document, facing two critical challenges: (1) event arguments always scatter across sentences (arguments-scattering); (2) multiple events can co-occur in one document (multi-event). Most recent studies mainly follow two simplified settings to ease the challenges: one simplifies DEE with the no-trigger-words design (NDEE), and the other focuses on event argument extraction (DEAE), a sub-task of DEE. However, the former excludes trigger extraction and suffers from error propagation in the sub-tasks. The latter relies heavily on the gold triggers as prerequisites and struggles to distinguish multiple arguments playing the same role in different events. To address the limitations above, we propose a novel joint trigger and argument extraction paradigm SEELE to enhance the DEE model via incorporating SchEma-awarE descriptions into Document-Level Event extraction. Specifically, the schema-aware descriptions are leveraged from two aspects: (1) guiding the attention mechanism among event-aware tokens across sentences, which relieves arguments-scattering without error propagation; (2) performing the fine-grained contrastive learning to distinguish different events, which mitigates multi-event without gold triggers. Extensive experiments show the superiority of SEELE, achieving notable improvements (2. 1% to 9. 7% F1) on three NDEE datasets and competitive performance on two DEAE datasets. Our code is available at https: //github. com/TheoryRhapsody/SEELE.

IJCAI Conference 2024 Conference Paper

Learning Multi-Granularity and Adaptive Representation for Knowledge Graph Reasoning

  • Ziyu Shang
  • Peng Wang
  • Wenjun Ke
  • Jiajun Liu
  • Hailang Huang
  • Guozheng Li
  • Chenxiao Wu
  • Jianghan Liu

Knowledge graph reasoning (KGR) aims to infer new factual triples from existing knowledge graphs (KGs). Recently, a new category of methods, possessing both transductive and inductive reasoning capabilities, has been proposed to tackle this task via learning entity-independent representations from local neighboring structures. However, these methods are plagued by inefficiency issues and they exclusively capture evidence from well-designed local structures, ignoring the correlation between the query and different structures within KGs. In this work, we first propose a novel multi-granularity and adaptive representation framework, MulGA, exploiting the connectivity subgraph to uniformly and hierarchically model query-related triples, relation paths, and subgraphs without explicitly extracting any graph structure, hence mitigating inefficiency issues. Second, we introduce a message-passing mechanism across connectivity subgraphs, facilitating all entities to attain query-related structural representations of diverse granularity levels, i. e. , triple and relation paths of different lengths. Third, we design a self-attention-based merging mechanism that allocates weights to different granularities and then consolidates them into subgraph granularity representations for reasoning. The systematic experiments have been conducted on 15 benchmarks and MulGA achieves a significant improvement in MRR by an average of 1. 5% on transductive and 2. 7% on inductive tasks than existing state-of-the-art methods. Moreover, MulGA boasts faster convergence speed, competitive inference time, and alleviates the over-smoothing prevalent in graph neural networks.

IJCAI Conference 2024 Conference Paper

Making LLMs as Fine-Grained Relation Extraction Data Augmentor

  • Yifan Zheng
  • Wenjun Ke
  • Qi Liu
  • Yuting Yang
  • Ruizhuo Zhao
  • Dacheng Feng
  • Jianwei Zhang
  • Zhi Fang

Relation Extraction (RE) identifies relations between entities in text, typically relying on supervised models that demand abundant high-quality data. Various approaches, including Data Augmentation (DA), have been proposed as promising solutions for addressing low-resource challenges in RE. However, existing DA methods in RE often struggle to ensure consistency and contextual diversity in generated data due to the fine-grained nature of RE. Inspired by the extensive generative capabilities of large language models (LLMs), we introduce a novel framework named ConsistRE, aiming to maintain context consistency in RE. ConsistRE initiates by collecting a substantial corpus from external resources and employing statistical algorithms and semantics to identify keyword hints closely related to relation instances. These keyword hints are subsequently integrated as contextual constraints in sentence generation, ensuring the preservation of relation dependence and diversity with LLMs. Additionally, we implement syntactic dependency selection to enhance the syntactic structure of the generated sentences. Experimental results from the evaluation of SemEval, TACRED, and TACREV datasets unequivocally demonstrate that ConsistRE outperforms other baselines in F1 values by 1. 76%, 3. 92%, and 2. 53%, respectively, particularly when operating under low-resource experimental conditions.

AAAI Conference 2024 Conference Paper

OntoFact: Unveiling Fantastic Fact-Skeleton of LLMs via Ontology-Driven Reinforcement Learning

  • Ziyu Shang
  • Wenjun Ke
  • Nana Xiu
  • Peng Wang
  • Jiajun Liu
  • Yanhui Li
  • Zhizhao Luo
  • Ke Ji

Large language models (LLMs) have demonstrated impressive proficiency in information retrieval, while they are prone to generating incorrect responses that conflict with reality, a phenomenon known as intrinsic hallucination. The critical challenge lies in the unclear and unreliable fact distribution within LLMs trained on vast amounts of data. The prevalent approach frames the factual detection task as a question-answering paradigm, where the LLMs are asked about factual knowledge and examined for correctness. However, existing studies primarily focused on deriving test cases only from several specific domains, such as movies and sports, limiting the comprehensive observation of missing knowledge and the analysis of unexpected hallucinations. To address this issue, we propose OntoFact, an adaptive framework for detecting unknown facts of LLMs, devoted to mining the ontology-level skeleton of the missing knowledge. Specifically, we argue that LLMs could expose the ontology-based similarity among missing facts and introduce five representative knowledge graphs (KGs) as benchmarks. We further devise a sophisticated ontology-driven reinforcement learning (ORL) mechanism to produce error-prone test cases with specific entities and relations automatically. The ORL mechanism rewards the KGs for navigating toward a feasible direction for unveiling factual errors. Moreover, empirical efforts demonstrate that dominant LLMs are biased towards answering Yes rather than No, regardless of whether this knowledge is included. To mitigate the overconfidence of LLMs, we leverage a hallucination-free detection (HFD) strategy to tackle unfair comparisons between baselines, thereby boosting the result robustness. Experimental results on 5 datasets, using 32 representative LLMs, reveal a general lack of fact in current LLMs. Notably, ChatGPT exhibits fact error rates of 51.6% on DBpedia and 64.7% on YAGO, respectively. Additionally, the ORL mechanism demonstrates promising error prediction scores, with F1 scores ranging from 70% to 90% across most LLMs. Compared to the exhaustive testing, ORL achieves an average recall of 80% while reducing evaluation time by 35.29% to 63.12%.

IJCAI Conference 2024 Conference Paper

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

  • Guozheng Li
  • Peng Wang
  • Wenjun Ke
  • Yikai Guo
  • Ke Ji
  • Ziyu Shang
  • Jiajun Liu
  • Zijie Xu

Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from training examples, and (2) enabling LLMs exhibit strong ICL abilities in RE. On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations. On the other hand, ICL with an LLM achieves poor performance in RE while RE is different from language modeling in nature or the LLM is not large enough. In this work, we propose a novel recall-retrieve-reason RE framework that synergizes LLMs with retrieval corpora (training examples) to enable relevant retrieving and reliable in-context reasoning. Specifically, we distill the consistently ontological knowledge from training datasets to let LLMs generate relevant entity pairs grounded by retrieval corpora as valid queries. These entity pairs are then used to retrieve relevant training examples from the retrieval corpora as demonstrations for LLMs to conduct better ICL via instruction tuning. Extensive experiments on different LLMs and RE datasets demonstrate that our method generates relevant and valid entity pairs and boosts ICL abilities of LLMs, achieving competitive or new state-of-the-art performance on sentence-level RE compared to previous supervised fine-tuning methods and ICL-based methods.

AAAI Conference 2024 Conference Paper

Towards Continual Knowledge Graph Embedding via Incremental Distillation

  • Jiajun Liu
  • Wenjun Ke
  • Peng Wang
  • Ziyu Shang
  • Jinhua Gao
  • Guozheng Li
  • Ke Ji
  • Yanhe Liu

Traditional knowledge graph embedding (KGE) methods typically require preserving the entire knowledge graph (KG) with significant training costs when new knowledge emerges. To address this issue, the continual knowledge graph embedding (CKGE) task has been proposed to train the KGE model by learning emerging knowledge efficiently while simultaneously preserving decent old knowledge. However, the explicit graph structure in KGs, which is critical for the above goal, has been heavily ignored by existing CKGE methods. On the one hand, existing methods usually learn new triples in a random order, destroying the inner structure of new KGs. On the other hand, old triples are preserved with equal priority, failing to alleviate catastrophic forgetting effectively. In this paper, we propose a competitive method for CKGE based on incremental distillation (IncDE), which considers the full use of the explicit graph structure in KGs. First, to optimize the learning order, we introduce a hierarchical strategy, ranking new triples for layer-by-layer learning. By employing the inter- and intra-hierarchical orders together, new triples are grouped into layers based on the graph structure features. Secondly, to preserve the old knowledge effectively, we devise a novel incremental distillation mechanism, which facilitates the seamless transfer of entity representations from the previous layer to the next one, promoting old knowledge preservation. Finally, we adopt a two-stage training paradigm to avoid the over-corruption of old knowledge influenced by under-trained new knowledge. Experimental results demonstrate the superiority of IncDE over state-of-the-art baselines. Notably, the incremental distillation mechanism contributes to improvements of 0.2%-6.5% in the mean reciprocal rank (MRR) score. More exploratory experiments validate the effectiveness of IncDE in proficiently learning new knowledge while preserving old knowledge across all time steps.

AAAI Conference 2024 Conference Paper

Unify Named Entity Recognition Scenarios via Contrastive Real-Time Updating Prototype

  • Yanhe Liu
  • Peng Wang
  • Wenjun Ke
  • Guozheng Li
  • Xiye Chen
  • Jiteng Zhao
  • Ziyu Shang

Supervised named entity recognition (NER) aims to classify entity mentions into a fixed number of pre-defined types. However, in real-world scenarios, unknown entity types are continually involved. Naive fine-tuning will result in catastrophic forgetting on old entity types. Existing continual methods usually depend on knowledge distillation to alleviate forgetting, which are less effective on long task sequences. Moreover, most of them are specific to the class-incremental scenario and cannot adapt to the online scenario, which is more common in practice. In this paper, we propose a unified framework called Contrastive Real-time Updating Prototype (CRUP) that can handle different scenarios for NER. Specifically, we train a Gaussian projection model by a regularized contrastive objective. After training on each batch, we store the mean vectors of representations belong to new entity types as their prototypes. Meanwhile, we update existing prototypes belong to old types only based on representations of the current batch. The final prototypes will be used for the nearest class mean classification. In this way, CRUP can handle different scenarios through its batch-wise learning. Moreover, CRUP can alleviate forgetting in continual scenarios only with current data instead of old data. To comprehensively evaluate CRUP, we construct extensive benchmarks based on various datasets. Experimental results show that CRUP significantly outperforms baselines in continual scenarios and is also competitive in the supervised scenario.

NeurIPS Conference 2024 Conference Paper

Unveiling LoRA Intrinsic Ranks via Salience Analysis

  • Wenjun Ke
  • Jiahao Wang
  • Peng Wang
  • Jiajun Liu
  • Dong Nie
  • Guozheng Li
  • Yining Li

The immense parameter scale of large language models underscores the necessity for parameter-efficient fine-tuning methods. Methods based on Low-Rank Adaptation (LoRA) assume the low-rank characteristics of the incremental matrix and optimize the matrix obtained from low-rank decomposition. Although effective, these methods are constrained by a fixed and unalterable intrinsic rank, neglecting the variable importance of matrices. Consequently, methods for adaptive rank allocation are proposed, among which AdaLoRA demonstrates excellent fine-tuning performance. AdaLoRA conducts adaptation based on singular value decomposition (SVD), dynamically allocating intrinsic ranks according to importance. However, it still struggles to achieve a balance between fine-tuning effectiveness and efficiency, leading to limited rank allocation space. Additionally, the importance measurement focuses only on parameters with minimal impact on the loss, neglecting the dominant role of singular values in SVD-based matrices and the fluctuations during training. To address these issues, we propose SalientLoRA, which adaptively optimizes intrinsic ranks of LoRA via salience measurement. Firstly, during rank allocation, the salience measurement analyses the variation of singular value magnitudes across multiple time steps and establishes their inter-dependency relationships to assess the matrix importance. This measurement mitigates instability and randomness that may arise during importance assessment. Secondly, to achieve a balance between fine-tuning performance and efficiency, we propose an adaptive adjustment of time-series window, which adaptively controls the size of time-series for significance measurement and rank reduction during training, allowing for rapid rank allocation while maintaining training stability. This mechanism enables matrics to set a higher initial rank, thus expanding the allocation space for ranks. To evaluate the generality of our method across various tasks, we conduct experiments on natural language understanding (NLU), natural language generation (NLG), and large model instruction tuning tasks. Experimental results demonstrate the superiority of SalientLoRA, which outperforms state-of-the-art methods by 0. 96\%-3. 56\% on multiple datasets. Furthermore, as the rank allocation space expands, our method ensures fine-tuning efficiency, achieving a speed improvement of 94. 5\% compared to AdaLoRA. The code is publicly available at https: //github. com/Heyest/SalientLoRA.

AAAI Conference 2023 Conference Paper

fmLRE: A Low-Resource Relation Extraction Model Based on Feature Mapping Similarity Calculation

  • Peng Wang
  • Tong Shao
  • Ke Ji
  • Guozheng Li
  • Wenjun Ke

Low-resource relation extraction (LRE) aims to extract relations from limited labeled corpora. Existing work takes advantages of self-training or distant supervision to expand the limited labeled data in the data-driven approaches, while the selection bias of pseudo labels may cause the error accumulation in subsequent relation classification. To address this issue, this paper proposes fmLRE, an iterative feedback method based on feature mapping similarity calculation to improve the accuracy of pseudo labels. First, it calculates the similarities between pseudo-label and real-label data of the same category in a feature mapping space based on semantic features of labeled dataset after feature projection. Then, it fine-tunes initial model according to the iterative process of reinforcement learning. Finally, the similarity is used as a threshold for screening high-precision pseudo-labels and the basis for setting different rewards, which also acts as a penalty term for the loss function of relation classifier. Experimental results demonstrate that fmLRE achieves the state-of-the-art performance compared with strong baselines on two public datasets.

AAAI Conference 2023 Conference Paper

Online Noisy Continual Relation Learning

  • Guozheng Li
  • Peng Wang
  • Qiqing Luo
  • Yanhe Liu
  • Wenjun Ke

Recent work for continual relation learning has achieved remarkable progress. However, most existing methods only focus on tackling catastrophic forgetting to improve performance in the existing setup, while continually learning relations in the real-world must overcome many other challenges. One is that the data possibly comes in an online streaming fashion with data distributions gradually changing and without distinct task boundaries. Another is that noisy labels are inevitable in real-world, as relation samples may be contaminated by label inconsistencies or labeled with distant supervision. In this work, therefore, we propose a novel continual relation learning framework that simultaneously addresses both online and noisy relation learning challenges. Our framework contains three key modules: (i) a sample separated online purifying module that divides the online data stream into clean and noisy samples, (ii) a self-supervised online learning module that circumvents inferior training signals caused by noisy data, and (iii) a semi-supervised offline finetuning module that ensures the participation of both clean and noisy samples. Experimental results on FewRel, TACRED and NYT-H with real-world noise demonstrate that our framework greatly outperforms the combinations of the state-of-the-art online continual learning and noisy label learning methods.

IJCAI Conference 2023 Conference Paper

Towards Incremental NER Data Augmentation via Syntactic-aware Insertion Transformer

  • Wenjun Ke
  • Zongkai Tian
  • Qi Liu
  • Peng Wang
  • Jinhua Gao
  • Rui Qi

Named entity recognition (NER) aims to locate and classify named entities in natural language texts. Most existing high-performance NER models employ a supervised paradigm, which requires a large quantity of high-quality annotated data during training. In order to help NER models perform well in few-shot scenarios, data augmentation approaches attempt to build extra data by means of random editing or by using end-to-end generation with PLMs. However, these methods focus on only the fluency of generated sentences, ignoring the syntactic correlation between the new and raw sentences. Such uncorrelation also brings low diversity and inconsistent labeling of synthetic samples. To fill this gap, we present SAINT (Syntactic-Aware InsertioN Transformer), a hard-constraint controlled text generation model that incorporates syntactic information. The proposed method operates by inserting new tokens between existing entities in a parallel manner. During insertion procedure, new tokens will be added taking both semantic and syntactic factors into account. Hence the resulting sentence can retain the syntactic correctness with respect to the raw data. Experimental results on two benchmark datasets, i. e. , Ontonotes and Wikiann, demonstrate the comparable performance of SAINT over the state-of-the-art baselines.