Arrow Research search

Author name cluster

Huajun Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

61 papers
2 author rows

Possible papers

61

AAAI Conference 2026 Conference Paper

Breaking the Modality Barrier: Generative Modeling for Accurate Molecule Retrieval from Mass Spectra

  • Yiwen Zhang
  • Keyan Ding
  • Yihang Wu
  • Xiang Zhuang
  • Yi Yang
  • Qiang Zhang
  • Huajun Chen

Retrieving molecular structures from tandem mass spectra is a crucial step in rapid compound identification. Existing retrieval methods, such as traditional mass spectral library matching, suffer from limited spectral library coverage, while recent cross-modal representation learning frameworks often encounter modality misalignment, resulting in suboptimal retrieval accuracy and generalization. To address these limitations, we propose GLMR, a Generative Language Model-based Retrieval framework that mitigates the cross-modal misalignment through a two-stage process. In the pre-retrieval stage, a contrastive learning-based model identifies top candidate molecules as contextual priors for the input mass spectrum. In the generative retrieval stage, these candidate molecules are integrated with the input mass spectrum to guide a generative model in producing refined molecular structures, which are then used to re-rank the candidates based on molecular similarity. Experiments on both MassSpecGym and the proposed MassRET-20k dataset demonstrate that GLMR significantly outperforms existing methods, achieving over 40% improvement in top-1 accuracy and exhibiting strong generalizability.

AAAI Conference 2026 Conference Paper

Template-Theorems Graph Construction to Enhance Mathematical Reasoning Capabilities of LLM

  • Yarong Lan
  • Yajing Xu
  • Huajun Chen

Large language models (LLMs) have made significant strides in mathematical reasoning, particularly at the elementary level. However, they continue to face substantial challenges when confronted with complex, advanced mathematical problems. In contrast to humans—who can effectively draw upon prior experiences in solving similar problems and retrieve relevant knowledge and theorems from memory—LLMs often struggle to accurately identify analogous problems and to recall or apply appropriate theorems. To overcome these limitations, we introduce a novel framework for constructing a template-theorems knowledge base, leveraging the capabilities of large language models. Inspired by the associative mechanisms of human cognition, our approach abstracts real-world problems into generalized templates and establishes intricate linkages between these templates and pertinent theorems. This design enables the efficient expansion of a comprehensive knowledge base, even when starting from a limited set of seed examples. Moreover, we develop an efficient retrieval strategy that, given a new problem, systematically extracts and presents the most relevant knowledge from the knowledge base as contextual input to the LLM. Extensive experiments on multiple public mathematical datasets and models demonstrate that our approach consistently surpasses conventional methods. Comprehensive ablation studies further corroborate the effectiveness of both our knowledge base construction and retrieval modules.

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

  • Jun Xu
  • Xinkai Du
  • Yu Ao
  • Peilong Zhao
  • Yang Li
  • Ling Zhong
  • Lin Yuan
  • Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

AAAI Conference 2026 Conference Paper

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

  • Yuqi Zhu
  • Yi Zhong
  • Jintian Zhang
  • Ziheng Zhang
  • Shuofei Qiao
  • Yujie Luo
  • Lun Du
  • Da Zheng

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data analysis capabilities of open-source LLMs. By curating a seed dataset of diverse, realistic scenarios, we evaluate models across three dimensions: data understanding, code generation, and strategic planning. Our analysis reveals three key findings: (1) Strategic planning quality serves as the primary determinant of model performance; (2) Interaction design and task complexity significantly influence reasoning capabilities; (3) Data quality demonstrates a greater impact than diversity in achieving optimal performance. We leverage these insights to develop a data synthesis methodology, demonstrating significant improvements in open-source LLMs' analytical reasoning capabilities.

ICLR Conference 2025 Conference Paper

Benchmarking Agentic Workflow Generation

  • Shuofei Qiao
  • Runnan Fang
  • Zhisong Qiu
  • Xiaobin Wang
  • Ningyu Zhang 0001
  • Yong Jiang 0005
  • Pengjun Xie
  • Fei Huang 0002

Large Language Models (LLMs), with their exceptional ability to handle a wide range of tasks, have driven significant advancements in tackling reasoning and planning tasks, wherein decomposing complex problems into executable workflows is a crucial step in this process. Existing workflow evaluation frameworks either focus solely on holistic performance or suffer from limitations such as restricted scenario coverage, simplistic workflow structures, and lax evaluation standards. To this end, we introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures. Additionally, we present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms to accurately quantify the LLM agent's workflow generation capabilities. Through comprehensive evaluations across different types of LLMs, we discover distinct gaps between the sequence planning capabilities and graph planning capabilities of LLM agents, with even GPT-4 exhibiting a gap of around 15%. We also train two open-source models and evaluate their generalization abilities on held-out tasks. Furthermore, we observe that the generated workflows can enhance downstream tasks, enabling them to achieve superior performance with less time during inference. Code and dataset are available at https://github.com/zjunlp/WorfBench.

NeurIPS Conference 2025 Conference Paper

HiMoLE: Towards OOD-Robust LoRA via Hierarchical Mixture of Experts

  • Yinuo Jiang
  • Yan Xiaodong
  • Keyan Ding
  • Deng Zhao
  • Lei Liang
  • Qiang Zhang
  • Huajun Chen

Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, have enabled the efficient adaptation of large language models (LLMs) by updating only a small subset of parameters. However, their robustness under out-of-distribution (OOD) conditions remains insufficiently studied. In this paper, we identify the limitations of conventional LoRA in handling distributional shifts and propose $\textbf{HiMoLE}$($\textbf{Hi}$erarchical $\textbf{M}$ixture of $\textbf{L}$oRA $\textbf{E}$xperts), a new framework designed to improve OOD generalization. HiMoLE integrates hierarchical expert modules and hierarchical routing strategies into the LoRA architecture and introduces a two-phase training procedure enhanced by a diversity-driven loss. This design mitigates negative transfer and promotes effective knowledge adaptation across diverse data distributions. We evaluate HiMoLE on three representative tasks in natural language processing. Experimental results evidence that HiMoLE consistently outperforms existing LoRA-based approaches, significantly reducing performance degradation on OOD data while improving in-distribution performance. Our work bridges the gap between parameter efficiency and distributional robustness, advancing the practical deployment of LLMs in real-world applications.

AAAI Conference 2025 Conference Paper

K-ON: Stacking Knowledge on the Head Layer of Large Language Model

  • Lingbing Guo
  • Yichi Zhang
  • Zhongpu Bo
  • Zhuo Chen
  • Mengshu Sun
  • Zhiqiang Zhang
  • Wen Zhang
  • Huajun Chen

Recent advancements in large language models (LLMs) have significantly improved various natural language processing (NLP) tasks. Typically, LLMs are trained to predict the next token, aligning well with many NLP tasks. However, in knowledge graph (KG) scenarios, entities are the fundamental units and identifying an entity requires at least several tokens. This leads to a granularity mismatch between KGs and natural languages. To address this issue, we propose K-ON, which integrates KG knowledge into the LLM by employing multiple head layers for next k-step prediction. K-ON can not only generate entity-level results in one step, but also enables contrastive loss against entities, which is the most powerful tool in KG representation learning. Experimental results show that K-ON outperforms state-of-the-art methods that incorporate text and even the other modalities.

ICLR Conference 2025 Conference Paper

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

  • Chenxi Wang
  • Xiang Chen 0016
  • Ningyu Zhang 0001
  • Bozhong Tian
  • Haoming Xu
  • Shumin Deng
  • Huajun Chen

Multimodal Large Language Models (MLLMs) frequently exhibit hallucination phenomena, but the underlying reasons remain poorly understood. In this paper, we present an empirical analysis and find that, although MLLMs incorrectly generate the objects in the final output, they are actually able to recognize visual objects in the preceding layers. We speculate that this may be due to the strong knowledge priors of the language model suppressing the visual information, leading to hallucinations. Motivated by this, we propose a novel dynamic correction decoding method for MLLMs DeCo, which adaptively selects the appropriate preceding layers and proportionally integrates knowledge into the final layer to adjust the output logits. Note that DeCo is model agnostic and can be seamlessly incorporated with various classic decoding strategies and applied to different MLLMs. We evaluate DeCo on widely-used benchmarks, demonstrating that it can reduce hallucination rates by a large margin compared to baselines, highlighting its potential to mitigate hallucinations. Code is available at https://github.com/zjunlp/DeCo.

ICLR Conference 2025 Conference Paper

Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning

  • Yichi Zhang 0009
  • Zhuo Chen 0007
  • Lingbing Guo
  • Yajing Xu
  • Bin-Bin Hu
  • Ziqi Liu
  • Wen Zhang 0015
  • Huajun Chen

Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning, which can en- hance reasoning tasks within the MMKGs, such as MMKG completion (MMKGC). The main challenge is to collaboratively model the structural information concealed in massive triples and the multi-modal features of the entities. Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies, yet they over- look the utilization of multi-perspective features concealed within the modalities under diverse relational contexts. To address this issue, we introduce a novel framework with Mixture of Modality Knowledge experts (MOMOK for short) to learn adaptive multi-modal entity representations for better MMKGC. We design relation-guided modality knowledge experts to acquire relation-aware modality embeddings and integrate the predictions from multi-modalities to achieve joint decisions. Additionally, we disentangle the experts by minimizing their mutual information. Experiments on four public MMKG benchmarks demonstrate the outstanding performance of MOMOK under complex scenarios. Our code and data are available at https://github.com/zjukg/MoMoK.

NeurIPS Conference 2025 Conference Paper

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

  • Mingyang Chen
  • Linzhuang Sun
  • Tianpeng Li
  • Haoze Sun
  • Chenzheng Zhu
  • Haofen Wang
  • Jeff Pan
  • Wen Zhang

Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. We propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. We train ReSearch on Qwen2. 5-7B(-Instruct) and Qwen2. 5-32B(-Instruct) models and conduct extensive experiments. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks. Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction during the reinforcement learning process.

ICLR Conference 2025 Conference Paper

SaMer: A Scenario-aware Multi-dimensional Evaluator for Large Language Models

  • Kehua Feng
  • Keyan Ding
  • Jing Yu
  • Yiwen Qu
  • Zhiwen Chen 0002
  • Chengfei Lv
  • Gang Yu
  • Qiang Zhang 0026

Evaluating the response quality of large language models (LLMs) for open-ended questions poses a significant challenge, especially given the subjectivity and multi-dimensionality of "quality" in natural language generation. Existing LLM evaluators often neglect that different scenarios require distinct evaluation criteria. In this work, we propose **SaMer**, a scenario-aware multi-dimensional evaluator designed to provide both overall and fine-grained assessments of LLM-generated responses. Unlike fixed-dimension evaluation approaches, SaMer adapts to different scenarios by automatically identifying and prioritizing relevant evaluation dimensions tailored to the given query. To achieve this, we construct a large-scale fine-grained preference dataset spanning multiple real-world scenarios, each with distinct evaluation dimensions. We then leverage a text embedding model combined with three specialized heads to predict the appropriate evaluation dimensions and corresponding scores, as well as the respective weights that contribute to the overall score. The resulting model offers fine-grained and interpretable evaluations and shows robust adaptability across diverse scenarios. Extensive experiments on eight single rating and pairwise comparison datasets demonstrate that SaMer outperforms existing baselines in a variety of evaluation tasks, showcasing its robustness, versatility, and generalizability.

AAAI Conference 2025 Conference Paper

Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation

  • Yichi Zhang
  • Zhuo Chen
  • Lingbing Guo
  • Yajing Xu
  • Binbin Hu
  • Ziqi Liu
  • Wen Zhang
  • Huajun Chen

Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given multi-modal knowledge graphs (MMKG), collaboratively leveraging structural information from the triples and multi-modal information of the entities to overcome the inherent incompleteness. Existing MMKGC methods usually extract multi-modal features with pre-trained models and employ fusion modules to integrate multi-modal features for the entities. This often results in coarse handling of multi-modal entity information, overlooking the nuanced, fine-grained semantic details and their complex interactions. To tackle this shortfall, we introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities and enhance the MMKGC performance. Motivated by the tokenization technology, MyGO tokenizes multi-modal entity information as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder. To further augment the multi-modal representations, MyGO incorporates fine-grained contrastive learning to highlight the specificity of the entity representations. Experiments on standard MMKGC benchmarks reveal that our method surpasses 19 of the latest models, underlining its superior performance.

AAAI Conference 2025 Conference Paper

TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

  • Wen Zhang
  • Long Jin
  • Yushan Zhu
  • Jiaoyan Chen
  • Zhiwei Huang
  • Junjie Wang
  • Yin Hua
  • Lei Liang

Natural language question answering (QA) over structured data sources such as tables and knowledge graphs have been widely investigated, especially with Large Language Models (LLMs) in recent years. The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multi-types of sources, while the later is limited in trustfulness. In this paper, we propose TrustUQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph (CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated TrustUQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods. In comparison with the baselines that are specific to one data type, it achieves state-of-the-art on 2 of the datasets. Further more, we have demonstrated the potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data.

NeurIPS Conference 2024 Conference Paper

Agent Planning with World Knowledge Model

  • Shuofei Qiao
  • Runnan Fang
  • Ningyu Zhang
  • Yuqi Zhu
  • Xiang Chen
  • Shumin Deng
  • Yong Jiang
  • Pengjun Xie

Recent endeavors towards directly using large language models (LLMs) as agent models to execute interactive planning tasks have shown commendable results. Despite their achievements, however, they still struggle with brainless trial-and-error in global planning and generating hallucinatory actions in local planning due to their poor understanding of the "real" physical world. Imitating humans' mental world knowledge model which provides global prior knowledge before the task and maintains local dynamic knowledge during the task, in this paper, we introduce parametric World Knowledge Model (WKM) to facilitate agent planning. Concretely, we steer the agent model to self-synthesize knowledge from both expert and sampled trajectories. Then we develop WKM, providing prior task knowledge to guide the global planning and dynamic state knowledge to assist the local planning. Experimental results on three real-world simulated datasets with Mistral-7B, Gemma-7B, and Llama-3-8B demonstrate that our method can achieve superior performance compared to various strong baselines. Besides, we analyze to illustrate that our WKM can effectively alleviate the blind trial-and-error and hallucinatory action issues, providing strong support for the agent's understanding of the world. Other interesting findings include: 1) our instance-level task knowledge can generalize better to unseen tasks, 2) weak WKM can guide strong agent model planning, and 3) unified WKM training has promising potential for further development.

IJCAI Conference 2024 Conference Paper

Continual Multimodal Knowledge Graph Construction

  • Xiang Chen
  • Jingtian Zhang
  • Xiaohan Wang
  • Ningyu Zhang
  • Tongtong Wu
  • Yuxiang Wang
  • Yongheng Wang
  • Huajun Chen

Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations, often succumbing to catastrophic forgetting—loss of previously acquired knowledge. This study introduces benchmarks aimed at fostering the development of the continual MKGC domain. We further introduce the MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing. MSPT harmonizes the retention of learned knowledge (stability) and the integration of new data (plasticity), outperforming current continual learning and multimodal methods. Our results confirm MSPT's superior performance in evolving knowledge environments, showcasing its capacity to navigate the balance between stability and plasticity.

NeurIPS Conference 2024 Conference Paper

DePLM: Denoising Protein Language Models for Property Optimization

  • Zeyuan Wang
  • Keyan Ding
  • Ming Qin
  • Xiaotong Li
  • Xiang Zhuang
  • Yu Zhao
  • Jianhua Yao
  • Qiang Zhang

Protein optimization is a fundamental biological task aimed at enhancing theperformance of proteins by modifying their sequences. Computational methodsprimarily rely on evolutionary information (EI) encoded by protein languagemodels (PLMs) to predict fitness landscape for optimization. However, thesemethods suffer from a few limitations. (1) Evolutionary processes involve thesimultaneous consideration of multiple functional properties, often overshadowingthe specific property of interest. (2) Measurements of these properties tend to betailored to experimental conditions, leading to reduced generalizability of trainedmodels to novel proteins. To address these limitations, we introduce DenoisingProtein Language Models (DePLM), a novel approach that refines the evolutionaryinformation embodied in PLMs for improved protein optimization. Specifically, weconceptualize EI as comprising both property-relevant and irrelevant information, with the latter acting as “noise” for the optimization task at hand. Our approachinvolves denoising this EI in PLMs through a diffusion process conducted in therank space of property values, thereby enhancing model generalization and ensuringdataset-agnostic learning. Extensive experimental results have demonstrated thatDePLM not only surpasses the state-of-the-art in mutation effect prediction butalso exhibits strong generalization capabilities for novel proteins.

ICLR Conference 2024 Conference Paper

Domain-Agnostic Molecular Generation with Chemical Feedback

  • Yin Fang
  • Ningyu Zhang 0001
  • Zhuo Chen 0007
  • Lingbing Guo
  • Xiaohui Fan
  • Huajun Chen

The generation of molecules with desired properties has become increasingly popular, revolutionizing the way scientists design molecular structures and providing valuable support for chemical and drug design. However, despite the potential of language models in molecule generation, they face challenges such as generating syntactically or chemically flawed molecules, having narrow domain focus, and struggling to create diverse and feasible molecules due to limited annotated data or external molecular databases. To tackle these challenges, we introduce MolGen, a pre-trained molecular language model tailored specifically for molecule generation. Through the reconstruction of over 100 million molecular SELFIES, MolGen internalizes structural and grammatical insights. This is further enhanced by domain-agnostic molecular prefix tuning, fostering robust knowledge transfer across diverse domains. Importantly, our chemical feedback paradigm steers the model away from "molecular hallucinations", ensuring alignment between the model's estimated probabilities and real-world chemical preferences. Extensive experiments on well-known benchmarks underscore MolGen's optimization capabilities in properties such as penalized logP, QED, and molecular docking. Additional analyses confirm its proficiency in accurately capturing molecule distributions, discerning intricate structural patterns, and efficiently exploring the chemical space (https://github.com/zjunlp/MolGen).

AAAI Conference 2024 Conference Paper

Editing Language Model-Based Knowledge Graph Embeddings

  • Siyuan Cheng
  • Ningyu Zhang
  • Bozhong Tian
  • Xi Chen
  • Qingbin Liu
  • Huajun Chen

Recently decades have witnessed the empirical success of framing Knowledge Graph (KG) embeddings via language models. However, language model-based KG embeddings are usually deployed as static artifacts, making them difficult to modify post-deployment without re-training after deployment. To address this issue, we propose a new task of editing language model-based KG embeddings in this paper. This task is designed to facilitate rapid, data-efficient updates to KG embeddings without compromising the performance of other aspects. We build four new datasets: E-FB15k237, A-FB15k237, E-WN18RR, and A-WN18RR, and evaluate several knowledge editing baselines demonstrating the limited ability of previous models to handle the proposed challenging task. We further propose a simple yet strong baseline dubbed KGEditor, which utilizes additional parametric layers of the hypernetwork to edit/add facts. Our comprehensive experimental results reveal that KGEditor excels in updating specific facts without impacting the overall performance, even when faced with limited training resources. Code and datasets will be available at https://github.com/AnonymousForPapers/DeltaKG.

IJCAI Conference 2024 Conference Paper

FactCHD: Benchmarking Fact-Conflicting Hallucination Detection

  • Xiang Chen
  • Duanzheng Song
  • Honghao Gui
  • Chenxi Wang
  • Ningyu Zhang
  • Yong Jiang
  • Fei Huang
  • Chengfei Lyu

Despite their impressive generative capabilities, LLMs are hindered by fact-conflicting hallucinations in real-world applications. The accurate identification of hallucinations in texts generated by LLMs, especially in complex inferential scenarios, is a relatively unexplored area. To address this gap, we present FactCHD, a dedicated benchmark designed for the detection of fact-conflicting hallucinations from LLMs. FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation. A distinctive element of FactCHD is its integration of fact-based evidence chains, significantly enhancing the depth of evaluating the detectors' explanations. Experiments on different LLMs expose the shortcomings of current approaches in detecting factual errors accurately. Furthermore, we introduce TRUTH-TRIANGULATOR which synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2, aiming to yield more credible detection through the amalgamation of predictive results and evidence.

IJCAI Conference 2024 Conference Paper

InstructEdit: Instruction-Based Knowledge Editing for Large Language Models

  • Ningyu Zhang
  • Bozhong Tian
  • Siyuan Cheng
  • Xiaozhuan Liang
  • Yi Hu
  • Kouying Xue
  • Yanjie Gou
  • Xi Chen

Knowledge editing for large language models can offer an efficient solution to alter a model’s behavior without negatively impacting the overall performance. However, the current approaches encounter issues with limited generalizability across tasks, necessitating one distinct editor for each task, significantly hindering the broader applications. To address this, we take the first step to analyze the multi-task generalization issue in knowledge editing. Specifically, we develop an instruction-based editing technique, termed InstructEdit, which facilitates the editor's adaptation to various task performances simultaneously using simple instructions. With only one unified editor for each LLM, we empirically demonstrate that InstructEdit can improve the editor's control, leading to an average 14. 86% increase in Reliability in multi-task editing setting. Furthermore, experiments involving holdout unseen task illustrate that InstructEdit consistently surpass previous strong baselines. To further investigate the underlying mechanisms of instruction-based knowledge editing, we analyze the principal components of the editing gradient directions, which unveils that instructions can help control optimization direction with stronger OOD generalization.

NeurIPS Conference 2024 Conference Paper

Knowledge Circuits in Pretrained Transformers

  • Yunzhi Yao
  • Ningyu Zhang
  • Zekun Xi
  • Mengru Wang
  • Ziwen Xu
  • Shumin Deng
  • Huajun Chen

The remarkable capabilities of modern large language models are rooted in their vast repositories of knowledge encoded within their parameters, enabling them to perceive the world and engage in reasoning. The inner workings of how these models store knowledge have long been a subject of intense interest and investigation among researchers. To date, most studies have concentrated on isolated components within these models, such as the Multilayer Perceptrons and attention head. In this paper, we delve into the computation graph of the language model to uncover the knowledge circuits that are instrumental in articulating specific knowledge. The experiments, conducted with GPT2 and TinyLLAMA, has allowed us to observe how certain information heads, relation heads, and Multilayer Perceptrons collaboratively encode knowledge within the model. Moreover, we evaluate the impact of current knowledge editing techniques on these knowledge circuits, providing deeper insights into the functioning and constraints of these editing methodologies. Finally, we utilize knowledge circuits to analyze and interpret language model behaviors such as hallucinations and in-context learning. We believe the knowledge circuit holds potential for advancing our understanding of Transformers and guiding the improved design of knowledge editing.

ICML Conference 2024 Conference Paper

Knowledge-aware Reinforced Language Models for Protein Directed Evolution

  • Yuhao Wang
  • Qiang Zhang 0026
  • Ming Qin
  • Xiang Zhuang
  • Xiaotong Li
  • Zhichen Gong
  • Zeyuan Wang
  • Yu Zhao 0009

Directed evolution, a cornerstone of protein optimization, is to harness natural mutational processes to enhance protein functionality. Existing Machine Learning-assisted Directed Evolution (MLDE) methodologies typically rely on data-driven strategies and often overlook the profound domain knowledge in biochemical fields. In this paper, we introduce a novel Knowledge-aware Reinforced Language Model (KnowRLM) for MLDE. An Amino Acid Knowledge Graph (AAKG) is constructed to represent the intricate biochemical relationships among amino acids. We further propose a Protein Language Model (PLM)-based policy network that iteratively samples mutants through preferential random walks on the AAKG using a dynamic sliding window mechanism. The novel mutants are actively sampled to fine-tune a fitness predictor as the reward model, providing feedback to the knowledge-aware policy. Finally, we optimize the whole system in an active learning approach that mimics biological settings in practice. KnowRLM stands out for its ability to utilize contextual amino acid information from knowledge graphs, thus attaining advantages from both statistical patterns of protein sequences and biochemical properties of amino acids. Extensive experiments demonstrate the superior performance of KnowRLM in more efficiently identifying high-fitness mutants compared to existing methods.

NeurIPS Conference 2024 Conference Paper

MKGL: Mastery of a Three-Word Language

  • Lingbing Guo
  • Zhongpu Bo
  • Zhuo Chen
  • Yichi Zhang
  • Jiaoyan Chen
  • Yarong Lan
  • Mengshu Sun
  • Zhiqiang Zhang

Large language models (LLMs) have significantly advanced performance across a spectrum of natural language processing (NLP) tasks. Yet, their application to knowledge graphs (KGs), which describe facts in the form of triplets and allow minimal hallucinations, remains an underexplored frontier. In this paper, we investigate the integration of LLMs with KGs by introducing a specialized KG Language (KGL), where a sentence precisely consists of an entity noun, a relation verb, and ends with another entity noun. Despite KGL's unfamiliar vocabulary to the LLM, we facilitate its learning through a tailored dictionary and illustrative sentences, and enhance context understanding via real-time KG context retrieval and KGL token embedding augmentation. Our results reveal that LLMs can achieve fluency in KGL, drastically reducing errors compared to conventional KG embedding methods on KG completion. Furthermore, our enhanced LLM shows exceptional competence in generating accurate three-word sentences from an initial entity and interpreting new unseen terms out of KGs.

ICLR Conference 2024 Conference Paper

Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

  • Yin Fang
  • Xiaozhuan Liang
  • Ningyu Zhang 0001
  • Kangwei Liu 0002
  • Rui Huang
  • Zhuo Chen 0007
  • Xiaohui Fan
  • Huajun Chen

Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. Mol-Instructions encompasses three key components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions. Each component aims to improve the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on LLMs, we demonstrate the effectiveness of Mol-Instructions in enhancing large models' performance in the intricate realm of biomolecular studies, thus fostering progress in the biomolecular research community. Mol-Instructions is publicly available for ongoing research and will undergo regular updates to enhance its applicability (https://github.com/zjunlp/Mol-Instructions).

ICLR Conference 2024 Conference Paper

Revisit and Outstrip Entity Alignment: A Perspective of Generative Models

  • Lingbing Guo
  • Zhuo Chen 0007
  • Jiaoyan Chen 0001
  • Yin Fang
  • Wen Zhang 0015
  • Huajun Chen

Recent embedding-based methods have achieved great successes in exploiting entity alignment from knowledge graph (KG) embeddings of multiple modalities. In this paper, we study embedding-based entity alignment (EEA) from a perspective of generative models. We show that EEA shares similarities with typical generative models and prove the effectiveness of the recently developed generative adversarial network (GAN)-based EEA methods theoretically. We then reveal that their incomplete objective limits the capacity on both entity alignment and entity synthesis (i.e., generating new entities). We mitigate this problem by introducing a generative EEA (GEEA) framework with the proposed mutual variational autoencoder (M-VAE) as the generative model. M-VAE enables entity conversion between KGs and generation of new entities from random noise vectors. We demonstrate the power of GEEA with theoretical analysis and empirical experiments on both entity alignment and entity synthesis tasks. The source code and datasets are available at github.com/zjukg/GEEA.

ICLR Conference 2024 Conference Paper

Unveiling the Pitfalls of Knowledge Editing for Large Language Models

  • Zhoubo Li
  • Ningyu Zhang 0001
  • Yunzhi Yao
  • Mengru Wang
  • Xi Chen 0003
  • Huajun Chen

As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs—a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code and data are available at https://github.com/zjunlp/PitfallsKnowledgeEditing.

AAAI Conference 2024 Conference Paper

When Do Program-of-Thought Works for Reasoning?

  • Zhen Bi
  • Ningyu Zhang
  • Yinuo Jiang
  • Shumin Deng
  • Guozhou Zheng
  • Huajun Chen

In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score CIRS, which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach.

NeurIPS Conference 2024 Conference Paper

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

  • Peng Wang
  • Zexi Li
  • Ningyu Zhang
  • Ziwen Xu
  • Yunzhi Yao
  • Yong Jiang
  • Pengjun Xie
  • Fei Huang

Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle---reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories. In WISE, we design a dual parametric memory scheme, which consists of the main memory for the pretrained knowledge and a side memory for the edited knowledge. We only edit the knowledge in the side memory and train a router to decide which memory to go through when given a query. For continual editing, we devise a knowledge-sharding mechanism where different sets of edits reside in distinct subspaces of parameters, and are subsequently merged into a shared memory without conflicts. Extensive experiments show that WISE can outperform previous model editing methods and overcome the impossible triangle under lifelong model editing of question answering, hallucination, and out-of-distribution settings across trending LLM architectures, e. g. , GPT, LLaMA, and Mistral.

ECAI Conference 2023 Conference Paper

Active Finetuning Protein Language Model: A Budget-Friendly Method for Directed Evolution

  • Ming Qin
  • Keyan Ding
  • Bin Wu 0025
  • Zhenping Li
  • Haihong Yang
  • Zeyuan Wang
  • Hongbin Ye
  • Haoran Yu

Directed evolution is a widely-used strategy of protein engineering to improve protein function via mimicking natural mutation and selection. Machine learning-assisted directed evolution (MLDE) approaches aim to learn a fitness predictor, thereby efficiently searching for optimal mutants within the vast combinatorial mutation space. Since annotating mutants is both costly and labor-intensive, how to efficiently sample and utilize informative protein mutants to train the predictor is a critical problem in MLDE. Previous MLDE works just simply utilized pre-trained protein language models (PPLMs) for sampling without tailoring to the specific target protein of interest, which has not fully exploited the potential of PPLMs. In this work, we propose a novel method, the Actively-Finetuned Protein language model for Directed Evolution(AFP-DE), which leverages PPLMs to actively sample and fine-tune themselves, continuously improving the model’s sampling and overall performance through iterations, to achieve efficient directed protein evolution. Extensive experiments have shown the effectiveness of our method in generating optimal mutants with minimal annotation effort, outperforming previous works even with fewer annotated mutants, making it budget-friendly for biological experiments.

AAAI Conference 2023 Conference Paper

Analogical Inference Enhanced Knowledge Graph Embedding

  • Zhen Yao
  • Wen Zhang
  • Mingyang Chen
  • Yufeng Huang
  • Yi Yang
  • Huajun Chen

Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.

AAAI Conference 2023 Conference Paper

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

  • Zhuo Chen
  • Yufeng Huang
  • Jiaoyan Chen
  • Yuxia Geng
  • Wen Zhang
  • Yin Fang
  • Jeff Z. Pan
  • Huajun Chen

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

AAAI Conference 2023 Conference Paper

Entity-Agnostic Representation Learning for Parameter-Efficient Knowledge Graph Embedding

  • Mingyang Chen
  • Wen Zhang
  • Zhen Yao
  • Yushan Zhu
  • Yang Gao
  • Jeff Z. Pan
  • Huajun Chen

We propose an entity-agnostic representation learning method for handling the problem of inefficient parameter storage costs brought by embedding knowledge graphs. Conventional knowledge graph embedding methods map elements in a knowledge graph, including entities and relations, into continuous vector spaces by assigning them one or multiple specific embeddings (i.e., vector representations). Thus the number of embedding parameters increases linearly as the growth of knowledge graphs. In our proposed model, Entity-Agnostic Representation Learning (EARL), we only learn the embeddings for a small set of entities and refer to them as reserved entities. To obtain the embeddings for the full set of entities, we encode their distinguishable information from their connected relations, k-nearest reserved entities, and multi-hop neighbors. We learn universal and entity-agnostic encoders for transforming distinguishable information into entity embeddings. This approach allows our proposed EARL to have a static, efficient, and lower parameter count than conventional knowledge graph embedding methods. Experimental results show that EARL uses fewer parameters and performs better on link prediction tasks than baselines, reflecting its parameter efficiency.

IJCAI Conference 2023 Conference Paper

Generalizing to Unseen Elements: A Survey on Knowledge Extrapolation for Knowledge Graphs

  • Mingyang Chen
  • Wen Zhang
  • Yuxia Geng
  • Zezhong Xu
  • Jeff Z. Pan
  • Huajun Chen

Knowledge graphs (KGs) have become valuable knowledge resources in various applications, and knowledge graph embedding (KGE) methods have garnered increasing attention in recent years. However, conventional KGE methods still face challenges when it comes to handling unseen entities or relations during model testing. To address this issue, much effort has been devoted to various fields of KGs. In this paper, we use a set of general terminologies to unify these methods and refer to them collectively as Knowledge Extrapolation. We comprehensively summarize these methods, classified by our proposed taxonomy, and describe their interrelationships. Additionally, we introduce benchmarks and provide comparisons of these methods based on aspects that are not captured by the taxonomy. Finally, we suggest potential directions for future research.

IJCAI Conference 2023 Conference Paper

Graph Sampling-based Meta-Learning for Molecular Property Prediction

  • Xiang Zhuang
  • Qiang Zhang
  • Bin Wu
  • Keyan Ding
  • Yin Fang
  • Huajun Chen

Molecular property is usually observed with a limited number of samples, and researchers have considered property prediction as a few-shot problem. One important fact that has been ignored by prior works is that each molecule can be recorded with several different properties simultaneously. To effectively utilize many-to-many correlations of molecules and properties, we propose a Graph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecular property prediction. First, we construct a Molecule-Property relation Graph (MPG): molecule and properties are nodes, while property labels decide edges. Then, to utilize the topological information of MPG, we reformulate an episode in meta-learning as a subgraph of the MPG, containing a target property node, molecule nodes, and auxiliary property nodes. Third, as episodes in the form of subgraphs are no longer independent of each other, we propose to schedule the subgraph sampling process with a contrastive loss function, which considers the consistency and discrimination of subgraphs. Extensive experiments on 5 commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-art methods by 5. 71%-6. 93% in ROC-AUC and verify the effectiveness of each proposed module. Our code is available at https: //github. com/HICAI-ZJU/GS-Meta.

NeurIPS Conference 2023 Conference Paper

Learning Invariant Molecular Representation in Latent Discrete Space

  • Xiang Zhuang
  • Qiang Zhang
  • Keyan Ding
  • Yatao Bian
  • Xiao Wang
  • Jingsong Lv
  • Hongyang Chen
  • Huajun Chen

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called ``first-encoding-then-separation'' to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https: //github. com/HICAI-ZJU/iMoLD.

ICLR Conference 2023 Conference Paper

Multi-level Protein Structure Pre-training via Prompt Learning

  • Zeyuan Wang
  • Qiang Zhang 0026
  • Shuangwei Hu
  • Haoran Yu
  • Xurui Jin
  • Zhichen Gong
  • Huajun Chen

A protein can focus on different structure levels to implement its functions. Each structure has its own merit and driving forces in describing some specific characteristics, and they cannot replace each other. Most existing function prediction methods take the tertiary structure as input, unintentionally ignoring the other levels of protein structures. Considering protein sequences can determine multi-level structures, in this paper, we aim to realize the comprehensive potential of protein sequences for function prediction. Specifically, we propose a new prompt-guided multi-task pre-training and fine-tuning framework, and the resulting protein model is called PromptProtein. Through the prompt-guided multi-task pre-training, we learn multiple prompt signals to steer the model to focus on different structure levels. We also design a prompt fine-tuning module to provide downstream tasks the on-demand flexibility of utilizing respective levels of structure information. Extensive experiments on function prediction and protein engineering show that PromptProtein outperforms state-of-the-art methods by large margins.

AAAI Conference 2023 Short Paper

Multi-Modal Protein Knowledge Graph Construction and Applications (Student Abstract)

  • Siyuan Cheng
  • Xiaozhuan Liang
  • Zhen Bi
  • Huajun Chen
  • Ningyu Zhang

Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly dedicated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at https://w3id.org/proteinkg65.

ICLR Conference 2023 Conference Paper

Multimodal Analogical Reasoning over Knowledge Graphs

  • Ningyu Zhang 0001
  • Lei Li 0040
  • Xiang Chen 0016
  • Xiaozhuan Liang
  • Shumin Deng
  • Huajun Chen

Analogical reasoning is fundamental to human cognition and holds an important place in various fields. However, previous studies mainly focus on single-modal analogical reasoning and ignore taking advantage of structure knowledge. Notably, the research in cognitive psychology has demonstrated that information from multimodal sources always brings more powerful cognitive transfer than single modality sources. To this end, we introduce the new task of multimodal analogical reasoning over knowledge graphs, which requires multimodal reasoning ability with the help of background knowledge. Specifically, we construct a Multimodal Analogical Reasoning dataSet (MARS) and a multimodal knowledge graph MarKG. We evaluate with multimodal knowledge graph embedding and pre-trained Transformer baselines, illustrating the potential challenges of the proposed task. We further propose a novel model-agnostic Multimodal analogical reasoning framework with Transformer (MarT) motivated by the structure mapping theory, which can obtain better performance. We hope our work can deliver benefits and inspire future research. Code and datasets are available in https://github.com/zjunlp/MKG_Analogy.

NeurIPS Conference 2023 Conference Paper

Newton–Cotes Graph Neural Networks: On the Time Evolution of Dynamic Systems

  • Lingbing Guo
  • Weiqing Wang
  • Zhuo Chen
  • Ningyu Zhang
  • Zequn Sun
  • Yixuan Lai
  • Qiang Zhang
  • Huajun Chen

Reasoning system dynamics is one of the most important analytical approaches for many scientific studies. With the initial state of a system as input, the recent graph neural networks (GNNs)-based methods are capable of predicting the future state distant in time with high accuracy. Although these methods have diverse designs in modeling the coordinates and interacting forces of the system, we show that they actually share a common paradigm that learns the integration of the velocity over the interval between the initial and terminal coordinates. However, their integrand is constant w. r. t. time. Inspired by this observation, we propose a new approach to predict the integration based on several velocity estimations with Newton–Cotes formulas and prove its effectiveness theoretically. Extensive experiments on several benchmarks empirically demonstrate consistent and significant improvement compared with the state-of-the-art methods.

AAAI Conference 2023 Short Paper

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (Student Abstract)

  • Lei Li
  • Xiang Chen
  • Shuofei Qiao
  • Feiyu Xiong
  • Huajun Chen
  • Ningyu Zhang

Multimodal relation extraction is an essential task for knowledge graph construction. In this paper, we take an in-depth empirical analysis that indicates the inaccurate information in the visual scene graph leads to poor modal alignment weights, further degrading performance. Moreover, the visual shuffle experiments illustrate that the current approaches may not take full advantage of visual information. Based on the above observation, we further propose a strong baseline with an implicit fine-grained multimodal alignment based on Transformer for multimodal relation extraction. Experimental results demonstrate the better performance of our method. Codes are available at https://github.com/zjunlp/DeepKE/tree/main/example/re/multimodal.

IJCAI Conference 2023 Conference Paper

One Model for All Domains: Collaborative Domain-Prefix Tuning for Cross-Domain NER

  • Xiang Chen
  • Lei Li
  • Shuofei Qiao
  • Ningyu Zhang
  • Chuanqi Tan
  • Yong Jiang
  • Fei Huang
  • Huajun Chen

Cross-domain NER is a challenging task to address the low-resource problem in practical scenarios. Previous typical solutions mainly obtain a NER model by pre-trained language models (PLMs) with data from a rich-resource domain and adapt it to the target domain. Owing to the mismatch issue among entity types in different domains, previous approaches normally tune all parameters of PLMs, ending up with an entirely new NER model for each domain. Moreover, current models only focus on leveraging knowledge in one general source domain while failing to successfully transfer knowledge from multiple sources to the target. To address these issues, we introduce Collaborative Domain-Prefix Tuning for cross-domain NER (CP-NER) based on text-to-text generative PLMs. Specifically, we present text-to-text generation grounding domain-related instructors to transfer knowledge to new domain NER tasks without structural modifications. We utilize frozen PLMs and conduct collaborative domain-prefix tuning to stimulate the potential of PLMs to handle NER tasks across various domains. Experimental results on the Cross-NER benchmark show that the proposed approach has flexible transfer ability and performs better on both one-source and multiple-source cross-domain NER tasks.

NeurIPS Conference 2022 Conference Paper

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

  • Xiang Chen
  • Lei Li
  • Ningyu Zhang
  • Xiaozhuan Liang
  • Shumin Deng
  • Chuanqi Tan
  • Fei Huang
  • Luo Si

Prompt learning approaches have made waves in natural language processing by inducing better few-shot performance while they still follow a parametric-based learning paradigm; the oblivion and rote memorization problems in learning may encounter unstable generalization issues. Specifically, vanilla prompt learning may struggle to utilize atypical instances by rote during fully-supervised training or overfit shallow patterns with low-shot data. To alleviate such limitations, we develop RetroPrompt with the motivation of decoupling knowledge from memorization to help the model strike a balance between generalization and memorization. In contrast with vanilla prompt learning, RetroPrompt constructs an open-book knowledge-store from training instances and implements a retrieval mechanism during the process of input, training and inference, thus equipping the model with the ability to retrieve related contexts from the training corpus as cues for enhancement. Extensive experiments demonstrate that RetroPrompt can obtain better performance in both few-shot and zero-shot settings. Besides, we further illustrate that our proposed RetroPrompt can yield better generalization abilities with new datasets. Detailed analysis of memorization indeed reveals RetroPrompt can reduce the reliance of language models on memorization; thus, improving generalization for downstream tasks. Code is available in https: //github. com/zjunlp/PromptKG/tree/main/research/RetroPrompt.

ICLR Conference 2022 Conference Paper

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

  • Ningyu Zhang 0001
  • Luoqiu Li
  • Xiang Chen 0016
  • Shumin Deng
  • Zhen Bi
  • Chuanqi Tan
  • Fei Huang 0002
  • Huajun Chen

Large-scale pre-trained language models have contributed significantly to natural language processing by demonstrating remarkable abilities as few-shot learners. However, their effectiveness depends mainly on scaling the model parameters and prompt design, hindering their implementation in most real-world applications. This study proposes a novel pluggable, extensible, and efficient approach named DifferentiAble pRompT (DART), which can convert small language models into better few-shot learners. The main principle behind this approach involves reformulating potential natural language processing tasks into the task of a pre-trained language model and differentially optimizing the prompt template as well as the target label with backpropagation. Furthermore, the proposed approach can be: (i) Plugged to any pre-trained language models; (ii) Extended to widespread classification tasks. A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.

AAAI Conference 2022 Short Paper

Learning to Ask for Data-Efficient Event Argument Extraction (Student Abstract)

  • Hongbin Ye
  • Ningyu Zhang
  • Zhen Bi
  • Shumin Deng
  • Chuanqi Tan
  • Hui Chen
  • Fei Huang
  • Huajun Chen

Event argument extraction (EAE) is an important task for information extraction to discover specific argument roles. In this study, we cast EAE as a question-based cloze task and empirically analyze fixed discrete token template performance. As generating human-annotated question templates is often time-consuming and labor-intensive, we further propose a novel approach called “Learning to Ask, ” which can learn optimized question templates for EAE without human annotations. Experiments using the ACE-2005 dataset demonstrate that our method based on optimized questions achieves state-of-the-art performance in both the few-shot and supervised settings.

IJCAI Conference 2022 Conference Paper

Meta-Learning Based Knowledge Extrapolation for Knowledge Graphs in the Federated Setting

  • Mingyang Chen
  • Wen Zhang
  • Zhen Yao
  • Xiangnan Chen
  • Mengxiao Ding
  • Fei Huang
  • Huajun Chen

We study the knowledge extrapolation problem to embed new components (i. e. , entities and relations) that come with emerging knowledge graphs (KGs) in the federated setting. In this problem, a model trained on an existing KG needs to embed an emerging KG with unseen entities and relations. To solve this problem, we introduce the meta-learning setting, where a set of tasks are sampled on the existing KG to mimic the link prediction task on the emerging KG. Based on sampled tasks, we meta-train a graph neural network framework that can construct features for unseen components based on structural information and output embeddings for them. Experimental results show that our proposed method can effectively embed unseen components and outperforms models that consider inductive settings for KGs and baselines that directly use conventional KG embedding methods.

AAAI Conference 2022 Conference Paper

Molecular Contrastive Learning with Chemical Element Knowledge Graph

  • Yin Fang
  • Qiang Zhang
  • Haihong Yang
  • Xiang Zhuang
  • Shumin Deng
  • Wen Zhang
  • Ming Qin
  • Zhuo Chen

Molecular representation learning contributes to multiple downstream tasks such as molecular property prediction and drug design. To properly represent molecules, graph contrastive learning is a promising paradigm as it utilizes selfsupervision signals and has no requirements for human annotations. However, prior works fail to incorporate fundamental domain knowledge into graph semantics and thus ignore the correlations between atoms that have common attributes but are not directly connected by bonds. To address these issues, we construct a Chemical Element Knowledge Graph (KG) to summarize microscopic associations between elements and propose a novel Knowledge-enhanced Contrastive Learning (KCL) framework for molecular representation learning. KCL framework consists of three modules. The first module, knowledge-guided graph augmentation, augments the original molecular graph based on the Chemical Element KG. The second module, knowledge-aware graph representation, extracts molecular representations with a common graph encoder for the original molecular graph and a Knowledgeaware Message Passing Neural Network (KMPNN) to encode complex information in the augmented molecular graph. The final module is a contrastive objective, where we maximize agreement between these two views of molecular graphs. Extensive experiments demonstrated that KCL obtained superior performances against state-of-the-art baselines on eight molecular datasets. Visualization experiments properly interpret what KCL has learned from atoms and attributes in the augmented molecular graphs.

NeurIPS Conference 2022 Conference Paper

Neural-Symbolic Entangled Framework for Complex Query Answering

  • Zezhong Xu
  • Wen Zhang
  • Peng Ye
  • Hui Chen
  • Huajun Chen

Answering complex queries over knowledge graphs (KG) is an important yet challenging task because of the KG incompleteness issue and cascading errors during reasoning. Recent query embedding (QE) approaches embed the entities and relations in a KG and the first-order logic (FOL) queries into a low dimensional space, making the query can be answered by dense similarity searching. However, previous works mainly concentrate on the target answers, ignoring intermediate entities' usefulness, which is essential for relieving the cascading error problem in logical query answering. In addition, these methods are usually designed with their own geometric or distributional embeddings to handle logical operators like union, intersection, and negation, with the sacrifice of the accuracy of the basic operator -- projection, and they could not absorb other embedding methods to their models. In this work, we propose a Neural and Symbolic Entangled framework (ENeSy) for complex query answering, which enables the neural and symbolic reasoning to enhance each other to alleviate the cascading error and KG incompleteness. The projection operator in ENeSy could be any embedding method with the capability of link prediction, and the other FOL operators are handled without parameters. With both neural and symbolic reasoning results contained, ENeSy answers queries in ensembles. We evaluate ENeSy on complex query answering benchmarks, and ENeSy achieves the state-of-the-art, especially in the setting of training model only with the link prediction task.

ICLR Conference 2022 Conference Paper

OntoProtein: Protein Pretraining With Gene Ontology Embedding

  • Ningyu Zhang 0001
  • Zhen Bi
  • Xiaozhuan Liang
  • Siyuan Cheng 0008
  • Haosen Hong
  • Shumin Deng
  • Qiang Zhang 0026
  • Jiazhang Lian

Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction.

ICML Conference 2022 Conference Paper

Understanding and Improving Knowledge Graph Embedding for Entity Alignment

  • Lingbing Guo
  • Qiang Zhang 0026
  • Zequn Sun 0001
  • Mingyang Chen 0002
  • Wei Hu 0007
  • Huajun Chen

Embedding-based entity alignment (EEA) has recently received great attention. Despite significant performance improvement, few efforts have been paid to facilitate understanding of EEA methods. Most existing studies rest on the assumption that a small number of pre-aligned entities can serve as anchors connecting the embedding spaces of two KGs. Nevertheless, no one has investigated the rationality of such an assumption. To fill the research gap, we define a typical paradigm abstracted from existing EEA methods and analyze how the embedding discrepancy between two potentially aligned entities is implicitly bounded by a predefined margin in the score function. Further, we find that such a bound cannot guarantee to be tight enough for alignment learning. We mitigate this problem by proposing a new approach, named NeoEA, to explicitly learn KG-invariant and principled entity embeddings. In this sense, an EEA model not only pursues the closeness of aligned entities based on geometric distance, but also aligns the neural ontologies of two KGs by eliminating the discrepancy in embedding distribution and underlying ontology knowledge. Our experiments demonstrate consistent and significant performance improvement against the best-performing EEA methods.

AAAI Conference 2021 Conference Paper

Contrastive Triple Extraction with Generative Transformer

  • Hongbin Ye
  • Ningyu Zhang
  • Shumin Deng
  • Mosha Chen
  • Chuanqi Tan
  • Fei Huang
  • Huajun Chen

Triple extraction is an essential task in information extraction for natural language processing and knowledge graph construction. In this paper, we revisit the end-to-end triple extraction task for sequence generation. Since generative triple extraction may struggle to capture long-term dependencies and generate unfaithful triples, we introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object. Moreover, we introduce two mechanisms to further improve model performance (i. e. , batch-wise dynamic attentionmasking and triple-wise calibration). Experimental results on three datasets (i. e. , NYT, WebNLG, and MIE) show that our approach achieves better performance than that of baselines.

IJCAI Conference 2021 Conference Paper

Document-level Relation Extraction as Semantic Segmentation

  • Ningyu Zhang
  • Xiang Chen
  • Xin Xie
  • Shumin Deng
  • Chuanqi Tan
  • Mosha Chen
  • Fei Huang
  • Luo Si

Document-level relation extraction aims to extract relations among multiple entity pairs from a document. Previously proposed graph-based or transformer-based models utilize the entities independently, regardless of global information among relational triples. This paper approaches the problem by predicting an entity-level relation matrix to capture local and global information, parallel to the semantic segmentation task in computer vision. Herein, we propose a Document U-shaped Network for document-level relation extraction. Specifically, we leverage an encoder module to capture the context information of entities and a U-shaped segmentation module over the image-style feature map to capture global interdependency among triples. Experimental results show that our approach can obtain state-of-the-art performance on three benchmark datasets DocRED, CDR, and GDA.

IJCAI Conference 2021 Conference Paper

Drop Redundant, Shrink Irrelevant: Selective Knowledge Injection for Language Pretraining

  • Ningyu Zhang
  • Shumin Deng
  • Xu Cheng
  • Xi Chen
  • Yichi Zhang
  • Wei Zhang
  • Huajun Chen

Previous research has demonstrated the power of leveraging prior knowledge to improve the performance of deep models in natural language processing. However, traditional methods neglect the fact that redundant and irrelevant knowledge exists in external knowledge bases. In this study, we launched an in-depth empirical investigation into downstream tasks and found that knowledge-enhanced approaches do not always exhibit satisfactory improvements. To this end, we investigate the fundamental reasons for ineffective knowledge infusion and present selective injection for language pretraining, which constitutes a model-agnostic method and is readily pluggable into previous approaches. Experimental results on benchmark datasets demonstrate that our approach can enhance state-of-the-art knowledge injection methods.

IJCAI Conference 2021 Conference Paper

Knowledge-aware Zero-Shot Learning: Survey and Perspective

  • Jiaoyan Chen
  • Yuxia Geng
  • Zhuo Chen
  • Ian Horrocks
  • Jeff Z. Pan
  • Huajun Chen

Zero-shot learning (ZSL) which aims at predicting classes that have never appeared during the training using external knowledge (a. k. a. side information) has been widely investigated. In this paper we present a literature review towards ZSL in the perspective of external knowledge, where we categorize the external knowledge, review their methods and compare different external knowledge. With the literature review, we further discuss and outlook the role of symbolic knowledge in addressing ZSL and other machine learning sample shortage issues.

IJCAI Conference 2020 Conference Paper

Neural Entity Summarization with Joint Encoding and Weak Supervision

  • Junyou Li
  • Gong Cheng
  • Qingxia Liu
  • Wen Zhang
  • Evgeny Kharlamov
  • Kalpa Gunaratna
  • Huajun Chen

In a large-scale knowledge graph (KG), an entity is often described by a large number of triple-structured facts. Many applications require abridged versions of entity descriptions, called entity summaries. Existing solutions to entity summarization are mainly unsupervised. In this paper, we present a supervised approach NEST that is based on our novel neural model to jointly encode graph structure and text in KGs and generate high-quality diversified summaries. Since it is costly to obtain manually labeled summaries for training, our supervision is weak as we train with programmatically labeled data which may contain noise but is free of manual work. Evaluation results show that our approach significantly outperforms the state of the art on two public benchmarks.

KR Conference 2020 Conference Paper

Ontology-guided Semantic Composition for Zero-shot Learning

  • Jiaoyan Chen
  • Freddy Lécué
  • Yuxia Geng
  • Jeff Z. Pan
  • Huajun Chen

Zero-shot learning (ZSL) is a popular research problem that aims at predicting for those classes that have never appeared in the training stage by utilizing the inter-class relationship with some side information. In this study, we propose to model the compositional and expressive semantics of class labels by an OWL (Web Ontology Language) ontology, and further develop a new ZSL framework with ontology embedding. The effectiveness has been verified by some primary experiments on animal image classification and visual question answering.

AAAI Conference 2020 Short Paper

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

  • Shumin Deng
  • Ningyu Zhang
  • Zhanlin Sun
  • Jiaoyan Chen
  • Huajun Chen

Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a stateof-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https: //github. com/zxlzr/ FewShotNLP.

IJCAI Conference 2019 Conference Paper

Augmenting Transfer Learning with Semantic Reasoning

  • Freddy Lécué
  • Jiaoyan Chen
  • Jeff Z. Pan
  • Huajun Chen

Transfer learning aims at building robust prediction models by transferring knowledge gained from one problem to another. In the semantic Web, learning tasks are enhanced with semantic representations. We exploit their semantics to augment transfer learning by dealing with when to transfer with semantic measurements and what to transfer with semantic embeddings. We further present a general framework that integrates the above measurements and embeddings with existing transfer learning algorithms for higher performance. It has demonstrated to be robust in two real-world applications: bus delay forecasting and air quality forecasting.

IJCAI Conference 2017 Conference Paper

Learning from Ontology Streams with Semantic Concept Drift

  • Jiaoyan Chen
  • Freddy Lecue
  • Jeff Z. Pan
  • Huajun Chen

Data stream learning has been largely studied for extracting knowledge structures from continuous and rapid data records. In the semantic Web, data is interpreted in ontologies and its ordered sequence is represented as an ontology stream. Our work exploits the semantics of such streams to tackle the problem of concept drift i. e. , unexpected changes in data distribution, causing most of models to be less accurate as time passes. To this end we revisited (i) semantic inference in the context of supervised stream learning, and (ii) models with semantic embeddings. The experiments show accurate prediction with data from Dublin and Beijing.

AIIM Journal 2010 Journal Article

Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience

  • Matthias Samwald
  • Huajun Chen
  • Alan Ruttenberg
  • Ernest Lim
  • Luis Marenco
  • Perry Miller
  • Gordon Shepherd
  • Kei-Hoi Cheung

Objective Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web technologies for the representation and integration of molecular-level data provided by several of SenseLab suite of neuroscience databases. Methods Based on the original database structure, we semi-automatically translated the databases into OWL ontologies with manual addition of semantic enrichment. The SenseLab ontologies are extensively linked to other biomedical Semantic Web resources, including the Subcellular Anatomy Ontology, Brain Architecture Management System, the Gene Ontology, BIRNLex and UniProt. The SenseLab ontologies have also been mapped to the Basic Formal Ontology and Relation Ontology, which helps ease interoperability with many other existing and future biomedical ontologies for the Semantic Web. In addition, approaches to representing contradictory research statements are described. The SenseLab ontologies are designed for use on the Semantic Web that enables their integration into a growing collection of biomedical information resources. Conclusion We demonstrate that our approach can yield significant potential benefits and that the Semantic Web is rapidly becoming mature enough to realize its anticipated promises. The ontologies are available online at http: //neuroweb. med. yale. edu/senselab/.

IS Journal 2005 Journal Article

DartGrid II: a semantic grid platform for ITS

  • Zhaohui Wu
  • Shuiguang Deng
  • Jian Wu
  • Huajun Chen
  • Shuming Tang
  • Haijun Gao

Intelligent transportation systems offer an alternative approach to solving many problems by implementing advances in information, Internet, communication, and cybernetics technologies. Grid computing can support traffic data semantization, resource sharing, ITS subsystem cooperation, and global-scale distributed computing that connects all kinds of resources. We are currently using grid technology to build DartGrid II, a semantic ITS platform to support resource sharing, service flow management, and cross-domain cooperation.