Arrow Research search

Author name cluster

Jing Tao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2025 Conference Paper

Debate on Graph: A Flexible and Reliable Reasoning Framework for Large Language Models

  • Jie Ma
  • Zhitao Gao
  • Qi Chai
  • Wangchun Sun
  • Pinghui Wang
  • Hongbin Pei
  • Jing Tao
  • Lingyun Song

Large Language Models (LLMs) may suffer from hallucinations in real-world applications due to the lack of relevant knowledge. In contrast, knowledge graphs encompass extensive, multi-relational structures that store a vast array of symbolic facts. Consequently, integrating LLMs with knowledge graphs has been extensively explored, with Knowledge Graph Question Answering (KGQA) serving as a critical touchstone for the integration. This task requires LLMs to answer natural language questions by retrieving relevant triples from knowledge graphs. However, existing methods face two significant challenges: *excessively long reasoning paths distracting from the answer generation*, and *false-positive relations hindering the path refinement*. In this paper, we propose an iterative interactive KGQA framework that leverages the interactive learning capabilities of LLMs to perform reasoning and Debating over Graphs (DoG). Specifically, DoG employs a subgraph-focusing mechanism, allowing LLMs to perform answer trying after each reasoning step, thereby mitigating the impact of lengthy reasoning paths. On the other hand, DoG utilizes a multi-role debate team to gradually simplify complex questions, reducing the influence of false-positive relations. This debate mechanism ensures the reliability of the reasoning process. Experimental results on five public datasets demonstrate the effectiveness and superiority of our architecture. Notably, DoG outperforms the state-of-the-art method ToG by 23.7% and 9.1% in accuracy on WebQuestions and GrailQA, respectively. Furthermore, the integration experiments with various LLMs on the mentioned datasets highlight the flexibility of DoG.

NeurIPS Conference 2025 Conference Paper

Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs

  • Jie Ma
  • NING QU
  • Zhitao Gao
  • Xing Rui
  • Jun Liu
  • Hongbin Pei
  • Jiang Xie
  • Lingyun Song

Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLMs' reasoning, while the latter can improve the reliability of response generations. Motivated by these, we propose a trustworthy reasoning framework, termed Deliberation over Priors (\texttt{DP}), which sufficiently utilizes the priors contained in KGs. Specifically, \texttt{DP} adopts a progressive knowledge distillation strategy that integrates structural priors into LLMs through a combination of supervised fine-tuning and Kahneman-Tversky Optimization, thereby improving the faithfulness of relation path generation. Furthermore, our framework employs a reasoning-introspection strategy, which guides LLMs to perform refined reasoning verification based on extracted constraint priors, ensuring the reliability of response generation. Extensive experiments on three benchmark datasets demonstrate that \texttt{DP} achieves new state-of-the-art performance, especially a H@1 improvement of 13% on the ComplexWebQuestions dataset, and generates highly trustworthy responses. We also conduct various analyses to verify its flexibility and practicality. Code is available at https: //github. com/mira-ai-lab/Deliberation-on-Priors.

ICML Conference 2025 Conference Paper

Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics

  • Hongbin Pei
  • Jingxin Hai
  • Yu Li
  • Huiqi Deng
  • Denghao Ma
  • Jie Ma 0001
  • Pinghui Wang
  • Jing Tao

Pseudo-labeling is a widely used strategy in semi-supervised learning. Existing methods typically select predicted labels with high confidence scores and high training stationarity, as pseudo-labels to augment training sets. In contrast, this paper explores the pseudo-labeling potential of predicted labels that do not exhibit these characteristics. We discover a new type of predicted labels suitable for pseudo-labeling, termed two-phase labels, which exhibit a two-phase pattern during training: they are initially predicted as one category in early training stages and switch to another category in subsequent epochs. Case studies show the two-phase labels are informative for decision boundaries. To effectively identify the two-phase labels, we design a 2- phasic metric that mathematically characterizes their spatial and temporal patterns. Furthermore, we propose a loss function tailored for two-phase pseudo-labeling learning, allowing models not only to learn correct correlations but also to eliminate false ones. Extensive experiments on eight datasets show that our proposed 2- phasic metric acts as a powerful booster for existing pseudo-labeling methods by additionally incorporating the two-phase labels, achieving an average classification accuracy gain of 1. 73% on image datasets and 1. 92% on graph datasets.

AAAI Conference 2024 Conference Paper

HAGO-Net: Hierarchical Geometric Message Passing for Molecular Representation Learning

  • Hongbin Pei
  • Taile Chen
  • Chen A
  • Huiqi Deng
  • Jing Tao
  • Pinghui Wang
  • Xiaohong Guan

Molecular representation learning has emerged as a game-changer at the intersection of AI and chemistry, with great potential in applications such as drug design and materials discovery. A substantial obstacle in successfully applying molecular representation learning is the difficulty of effectively and completely characterizing and learning molecular geometry, which has not been well addressed to date. To overcome this challenge, we propose a novel framework that features a novel geometric graph, termed HAGO-Graph, and a specifically designed geometric graph learning model, HAGO-Net. In the framework, the foundation is HAGO-Graph, which enables a complete characterization of molecular geometry in a hierarchical manner. Specifically, we leverage the concept of n-body in physics to characterize geometric patterns at multiple spatial scales. We then specifically design a message passing scheme, HAGO-MPS, and implement the scheme as a geometric graph neural network, HAGO-Net, to effectively learn the representation of HAGO-Graph by horizontal and vertical aggregation. We further prove DHAGO-Net, the derivative function of HAGO-Net, is an equivariant model. The proposed models are validated by extensive comparisons on four challenging benchmarks. Notably, the models exhibited state-of-the-art performance in molecular chirality identification and property prediction, achieving state-of-the-art performance on five properties of QM9 dataset. The models also achieved competitive results on molecular dynamics prediction task.

ICML Conference 2024 Conference Paper

Multi-Track Message Passing: Tackling Oversmoothing and Oversquashing in Graph Learning via Preventing Heterophily Mixing

  • Hongbin Pei
  • Yu Li
  • Huiqi Deng
  • Jingxin Hai
  • Pinghui Wang
  • Jie Ma 0001
  • Jing Tao
  • Yuheng Xiong

The advancement toward deeper graph neural networks is currently obscured by two inherent issues in message passing, oversmoothing and oversquashing. We identify the root cause of these issues as information loss due to heterophily mixing in aggregation, where messages of diverse category semantics are mixed. We propose a novel multi-track graph convolutional network to address oversmoothing and oversquashing effectively. Our basic idea is intuitive: if messages are separated and independently propagated according to their category semantics, heterophilic mixing can be prevented. Consequently, we present a novel multi-track message passing scheme capable of preventing heterophilic mixing, enhancing long-distance information flow, and improving separation condition. Empirical validations show that our model achieved state-of-the-art performance on several graph datasets and effectively tackled oversmoothing and oversquashing, setting a new benchmark of $86. 4$% accuracy on Cora.

AAAI Conference 2023 Conference Paper

Multi-Action Dialog Policy Learning from Logged User Feedback

  • Shuo Zhang
  • Junzhou Zhao
  • Pinghui Wang
  • Tianxiang Wang
  • Zi Liang
  • Jing Tao
  • Yi Huang
  • Junlan Feng

Multi-action dialog policy (MADP), which generates multiple atomic dialog actions per turn, has been widely applied in task-oriented dialog systems to provide expressive and efficient system responses. Existing MADP models usually imitate action combinations from the labeled multi-action dialog samples. Due to data limitations, they generalize poorly toward unseen dialog flows. While reinforcement learning-based methods are proposed to incorporate the service ratings from real users and user simulators as external supervision signals, they suffer from sparse and less credible dialog-level rewards. To cope with this problem, we explore to improve MADPL with explicit and implicit turn-level user feedback received for historical predictions (i.e., logged user feedback) that are cost-efficient to collect and faithful to real-world scenarios. The task is challenging since the logged user feedback provides only partial label feedback limited to the particular historical dialog actions predicted by the agent. To fully exploit such feedback information, we propose BanditMatch, which addresses the task from a feedback-enhanced semi-supervised learning perspective with a hybrid learning objective of SSL and bandit learning. BanditMatch integrates pseudo-labeling methods to better explore the action space through constructing full label feedback. Extensive experiments show that our BanditMatch improves MADPL over the state-of-the-art methods by generating more concise and informative responses. The source code and the appendix of this paper can be obtained from https://github.com/ShuoZhangXJTU/BanditMatch.

NeurIPS Conference 2020 Conference Paper

Node Classification on Graphs with Few-Shot Novel Labels via Meta Transformed Network Embedding

  • Lin Lan
  • Pinghui Wang
  • Xuefeng Du
  • Kaikai Song
  • Jing Tao
  • Xiaohong Guan

We study the problem of node classification on graphs with few-shot novel labels, which has two distinctive properties: (1) There are novel labels to emerge in the graph; (2) The novel labels have only a few representative nodes for training a classifier. The study of this problem is instructive and corresponds to many applications such as recommendations for newly formed groups with only a few users in online social networks. To cope with this problem, we propose a novel Meta Transformed Network Embedding framework (MetaTNE), which consists of three modules: (1) A \emph{structural module} provides each node a latent representation according to the graph structure. (2) A \emph{meta-learning module} captures the relationships between the graph structure and the node labels as prior knowledge in a meta-learning manner. Additionally, we introduce an \emph{embedding transformation function} that remedies the deficiency of the straightforward use of meta-learning. Inherently, the meta-learned prior knowledge can be used to facilitate the learning of few-shot novel labels. (3) An \emph{optimization module} employs a simple yet effective scheduling strategy to train the above two modules with a balance between graph structure learning and meta-learning. Experiments on four real-world datasets show that MetaTNE brings a huge improvement over the state-of-the-art methods.

IJCAI Conference 2019 Conference Paper

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

  • Nuo Xu
  • Pinghui Wang
  • Long Chen
  • Jing Tao
  • Junzhou Zhao

Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs, and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i. e. , a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present {\em MR-GNN}, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (LSTMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.