Arrow Research search

Author name cluster

Yi Guan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
1 author row

Possible papers

11

AAAI Conference 2026 Conference Paper

AgriEval: A Comprehensive Chinese Agricultural Benchmark for Large Language Models

  • Lian Yan
  • Haotian Wang
  • Chen Tang
  • Haifeng Liu
  • Tianyang Sun
  • Liangliang Liu
  • Yi Guan
  • Jingchi Jiang

n the agricultural domain, the deployment of large language models (LLMs) is hindered by the lack of training data and evaluation benchmarks. To mitigate this issue, we propose AgriEval, the first comprehensive Chinese agricultural benchmark with three main characteristics: (1) Comprehensive Capability Evaluation. AgriEval covers six major agriculture categories and 29 subcategories within agriculture, addressing four core cognitive scenarios—memorization, understanding, inference, and generation. (2) High-Quality Data. The dataset is curated from university-level examinations and assignments, providing a natural and robust benchmark for assessing the capacity of LLMs to apply knowledge and make expert-like decisions. (3) Diverse Formats and Extensive Scale. AgriEval comprises 14,697 multiple-choice questions and 2,167 open-ended question-and-answer questions, establishing it as the most extensive agricultural benchmark available to date. We also present comprehensive experimental results over 51 open-source and commercial LLMs. The experimental results reveal that most existing LLMs struggle to achieve 60 percent accuracy, underscoring the developmental potential in agricultural LLMs. Additionally, we conduct extensive experiments to investigate factors influencing model performance and propose strategies for enhancement.

AAAI Conference 2026 Conference Paper

Invariant Feature Learning for Counterfactual Watch-time Prediction in Video Recommendation

  • Chenghou Jin
  • Yixin Ren
  • Hongxu Ma
  • Yewei Xia
  • Yi Guan
  • Hao Zhang
  • Jiandong Ding
  • Jihong Guan

Video recommendation systems heavily rely on user watch time feedback, making accurate watch time prediction a crucial task. However, this task inherently suffers from bias, as recommendation models tend to favor long-duration videos to maximize watch time. This issue, known as duration bias in the watch-time prediction context, can be explained from a causal perspective, where video duration acts as a confounder. Recent works address this bias using backdoor adjustment, isolating the direct effect of content on watch time from observational data. These methods typically discretize video duration into groups, estimate group-wise effects, and then aggregate them via a unified prediction model. However, this aggregation strategy is prone to model misspecification due to feature distribution shift across groups. In this paper, we reinterpret the problem through the lens of invariant learning and propose a novel framework: Duration-Invariant Feature Learning (DIFL). DIFL employs a kernel-based regularization that enforces representation invariance across duration groups, reducing sensitivity to group design and improving generalization. This enables more accurate modeling of the direct causal effect and making counterfactual inference. Extensive experiments on both public and real large-scale production datasets demonstrate the effectiveness of our approach, which achieves SOTA performance.

AAAI Conference 2024 Conference Paper

Dialogues Are Not Just Text: Modeling Cognition for Dialogue Coherence Evaluation

  • Xue Li
  • Jia Su
  • Yang Yang
  • Zipeng Gao
  • Xinyu Duan
  • Yi Guan

The generation of logically coherent dialogues by humans relies on underlying cognitive abilities. Based on this, we redefine the dialogue coherence evaluation process, combining cognitive judgment with the basic text to achieve a more human-like evaluation. We propose a novel dialogue evaluation framework based on Dialogue Cognition Graph (DCGEval) to implement the fusion by in-depth interaction between cognition modeling and text modeling. The proposed Abstract Meaning Representation (AMR) based graph structure called DCG aims to uniformly model four dialogue cognitive abilities. Specifically, core-semantic cognition is modeled by converting the utterance into an AMR graph, which can extract essential semantic information without redundancy. The temporal and role cognition are modeled by establishing logical relationships among the different AMR graphs. Finally, the commonsense knowledge from ConceptNet is fused to express commonsense cognition. Experiments demonstrate the necessity of modeling human cognition for dialogue evaluation, and our DCGEval presents stronger correlations with human judgments compared to other state-of-the-art evaluation metrics.

JBHI Journal 2024 Journal Article

EIRAD: An Evidence-Based Dialogue System With Highly Interpretable Reasoning Path for Automatic Diagnosis

  • Lian Yan
  • Yi Guan
  • Haotian Wang
  • Yi Lin
  • Yang Yang
  • Boran Wang
  • Jingchi Jiang

Dialogue System for Medical Diagnosis (DSMD) based on reinforcement learning (RL) can simulate patient-doctor interactions, playing a crucial role in clinical diagnosis. However, due to the complexity of disease etiology, DSMD faces the challenges of low efficiency in diagnostic evidence search. Moreover, solely RL-based DSMS, without the constraints of professional medical knowledge, often generates irrational, meaningless, or even erroneous symptom inquiries, leading to poor interpretability of diagnostic path and high misdiagnosis rates. To address these issues, we propose an E vidence-based dialogue system with highly I nterpretable R easoning path for A utomatic D iagnosis (EIRAD) grounded in medical knowledge graph (MKG). Specifically, our automated diagnostic model captures key symptoms for suspected diseases by explicitly leveraging the topology of MKG, enhancing the interpretability and accuracy of diagnosis. To expedite the retrieval of factual evidence, we develop two mechanisms: 1) Mapping mechanism between the entity set of MKG and DSMD's diagnostic evidence and diseases. According to the patient's symptoms, EIRAD prunes irrelevant disease and symptom nodes from the MKG, which can truncate the invalid action of RL-based DSMD. 2) Reward Mechanism of integrating the effectiveness of symptom inquiry and the accuracy of disease diagnosis. The comprehensive reward system is suitable for intelligent consultation, which can effectively drive DSMD to accelerate evidence collection. Experimental results demonstrate that our model significantly outperforms competitive benchmark methods in symptom inquiry efficiency and diagnostic accuracy.

AAAI Conference 2014 Conference Paper

Representing Words as Lymphocytes

  • Jinfeng Yang
  • Yi Guan
  • Xishuang Dong
  • Bin He

Similarity between words is becoming a generic problem for many applications of computational linguistics, and computing word similarities is determined by word representations. Inspired by the analogies between words and lymphocytes, a lymphocyte-style word representation is proposed. The word representation is built on the basis of dependency syntax of sentences and represent word context as head properties and dependent properties of the word. Lymphocyte-style word representations are evaluated by computing the similarities between words, and experiments are conducted on the Penn Chinese Treebank 5. 1. Experimental results indicate that the proposed word representations are effective.