Arrow Research search

Author name cluster

Minghui Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

AAAI Conference 2026 Conference Paper

MHB: Medical Hallucination Benchmark for Large Language Models in Complex Clinical Tasks

  • Jianrong Lu
  • Junwei Liu
  • Xingyun Zheng
  • Minghui Yang
  • Jian Wang
  • Ping Wang
  • Yechao Zhang

The integration of Large Language Models (LLMs) into clinical applications presents transformative potential but is undermined by the critical risk of hallucination, the generation of plausible but factually incorrect information. Such failures pose a direct threat to patient safety and the integrity of clinical decision-making. To address this challenge, we introduce MHB, a novel and comprehensive benchmark framework designed to evaluate LLM reliability in two complex, high-stakes clinical contexts: multi-turn medical dialogues and clinical case report analysis. The core of our contribution is a systematic methodology for generating adversarial test cases by injecting ``hallucination traps" into realistic medical data, guided by a fine-grained taxonomy of clinical errors. MHB, comprising 4,695 samples and 20,288 evaluation rubrics, underwent a rigorous, two-stage validation by a panel of 60 licensed physicians from top-tier hospitals, ensuring high clinical realism and consistency. This comprehensive assessment of leading LLMs revealed significant, clinically relevant shortcomings across the board. Even the best-performing model, Claude-4-Sonnet, exhibited a hallucination rate of 29.1%, with some open-source models exceeding 57.0%. All models struggled with specific traps, like fabricated medical data or non-existent guidelines, highlighting prevalent systemic weaknesses.

IJCAI Conference 2023 Conference Paper

Learning Heuristically-Selected and Neurally-Guided Feature for Age Group Recognition Using Unconstrained Smartphone Interaction

  • Yingmao Miao
  • Qiwei Tian
  • Chenhao Lin
  • Tianle Song
  • Yajie Zhou
  • Junyi Zhao
  • Shuxin Gao
  • Minghui Yang

Owing to the boom of smartphone industries, the expansion of phone users has also been significant. Besides adults, children and elders have also begun to join the population of daily smartphone users. Such an expansion indeed facilitates the further exploration of the versatility and flexibility of digitization. However, these new users may also be susceptible to issues such as addiction, fraud, and insufficient accessibility. To fully utilize the capability of mobile devices without breaching personal privacy, we build the first corpus for age group recognition on smartphones with more than 1, 445, 087 unrestricted actions from 2, 100 subjects. Then a series of heuristically-selected and neurally-guided features are proposed to increase the separability of the above dataset. Finally, we develop AgeCare, the first implicit and continuous system incorporated with bottom-to-top functionality without any restriction on user-phone interaction scenarios, for accurate age group recognition and age-tailored assistance on smartphones. Our system performs impressively well on this dataset and significantly surpasses the state-of-the-art methods.

IJCAI Conference 2023 Conference Paper

Towards Hierarchical Policy Learning for Conversational Recommendation with Hypergraph-based Reinforcement Learning

  • Sen Zhao
  • Wei Wei
  • Yifan Liu
  • Ziyang Wang
  • Wendi Li
  • Xian-Ling Mao
  • Shuai Zhu
  • Minghui Yang

Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i. e. , ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i. e. , asked attribute or recommended item) to estimate the effectiveness of the director’s option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director’s option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.

AAAI Conference 2021 System Paper

IFDDS: An Anti-fraud Outbound Robot

  • Zihao Wang
  • Minghui Yang
  • Chunxiang Jin
  • Jia Liu
  • Zujie Wen
  • Saishuai Liu
  • Zhe Zhang

With the rapid growth of internet finance and e-payment, payment fraud has attracted increasing attention. To prevent customers from being cheated, systems often block risky payments depending on a risk factor. However, this may also inadvertently block cases which are not actually risky. To solve this problem, we present IFDDS, a system that proactively chats with customers through intelligent speech interaction to precisely determine the actual payment risk. Our system adopts imitation learning to learn dialogue policies. In addition, it encompasses a dialogue risk detection module which identifies fraud probability every turn based on the dialogue state. We create a web-based user interface which simulates a practical voice-based dialogue system.

IJCAI Conference 2021 Conference Paper

IIAS: An Intelligent Insurance Assessment System through Online Real-time Conversation Analysis

  • Mengdi Zhou
  • Shuang Peng
  • Minghui Yang
  • Nan Li
  • Hongbin Wang
  • Li Qiao
  • Haitao Mi
  • Zujie Wen

With the development of Chinese medical insurance industry, the amount of claim cases is growing rapidly. Ultimately, more claims necessarily indicate that the insurance company has to spend much time assessing claims and decides how much compensation the claimant should receive, which is a highly professional process that involves many complex operations. Therefore, the insurance assessor's role is essential. However, for the junior assessor often lacking in practical experience, it is not easy to quickly handle such an online procedure. In order to alleviate assessors' cognitive workload, we propose an Intelligent Insurance Assessment System (IIAS) that helps effectively collect claimant information through online real-time conversation analysis. With the assistance of IIAS, the average time cost of the insurance assessment procedure is reduced from 55 minutes to 35 minutes.

IJCAI Conference 2020 Conference Paper

Two-stage Behavior Cloning for Spoken Dialogue System in Debt Collection

  • Zihao Wang
  • Jia Liu
  • Hengbin Cui
  • Chunxiang Jin
  • Minghui Yang
  • Yafang Wang
  • Xiaolong Li
  • Renxin Mao

With the rapid growth of internet finance and the booming of financial lending, the intelligent calling for debt collection in FinTech companies has driven increasing attention. Nowadays, the widely used intelligent calling system is based on dialogue flow, namely configuring the interaction flow with the finite-state machine. In our scenario of debt collection, the completed dialogue flow contains more than one thousand interactive paths. All the dialogue procedures are artificially specified, with extremely high maintenance costs and error-prone. To solve this problem, we propose the behavior-cloning-based collection robot framework without any dialogue flow configuration, called two-stage behavior cloning (TSBC). In the first stage, we use multi-label classification model to obtain policies that may be able to cope with the current situation according to the dialogue state; in the second stage, we score several scripts under each obtained policy to select the script with the highest score as the reply for the current state. This framework makes full use of the massive manual collection records without labeling and fully absorbs artificial wisdom and experience. We have conducted extensive experiments in both single-round and multi-round scenarios and showed the effectiveness of the proposed system. The accuracy of a single round of dialogue can be improved by 5%, and the accuracy of multiple rounds of dialogue can be increased by 3. 1%.