Arrow Research search

Author name cluster

Jiateng Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
2 author rows

Possible papers

3

IS Journal 2026 Journal Article

Pavlov’s Dog and Large Language Models: The Double-Edged Power of Context Conditioning

  • Denghui Zhang
  • Rushi Wang
  • Jiateng Liu
  • Kezia Oketch
  • Yiyu Shi
  • Heng Ji
  • Ahmed Abbasi

We introduce context conditioning, a phenomenon analogous to Pavlovian learning, in which large language models (LLMs) display heightened sensitivity to small amounts of novel contextual signals. This conditioning is double-edged. Carefully curated contexts can quickly steer models toward trustworthy, inclusive behavior, while minor malicious or biased signals can provoke unsafe, toxic, or privacy-compromising responses. We reveal this double-edged behavior with two studies that collectively highlight the underlying associative amplification mechanism through which novel or low-frequency contextual cues exert outsized influence on model attention and response distributions. Trust in context-based artificial intelligence (AI) thus depends not only on model design but also on how context governs behavior at inference time. We outline five research directions for building trustworthy context-based LLM systems and argue that the future of responsible AI lies not only in safer models but in safer contexts, meaning systems that understand, audit, and adapt to the stimuli that condition them.

ICLR Conference 2024 Conference Paper

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

  • Xingyao Wang 0002
  • Zihan Wang 0010
  • Jiateng Liu
  • Yangyi Chen
  • Lifan Yuan
  • Hao Peng 0009
  • Heng Ji 0001

To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users. These oversights contribute to discrepancies between research benchmark evaluations and real-world use cases. We introduce MINT, a benchmark that evaluates LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive users' natural language feedback simulated by GPT-4. We repurpose a diverse set of established evaluation datasets focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1--8% for each turn of tool use and 2--17% with natural language feedback. (b) Better single-turn performance does not guarantee better multi-turn performance. (c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.

AAAI Conference 2023 Conference Paper

CMNet: Contrastive Magnification Network for Micro-Expression Recognition

  • Mengting Wei
  • Xingxun Jiang
  • Wenming Zheng
  • Yuan Zong
  • Cheng Lu
  • Jiateng Liu

Micro-Expression Recognition (MER) is challenging because the Micro-Expressions' (ME) motion is too weak to distinguish. This hurdle can be tackled by enhancing intensity for a more accurate acquisition of movements. However, existing magnification strategies tend to use the features of facial images that include not only intensity clues as intensity features, leading to the intensity representation deficient of credibility. In addition, the intensity variation over time, which is crucial for encoding movements, is also neglected. To this end, we provide a reliable scheme to extract intensity clues while considering their variation on the time scale. First, we devise an Intensity Distillation (ID) loss to acquire the intensity clues by contrasting the difference between frames, given that the difference in the same video lies only in the intensity. Then, the intensity clues are calibrated to follow the trend of the original video. Specifically, due to the lack of truth intensity annotation of the original video, we build the intensity tendency by setting each intensity vacancy an uncertain value, which guides the extracted intensity clues to converge towards this trend rather some fixed values. A Wilcoxon rank sum test (Wrst) method is enforced to implement the calibration. Experimental results on three public ME databases i.e. CASME II, SAMM, and SMIC-HS validate the superiority against state-of-the-art methods.