Arrow Research search

Author name cluster

Xintian Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

AAAI Conference 2026 Conference Paper

AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design

  • Zhishuai Zhang
  • Xintian Li
  • Shilong Liu
  • Aodong Zhang
  • Lu Jie
  • Nan Sun

In this paper, we propose AMS-IO-Agent, a domain-specialized LLM-based agent for structure-aware input/output (I/O) subsystem generation in analog and mixed-signal (AMS) integrated circuits (ICs). The central contribution of this work is a framework that connects natural language design intent with industrial-level AMS IC design deliverables. AMS-IO-Agent integrates two key capabilities: (1) a structured domain knowledge base that captures reusable constraints and design conventions; (2) design intent structuring, which converts ambiguous user intent into verifiable logic steps using JSON and Python as intermediate formats. We further introduce AMS-IO-Bench, a benchmark for wirebond-packaged AMS I/O ring automation. On this benchmark, AMS-IO-Agent achieves over 70% DRC+LVS pass rate and reduces design turnaround time from hours to minutes, outperforming the baseline LLM. Furthermore, an agent-generated I/O ring was fabricated and validated in a 28 nm CMOS tape-out, demonstrating the practical effectiveness of the approach in real AMS IC design flows. To our knowledge, this is the first reported human-agent collaborative AMS IC design in which an LLM-based agent completes a nontrivial subtask with outputs directly used in silicon.

AAAI Conference 2026 Conference Paper

MrM: Black-Box Membership Inference Attacks Against Multimodal RAG Systems

  • Peiru Yang
  • Jinhua Yin
  • Haoran Zheng
  • Xueying Bai
  • Huili Wang
  • Yufei Sun
  • Xintian Li
  • Songwei Pei

Multimodal retrieval-augmented generation (RAG) systems enhance large vision-language models by integrating cross-modal knowledge, enabling their increasing adoption across real-world multimodal tasks. These knowledge databases may contain sensitive information that requires privacy protection. However, multimodal RAG systems inherently grant external users indirect access to such data, making them potentially vulnerable to privacy attacks, particularly membership inference attacks (MIAs). Existing MIA methods targeting RAG systems predominantly focus on the textual modality, while the visual modality remains relatively underexplored. To bridge this gap, we propose MrM, the first black-box MIA framework targeted at multimodal RAG systems. It utilizes a multi-object data perturbation framework constrained by counterfactual attacks, which can concurrently induce the RAG systems to retrieve the target data and generate information that leaks the membership information. Our method first employs an object-aware data perturbation method to constrain the perturbation to key semantics and ensure successful retrieval. Building on this, we design a counterfact-informed mask selection strategy to prioritize the most informative masked regions, aiming to eliminate the interference of model self-knowledge and amplify attack efficacy. Finally, we perform statistical membership inference by modeling query trials to extract features that reflect the reconstruction of masked semantics from response patterns. Experiments on two visual datasets and eight mainstream commercial visual-language models (e.g., GPT-4o, Gemini-2) demonstrate that MrM achieves consistently strong performance across both sample-level and set-level evaluations, and remains robust under adaptive defenses.

AAAI Conference 2026 Conference Paper

OncoCoT: A Temporal-causal Chain-of-Thought Dataset for Oncologic Decision-Making

  • Peiru Yang
  • Yudong Li
  • Shiting Wang
  • Xinyi Liu
  • Haotian Gan
  • Xintian Li
  • Qingyu Gao
  • Yongfeng Huang

Long Chain-of-Thought (CoT) reasoning has shown great promise in complex reasoning tasks, but its application to medical decision-making presents unique challenges. Unlike structured tasks relying on static verification frameworks, medical decision-making requires dynamic validation through longitudinal clinical outcomes, exhibiting temporal-causal dependencies that complicate the verification of reasoning processes. Therefore, we introduce a novel data construction framework specifically designed for medical decision-making. First, the framework analyzes real-world clinical cases to construct a timeline of medical events and identify critical decision points, including examination, diagnosis, and treatment. Subsequently, it employs a clinical causality-aware strategy to generate decision-making questions at the identified points, along with reasoning traces and corresponding answers. Finally, information drawn from future nodes serves as clinical logic-constrained criteria to re-evaluate and refine the soundness of the generated reasoning and responses. Building on this, we present OncoCoT, an oncologic decision-making dataset derived from clinical records over the past four years across eight common cancer types. Furthermore, we distill a subset of OncoCoT into a dedicated benchmark, OncoEval, to facilitate systematic evaluation of clinical reasoning capabilities in LLMs. Evaluation results show that existing state-of-the-art reasoning models, such as Deepseek-r1 and GPT-o3, exhibit limited capability in addressing clinical problems in OncoEval, highlighting the need for further improvement.

AAAI Conference 2026 Conference Paper

ShieldRAG: Safeguarding Retrieval-Augmented Generation from Untrusted Knowledge Bases

  • Peiru Yang
  • Haoran Zheng
  • Yi Luo
  • Xinyi Liu
  • Jinrui Wang
  • Huili Wang
  • Xintian Li
  • Yongfeng Huang

Open knowledge bases (e.g., websites) are widely adopted in Retrieval-Augmented Generation (RAG) systems to provide supplementary knowledge (e.g., latest information). However, such sources inevitably contain biased or harmful content, and incorporating these untrusted contents into the RAG process introduces significant safety risks, including the degradation of LLM performance and the potential generation of harmful outputs. Recent studies have shown that this vulnerability can be further amplified by adversarial poisoning attacks specifically targeting the knowledge sources. Most existing methods primarily emphasize improving the accuracy and efficiency of RAG systems, usually overlooking these critical safety concerns. In this paper, we propose a safety-aware retrieval framework (ShieldRAG) designed to augment language model generation by jointly optimizing for both relevance and safety in the retrieved knowledge content. The core idea of ShieldRAG is to transfer the safety knowledge implicitly encoded in powerful LLMs into the retriever model through an adversarial knowledge alignment mechanism. This can empower the retriever with the safety awareness, and adapt to the diverse and unknown distribution of unsafe content encountered in practical scenarios. We evaluate ShieldRAG on seven real-world datasets using five widely-used LLMs and two state-of-the-art poisoning attack strategies. Experimental results show that our method substantially improves the robustness of RAG systems against unsafe knowledge sources, while maintaining competitive performance in terms of generation accuracy and efficiency.