Arrow Research search

Author name cluster

Mingyu Derek Ma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification

  • Zixun Xiong
  • Gaoyi Wu
  • Qingyang Yu
  • Mingyu Derek Ma
  • Lingfeng Yao
  • Miao Pan
  • Xiaojiang Du
  • Hao Wang

Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) becomes increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning, or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the model and an external module, reinforced by an error-correction mechanism and a similarity-based verification strategy. These components are resistant to verification-time attacks, including collusion-based fingerprint unlearning and response manipulation, backed by both theoretical analysis and empirical results. iSeal achieves 100% Fingerprint Success Rate (FSR) on 12 LLMs against more than 10 attacks, while baselines fail under unlearning and response manipulations.

ICML Conference 2025 Conference Paper

GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation

  • Jiashu He
  • Mingyu Derek Ma
  • Jinxuan Fan
  • Dan Roth 0001
  • Wei Wang 0010
  • Alejandro Ribeiro

Existing approaches based on context prompting or reinforcement learning (RL) to improve the reasoning capacities of large language models (LLMs) depend on the LLMs’ internal knowledge to produce reliable Chain-Of-Thought (CoT). However, no matter the size of LLMs, certain problems cannot be resolved in a single forward pass. Meanwhile, agent-based reasoning systems require access to a comprehensive nonparametric knowledge base, which is often costly or not feasible for use in scientific and niche domains. We present Graph Inspired Veracity Extrapolation (GIVE), a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. GIVE guides the LLM agent to select the most pertinent expert data ($\textbf{observe}$), engage in query-specific associative thinking ($\textbf{reflect}$), and then synthesize this information to produce the final output ($\textbf{speak}$). Extensive experiments demonstrated the following benefits of our framework: (1) GIVE increases the performance of LLMs across various sizes. (2) In some scenarios, GIVE allows smaller LLMs to surpass larger, more sophisticated ones in scientific tasks ($\textbf{GPT3. 5T + GIVE > GPT4}$). (3) GIVE is effective on scientific and open-domain assessments. (4) GIVE is a training-free method that enables LLMs to tackle new problems that extend beyond their training data (up to $\textbf{43. 5}$% $\rightarrow$ $\textbf{88. 2}$% accuracy improvement). (5) GIVE allows LLM agents to reason using both restricted (very small) and noisy (very large) knowledge sources, accommodating knowledge graphs (KG) ranging from $\textbf{135}$ to more than $\textbf{840k}$ nodes. (6) The reasoning process involved in GIVE is fully interpretable. Our code is available at https: //github. com/Jason-Tree/GIVE

AAAI Conference 2025 Conference Paper

Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction

  • Mingyu Derek Ma
  • Xiaoxuan Wang
  • Yijia Xiao
  • Anthony Cuturrufo
  • Vijay S Nori
  • Eran Halperin
  • Wei Wang

Clinical diagnosis prediction models, when provided with a patient's medical history, aim to detect potential diseases early, facilitating timely intervention and improving prognostic outcomes. However, the inherent scarcity of patient data and large disease candidate space often pose challenges in developing satisfactory models for this intricate task. The exploration of leveraging Large Language Models (LLMs) for encapsulating clinical decision processes has been limited. We introduce MERA, a clinical diagnosis prediction model that bridges pertaining natural language knowledge with medical practice. We apply hierarchical contrastive learning on a disease candidate ranking list to alleviate the large decision space issue. With concept memorization through fine-tuning, we bridge the natural language clinical knowledge with medical codes. Experimental results on MIMIC-III and IV datasets show that MERA achieves the state-of-the-art diagnosis prediction performance and dramatically elevates the diagnosis prediction capabilities of generative LMs.

ICLR Conference 2025 Conference Paper

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

  • Fei Wang 0060
  • Xingyu Fu
  • James Y. Huang
  • Zekun Li 0007
  • Qin Liu 0010
  • Xiaogeng Liu
  • Mingyu Derek Ma
  • Nan Xu 0014

We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a pairwise manner, where each standard instance is paired with an unanswerable variant that has minimal semantic differences, in order for a reliable assessment. Evaluated upon 20 recent multi-modal LLMs, our results reveal that even the best-performing models like GPT-4o and Gemini Pro find it challenging to solve MuirBench, achieving 68.0% and 49.3% in accuracy. Open-source multimodal LLMs trained on single images can hardly generalize to multi-image questions, hovering below 33.3% in accuracy. These results highlight the importance of MuirBench in encouraging the community to develop multimodal LLMs that can look beyond a single image, suggesting potential pathways for future improvements.

NeurIPS Conference 2024 Conference Paper

GraphVis: Boosting LLMs with Visual Knowledge Graph Integration

  • Yihe Deng
  • Chenchen Ye
  • Zijie Huang
  • Mingyu Derek Ma
  • Yiwen Kou
  • Wei Wang

The rapid evolution of large language models (LLMs) has expanded their capabilities across various data modalities, extending from well-established image data to increasingly popular graph data. Given the limitation of LLMs in hallucinations and inaccuracies in recalling factual knowledge, Knowledge Graph (KG) has emerged as a crucial data modality to support more accurate reasoning by LLMs. However, integrating structured knowledge from KGs into LLMs remains challenging, as most current KG-enhanced LLM methods directly convert the KG into linearized text triples, which is not as expressive as the original structured data. To address this, we introduce GraphVis, which conserves the intricate graph structure through the visual modality to enhance the comprehension of KGs with the aid of Large Vision Language Models (LVLMs). Our approach incorporates a unique curriculum fine-tuning scheme which first instructs LVLMs to recognize basic graphical features from the images, and subsequently incorporates reasoning on QA tasks with the visual graphs. This cross-modal methodology not only markedly enhances performance on standard textual QA but also shows improved zero-shot VQA performance by utilizing synthetic graph images to augment the data for VQA tasks. We present comprehensive evaluations across commonsense reasoning QA benchmarks, where GraphVis provides an average improvement of 11. 1% over its base model and outperforms existing KG-enhanced LLM approaches. Across VQA benchmarks such as ScienceQA that share similar scientific diagram images, GraphVis provides a notable gain of 4. 32%.

AAAI Conference 2024 System Paper

MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways

  • Mingyu Derek Ma
  • Alexander K. Taylor
  • Nuan Wen
  • Yanchen Liu
  • Po-Nien Kung
  • Wenna Qin
  • Shicheng Wen
  • Azure Zhou

We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users, we construct communities among users and develop the propagation forecasting capability, enabling tracing and understanding of how information is disseminated at a higher level. A demo video and more are available at https://info-pathways.github.io.

AAAI Conference 2024 Conference Paper

STAR: Boosting Low-Resource Information Extraction by Structure-to-Text Data Generation with Large Language Models

  • Mingyu Derek Ma
  • Xiaoxuan Wang
  • Po-Nien Kung
  • P. Jeffrey Brantingham
  • Nanyun Peng
  • Wei Wang

Information extraction tasks such as event extraction require an in-depth understanding of the output structure and sub-task dependencies. They heavily rely on task-specific training data in the form of (passage, target structure) pairs to obtain reasonable performance. However, obtaining such data through human annotation is costly, leading to a pressing need for low-resource information extraction approaches that require minimal human labeling for real-world applications. Fine-tuning supervised models with synthesized training data would be a generalizable method, but the existing data generation methods either still rely on large-scale ground-truth data or cannot be applied to complicated IE tasks due to their poor performance. To address these challenges, we propose STAR, a data generation method that leverages Large Language Models (LLMs) to synthesize data instances given limited seed demonstrations, thereby boosting low-resource information extraction performance. Our approach involves generating target structures (Y) followed by generating passages (X), all accomplished with the aid of LLMs. We design fine-grained step-by-step instructions to obtain the initial data instances. We further reduce errors and improve data quality through self-reflection error identification and self-refinement with iterative revision. Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks, even surpassing the effectiveness of human-curated data. Human assessment of the data quality shows STAR-generated data exhibit higher passage quality and better align with the task definitions compared with the human-curated data.