Arrow Research search

Author name cluster

Min Peng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2026 Conference Paper

HyperGLLM: An Efficient Framework for Endpoint Threat Detection via Hypergraph-Enhanced Large Language Models

  • Hongyi Zhou
  • Jianfeng Pan
  • Min Peng
  • Shaomang Huang
  • Hanzhong Zheng

Endpoint Detection and Response (EDR) systems are a cornerstone of modern threat detection and endpoint protection. However, conventional heuristic- and learning-based approaches often fail to address sophisticated and continuously evolving attack patterns. Recent advances in large language models (LLMs) offer promising capabilities for behavioral analysis in EDR logs, yet their effectiveness is hindered by the high volume of events and the interleaved nature of behavior sequences---posing significant challenges for long-context modeling and stealthy threat detection. To address these issues, we propose HyperGLLM, a novel detection framework that introduces hypergraph reasoning into LLMs. It first constructs an attribute-value level relation-aware graph to model low-order structural semantics while reducing textual redundancy. Then, it introduces a differential hypergraph module with multi-granularity clustering to capture high-order behavioral dependencies embedded in interleaved events and reinforce threat semantics. Finally, the hypergraph representations are aligned with an LLM for efficient contextual reasoning over potential malicious behaviors. To facilitate empirical evaluation, we curate EDR3.6B-63F, a large-scale EDR dataset containing 3.6 billion events across 63 distinct behavior families. Extensive experiments demonstrate that HyperGLLM significantly outperforms state-of-the-art methods by reducing the false alarm rate to 1.67%, achieving 94.65% accuracy across 63 behavior families, and improving the modeling efficiency of LLMs on long EDR logs. Our framework and dataset provide a solid foundation for future research and support the development of advanced detection solutions in endpoint security.

AAAI Conference 2026 Conference Paper

LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

  • Liutao
  • Xutao Mao
  • Dixuan Zhang
  • Yifan Li
  • LiuHaixin
  • KongLulu
  • Jiaming Hou
  • Rui Li

Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and hypothetical reasoning scenarios. LogicCat comprises 4,038 English questions paired 12,114 detailed chain-of-thought reasoning steps, spanning 45 databases across diverse domains, significantly surpassing existing datasets in complexity. Experimental results demonstrate that LogicCat substantially increases the task difficulty for current state-of-the-art models to at most 33.20% execution accuracy, indicating that this task remains exceptionally challenging. The advancement of LogicCat represents a crucial step toward developing systems suitable for real-world enterprise data analysis and autonomous query generation.

NeurIPS Conference 2025 Conference Paper

Enhanced Expert Merging for Mixture-of-Experts in Graph Foundation Models

  • Lei Liu
  • Xingyu Xia
  • Qianqian Xie
  • Ben Liu
  • Wenjie Xu
  • Min Peng

Graph foundation models (GFMs) have emerged as a promising paradigm for learning transferable knowledge across diverse graph-structured data. The inherent heterogeneity in features and graph structures poses significant challenges for building scalable and generalizable GFMs. Existing research has employed mixture-of-experts (MoE) models to handle the challenges, assigning the most suitable expert to each graph. Despite this, the underlying mechanisms of MoE within the context of GFMs remain insufficiently explored. In this work, we conduct an in-depth experimental study on an MoE-based GFM and uncover an intriguing finding: the experts ranked second and third assigned by the router perform better than the top-ranked expert. This insight motivates us to investigate the potential of leveraging knowledge embedded across multiple experts. However, directly ensembling the outputs of multiple experts would incur substantial computational overhead, while applying a standard expert merging strategy risks suboptimal performance. To address these challenges, we introduce two enhanced expert merging strategies that retain the computational efficiency of expert merging, while improving performance to approach the effectiveness of expert ensembling. Specifically, we propose (i) a knowledge distillation-inspired expert merging method that aligns the behavior of parameter-fused experts with expert ensembles, and (ii) a theoretical parameter proximity approach that leverages the similarity of expert parameters to approximate ensemble outputs while preserving diversity. Extensive experiments demonstrate that our methods effectively enhance model performance.

NeurIPS Conference 2025 Conference Paper

MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis

  • Mengxi Xiao
  • Ben Liu
  • He Li
  • Jimin Huang
  • Qianqian Xie
  • Xiaofen Zong
  • Mang Ye
  • Min Peng

The application of AI in psychiatric diagnosis faces significant challenges, including the subjective nature of mental health assessments, symptom overlap across disorders, and privacy constraints limiting data availability. To address these issues, we present MoodAngels, the first specialized multi-agent framework for mood disorder diagnosis. Our approach combines granular-scale analysis of clinical assessments with a structured verification process, enabling more accurate interpretation of complex psychiatric data. Complementing this framework, we introduce MoodSyn, an open-source dataset of 1, 173 synthetic psychiatric cases that preserves clinical validity while ensuring patient privacy. Experimental results demonstrate that MoodAngels outperforms conventional methods, with our baseline agent achieving 12. 3\% higher accuracy than GPT-4o on real-world cases, and our full multi-agent system delivering further improvements. Together, these contributions provide both an advanced diagnostic tool and a critical research resource for computational psychiatry, bridging important gaps in AI-assisted mental health assessment.

NeurIPS Conference 2024 Conference Paper

FinBen: A Holistic Financial Benchmark for Large Language Models

  • Qianqian Xie
  • Weiguang Han
  • Zhengyu Chen
  • Ruoyu Xiang
  • Xiao Zhang
  • Yueru He
  • Mengxi Xiao
  • Dong Li

LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 42 datasets spanning 24 financial tasks, covering eight critical aspects: information extraction (IE), textual analysis, question answering (QA), text generation, risk management, forecasting, decision-making, and bilingual (English and Spanish). FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and two novel datasets for regulations and stock trading. Our evaluation of 21 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals several key findings: While LLMs excel in IE and textual analysis, they struggle with advanced reasoning and complex tasks like text generation and forecasting. GPT-4 excels in IE and stock trading, while Gemini is better at text generation and forecasting. Instruction-tuned LLMs improve textual analysis but offer limited benefits for complex tasks such as QA. FinBen has been used to host the first financial LLMs shared task at the FinNLP-AgentScen workshop during IJCAI-2024, attracting 12 teams. Their novel solutions outperformed GPT-4, showcasing FinBen's potential to drive innovations in financial LLMs. All datasets and code are publicly available for the research community, with results shared and updated regularly on the Open Financial LLM Leaderboard.

AAAI Conference 2023 Conference Paper

Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer

  • Min Peng
  • Chongyang Wang
  • Yu Shi
  • Xiang-Dong Zhou

This paper presents a new method for end-to-end Video Question Answering (VideoQA), aside from the current popularity of using large-scale pre-training with huge feature extractors. We achieve this with a pyramidal multimodal transformer (PMT) model, which simply incorporates a learnable word embedding layer, a few convolutional and transformer layers. We use the anisotropic pyramid to fulfill video-language interactions across different spatio-temporal scales. In addition to the canonical pyramid, which includes both bottom-up and top-down pathways with lateral connections, novel strategies are proposed to decompose the visual feature stream into spatial and temporal sub-streams at different scales and implement their interactions with the linguistic semantics while preserving the integrity of local and global semantics. We demonstrate better or on-par performances with high computational efficiency against state-of-the-art methods on five VideoQA benchmarks. Our ablation study shows the scalability of our model that achieves competitive results for text-to-video retrieval by leveraging feature extractors with reusable pre-trained weights, and also the effectiveness of the pyramid. Code available at: https://github.com/Trunpm/PMT-AAAI23.

NeurIPS Conference 2023 Conference Paper

PIXIU: A Comprehensive Benchmark, Instruction Dataset and Large Language Model for Finance

  • Qianqian Xie
  • Weiguang Han
  • Xiao Zhang
  • Yanzhao Lai
  • Min Peng
  • Alejandro Lopez-Lira
  • Jimin Huang

Although large language models (LLMs) have shown great performance in natural language processing (NLP) in the financial domain, there are no publicly available financially tailored LLMs, instruction tuning datasets, and evaluation benchmarks, which is critical for continually pushing forward the open-source development of financial artificial intelligence (AI). This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 128K data samples to support the fine-tuning, and an evaluation benchmark with 8 tasks and 15 datasets. We first construct the large-scale multi-task instruction data considering a variety of financial tasks, financial document types, and financial data modalities. We then propose a financial LLM called FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. To support the evaluation of financial LLMs, we propose a standardized benchmark that covers a set of critical financial tasks, including six financial NLP tasks and two financial prediction tasks. With this benchmark, we conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks. The model, datasets, benchmark, and experimental results are open-sourced to facilitate future research in financial AI.

IJCAI Conference 2022 Conference Paper

Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering

  • Min Peng
  • Chongyang Wang
  • Yuan Gao
  • Yu Shi
  • Xiang-Dong Zhou

Video question answering (VideoQA) is challenging given its multimodal combination of visual understanding and natural language processing. While most existing approaches ignore the visual appearance-motion information at different temporal scales, it is unknown how to incorporate the multilevel processing capacity of a deep learning model with such multiscale information. Targeting these issues, this paper proposes a novel Multilevel Hierarchical Network (MHN) with multiscale sampling for VideoQA. MHN comprises two modules, namely Recurrent Multimodal Interaction (RMI) and Parallel Visual Reasoning (PVR). With a multiscale sampling, RMI iterates the interaction of appearance-motion information at each scale and the question embeddings to build the multilevel question-guided visual representations. Thereon, with a shared transformer encoder, PVR infers the visual cues at each level in parallel to fit with answering different question types that may rely on the visual information at relevant levels. Through extensive experiments on three VideoQA datasets, we demonstrate improved performances than previous state-of-the-arts and justify the effectiveness of each part of our method.