Author name cluster

Rui Meng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

Benchmarking LLMs for Political Science: A United Nations Perspective

Yueqing Liang
Liangwei Yang
Chen Wang
Congying Xia
Rui Meng
Xiongxiao Xu
Haoran Wang
Ali Payani

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process—drafting, voting, and discussing—and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. To the best of our knowledge, this is the first benchmark to systematically evaluate LLMs in UN decision-making, contributing to the growing intersection of AI and political science.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SMPRO: Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking

Sirnam Swetha
Rui Meng
Shwetha Ram
Tal Neiman
Son Tran
Mubarak Shah

Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning models with human preferences. However, existing DPO-based methods suffer from 3 key drawbacks: they rely on only a single positive-negative preference pair per question, restricting the diversity and richness of feedback; they often emphasize minimizing negative preference scores while neglecting to strengthen the positive preferences; and they depend on either human-annotated preferences or expert model outputs - both expensive and difficult to scale. Moreover, the deterministic ranking assumptions of recent Group-based preference optimization methods break down in open-ended tasks such as Visual Question Answering (VQA), where multiple answers can be equally plausible but differ subtly in relevance or specificity. Given this subtle variance in preferences, we propose to perform ranking over groups of preferences rather than relying on fine-grained ranking of individual ones, which is often noisy and subjective. To address these challenges, we introduce Self-Supervised Visual Preference Alignment via Differentiable Multi-Preference Multi-Group Ranking (SMPRO), a novel framework that (1) self-generates rich, diverse preference groups while eliminating the need for external annotations, (2) employs a fully differentiable ranking objective based on sorting networks to capture nuanced preference gradients across arbitrary numbers of preferences both within and across these groups, and (3) incorporates multiple positive preferences to enrich the positive preference group, capturing subtle distinctions among high-quality preferences. Extensive experiments across diverse visual tasks show that our approach achieves state-of-the-art performance in self-supervised setting. Specifically, our model surpasses existing baselines, achieving notable gains such as 82.4% on MM-Bench, 63.2% on MMStar, 94.6% on LLaVA-W, and 81.9% on AI2D. These results underscore the effectiveness of our approach in capturing richer preference signals and demonstrate its scalability for open-ended, ambiguous VQA tasks.

PDF Details DOI

TMLR Journal 2026 Journal Article

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Rui Meng
Ziyan Jiang
Ye Liu
Mingyi Su
Xinyi Yang
Yuepeng Fu
Can Qin
Raghuveer Thirukovalluru

Multimodal embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering over different modalities. However, existing multimodal embeddings like VLM2Vec, E5-V, GME are predominantly focused on natural images, with limited support for other visual forms such as videos and visual documents. This restricts their applicability in real-world scenarios, including AI agents, retrieval-augmented generation (RAG) systems, and recommendation. To close this gap, we propose VLM2Vec-V2, a unified framework for learning embeddings across diverse visual forms. First, we introduce MMEB-V2, a comprehensive benchmark that extends MMEB with five new task types: visual document retrieval, video retrieval, temporal grounding, video classification and video question answering -- spanning text, image, video, and visual document inputs. Next, we train VLM2Vec-V2, a general-purpose embedding model that supports text, image, video, and visual document inputs. Extensive experiments show that VLM2Vec-V2 achieves strong performance not only on the newly introduced video and document retrieval tasks, but also improves over prior baselines on the original image benchmarks. Through extensive evaluation, our study offers insights into the generalizability of various multimodal embedding models and highlights effective strategies for unified embedding learning, laying the groundwork for more scalable and adaptable representation learning in both research and real-world settings.

PDF Details

ICLR Conference 2025 Conference Paper

Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models

Jianqun Zhou
Yuanlei Zheng
Wei Chen
Qianqian Zheng
Zeyuan Shang
Wei Zhang 0185
Rui Meng
Xiaoyu Shen 0001

Instruction-following capabilities in large language models (LLMs) have progressed significantly, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings indicate that although fine-tuning models on instruction-aware retrieval datasets and increasing model size enhance performance, most models still fall short of instruction compliance. We release our dataset and code on https://github.com/EIT-NLP/InfoSearch.

Details

NeurIPS Conference 2025 Conference Paper

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining

Raghuveer Thirukovalluru
Rui Meng
Ye Liu
Karthikeyan K
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou

Contrastive learning (CL) is a prevalent technique for training embedding models, which pulls semantically similar examples (positives) closer in the representation space while pushing dissimilar ones (negatives) further apart. A key source of negatives are "in-batch" examples, i. e. , positives from other examples in the batch. Effectiveness of such models is hence strongly influenced by the size and quality of training batches. In this work, we propose Breaking the Batch Barrier (B3), a novel batch construction strategy designed to curate high-quality batches for CL. Our approach begins by using a pretrained teacher embedding model to rank all examples in the dataset, from which a sparse similarity graph is constructed. A community detection algorithm is then applied to this graph to identify clusters of examples that serve as strong negatives for one another. The clusters are then used to construct batches that are rich in in-batch negatives. Empirical results on the MMEB multimodal embedding benchmark (36 tasks) demonstrate that our method sets a new state of the art, outperforming previous best methods by +1. 3 and +2. 9 points at the 7B and 2B model scales, respectively. Notably, models trained with B3 surpass existing state-of-the-art results even with a batch size as small as 64, which is 4–16× smaller than that required by other methods. Moreover, experiments show that B3 generalizes well across domains and tasks, maintaining strong performance even when trained with considerably weaker teachers.

PDF Details

ICLR Conference 2025 Conference Paper

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

Ziyan Jiang
Rui Meng
Xinyi Yang 0002
Semih Yavuz
Yingbo Zhou 0002
Wenhu Chen

Embedding models play a crucial role in a variety of downstream tasks, including semantic similarity, information retrieval, and clustering. While there has been a surge of interest in developing universal text embedding models that generalize across tasks (e.g., MTEB), progress in learning universal multimodal embedding models has been comparatively slow, despite their importance and practical applications. In this work, we explore the potential of building universal multimodal embeddings capable of handling a broad range of downstream tasks. Our contributions are twofold: (1) we propose MMEB (Massive Multimodal Embedding Benchmark), which covers four meta-tasks (classification, visual question answering, multimodal retrieval, and visual grounding) and 36 datasets, including 20 training datasets and 16 evaluation datasets spanning both in-distribution and out-of-distribution tasks, and (2) VLM2Vec (Vision-Language Model → Vector), a contrastive training framework that transforms any vision-language model into an embedding model through contrastive training on MMEB. Unlike previous models such as CLIP and BLIP, which encode text and images independently without task-specific guidance, VLM2Vec can process any combination of images and text while incorporating task instructions to generate a fixed-dimensional vector. We develop a series of VLM2Vec models based on state-of-the-art VLMs, including Phi-3.5-V, LLaVA-1.6, and Qwen2-VL, and evaluate them on MMEB’s benchmark. With LoRA tuning, VLM2Vec achieves a 10% to 20% improvement over existing multimodal embedding models on MMEB’s evaluation sets. Our findings reveal that VLMs are surprisingly strong embedding models.

Details

ICLR Conference 2024 Conference Paper

Decoupled Marked Temporal Point Process using Neural Ordinary Differential Equations

Yujee Song
Donghyun Lee 0006
Rui Meng
Won Hwa Kim

A Marked Temporal Point Process (MTPP) is a stochastic process whose realization is a set of event-time data. MTPP is often used to understand complex dynamics of asynchronous temporal events such as money transaction, social media, healthcare, etc. Recent studies have utilized deep neural networks to capture complex temporal dependencies of events and generate embedding that aptly represent the observed events. While most previous studies focus on the inter-event dependencies and their representations, how individual events influence the overall dynamics over time has been under-explored. In this regime, we propose a Decoupled MTPP framework that disentangles characterization of a stochastic process into a set of evolving influences from different events. Our approach employs Neural Ordinary Differential Equations (Neural ODEs) to learn flexible continuous dynamics of these influences while simultaneously addressing multiple inference problems, such as density estimation and survival rate computation. We emphasize the significance of disentangling the influences by comparing our framework with state-of-the-art methods on real-life datasets, and provide analysis on the model behavior for potential applications.

Details

AAAI Conference 2022 Conference Paper

Unsupervised Deep Keyphrase Generation

Xianjie Shen
Yinghan Wang
Rui Meng
Jingbo Shang

Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKey- Gen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.

PDF Details