Author name cluster

Ryan A. Rossi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

30 papers

2 author rows

AAAI Conference 2026 Conference Paper

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use

Zhehao Zhang
Ryan A. Rossi
Tong Yu
Franck Dernoncourt
Ruiyi Zhang
Jiuxiang Gu
Sungchul Kim
Xiang Chen

While vision-language models (VLMs) have demonstrated remarkable performance across various tasks combining textual and visual information, they continue to struggle with fine-grained visual perception tasks that require detailed pixel-level analysis. Effectively eliciting comprehensive reasoning from VLMs on such intricate visual elements remains an open challenge. In this paper, we present VipAct, an agent framework that enhances VLMs by integrating multi-agent collaboration and vision expert models, enabling more precise visual understanding and comprehensive reasoning. VipAct consists of an orchestrator agent, which manages task requirement analysis, planning, and coordination, along with specialized agents that handle specific tasks such as image captioning and vision expert models that provide high-precision perceptual information. This multi-agent approach allows VLMs to better perform fine-grained visual perception tasks by synergizing planning, reasoning, and tool use. We evaluate VipAct on benchmarks featuring a diverse set of visual perception tasks, with experimental results demonstrating significant performance improvements over state-of-the-art baselines across all tasks. Furthermore, comprehensive ablation studies reveal the critical role of multi-agent collaboration in eliciting more detailed System-2 reasoning and highlight the importance of image input for task planning. Additionally, our error analysis identifies patterns of VLMs' inherent limitations in visual perception, providing insights into potential future improvements. VipAct offers a flexible and extensible framework, paving the way for more advanced visual perception systems across various real-world applications.

PDF Details DOI

ICLR Conference 2025 Conference Paper

A Large-scale Training Paradigm for Graph Generative Models

Yu Wang 0160
Ryan A. Rossi
Namyong Park 0001
Huiyuan Chen
Nesreen K. Ahmed
Puja Trivedi
Franck Dernoncourt
Danai Koutra

Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of texts, images, videos, and audio that are extremely diverse from numerous domains. This large-scale training paradigm on diverse well-curated data enhances the creativity and diversity of the generated content. However, all previous graph-generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a large-scale training paradigm that uses a large corpus of graphs (over 5000 graphs) from 13 domains, leading to the development of large graph generative models (LGGMs). We empirically demonstrate that the pre-trained LGGMs have superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGMs can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGMs with the Text-to-Graph generation capability, such as providing the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system.") and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying language model, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://github.com/KINDLab-Fly/LGGM.