Arrow Research search

Author name cluster

Vikram Adve

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

NeurIPS Conference 2025 Conference Paper

AgMMU: A Comprehensive Agricultural Multimodal Understanding Benchmark

  • Aruna Gauba
  • Irene Pi
  • Yunze Man
  • Ziqi Pang
  • Vikram Adve
  • Yu-Xiong Wang

We present AgMMU, a challenging real‑world benchmark for evaluating and advancing vision-language models (VLMs) in the knowledge‑intensive domain of agriculture. Unlike prior datasets that rely on crowdsourced prompts, AgMMU is distilled from 116, 231 authentic dialogues between everyday growers and USDA-authorized Cooperative Extension experts. Through a three‑stage pipeline: automated knowledge extraction, QA generation, and human verification, we construct (i) AgMMU, an evaluation set of 746 multiple‑choice questions (MCQs) and 746 open‑ended questions (OEQs), and (ii) AgBase, a development corpus of 57, 079 multimodal facts covering five high-stakes agricultural topics: insect identification, species identification, disease categorization, symptom description, and management instruction. AgMMU has three key advantages: - Authentic & Expert‑Verified: All facts, images, and answers originate from real farmer and gardener inquiries answered by credentialed specialists, ensuring high‑fidelity agricultural knowledge. - Complete Development Suite: AgMMU uniquely couples a dual‑format evaluation benchmark (MCQ and OEQ) with AgBase, a large‑scale training set, enabling both rigorous assessment and targeted improvement of VLMs. - Knowledge‑intensive Challenge: Our tasks demand the synergy of nuanced visual perception and domain expertise, exposing fundamental limitations of current general‑purpose models and charting a path toward robust, application‑ready agricultural AI. Benchmarking 12 leading VLMs reveals pronounced gaps in fine‑grained perception and factual grounding. Open‑sourced models trail after proprietary ones by a wide margin. Simple fine‑tuning on AgBase boosts open-sourced model performance on challenging OEQs for up to 11. 6\% on average, narrowing this gap and also motivating future research to propose better strategies in knowledge extraction and distillation from AgBase. We hope AgMMU stimulates research on domain‑specific knowledge integration and trustworthy decision support in agriculture AI development.

NeurIPS Conference 2025 Conference Paper

MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations

  • Vardhan Dongre
  • Chi Gui
  • Shubham Garg
  • Hooshang Nayyeri
  • Gokhan Tur
  • Dilek Hakkani-Tur
  • Vikram Adve

We introduce MIRAGE, a new benchmark for multimodal expert-level reasoning and decision-making in consultative interaction settings. Designed for the domain of agriculture, MIRAGE captures the full complexity of expert consultations by combining natural user queries, expert-authored responses, and image-based context, offering a high-fidelity benchmark for evaluating models on grounded reasoning, clarification strategies, and long-form generation in a real-world, knowledge-intensive domain. Grounded in over 35, 000 real user-expert interactions, and curated through a carefully designed multi-step pipeline, MIRAGE spans diverse crop health, pest diagnosis, and crop management scenarios. The benchmark includes more than 7, 000 unique biological entities, covering plant species, pests, and diseases, making it one of the most taxonomically diverse benchmarks available for vision-language models in real-world expert-guided domains. Unlike existing benchmarks that rely on well-specified user inputs, MIRAGE features underspecified, context-rich scenarios, requiring models to infer latent knowledge gaps and either proactively guide the interaction or respond. Our benchmark comprises two core components. The Single-turn Challenge to reason over a single user turn and image set, identify relevant entities, infer causal explanations, and generate actionable recommendations; and a Multi-Turn challenge for dialogue state tracking, goal-driven generation, and expert-level conversational decision-making. We evaluate more than 20 closed and open-source frontier vision-language models (VLMs), using three reasoning language models as evaluators, highlighting the significant challenges posed by MIRAGE in both single-turn and multi-turn interaction settings. Even the advanced GPT4. 1 and GPT4o models achieve 44. 6% and 40. 9% accuracy, respectively, indicating significant room for improvement.