Author name cluster

Anuj Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

EAAI Journal 2026 Journal Article

An explainable fine-tuned transfer learning approach for multi-classification of skin cancer disease

Ishwari Singh Rajput
Anuj Kumar
Jeewan Singh Koranga
Megha Papola
Tanuja Bisht
Tanay Pratap Singh

Details DOI

NeurIPS Conference 2025 Conference Paper

VisualLens: Personalization through Task-Agnostic Visual History

Wang Bill Zhu
Deqing Fu
Kai Sun
Yi Lu
Zhaojiang Lin
Seungwhan Moon
Kanika Narang
Mustafa Canim

Existing recommendation systems either rely on user interaction logs, such as online shopping history for shopping recommendations, or focus on text signals. However, item-based histories are not always accessible and generalizable for multimodal recommendation. We hypothesize that a user's visual history --- comprising images from daily life --- can offer rich, task-agnostic insights into their interests and preferences, and thus be leveraged for effective personalization. To this end, we propose VisualLens, a novel framework that leverages multimodal large language models (MLLMs) to enable personalization using task-agnostic visual history. VisualLens extracts, filters, and refines a spectrum user profile from the visual history to support personalized recommendation. We created two new benchmarks, Google-Review-V and Yelp-V, with task-agnostic visual histories, and show that VisualLens improves over state-of-the-art item-based multimodal recommendations by 5-10\% on Hit@3, and outperforms GPT-4o by 2-5\%. Further analysis shows that VisualLens is robust across varying history lengths and excels at adapting to both longer histories and unseen content categories.

PDF Details

NeurIPS Conference 2025 Conference Paper

WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios

Eun Chang
Zhuangqun Huang
Yiwei Liao
Sagar Bhavsar
Amogh Param
Tammy Stark
Adel Ahmadyan
Xiao Yang

We introduce WearVQA, the first benchmark specifically designed to evaluate the visual questionanswering (VQA) capabilities of multi-modal AI assistant on wearable devices like smart glasses. Unlikeprior benchmarks that focus on high-quality, third-person imagery, WearVQA reflects the unique chal-lenges of ego-centric interaction—where visual inputs may be occluded, poorly lit, unzoomed, or blurry, and questions are grounded in realistic wearable use cases. The benchmark comprises 2, 500 carefullycurated image-question-answer triplets, spanning 7 diverse image domains including both text-centricand general scenes, 10 cognitive task types ranging from basic recognition to various forms of reasoning, and 6 common wearables-specific image quality issues. All questions are designed to be answerable usingonly the visual input and common senses. WearVQA is paired with a rigorous LLM-as-a-judge evaluationframework with 96% labeling accuracy. Open-source and proprietary multi-modal LLMs achieved a QAaccuracy as low as 24–52% on WearVQA, with substantial drops on lower-quality images and reasoning-heavy tasks. These observations position WearVQA as a comprehensive and challenging benchmark forguiding technicial advancement towards robust, real-world multi-modal wearables AI systems.

PDF Details

NeurIPS Conference 2024 Conference Paper

CRAG - Comprehensive RAG Benchmark

Xiao Yang
Kai Sun
Hao Xin
Yushi Sun
Nikita Bhalla
Xiangsen Chen
Sajal Choudhary
Rongze D. Gui

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)’s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4, 409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search. CRAG is designed to encapsulate a diverse array of questions across five domains and eight question categories, reflecting varied entity popularity from popular to long-tail, and temporal dynamisms ranging from years to seconds. Our evaluation on this benchmark highlights the gap to fully trustworthy QA. Whereas most advanced LLMs achieve $\le 34\%$ accuracy on CRAG, adding RAG in a straightforward manner improves the accuracy only to 44%. State-of-the-art industry RAG solutions only answer 63% questions without any hallucination. CRAG also reveals much lower accuracy in answering questions regarding facts with higher dynamism, lower popularity, or higher complexity, suggesting future research directions. The CRAG benchmark laid the groundwork for a KDD Cup 2024 challenge, attracted thousands of participants and submissions. We commit to maintaining CRAG to serve research communities in advancing RAG solutions and general QA solutions. CRAG is available at https: //github. com/facebookresearch/CRAG/.

PDF Details DOI