Author name cluster

Benjamin Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

NeurIPS Conference 2025 Conference Paper

ExAct: A Video-Language Benchmark for Expert Action Analysis

Han Yi
Yulu Pan
Feihong He
Xinyu Liu
Benjamin Zhang
Oluwatumininu Oguntola
Gedas Bertasius

We present ExAct, a new video-language benchmark for expert-level understanding of skilled physical human activities. Our new benchmark contains 3, 521 expert-curated video question-answer pairs spanning 11 physical activities in 6 domains: Sports, Bike Repair, Cooking, Health, Music, and Dance. ExAct requires the correct answer to be selected from five carefully designed candidate options, thus necessitating a nuanced, fine-grained, expert-level understanding of physical human skills. Evaluating the recent state-of-the-art VLMs on ExAct reveals a substantial performance gap relative to human expert performance. Specifically, the best-performing Gemini 2. 5 Pro model achieves only 55. 35% accuracy, well below the 82. 02% attained by trained human experts. We believe that ExAct will be beneficial for developing and evaluating VLMs capable of precise understanding of human skills in various physical and procedural domains. Dataset and code are available at https: //texaser. github. io/exact project page/.

PDF Details

NeurIPS Conference 2025 Conference Paper

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan
Xingjian Li
Benjamin Zhang
Tuhin Sahai
Stanley Osher
Markos Katsoulakis

We study Transformers through the perspective of optimal control theory, using tools from continuous-time formulations to derive actionable insights into training and architecture design. This framework improves the performance of existing Transformer models while providing desirable theoretical guarantees, including generalization and robustness. Our framework is designed to be plug-and-play, enabling seamless integration with established Transformer models and requiring only slight changes to the implementation. We conduct seven extensive experiments on tasks motivated by text generation, sentiment analysis, image classification, and point cloud classification. Experimental results show that the framework improves the test performance of the baselines, while being more parameter-efficient. On character-level text generation with nanoGPT, our framework achieves a 46\% reduction in final test loss while using 42\% fewer parameters. On GPT-2, our framework achieves a 9. 3\% reduction in final test loss, demonstrating scalability to larger models. To the best of our knowledge, this is the first work that applies optimal control theory to both the training and architecture of Transformers. It offers a new foundation for systematic, theory-driven improvements and moves beyond costly trial-and-error approaches.

PDF Details