Arrow Research search

Author name cluster

Leena Mathur

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
2 author rows

Possible papers

3

NeurIPS Conference 2024 Conference Paper

HEMM: Holistic Evaluation of Multimodal Foundation Models

  • Paul Pu Liang
  • Akshay Goindani
  • Talha Chafekar
  • Leena Mathur
  • Haofei Yu
  • Ruslan Salakhutdinov
  • Louis-Philippe Morency

Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e. g. , basic skills, information flows, and use cases) that pose challenges to today’s models, and (2) distill performance trends regarding how different modeling dimensions (e. g. , scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction-tuning yield actionable insights for future work in multimodal foundation models.

ICLR Conference 2024 Conference Paper

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

  • Xuhui Zhou
  • Hao Zhu 0011
  • Leena Mathur
  • Ruohong Zhang
  • Haofei Yu
  • Zhengyang Qi
  • Louis-Philippe Morency
  • Yonatan Bisk

*Humans are social beings*; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and *interact* under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We simulate the role-play interaction between LLM-based agents and humans within this task space and evaluate their performance with a holistic evaluation framework called SOTOPIA-Eval. With SOTOPIA, we find significant differences between these models in terms of their social intelligence, and we identify a subset of SOTOPIA scenarios, SOTOPIA-hard, that is generally challenging for all models. We find that on this subset, GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills. These findings demonstrate SOTOPIA's promise as a general platform for research on evaluating and improving social intelligence in artificial agents.

AAAI Conference 2021 Short Paper

Affect-Aware Machine Learning Models for Deception Detection

  • Leena Mathur

Automated deception detection systems can enhance societal well-being by helping humans detect deceivers and support people in high-stakes situations across health, social work, and legal domains. Existing computational approaches for detecting deception have not leveraged dimensional representations of affect, specifically valence and arousal, expressed during communication. My research presents a novel analysis of the potential for including affect in machine learning models for detecting deception. My work informs and motivates the development of affect-aware machine learning approaches for modeling deception and other social behaviors during human interactions in-the-wild. This research, independently defined and conducted by me, is from work-inprogress towards my undergraduate thesis in the Department of Computer Science at the University of Southern California.