Author name cluster

Yubin Kim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

NeurIPS Conference 2025 Conference Paper

ZeroS: Zero‑Sum Linear Attention for Efficient Transformers

Jiecheng Lu
Xu Han
Yan Sun
Viresh Pati
Yubin Kim
Siddhartha Somani
Shihao Yang

Linear attention methods offer Transformers $O(N)$ complexity but typically underperform standard softmax attention. We identify two fundamental limitations affecting these approaches: the restriction to convex combinations that only permits additive information blending, and uniform accumulated weight bias that dilutes attention in long contexts. We propose Zero-Sum Linear Attention (ZeroS), which addresses these limitations by removing the constant zero-order term $1/t$ and reweighting the remaining zero-sum softmax residuals. This modification creates mathematically stable weights, enabling both positive and negative values and allowing a single attention layer to perform contrastive operations. While maintaining $O(N)$ complexity, ZeroS theoretically expands the set of representable functions compared to convex combinations. Empirically, it matches or exceeds standard softmax attention across various sequence modeling benchmarks.

PDF Details

NeurIPS Conference 2024 Conference Paper

MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making

Yubin Kim
Chanwoo Park
Hyewon Jeong
Yik S. Chan
Xuhai Xu
Daniel McDuff
Hyeonhoon Lee
Marzyeh Ghassemi

Foundation models are becoming valuable tools in medicine. Yet despite their promise, the best way to leverage Large Language Models (LLMs) in complex medical tasks remains an open question. We introduce a novel multi-agent framework, named **M**edical **D**ecision-making **Agents** (**MDAgents**) that helps to address this gap by automatically assigning a collaboration structure to a team of LLMs. The assigned solo or group collaboration structure is tailored to the medical task at hand, a simple emulation inspired by the way real-world medical decision-making processes are adapted to tasks of different complexities. We evaluate our framework and baseline methods using state-of-the-art LLMs across a suite of real-world medical knowledge and clinical diagnosis benchmarks, including a comparison ofLLMs’ medical complexity classification against human physicians. MDAgents achieved the **best performance in seven out of ten** benchmarks on tasks requiring an understanding of medical knowledge and multi-modal reasoning, showing a significant **improvement of up to 4. 2\%** ($p$ < 0. 05) compared to previous methods' best performances. Ablation studies reveal that MDAgents effectively determines medical complexity to optimize for efficiency and accuracy across diverse medical tasks. Notably, the combination of moderator review and external medical knowledge in group collaboration resulted in an average accuracy **improvement of 11. 8\%**. Our code can be found at https: //github. com/mitmedialab/MDAgents.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Joint Engagement Classification using Video Augmentation Techniques for Multi-person HRI in the wild

Yubin Kim
Huili Chen
Sharifa Algohwinem
Cynthia Breazeal
Hae Won Park

Affect understanding capability is essential for social robots to autonomously interact with a group of users in an intuitive and reciprocal way. However, the challenge of multi-person affect understanding comes from not only the accurate perception of each user’s affective state (e. g. , engagement) but also the recognition of the affect interplay between the members (e. g. , joint engagement) that presents as complex, but subtle, nonverbal exchanges between them. Here, we present a novel hybrid framework for identifying a parent-child dyad’s joint engagement by combining a deep learning framework with various video augmentation techniques. Using a dataset of parent-child dyads reading storybooks together with a social robot at home, we first train RGB frame- and skeletonbased joint engagement recognition models with four video augmentation techniques (General Aug, DeepFake, CutOut, and Mixed) applied datasets to improve joint engagement classification performance. Second, we demonstrate experimental results on the use of trained models in the robot-parent-child interaction context. Third, we introduce a behavior-based metric for evaluating the learned representation of the models to investigate the model interpretability when recognizing joint engagement. This work serves as the first step toward fully unlocking the potential of end-to-end video understanding models pre-trained on large public datasets and augmented with data augmentation and visualization techniques for affect recognition in the multi-person human-robot interaction in the wild. Our code and detailed experimental results are available at https: //github. com/ybkim95/multi_person_joint_engagement

PDF

IJCAI Conference 2023 Conference Paper

MultiPar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations

Dong Won Lee
Yubin Kim
Rosalind W. Picard
Cynthia Breazeal
Hae Won Park

As we move closer to real-world social AI systems, AI agents must be able to deal with multiparty (group) conversations. Recognizing and interpreting multiparty behaviors is challenging, as the system must recognize individual behavioral cues, deal with the complexity of multiple streams of data from multiple people, and recognize the subtle contingent social exchanges that take place amongst group members. To tackle this challenge, we propose the Multiparty-Transformer (Multipar- T), a transformer model for multiparty behavior modeling. The core component of our proposed approach is Crossperson Attention, which is specifically designed to detect contingent behavior between pairs of people. We verify the effectiveness of Multipar-T on a publicly available video-based group engagement detection benchmark, where it outperforms state-of-the-art approaches in average F-1 scores by 5. 2% and individual class F-1 scores by up to 10. 0%. Through qualitative analysis, we show that our Crossperson Attention module is able to discover contingent behaviors.

PDF Details DOI

TIST Journal 2016 Journal Article

Using the Crowd to Improve Search Result Ranking and the Search Experience

Yubin Kim
Kevyn Collins-Thompson
Jaime Teevan

Despite technological advances, algorithmic search systems still have difficulty with complex or subtle information needs. For example, scenarios requiring deep semantic interpretation are a challenge for computers. People, on the other hand, are well suited to solving such problems. As a result, there is an opportunity for humans and computers to collaborate during the course of a search in a way that takes advantage of the unique abilities of each. While search tools that rely on human intervention will never be able to respond as quickly as current search engines do, recent research suggests that there are scenarios where a search engine could take more time if it resulted in a much better experience. This article explores how crowdsourcing can be used at query time to augment key stages of the search pipeline. We first explore the use of crowdsourcing to improve search result ranking. When the crowd is used to replace or augment traditional retrieval components such as query expansion and relevance scoring, we find that we can increase robustness against failure for query expansion and improve overall precision for results filtering. However, the gains that we observe are limited and unlikely to make up for the extra cost and time that the crowd requires. We then explore ways to incorporate the crowd into the search process that more drastically alter the overall experience. We find that using crowd workers to support rich query understanding and result processing appears to be a more worthwhile way to make use of the crowd during search. Our results confirm that crowdsourcing can positively impact the search experience but suggest that significant changes to the search process may be required for crowdsourcing to fulfill its potential in search systems.

Details DOI