Arrow Research search

Author name cluster

Naoki Makishima

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

AAAI Conference 2026 Conference Paper

Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models

  • Satoshi Suzuki
  • Shin'ya Yamaguchi
  • Shoichiro Takeda
  • Taiga Yamane
  • Naoki Makishima
  • Naotaka Kawata
  • Mana Ihori
  • Tomohiro Tanaka

Contrastive pre-trained vision-language models, such as CLIP, demonstrate strong generalization abilities in zero-shot classification by leveraging embeddings extracted from image and text encoders. This paper aims to robustly fine-tune these vision-language models on in-distribution (ID) data without compromising their generalization abilities in out-of-distribution (OOD) and zero-shot settings. Current robust fine-tuning methods tackle this challenge by reusing contrastive learning, which was used in pre-training, for fine-tuning. However, we found that these methods distort the geometric structure of the embeddings, which plays a crucial role in the generalization of vision-language models, resulting in limited OOD and zero-shot performance. To address this, we propose Difference Vector Equalization (DiVE), which preserves the geometric structure during fine-tuning. The idea behind DiVE is to constrain difference vectors, each of which is obtained by subtracting the embeddings extracted from the pre-trained and fine-tuning models for the same data sample. By constraining the difference vectors to be equal across various data samples, we effectively preserve the geometric structure. Therefore, we introduce two losses: average vector loss (AVL) and pairwise vector loss (PVL). AVL preserves the geometric structure globally by constraining difference vectors to be equal to their weighted average. PVL preserves the geometric structure locally by ensuring a consistent multimodal alignment. Our experiments demonstrate that DiVE effectively preserves the geometric structure, achieving strong results across ID, OOD, and zero-shot metrics.

AAAI Conference 2025 Conference Paper

Multimodal Fine-Grained Apparent Personality Trait Recognition: Joint Modeling of Big Five and Questionnaire Item-level Scores

  • Ryo Masumura
  • Shota Orihashi
  • Mana Ihori
  • Tomohiro Tanaka
  • Naoki Makishima
  • Satoshi Suzuki
  • Saki Mizuno
  • Nobukatsu Hojo

This paper presents a novel method for automatically recognizing people's apparent personality traits as perceived by others. In previous studies, apparent personality trait recognition from multimodal human behavior is often modeled to directly estimate personality trait scores, i.e., the ``Big Five'' scores. In the model training phase, ground-truth personality trait scores were often determined from personality test results scored by many other people using fine-grained questionnaires, however, rich information in the personality test results have not been leveraged for anything other than determining the ground-truth Big Five scores. The scores assigned to each questionnaire item are thought to include more meta-level differences in personality characteristics. Therefore, we propose joint modeling methods that can estimate not only the Big Five scores but also questionnaire item-level scores. This enables us to improve awareness of multimodal human behavior. In addition, we present a newly created self-introduction video dataset with 50-item Big Five questionnaire results since previous apparent personality trait recognition datasets do not provide such personality test results. Experiments using the created dataset demonstrate that our proposed joint modeling methods with a multimodal transformer backbone can improve to estimate Big Five scores and effectively estimate questionnaire item-level scores. We also verify that the estimation performance reached human evaluation performance.