Jae-Jun Lee Papers

ICLR Conference 2025 Conference Paper

Can One Modality Model Synergize Training of Other Modality Models?

Jae-Jun Lee
Sung Whan Yoon

Learning with multiple modalities has recently demonstrated significant gains in many domains by maximizing the shared information across modalities. However, the current approaches strongly rely on high-quality paired datasets, which allow co-training from the paired labels from different modalities. In this context, we raise a pivotal question: Can a model with one modality synergize the training of other models with the different modalities, even without the paired multimodal labels? Our answer is 'Yes'. As a figurative description, we argue that a writer, i.e., a language model, can promote the training of a painter, i.e., a visual model, even without the paired ground truth of text and image. We theoretically argue that a superior representation can be achieved by the synergy between two different modalities without paired supervision. As proofs of concept, we broadly confirm the considerable performance gains from the synergy among visual, language, and audio models. From a theoretical viewpoint, we first establish a mathematical foundation of the synergy between two different modality models, where each one is trained with its own modality. From a practical viewpoint, our work aims to broaden the scope of multimodal learning to encompass the synergistic usage of single-modality models, relieving a strong limitation of paired supervision. The code is available at https://github.com/johnjaejunlee95/synergistic-multimodal.

Details

AAAI Conference 2020 Short Paper

An Automatic Shoplifting Detection from Surveillance Videos (Student Abstract)

U-Ju Gim
Jae-Jun Lee
Jeong-Hun Kim
Young-Ho Park
Aziz Nasridinov

The use of closed circuit television (CCTV) surveillance devices is increasing every year to prevent abnormal behaviors, including shoplifting. However, damage from shoplifting is also increasing every year. Thus, there is a need for intelligent CCTV surveillance systems that ensure the integrity of shops, despite workforce shortages. In this study, we propose an automatic detection system of shoplifting behaviors from surveillance videos. Instead of extracting features from the whole frame, we use the Region of Interest (ROI) optical- ﬂow fusion network to highlight the necessary features more accurately.

PDF Details

YNIMG Journal 2010 Journal Article

Altered working memory process in the manganese-exposed brain

Yongmin Chang
Jae-Jun Lee
Jee-Hye Seo
Hui-Jin Song
Joo-Hyun Kim
Sung-Jin Bae
Joon-Ho Ahn
Sin-Jae Park

Details DOI

Possible papers

Can One Modality Model Synergize Training of Other Modality Models?

An Automatic Shoplifting Detection from Surveillance Videos (Student Abstract)

Altered working memory process in the manganese-exposed brain