Arrow Research search

Author name cluster

Jong Wook Kim

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

EAAI Journal 2025 Journal Article

High-quality three-dimensional cartoon avatar reconstruction with Gaussian splatting

  • MinHyuk Jang
  • Jong Wook Kim
  • Youngdong Jang
  • Donghyun Kim
  • Wonseok Roh
  • InYong Hwang
  • Guang Lin
  • Sangpil Kim

The growth of the augmented reality industry has increased demand for three-dimensional (3D) cartoon avatars, requiring expertise from computer graphics designers. Recent 3D Gaussian splatting methods have successfully reconstructed 3D avatars from videos, establishing them as a promising solution for this task. However, these methods primarily focus on real-world videos, limiting their effectiveness in the cartoon domain. In this paper, we present an artificial intelligence (AI)-based method for 3D avatar reconstruction from animated cartoon videos, addressing the physically unrealistic and unstructured geometries of cartoons, as well as the varying texture styles across frames. Our surface fitting module models the unstructured geometry of cartoon characters by integrating the surfaces observed from multiple views into a 3D avatar. We design a style normalizer that adjusts color distributions to reduce texture color inconsistencies in each frame of animated cartoons. Additionally, to better capture the simplified color distributions of cartoons, we design a frequency transform loss that focuses on low-frequency components. Our method significantly outperforms state-of-the-art methods, achieving approximately a 25% improvement in Learned Perceptual Image Patch Similarity (LPIPS) with a score of 0. 052 over baselines across the Cartoon Neuman and ToonVid datasets, which comprise 10 videos with diverse styles and poses. Consequently, this paper presents a promising solution to meet the growing demand for high-quality 3D cartoon avatar modeling.

ICML Conference 2023 Conference Paper

Robust Speech Recognition via Large-Scale Weak Supervision

  • Alec Radford
  • Jong Wook Kim
  • Tao Xu
  • Greg Brockman
  • Christine McLeavey
  • Ilya Sutskever

We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680, 000 hours of multilingual and multitask supervision, the resulting models generalize well to standard benchmarks and are often competitive with prior fully supervised results without the need for any dataset specific fine-tuning. When compared to humans, the models approach their accuracy and robustness. We are releasing models and inference code to serve as a foundation for further work on robust speech processing.

ICML Conference 2021 Conference Paper

Learning Transferable Visual Models From Natural Language Supervision

  • Alec Radford
  • Jong Wook Kim
  • Chris Hallacy
  • Aditya Ramesh
  • Gabriel Goh
  • Sandhini Agarwal
  • Girish Sastry
  • Amanda Askell

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1. 28 million training examples it was trained on.

JBHI Journal 2020 Journal Article

Collaborative Ehealth Privacy and Security: An Access Control With Attribute Revocation Based on OBDD Access Structure

  • Kennedy Edemacu
  • Beakcheol Jang
  • Jong Wook Kim

The digitization of health records due to technological developments has paved the way for patients to be collaboratively treated by different healthcare institutions. In collaborative ehealth systems, a patient's health data is stored remotely in the cloud for sharing with different healthcare service providers. However, the use of third parties for storage exposes the data to several privacy and security violation threats. Ciphertext policy attribute-based encryption (CP-ABE) which provides a fine-grained access control is a promising solution to privacy and security issues in the cloud environment and as a result, it has been widely studied for secure sharing of health data in cloud-based ehealth systems. Addressing the aspects of expressiveness, efficiency, user collusion resistance and attribute/user revocation in CP-ABE have been at the forefront of these studies. Thus, in this article, we proposed a novel expressive, efficient and collusion-resistant access control scheme with immediate attribute/user revocation for secure sharing of health data in collaborative ehealth systems. The proposed scheme additionally achieves forward and backward security. To realize these features, our access control is based on the ordered binary decision diagram (OBDD) access structure and it binds the user keys to the user identities. Security and performance analysis show that our proposed scheme is secure, expressive and efficient.