Author name cluster

Jihwan Park

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

GrayKD: Distilling Better Knowledge from Black-box LLM via Multi-rationale Injection

Hyeongsoo Lim
Hyung Yong Kim
Jin Young Kim
Min Ho Jang
Eun Seo Seo
Youshin Lim
Shukjae Choi
Jihwan Park

Knowledge distillation (KD) is a promising compression technique for reducing the computational burden of large language models (LLMs). Depending on access to the teacher model’s internal parameters, KD is typically categorized into white-box and black-box KD. While white-box KD benefits from full access to intrinsic knowledge such as softmax distributions, black-box KD adopts a black-box LLM (e.g., GPT-4) as the teacher, which provides only text-level outputs via API calls. This limited supervision makes black-box KD generally less effective than its white-box counterpart. To bridge the gap between white-box and black-box KD, we propose GrayKD, a novel framework that can effectively distill text-level knowledge from a black-box LLM in a single-stage manner. In particular, rationales generated by the black-box LLM are injected into the student via a lightweight cross-attention module (teacher mode), enabling the model to approximate the black-box teacher’s output distribution without access to internal parameters. The student is then trained with the softmax-level knowledge provided by the teacher mode (student mode). Since both the teacher and student modes share the same backbone, the proposed teacher mode remains highly parameter-efficient, requiring only a small number of additional parameters for rationale injection. Experimental results on instruction-following tasks demonstrate that GrayKD achieves substantial performance improvements over existing KD methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Transferable Model-agnostic Vision-Language Model Adaptation for Efficient Weak-to-Strong Generalization

Jihwan Park
Taehoon Song
Sanghyeok Lee
Miso Choi
Hyunwoo J. Kim

Vision-Language Models (VLMs) have been widely used in various visual recognition tasks due to their remarkable generalization capabilities. As these models grow in size and complexity, fine-tuning becomes costly, emphasizing the need to reuse adaptation knowledge from 'weaker' models to efficiently enhance 'stronger' ones. However, existing adaptation transfer methods exhibit limited transferability across models due to their model-specific design and high computational demands. To tackle this, we propose Transferable Model-agnostic adapter (TransMiter), a light-weight adapter that improves vision-language models 'without backpropagation'. TransMiter captures the knowledge gap between pre-trained and fine-tuned VLMs, in an 'unsupervised' manner. Once trained, this knowledge can be seamlessly transferred across different models without the need for backpropagation. Moreover, TransMiter consists of only a few layers, inducing a negligible additional inference cost. Notably, supplementing the process with a few labeled data further yields additional performance gain, often surpassing a fine-tuned stronger model, with a marginal training cost. Experimental results and analyses demonstrate that TransMiter effectively and efficiently transfers adaptation knowledge while preserving generalization abilities across VLMs of different sizes and architectures in visual recognition tasks.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Super-Class Guided Transformer for Zero-Shot Attribute Classification

Sehyung Kim
Chanhyeong Yang
Jihwan Park
Taehoon Song
Hyunwoo J. Kim

Attribute classification is crucial for identifying specific characteristics within image regions. Vision-Language Models (VLMs) have been effective in zero-shot tasks by leveraging their general knowledge from large-scale datasets. Recent studies demonstrate that transformer-based models with class-wise queries can effectively address zero-shot multi-label classification. However, poor utilization of the relationship between seen and unseen attributes makes the model lack generalizability. Additionally, attribute classification generally involves many attributes, making maintaining the model’s scalability difficult. To address these issues, we propose Super-class guided transFormer (SugaFormer), a novel framework that leverages super-classes to enhance scalability and generalizability for zero-shot attribute classification. SugaFormer employs Super-class Query Initialization (SQI) to reduce the number of queries, utilizing common semantic information from super-classes, and incorporates Multi-context Decoding (MD) to handle diverse visual cues. To strengthen generalizability, we introduce two knowledge transfer strategies that utilize VLMs. During training, Super-class guided Consistency Regularization (SCR) aligns model’s features with VLMs using super-class guided prompts, and during inference, Zero-shot Retrieval-based Score Enhancement (ZRSE) refines predictions for unseen attributes. Extensive experiments demonstrate that SugaFormer achieves state-of-the-art performance across three widely-used attribute classification benchmarks under zero-shot, and cross-dataset transfer settings.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection

Chanhyeong Yang
Taehoon Song
Jihwan Park
Hyunwoo J. Kim

Zero-shot Human-Object Interaction detection aims to localize humans and objects in an image and recognize their interaction, even when specific verb-object pairs are unseen during training. Recent works have shown promising results using prompt learning with pretrained vision-language models such as CLIP, which align natural language prompts with visual features in a shared embedding space. However, existing approaches still fail to handle the visual complexity of interaction —including (1) intra-class visual diversity, where instances of the same verb appear in diverse poses and contexts, and (2) inter-class visual entanglement, where distinct verbs yield visually similar patterns. To address these challenges, we propose VDRP, a framework for Visual Diversity and Region-aware Prompt learning. First, we introduce a visual diversity-aware prompt learning strategy that injects group-wise visual variance into the context embedding. We further apply Gaussian perturbation to encourage the prompt to capture diverse visual variations of a verb. Second, we retrieve region-specific concepts from the human, object, and union regions. These are used to augment the diversity-aware prompt embeddings, yielding region-aware prompts that improve verb-level discrimination. Experiments on the HICO-DET benchmark demonstrate that our method achieves state-of-the-art performance under four zero-shot evaluation settings, effectively addressing both intra-class diversity and inter-class visual entanglement. Code is available at https: //github. com/mlvlab/VDRP.

PDF Details

AAAI Conference 2022 Conference Paper

Deformable Graph Convolutional Networks

Jinyoung Park
Sungdong Yoo
Jihwan Park
Hyunwoo J. Kim

Graph neural networks (GNNs) have significantly improved the representation power for graph-structured data. Despite of the recent success of GNNs, the graph convolution in most GNNs have two limitations. Since the graph convolution is performed in a small local neighborhood on the input graph, it is inherently incapable to capture long-range dependencies between distance nodes. In addition, when a node has neighbors that belong to different classes, i. e. , heterophily, the aggregated messages from them often negatively affect representation learning. To address the two common problems of graph convolution, in this paper, we propose Deformable Graph Convolutional Networks (Deformable GCNs) that adaptively perform convolution in multiple latent spaces and capture short/long-range dependencies between nodes. Separated from node representations (features), our framework simultaneously learns the node positional embeddings (coordinates) to determine the relations between nodes in an end-to-end fashion. Depending on node position, the convolution kernels are deformed by deformation vectors and apply different transformations to its neighbor nodes. Our extensive experiments demonstrate that Deformable GCNs flexibly handles the heterophily and achieve the best performance in node classification tasks on six heterophilic graph datasets. Our code is publicly available at https: //github. com/mlvlab/DeformableGCN.

PDF Details

EAAI Journal 2020 Journal Article

Deep neural network model with Bayesian hyperparameter optimization for prediction of NO x at transient conditions in a diesel engine

Seunghyup Shin
Youngbok Lee
Minjae Kim
Jihwan Park
Sangyul Lee
Kyoungdoug Min

Owing to increasing interest in the environment, particularly on air quality, regulations in the automobile industry have become stricter. Test cycles have been substituted to simulate real driving conditions, and they offer opportunities for researchers to satisfy regulations and predict emissions using models. The objective of this study is to develop a deep neural network (DNN) model, optimize its hyperparameters using the Bayesian optimization method, and use hidden-node determination logic to predict engine-out NO x emissions by using the worldwide harmonized light vehicles test procedure (WLTP) of diesel engines. A DNN network learns the internal relationships between inputs and target outputs even though they are complicated. However, the hyperparameters of DNNs are typically determined by researchers before training, and they affected the accuracy of the model. In this study, the hyperparameters of the DNN model such as the number of hidden layers, number of nodes in each hidden layer, learning rate, learning rate decay, and batch size are automatically optimized using the Bayesian optimization method. Some logical equations are combined with the number of nodes in the first hidden layer and the number of hidden layers to realize the model’s structure instead of using the number of hidden nodes in each hidden layer. Compared with grid search and random sampling, the Bayesian optimization method is a promising solution to optimize hyperparameters. In addition, a hidden-node determination logic further improved the accuracy of the model. The accuracy of the optimized model is indicated by an R2 value of 0. 9675 with 14 input features. The result of cycle prediction shows that the mean absolute errors are approximately 16–17 ppm for four WLTP cycles, which are 1. 6% of the maximum NO x value. These results indicate that the accuracy of the model is comparable to that of a physical NO x measurement device whose linearity is 1% of the full scale (5, 000 ppm).

Details DOI