Author name cluster

Jong-Seok Lee

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

ICLR Conference 2025 Conference Paper

Exploring the Camera Bias of Person Re-identification

Myungseo Song
Jin-Woo Park
Jong-Seok Lee

We empirically investigate the camera bias of person re-identification (ReID) models. Previously, camera-aware methods have been proposed to address this issue, but they are largely confined to training domains of the models. We measure the camera bias of ReID models on unseen domains and reveal that camera bias becomes more pronounced under data distribution shifts. As a debiasing method for unseen domain data, we revisit feature normalization on embedding vectors. While the normalization has been used as a straightforward solution, its underlying causes and broader applicability remain unexplored. We analyze why this simple method is effective at reducing bias and show that it can be applied to detailed bias factors such as low-level image properties and body angle. Furthermore, we validate its generalizability across various models and benchmarks, highlighting its potential as a simple yet effective test-time postprocessing method for ReID. In addition, we explore the inherent risk of camera bias in unsupervised learning of ReID models. The unsupervised models remain highly biased towards camera labels even for seen domain data, indicating substantial room for improvement. Based on observations of the negative impact of camera-biased pseudo labels on training, we suggest simple training strategies to mitigate the bias. By applying these strategies to existing unsupervised learning algorithms, we show that significant performance improvements can be achieved with minor modifications.

Details

AAAI Conference 2024 Conference Paper

Curved Representation Space of Vision Transformers

Juyeop Kim
Junha Park
Songkuk Kim
Jong-Seok Lee

Neural networks with self-attention (a.k.a. Transformers) like ViT and Swin have emerged as a better alternative to traditional convolutional neural networks (CNNs). However, our understanding of how the new architecture works is still limited. In this paper, we focus on the phenomenon that Transformers show higher robustness against corruptions than CNNs, while not being overconfident. This is contrary to the intuition that robustness increases with confidence. We resolve this contradiction by empirically investigating how the output of the penultimate layer moves in the representation space as the input data moves linearly within a small area. In particular, we show the following. (1) While CNNs exhibit fairly linear relationship between the input and output movements, Transformers show nonlinear relationship for some data. For those data, the output of Transformers moves in a curved trajectory as the input moves linearly. (2) When a data is located in a curved region, it is hard to move it out of the decision region since the output moves along a curved trajectory instead of a straight line to the decision boundary, resulting in high robustness of Transformers. (3) If a data is slightly modified to jump out of the curved region, the movements afterwards become linear and the output goes to the decision boundary directly. In other words, there does exist a decision boundary near the data, which is hard to find only because of the curved representation space. This explains the underconfident prediction of Transformers. Also, we examine mathematical properties of the attention operation that induce nonlinear response to linear perturbation. Finally, we share our additional findings, regarding what contributes to the curved representation space of Transformers, and how the curvedness evolves during training.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Scalable Monotonic Neural Networks

Hyunho Kim
Jong-Seok Lee

In this research, we focus on the problem of learning monotonic neural networks, as preserving the monotonicity of a model with respect to a subset of inputs is crucial for practical applications across various domains. Although several methods have recently been proposed to address this problem, they have limitations such as not guaranteeing monotonicity in certain cases, requiring additional inference time, lacking scalability with increasing network size and number of monotonic inputs, and manipulating network weights during training. To overcome these limitations, we introduce a simple but novel architecture of the partially connected network which incorporates a 'scalable monotonic hidden layer' comprising three units: the exponentiated unit, ReLU unit, and confluence unit. This allows for the repetitive integration of the scalable monotonic hidden layers without other structural constraints. Consequently, our method offers ease of implementation and rapid training through the conventional error-backpropagation algorithm. We accordingly term this method as Scalable Monotonic Neural Networks (SMNN). Numerical experiments demonstrated that our method achieved comparable prediction accuracy to the state-of-the-art approaches while effectively addressing the aforementioned weaknesses.

Details

AAAI Conference 2023 Conference Paper

Demystifying Randomly Initialized Networks for Evaluating Generative Models

Junghyuk Lee
Jun-Hyuk Kim
Jong-Seok Lee

Evaluation of generative models is mostly based on the comparison between the estimated distribution and the ground truth distribution in a certain feature space. To embed samples into informative features, previous works often use convolutional neural networks optimized for classification, which is criticized by recent studies. Therefore, various feature spaces have been explored to discover alternatives. Among them, a surprising approach is to use a randomly initialized neural network for feature embedding. However, the fundamental basis to employ the random features has not been sufficiently justified. In this paper, we rigorously investigate the feature space of models with random weights in comparison to that of trained models. Furthermore, we provide an empirical evidence to choose networks for random features to obtain consistent and reliable results. Our results indicate that the features from random networks can evaluate generative models well similarly to those from trained networks, and furthermore, the two types of features can be used together in a complementary way.

PDF Details DOI

ECAI Conference 2020 Conference Paper

Dynamic Thresholding for Learning Sparse Neural Networks

Jin-Woo Park
Jong-Seok Lee

This paper proposes a method called Dynamic Thresholding, which can dynamically adjust the size of deep neural networks by removing redundant weights during training. The key idea is to learn the pruning threshold values applied for weight removal, instead of fixing them manually. We approximate a discontinuous pruning function with a differentiable form involving the thresholds, which can be optimized via the gradient descent learning procedure. While previous sparsity-promoting methods perform pruning with manually determined thresholds, our method can directly obtain a sparse network at each training iteration and thus does not need a trial-and-error process to choose proper threshold values. We examine the performance of the proposed method on the image classification tasks including MNIST, CIFAR10, and ImageNet. It is demonstrated that our method achieves competitive results with existing methods and, at the same time, requires smaller numbers of training iterations in comparison to other approaches based on train-prune-retrain cycles.

Details

ECAI Conference 2020 Conference Paper

Learning Conjunctive Information of Signals in Multi-Sensor Systems

Seong-Eun Moon
Jong-Seok Lee

This paper proposes a novel deep learning method for extraction of the conjunctive information that describes the relationship between signals in multi-sensor systems to enhance the performance of the given classification task. The signals obtained from different sensors included in the multi-sensor systems are closely related. Handcrafted metrics have been used to extract the relationship between the signals in some work, which is hardly optimal for the given task. Our proposed method learns the pair-wise relationship from data to maximize the performance of the given task, which is fully data-driven, multi-aspect, and target-oriented. We demonstrate the effectiveness of the proposed method on a toy example and two real-world problems, i. e. , activity recognition using accelerometer signals and emotional video classification using brain signals.

Details