Author name cluster

Eojindl Yi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

ICRA Conference 2023 Conference Paper

Lightweight Monocular Depth Estimation via Token-Sharing Transformer

Dong-Jae Lee
Jae Young Lee 0002
Hyunguk Shon
Eojindl Yi
Yeong-Hun Park
Sung-Sik Cho
Junmo Kim 0002

Depth estimation is an important task in various robotics systems and applications. In mobile robotics systems, monocular depth estimation is desirable since a single RGB camera can be deployable at a low cost and compact size. Due to its significant and growing needs, many lightweight monocular depth estimation networks have been proposed for mobile robotics systems. While most lightweight monocular depth estimation methods have been developed using convolution neural networks, the Transformer has been gradually utilized in monocular depth estimation recently. However, massive parameters and large computational costs in the Transformer disturb the deployment to embedded devices. In this paper, we present a Token-Sharing Transformer (TST), an architecture using the Transformer for monocular depth estimation, optimized especially in embedded devices. The proposed TST utilizes global token sharing, which enables the model to obtain an accurate depth prediction with high throughput in embedded devices. Experimental results show that TST outperforms the existing lightweight monocular depth estimation methods. On the NYU Depth v2 dataset, TST can deliver depth maps up to 63. 4 FPS in NVIDIA Jetson nano and 142. 6 FPS in NVIDIA Jetson TX2, with lower errors than the existing methods. Furthermore, TST achieves real-time depth estimation of high-resolution images on Jetson TX2 with competitive results.

Details

ICRA Conference 2023 Conference Paper

Test-Time Synthetic-to-Real Adaptive Depth Estimation

Eojindl Yi
Junmo Kim 0002

Is it possible for a synthetic to realistic domain adapted neural network in single image depth estimation to truly generalize on real world data? The resultant, adapted model will only generalize on the realistic domain dataset, which only reflects a small portion of the true, real world. As a result, the network still has to cope with the potential danger of domain shift between the realistic domain dataset and the real world data. Instead, a viable solution is to design the model to be capable of continuously adapting to the distribution of data it receives at test-time. In this paper, we propose a depth estimation method that is capable of adapting to the domain shift at test-time. Our method adapts to the unseen test-time domain, by updating the network using our proposed objective functions. Following former work, we reduce the entropy of the current prediction for refinement and adaptation. We propose a Logit Order Enforcement loss that can prevent the network from deviating into wrong solutions, which can result from the mere reduction of the aforementioned entropy. Qualitative and quantitative results show the effectiveness of our method. Our method reduces the dependency on training data by 5. 8× on average, while achieving comparable performance to state-of-the-art unsupervised domain adaptation (UDA) and domain generalization methods (DG) on the KITTI dataset.

Details

ICRA Conference 2022 Conference Paper

Enhanced Prototypical Learning for Unsupervised Domain Adaptation in LiDAR Semantic Segmentation

Eojindl Yi
JuYoung Yang
Junmo Kim 0002

Despite its importance, unsupervised domain adaptation (UDA) on LiDAR semantic segmentation is a task that has not received much attention from the research community. Only recently, a completion-based 3 $D$ method has been proposed to tackle the problem and formally set up the adaptive scenarios. However, the proposed pipeline is complex, voxel-based and requires multi-stage inference, which inhibits it for real-time inference. We propose a range image-based, effective and efficient method for solving UDA on LiDAR segmentation. The method exploits class prototypes from the source domain to pseudo label target domain pixels, which is a research direction showing good performance in UDA for natural image semantic segmentation. Applying such approaches to LiDAR scans has not been considered because of the severe domain shift and lack of pre-trained feature extractor that is unavailable in the LiDAR segmentation setup. However, we show that proper strategies, including reconstruction-based pre-training, enhanced prototypes, and selective pseudo labeling based on distance to prototypes, is sufficient enough to enable the use of prototypical approaches. We evaluate the performance of our method on the recently proposed LiDAR segmentation UDA scenarios. Our method achieves remarkable performance among contemporary methods.

Details

IROS Conference 2022 Conference Paper

Fully Convolutional Transformer with Local-Global Attention

Sihaeng Lee
Eojindl Yi
Janghyeon Lee 0001
Jinsu Yoo
Honglak Lee
Seung Hwan Kim

In an attempt to imitate the success of transformers in the field of natural language processing into computer vision tasks, vision transformers (ViTs) have recently gained attention. Performance breakthroughs have been achieved in coarse-grained tasks like classification. However, dense prediction tasks, such as detection, segmentation, and depth estimation, require additional modifications and have been tackled only in an ad-hoc manner, by replacing the convolutional neural network encoder backbone of an existing architecture with a ViT. This study proposes a fully convolutional transformer that can perform both coarse and dense prediction tasks. The proposed architecture is, to the best of our knowledge, the first architecture composed of attention layers, even in the decoder part of the network. This is because our newly proposed local-global attention (LGA) can flexibly perform both downsampling and upsampling of spatial features, which are key operations required for dense prediction. Against existing ViTs on classification tasks, our architecture shows a reasonable trade-off between performance and efficiency. In the depth estimation task, our architecture achieves performance comparable to that of state-of-the-art transformer-based methods.

Details

IROS Conference 2022 Conference Paper

Multi-Scaled and Densely Connected Locally Convolutional Layers for Depth Completion

Sihaeng Lee
Eojindl Yi
Janghyeon Lee 0001
Junmo Kim 0002

The depth completion task aims to predict a dense depth map from a sparse LiDAR point cloud and an RGB image. This task is critical because an accurate depth map can be used as prior information to solve many computer vision tasks, such as downstream tasks in autonomous vehicles and robot vision. Previous deep learning methods which focus on the local affinity have achieved impressive results. However, an architecture that is directly designed to extract local affinity has not been proposed yet. In this paper, we propose multi-scaled and densely connected locally convolutional layers to learn the affinity of the neighborhood. We set a different grid factor for each step of this module, and each step consists of several convolutional layers applied only to the local area assigned from the grid factor. In addition, each step is densely connected, sequentially, to take advantage of the multi-scale receptive fields. The proposed module effectively learns the neighbor-hood's affinity in a local area with multiple scales, while keeping the network size small. As a result, our architecture achieves state-of-the-art performance compared to published works on the KITTI depth completion benchmark. On the NYU Depth V2 completion benchmark our method achieves performance comparable to state-of-the-art approaches.

Details

IROS Conference 2022 Conference Paper

Subspace-based Feature Alignment for Unsupervised Domain Adaptation

Eojindl Yi
Junmo Kim 0002

Autonomous agents need to perceive the world in a robust way, such that the shift in data distribution does not lead to faulty perception results. When agents cannot be trained with abundant data, agents may need to operate on real world environments while trained on simulated data, and suffer from domain shift. This paper proposes an effective and robust unsupervised domain adaptation (UDA) method that can resolve these situations. In the UDA setup, we are given a labeled source domain and an unlabeled target domain that share the same set of classes but are sampled from different distributions. This domain shift prevents agents which employ deep neural networks from generalizing well on the target domain. Recent methods adopt the strategy of self-training the networks with pseudo labeled target samples. However, falsely labeled samples cause negative transfer and deteriorate generalization of a network. to reduce negative transfer we propose an algorithm that can filter the pseudo labels, and use the filtered labels to align the domains in the feature space. The samples whose labels have not passed the filtering process can be used as an index to tune the hyperparameters of our method. Across various benchmarks, we validate the performance of our method. Especially, our method achieves strong performance on the synthetic-to-real adaptation scenario.

Details

AAAI Conference 2021 Conference Paper

Linearly Replaceable Filters for Deep Network Channel Pruning

Donggyu Joo
Eojindl Yi
Sunghyun Baek
Junmo Kim

Convolutional neural networks (CNNs) have achieved remarkable results; however, despite the development of deep learning, practical user applications are fairly limited because heavy networks can be used solely with the latest hardware and software supports. Therefore, network pruning is gaining attention for general applications in various fields. This paper proposes a novel channel pruning method, Linearly Replaceable Filter (LRF), which suggests that a filter that can be approximated by the linear combination of other filters is replaceable. Moreover, an additional method called Weights Compensation is proposed to support the LRF method. This is a technique that effectively reduces the output difference caused by removing filters via direct weight modification. Through various experiments, we have confirmed that our method achieves state-of-the-art performance in several benchmarks. In particular, on ImageNet, LRF-60 reduces approximately 56% of FLOPs on ResNet-50 without top-5 accuracy drop. Further, through extensive analyses, we proved the effectiveness of our approaches.

PDF Details

AAAI Conference 2021 Conference Paper

Patch-Wise Attention Network for Monocular Depth Estimation

Sihaeng Lee
Janghyeon Lee
Byungju Kim
Eojindl Yi
Junmo Kim

In computer vision, monocular depth estimation is the problem of obtaining a high-quality depth map from a twodimensional image. This map provides information on threedimensional scene geometry, which is necessary for various applications in academia and industry, such as robotics and autonomous driving. Recent studies based on convolutional neural networks achieved impressive results for this task. However, most previous studies did not consider the relationships between the neighboring pixels in a local area of the scene. To overcome the drawbacks of existing methods, we propose a patch-wise attention method for focusing on each local area. After extracting patches from an input feature map, our module generates attention maps for each local patch, using two attention modules for each patch along the channel and spatial dimensions. Subsequently, the attention maps return to their initial positions and merge into one attention feature. Our method is straightforward but effective. The experimental results on two challenging datasets, KITTI and NYU Depth V2, demonstrate that the proposed method achieves significant performance. Furthermore, our method outperforms other state-of-the-art methods on the KITTI depth estimation benchmark.

PDF Details

IROS Conference 2020 Conference Paper

PBP-Net: Point Projection and Back-Projection Network for 3D Point Cloud Segmentation

JuYoung Yang
Chanho Lee
Pyunghwan Ahn
Haeil Lee
Eojindl Yi
Junmo Kim 0002

Following considerable development in 3D scanning technologies, many studies have recently been proposed with various approaches for 3D vision tasks, including some methods that utilize 2D convolutional neural networks (CNNs). However, even though 2D CNNs have achieved high performance in many 2D vision tasks, existing works have not effectively applied them onto 3D vision tasks. In particular, segmentation has not been well studied because of the difficulty of dense prediction for each point, which requires rich feature representation. In this paper, we propose a simple and efficient architecture named point projection and back-projection network (PBP-Net), which leverages 2D CNNs for the 3D point cloud segmentation. 3 modules are introduced, each of which projects 3D point cloud onto 2D planes, extracts features using a 2D CNN backbone, and back-projects features onto the original 3D point cloud. To demonstrate effective 3D feature extraction using 2D CNN, we perform various experiments including comparison to recent methods. We analyze the proposed modules through ablation studies and perform experiments on object part segmentation (ShapeNet-Part dataset) and indoor scene semantic segmentation (S3DIS dataset). The experimental results show that proposed PBP-Net achieves comparable performance to existing state-of-the-art methods.

Details