Author name cluster

Mete Ozay

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

IROS Conference 2024 Conference Paper

Cross-Architecture Auxiliary Feature Space Translation for Efficient Few-Shot Personalized Object Detection

Francesco Barbato
Umberto Michieli
Jijoong Moon
Pietro Zanuttigh
Mete Ozay

Recent years have seen object detection robotic systems deployed in several personal devices (e. g. , home robots and appliances). This has highlighted a challenge in their design, i. e. , they cannot efficiently update their knowledge to distinguish between general classes and user-specific instances (e. g. , a dog vs. user’s dog). We refer to this challenging task as Instance-level Personalized Object Detection (IPOD). The personalization task requires many samples for model tuning and optimization in a centralized server, raising privacy concerns. An alternative is provided by approaches based on recent large-scale Foundation Models, but their compute costs preclude on-device applications. In our work we tackle both problems at the same time, designing a Few-Shot IPOD strategy called AuXFT. We introduce a conditional coarse-to-fine few-shot learner to refine the coarse predictions made by an efficient object detector, showing that using an off-the-shelf model leads to poor personalization due to neural collapse. Therefore, we introduce a Translator block that generates an auxiliary feature space where features generated by a self-supervised model (e. g. , DINOv2) are distilled without impacting the performance of the detector. We validate AuXFT on three publicly available datasets and one in-house benchmark designed for the IPOD task, achieving remarkable gains in all considered scenarios with excellent time-complexity trade-off: AuXFT reaches a performance of 80% its upper bound at just 32% of the inference time, 13% of VRAM and 19% of the model size.

Details

IROS Conference 2024 Conference Paper

Enhanced Model Robustness to Input Corruptions by Per-corruption Adaptation of Normalization Statistics

Elena Camuffo
Umberto Michieli
Simone Milani
Jijoong Moon
Mete Ozay

Developing a reliable vision system is a fundamental challenge for robotic technologies (e. g. , indoor service robots and outdoor autonomous robots) which can ensure reliable navigation even in challenging environments such as adverse weather conditions (e. g. , fog, rain), poor lighting conditions (e. g. , over/under exposure), or sensor degradation (e. g. , blurring, noise), and can guarantee high performance in safety-critical functions. Current solutions proposed to improve model robustness usually rely on generic data augmentation techniques or employ costly test-time adaptation methods. In addition, most approaches focus on addressing a single vision task (typically, image recognition) utilising synthetic data. In this paper, we introduce Per-corruption Adaptation of Normalization statistics (PAN) to enhance the model robustness of vision systems. Our approach entails three key components: (i) a corruption type identification module, (ii) dynamic adjustment of normalization layer statistics based on identified corruption type, and (iii) real-time update of these statistics according to input data. PAN can integrate seamlessly with any convolutional model for enhanced accuracy in several robot vision tasks. In our experiments, PAN obtains robust performance improvement on challenging real-world corrupted image datasets (e. g. , OpenLoris, ExDark, ACDC), where most of the current solutions tend to fail. Moreover, PAN outperforms the baseline models by 20-30% on synthetic benchmarks in object recognition tasks.

Details

AAAI Conference 2024 Conference Paper

HOP to the Next Tasks and Domains for Continual Learning in NLP

Umberto Michieli
Mete Ozay

Continual Learning (CL) aims to learn a sequence of problems (i.e., tasks and domains) by transferring knowledge acquired on previous problems, whilst avoiding forgetting of past ones. Different from previous approaches which focused on CL for one NLP task or domain in a specific use-case, in this paper, we address a more general CL setting to learn from a sequence of problems in a unique framework. Our method, HOP, permits to hop across tasks and domains by addressing the CL problem along three directions: (i) we employ a set of adapters to generalize a large pre-trained model to unseen problems, (ii) we compute high-order moments over the distribution of embedded representations to distinguish independent and correlated statistics across different tasks and domains, (iii) we process this enriched information with auxiliary heads specialized for each end problem. Extensive experimental campaign on 4 NLP applications, 5 benchmarks and 2 CL setups demonstrates the effectiveness of our HOP.

PDF Details DOI

IROS Conference 2024 Conference Paper

Swiss DINO: Efficient and Versatile Vision Framework for On-device Personal Object Search

Kirill Paramonov
Jia-Xing Zhong
Umberto Michieli
Jijoong Moon
Mete Ozay

In this paper, we address a recent trend in robotic home appliances to include vision systems on personal devices, capable of personalizing the appliances on the fly. In particular, we formulate and address an important technical task of personal object search, which involves localization and identification of personal items of interest on images captured by robotic appliances, with each item referenced only by a few annotated images. The task is crucial for robotic home appliances and mobile systems, which need to process personal visual scenes or to operate with particular personal objects (e. g. , for grasping or navigation). In practice, personal object search presents two main technical challenges. First, a robot vision system needs to be able to distinguish between many fine-grained classes, in the presence of occlusions and clutter. Second, the strict resource requirements for the on-device system restrict the usage of most state-of-the-art methods for few-shot learning and often prevent on-device adaptation. In this work, we propose Swiss DINO: a simple yet effective framework for one-shot personal object search based on the recent DINOv2 transformer model, which was shown to have strong zero-shot generalization properties. Swiss DINO handles challenging on-device personalized scene understanding requirements and does not require any adaptation training. We show significant improvement (up to 55%) in segmentation and recognition accuracy compared to the common lightweight solutions, and significant footprint reduction of backbone inference time (up to 100×) and GPU consumption (up to 10×) compared to the heavy transformer-based solutions 1.

Details

IROS Conference 2023 Conference Paper

Online Continual Learning for Robust Indoor Object Recognition

Umberto Michieli
Mete Ozay

Vision systems mounted on home robots need to interact with unseen classes in changing environments. Robots have limited computational resources, labelled data and storage capability. These requirements pose some unique challenges: models should adapt without forgetting past knowledge in a data- and parameter-efficient way. We characterize the problem as few-shot (FS) online continual learning (OCL), where robotic agents learn from a non-repeated stream of few-shot data updating only a few model parameters. Additionally, such models experience variable conditions at test time, where objects may appear in different poses (e. g. , horizontal or vertical) and environments (e. g. , day or night). To improve robustness of CL agents, we propose RobOCLe, which; 1) constructs an enriched feature space computing high order statistical moments from the embedded features of samples; and 2) computes similarity between high order statistics of the samples on the enriched feature space, and predicts their class labels. We evaluate robustness of CL models to train/test augmentations in various cases. We show that different moments allow RobOCLe to capture different properties of deformations, providing higher robustness with no decrease of inference speed.

Details

ICML Conference 2022 Conference Paper

Exploring the Gap between Collapsed & Whitened Features in Self-Supervised Learning

Bobby He
Mete Ozay

Avoiding feature collapse, when a Neural Network (NN) encoder maps all inputs to a constant vector, is a shared implicit desideratum of various methodological advances in self-supervised learning (SSL). To that end, whitened features have been proposed as an explicit objective to ensure uncollapsed features \cite{zbontar2021barlow, ermolov2021whitening, hua2021feature, bardes2022vicreg}. We identify power law behaviour in eigenvalue decay, parameterised by exponent $\beta{\geq}0$, as a spectrum that bridges between the collapsed & whitened feature extremes. We provide theoretical & empirical evidence highlighting the factors in SSL, like projection layers & regularisation strength, that influence eigenvalue decay rate, & demonstrate that the degree of feature whitening affects generalisation, particularly in label scarce regimes. We use our insights to motivate a novel method, PMP (PostMan-Pat), which efficiently post-processes a pretrained encoder to enforce eigenvalue decay rate with power law exponent $\beta$, & find that PostMan-Pat delivers improved label efficiency and transferability across a range of SSL methods and encoder architectures.

Details

ICLR Conference 2022 Conference Paper

Feature Kernel Distillation

Bobby He
Mete Ozay

Trained Neural Networks (NNs) can be viewed as data-dependent kernel machines, with predictions determined by the inner product of last-layer representations across inputs, referred to as the feature kernel. We explore the relevance of the feature kernel for Knowledge Distillation (KD), using a mechanistic understanding of an NN’s optimisation process. We extend the theoretical analysis of Allen-Zhu & Li (2020) to show that a trained NN’s feature kernel is highly dependent on its parameter initialisation, which biases different initialisations of the same architecture to learn different data attributes in a multi-view data setting. This enables us to prove that KD using only pairwise feature kernel comparisons can improve NN test accuracy in such settings, with both single & ensemble teacher models, whereas standard training without KD fails to generalise. We further use our theory to motivate practical considerations for improving student generalisation when using distillation with feature kernels, which allows us to propose a novel approach: Feature Kernel Distillation (FKD). Finally, we experimentally corroborate our theory in the image classification setting, showing that FKD is amenable to ensemble distillation, can transfer knowledge across datasets, and outperforms both vanilla KD & other feature kernel based KD baselines across a range of standard architectures & datasets.

Details

EAAI Journal 2021 Journal Article

Learning from experience for rapid generation of local car maneuvers

Piotr Kicki
Tomasz Gawron
Krzysztof Ćwian
Mete Ozay
Piotr Skrzypczyński

Being able to rapidly respond to the changing scenes and traffic situations by generating feasible local paths is of pivotal importance for car autonomy. We propose to train a deep neural network (DNN) to plan feasible and nearly-optimal paths for kinematically constrained vehicles in a small constant time. Our DNN model is trained using a novel weakly supervised approach and a gradient-based policy search. On real and simulated scenes and a large set of local planning problems, we demonstrate that our approach outperforms the existing planners with respect to the number of successfully completed tasks. While the path generation time is about 40 ms, the generated paths are smooth and comparable to those obtained from conventional path planners.

Details DOI

IROS Conference 2019 Conference Paper

A Generative Model of Underwater Images for Active Landmark Detection and Docking

Shuang Liu 0002
Mete Ozay
Hongli Xu
Yang Lin
Takayuki Okatani

Underwater active landmarks (UALs) are widely used for short-range underwater navigation in underwater robotics tasks. Detection of UALs is challenging due to large variance of underwater illumination, water quality and change of camera viewpoint. Moreover, improvement of detection accuracy relies upon statistical diversity of images used to train detection models. We propose a generative adversarial network, called Tank-to-field GAN (T2FGAN), to learn generative models of underwater images, and use the learned models for data augmentation to improve detection accuracy. To this end, first a T2FGAN is trained using images of UALs captured in a tank. Then, the learned model of the T2FGAN is used to generate images of UALs according to different water quality, illumination, pose and landmark configurations (WIPCs). In experimental analyses, we first explore statistical properties of images of UALs generated by T2FGAN under various WIPCs for active landmark detection. Then, we use the generated images for training detection algorithms. Experimental results show that training detection algorithms using the generated images can improve detection accuracy. In field experiments, underwater docking tasks are successfully performed in a lake by employing detection models trained on datasets generated by T2FGAN.

Details

NeurIPS Conference 2019 Conference Paper

Fine-grained Optimization of Deep Neural Networks

Mete Ozay

In recent studies, several asymptotic upper bounds on generalization errors on deep neural networks (DNNs) are theoretically derived. These bounds are functions of several norms of weights of the DNNs, such as the Frobenius and spectral norms, and they are computed for weights grouped according to either input and output channels of the DNNs. In this work, we conjecture that if we can impose multiple constraints on weights of DNNs to upper bound the norms of the weights, and train the DNNs with these weights, then we can attain empirical generalization errors closer to the derived theoretical bounds, and improve accuracy of the DNNs. To this end, we pose two problems. First, we aim to obtain weights whose different norms are all upper bounded by a constant number. To achieve these bounds, we propose a two-stage renormalization procedure; (i) normalization of weights according to different norms used in the bounds, and (ii) reparameterization of the normalized weights to set a constant and finite upper bound of their norms. In the second problem, we consider training DNNs with these renormalized weights. To this end, we first propose a strategy to construct joint spaces (manifolds) of weights according to different constraints in DNNs. Next, we propose a fine-grained SGD algorithm (FG-SGD) for optimization on the weight manifolds to train DNNs with assurance of convergence to minima. Experimental analyses show that image classification accuracy of baseline DNNs can be boosted using FG-SGD on collections of manifolds identified by multiple constraints.

PDF Details

ICML Conference 2018 Conference Paper

Exploiting the Potential of Standard Convolutional Autoencoders for Image Restoration by Evolutionary Search

Masanori Suganuma
Mete Ozay
Takayuki Okatani

Researchers have applied deep neural networks to image restoration tasks, in which they proposed various network architectures, loss functions, and training methods. In particular, adversarial training, which is employed in recent studies, seems to be a key ingredient to success. In this paper, we show that simple convolutional autoencoders (CAEs) built upon only standard network components, i. e. , convolutional layers and skip connections, can outperform the state-of-the-art methods which employ adversarial training and sophisticated loss functions. The secret is to search for good architectures using an evolutionary algorithm. All we did was to train the optimized CAEs by minimizing the l2 loss between reconstructed images and their ground truths using the ADAM optimizer. Our experimental results show that this approach achieves 27. 8 dB peak signal to noise ratio (PSNR) on the CelebA dataset and 33. 3 dB on the SVHN dataset, compared to 22. 8 dB and 19. 0 dB provided by the former state-of-the-art methods, respectively.

Details

AAAI Conference 2018 Conference Paper

Training CNNs With Normalized Kernels

Mete Ozay
Takayuki Okatani

Several methods of normalizing convolution kernels have been proposed in the literature to train convolutional neural networks (CNNs), and have shown some success. However, our understanding of these methods has lagged behind their success in application; there are a lot of open questions, such as why a certain type of kernel normalization is effective and what type of normalization should be employed for each (e. g. , higher or lower) layer of a CNN. As the ﬁrst step towards answering these questions, we propose a framework that enables us to use a variety of kernel normalization methods at any layer of a CNN. A naive integration of kernel normalization with a general optimization method, such as SGD, often entails instability while updating parameters. Thus, existing methods employ ad-hoc procedures to empirically assure convergence. In this study, we pose estimation of convolution kernels under normalization constraints as constraint-free optimization on kernel submanifolds that are identiﬁed by the employed constraints. Note that naive application of the established optimization methods for matrix manifolds to the aforementioned problems is not feasible because of the hierarchical nature of CNNs. To this end, we propose an algorithm for optimization on kernel manifolds in CNNs by appropriate scaling of the space of kernels based on structure of CNNs and statistics of data. We theoretically prove that the proposed algorithm has assurance of almost sure convergence to a solution at single minimum. Our experimental results show that the proposed method can successfully train popular CNN models using several different types of kernel normalization methods. Moreover, they show that the proposed method improves classiﬁcation performance of baseline CNNs, and provides state-of-the-art performance for major image classiﬁcation benchmarks.

PDF Details

ICRA Conference 2014 Conference Paper

A hierarchical approach for joint multi-view object pose estimation and categorization

Mete Ozay
Krzysztof Walas
Ales Leonardis

We propose a joint object pose estimation and categorization approach which extracts information about object poses and categories from the object parts and compositions constructed at different layers of a hierarchical object representation algorithm, namely Learned Hierarchy of Parts (LHOP) [7]. In the proposed approach, we first employ the LHOP to learn hierarchical part libraries which represent entity parts and compositions across different object categories and views. Then, we extract statistical and geometric features from the part realizations of the objects in the images in order to represent the information about object pose and category at each different layer of the hierarchy. Unlike the traditional approaches which consider specific layers of the hierarchies in order to extract information to perform specific tasks, we combine the information extracted at different layers to solve a joint object pose estimation and categorization problem using distributed optimization algorithms. We examine the proposed generative-discriminative learning approach and the algorithms on two benchmark 2-D multi-view image datasets. The proposed approach and the algorithms outperform state-of-the-art classification, regression and feature extraction algorithms. In addition, the experimental results shed light on the relationship between object categorization, pose estimation and the part realizations observed at different layers of the hierarchy.

Details