Author name cluster

Petros Maragos

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers

2 author rows

IROS Conference 2025 Conference Paper

Category-Level 6D Object Pose Estimation in Agricultural Settings Using a Lattice-Deformation Framework and Diffusion-Augmented Synthetic Data

Marios Glytsos
Panagiotis Paraskevas Filntisis
George Retsinas
Petros Maragos

Accurate 6D object pose estimation is essential for robotic grasping and manipulation, particularly in agriculture, where fruits and vegetables exhibit high intra-class variability in shape, size, and texture. The vast majority of existing methods rely on instance-specific CAD models or require depth sensors to resolve geometric ambiguities, making them impractical for real-world agricultural applications. In this work, we introduce PLANTPose, a novel framework for category-level 6D pose estimation that operates purely on RGB input. PLANT-Pose predicts both the 6D pose and deformation parameters relative to a base mesh, allowing a single category-level CAD model to adapt to unseen instances. This enables accurate pose estimation across varying shapes without relying on instance-specific data. To enhance realism and improve generalization, we also leverage Stable Diffusion to refine synthetic training images with realistic texturing, mimicking variations due to ripeness and environmental factors and bridging the domain gap between synthetic data and the real world. Our evaluations on a challenging benchmark that includes bananas of various shapes, sizes, and ripeness status demonstrate the effectiveness of our framework in handling large intraclass variations while maintaining accurate 6D pose predictions, significantly outperforming the state-of-the-art RGB-based approach MegaPose. Our code, data, and models are publicly available at https://github.com/mariosgly/PLANTPose.

NeurIPS Conference 2025 Conference Paper

Instance-Level Composed Image Retrieval

Bill Psomas
George Retsinas
Nikos Efthymiadis
Panagiotis Filntisis
Yannis Avrithis
Petros Maragos
Ondrej Chum
Giorgos Tolias

The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instance-level class definition. The goal is to retrieve images that contain the same particular object as the visual query, presented under a variety of modifications defined by textual queries. Its design and curation process keep the dataset compact to facilitate future research, while maintaining its challenge—comparable to retrieval among more than 40M random distractors—through a semi-automated selection of hard negatives. To overcome the challenge of obtaining clean, diverse, and suitable training data, we leverage pre-trained vision-and-language models (VLMs) in a training-free approach called BASIC. The method separately estimates query-image-to-image and query-text-to-image similarities, performing late fusion to upweight images that satisfy both queries, while down-weighting those that exhibit high similarity with only one of the two. Each individual similarity is further improved by a set of components that are simple and intuitive. BASIC sets a new state of the art on i-CIR but also on existing CIR datasets that follow a semantic-level class definition. Project page: https: //vrg. fel. cvut. cz/icir/.

ICRA Conference 2025 Conference Paper

Proactive Tactile Exploration for Object-Agnostic Shape Reconstruction from Minimal Visual Priors

Paris Oikonomou
George Retsinas
Petros Maragos
Costas S. Tzafestas

The perception of an object's surface is important for robotic applications enabling robust object manipulation. The level of accuracy in such a representation affects the outcome of the action planning, especially during tasks that require physical contact, e. g. grasping. In this paper, we propose a novel iterative method for 3D shape reconstruction consisting of two steps. At first, a mesh is fitted on data points acquired from the object's surface, based on a single primitive template. Subsequently, the mesh is properly adjusted to adequately represent local deformities. Moreover, a novel proactive tactile exploration strategy aims at minimizing the total uncertainty with the least number of contacts, while reducing the risk of contact failure in case the estimated surface differs significantly from the real one. The performance of the methodology is evaluated both in 3D simulation and on a real setup.

ICRA Conference 2025 Conference Paper

Towards Open-Ended Robotic Exploration Using Vision-Inspired Similarity and Foundation Models

Panagiotis Paraskevas Filntisis
Efthymios Tsaprazlis
Paraskevas Oikonomou
Francesco Mattioli 0003
Vieri Giuliano Santucci
George Retsinas
Petros Maragos

In the domain of robotics, achieving Lifelong Open-ended Learning Autonomy (LOLA) represents a significant milestone, especially in contexts where autonomous agents must adapt to unforeseen environmental variations and evolving objectives. This paper introduces VISOR (VisionSimilarity for Open-ended Robotic exploration), a vision-based framework designed to assist robotic agents in autonomously exploring and learning from new environments and objects, whether through guided or random exploration, without reliance on predefined design considerations. In that direction, VISOR acts as a perception mediator, classifying everything a robot encounters in a scene as either known or unknown. It further identifies potential distractors (e. g. , background elements), known categories, or objects specified through text seeds. By leveraging recent advancements in vision foundation models, VISOR operates in a training-free manner. It begins by segmenting a scene into its constituent entities, regardless of familiarity, and then extracts robust visual representations for each one. These representations are compared against an adaptive memory system that evolves over time; unknown objects are assigned unique IDs and added to this memory as new classes, enriching the robot's understanding of its environment. We argue that this evolving memory can facilitate guided exploration through prior knowledge, enhancing the efficiency of robotic exploration, and validate this by designing two exploration scenarios and running both simulated and real-world experiments.

IROS Conference 2024 Conference Paper

SDPL-SLAM: Introducing Lines in Dynamic Visual SLAM and Multi-Object Tracking

Argyris Manetas
Panagiotis Mermigkas
Petros Maragos

The need for a robust visual SLAM system operating in real human environments has led to the gradual abandonment of the static world assumption and to the creation of many dynamic SLAM algorithms. Even though there have been many dynamic SLAM proposals, the vast majority of them relied on point features. However, research in static SLAM systems has demonstrated that the use of more complex geometric shapes such as lines can improve performance. Motivated by this we have created a new dynamic SLAM system that estimates the camera poses and the motion of rigid objects, by exploiting both static and dynamic points and lines. Line segments have been incorporated in a novel way in every aspect of our algorithm, by improving their correspondences through optical flow refinement, and by introducing line error terms in both camera and object motion, and in batch optimization. Our proposal has been tested extensively in indoor and outdoor datasets and has achieved significant improvement compared to other state-of-the-art dynamic SLAM systems. Our results demonstrated that line segments enhanced the robustness, thus contributing towards a fully operational SLAM system. Code is publicly available *.

IROS Conference 2022 Conference Paper

Child Engagement Estimation in Heterogeneous Child-Robot Interactions Using Spatiotemporal Visual Cues

Dafni Anagnostopoulou
Niki Efthymiou
Christina Papailiou
Petros Maragos

Robots are increasingly introduced in various Child-Robot Interactions with educational, entertainment or even therapeutic goals. In order to achieve qualitative inter-actions, robots need to adjust their behavior according to children's response. A robot's ability to successfully estimate partner's engagement is of great importance towards this direction. In this research we propose a method to estimate the engagement level of children during heterogeneous and challenging child-robot interactions. Our method uses the spatiotemporal residual $\mathrm{R}(2+1)\mathrm{D}$ blocks to simultaneously leverage the rich RGB and temporal information, which is crucial for the engagement estimation. We present results on three different groups of data, including the PInSoRo open dataset, proving our method's robustness and improvement over previous works.

ICLR Conference 2022 Conference Paper

Neural Network Approximation based on Hausdorff distance of Tropical Zonotopes

Panagiotis Misiakos
Georgios Smyrnis
George Retsinas
Petros Maragos

In this work we theoretically contribute to neural network approximation by providing a novel tropical geometrical viewpoint to structured neural network compression. In particular, we show that the approximation error between two neural networks with ReLU activations and one hidden layer depends on the Hausdorff distance of the tropical zonotopes of the networks. This theorem comes as a first step towards a purely geometrical interpretation of neural network approximation. Based on this theoretical contribution, we propose geometrical methods that employ the K-means algorithm to compress the fully connected parts of ReLU activated deep neural networks. We analyze the error bounds of our algorithms theoretically based on our approximation theorem and evaluate them empirically on neural network compression. Our experiments follow a proof-of-concept strategy and indicate that our geometrical tools achieve improved performance over relevant tropical geometry techniques and can be competitive against non-tropical methods.

ICRA Conference 2021 Conference Paper

Engagement Estimation During Child Robot Interaction Using Deep Convolutional Networks Focusing on ASD Children

Dafni Anagnostopoulou
Niki Efthymiou
Christina Papailiou
Petros Maragos

Estimating the engagement of children is an essential prerequisite for constructing natural Child-Robot Interaction. Especially in the case of children with Autism Spectrum Disorder, monitoring the engagement of the other party allows robots to adjust their actions according to the educational and therapeutic goals in hand. In this work we delve into engagement estimation with a focus on children with autism spectrum disorder. We propose deep convolutional architectures for engagement estimation that outperform previous methods, and explore their performance under variable conditions, in four databases depicting ASD and TD children interacting with robots or humans.

ICML Conference 2020 Conference Paper

Multiclass Neural Network Minimization via Tropical Newton Polytope Approximation

Georgios Smyrnis
Petros Maragos

The field of tropical algebra is closely linked with the domain of neural networks with piecewise linear activations, since their output can be described via tropical polynomials in the max-plus semiring. In this work, we attempt to make use of methods stemming from a form of approximate division of such polynomials, which relies on the approximation of their Newton Polytopes, in order to minimize networks trained for multiclass classification problems. We make theoretical contributions in this domain, by proposing and analyzing methods which seek to reduce the size of such networks. In addition, we make experimental evaluations on the MNIST and Fashion-MNIST datasets, with our results demonstrating a significant reduction in network size, while retaining adequate performance.

IROS Conference 2019 Conference Paper

A Deep Learning Approach for Multi-View Engagement Estimation of Children in a Child-Robot Joint Attention Task

Jack Hadfield
Georgia Chalvatzaki
Petros Koutras
Mehdi Khamassi
Costas S. Tzafestas
Petros Maragos

In this work, we tackle the problem of child engagement estimation while children freely interact with a robot in a friendly, room-like environment. We propose a deep learning-based multi-view solution that takes advantage of recent developments in human pose detection. We extract the child’s pose from different RGB-D cameras placed regularly in the room, fuse the results and feed them to a deep Neural Network (NN) trained for classifying engagement levels. The deep network contains a recurrent layer, in order to exploit the rich temporal information contained in the pose data. The resulting method outperforms a number of baseline classifiers and provides a promising tool for better automatic understanding of a child’s attitude, interest and attention while cooperating with a robot. The goal is to integrate this model in next-generation social robots as an attention monitoring tool during various Child Robot Interaction (CRI) tasks both for Typically Developed (TD) children and children affected by autism (ASD).

ICRA Conference 2019 Conference Paper

LSTM-based Network for Human Gait Stability Prediction in an Intelligent Robotic Rollator

Georgia Chalvatzaki
Petros Koutras
Jack Hadfield
Xanthi S. Papageorgiou
Costas S. Tzafestas
Petros Maragos

In this work, we present a novel framework for on-line human gait stability prediction of the elderly users of an intelligent robotic rollator using Long Short Term Memory (LSTM) networks, fusing multimodal RGB-D and Laser Range Finder (LRF) data from non-wearable sensors. A Deep Learning (DL) based approach is used for the upper body pose estimation. The detected pose is used for estimating the body Center of Mass (CoM) using Unscented Kalman Filter (UKF). An Augmented Gait State Estimation framework exploits the LRF data to estimate the legs' positions and the respective gait phase. These estimates are the inputs of an encoder-decoder sequence to sequence model which predicts the gait stability state as Safe or Fall Risk walking. It is validated with data from real patients, by exploring different network architectures, hyperparameter settings and by comparing the proposed method with other baselines. The presented LSTM-based human gait stability predictor is shown to provide robust predictions of the human stability state, and thus has the potential to be integrated into a general user-adaptive control architecture as a fall-risk alarm.

ICRA Conference 2018 Conference Paper

Multi3: Multi-Sensory Perception System for Multi-Modal Child Interaction with Multiple Robots

Antigoni Tsiami
Petros Koutras
Niki Efthymiou
Panagiotis Paraskevas Filntisis
Gerasimos Potamianos
Petros Maragos

Child-robot interaction is an interdisciplinary research area that has been attracting growing interest, primarily focusing on edutainment applications. A crucial factor to the successful deployment and wide adoption of such applications remains the robust perception of the child's multi-modal actions, when interacting with the robot in a natural and untethered fashion. Since robotic sensory and perception capabilities are platform-dependent and most often rather limited, we propose a multiple Kinect-based system to perceive the child-robot interaction scene that is robot-independent and suitable for indoors interaction scenarios. The audio-visual input from the Kinect sensors is fed into speech, gesture, and action recognition modules, appropriately developed in this paper to address the challenging nature of child-robot interaction. For this purpose, data from multiple children are collected and used for module training or adaptation. Further, information from the multiple sensors is fused to enhance module performance. The perception system is integrated in a modular multi-robot architecture demonstrating its flexibility and scalability with different robotic platforms. The whole system, called Multi3, is evaluated, both objectively at the module level and subjectively in its entirety, under appropriate child-robot interaction scenarios containing several carefully designed games between children and robots.

IROS Conference 2018 Conference Paper

Object Assembly Guidance in Child-Robot Interaction using RGB-D based 3D Tracking

Jack Hadfield
Petros Koutras
Niki Efthymiou
Gerasimos Potamianos
Costas S. Tzafestas
Petros Maragos

This work examines how and to what benefit an autonomous humanoid robot can supervise a child in an object assembly task. In order to understand the child's actions, a novel 3D object tracking algorithm for RGB-D data is employed. The tracker consists of two stages: the first performs a tracking-by-detection scheme on the color stream, to locate the objects on the image plane, while the second uses a particle filter that operates on the depth data stream to refine the first stage output and infer the objects' rotations. Given the six degrees-of-freedom of the assembly part poses, the system is able to recognize which connections have been completed at any given time. This information is then used to select an appropriate verbal or gestural response for the robot. Experimental results show that (a) the tracking algorithm is accurate, fast and robust to severe occlusions and fast movements, (b) the proposed method of assembly state estimation is indeed effective, and (c) the resulting Child-Robot Interaction scenario is educational and enjoyable for the children involved.

IROS Conference 2018 Conference Paper

User-Adaptive Human-Robot Formation Control for an Intelligent Robotic Walker Using Augmented Human State Estimation and Pathological Gait Characterization

Georgia Chalvatzaki
Xanthi S. Papageorgiou
Petros Maragos
Costas S. Tzafestas

In this paper we describe a control strategy for a user-adaptive human-robot system for an intelligent robotic Mobility Assistive Device (MAD)using raw data from a single laser-range-finder (LRF)mounted on the MAD and scanning the walking area. The proposed control architecture consists of three modules. In the first module, a previously proposed methodology (termed IMM-PDA-PF)delivers the augmented human state estimation of the user by providing robust leg tracking and on-line estimation of the human gait phases. This information is processed at the next module for providing the pathological gait parametrization and characterization, by computing specific gait parameters for each gait cycle. These gait parameters form the feature vector that classifies the user in a certain class related to risk of fall. Those are of particular significance to the system, since the gait parameters and the respective class are used in the third module, i. e. the human-robot formation controller, in order to adapt the desired formation of the human-robot system, by selecting the appropriate control variables. The experimental evaluation comprises gait data from real patients, and demonstrates the stability of the human-robot formation control, indicating the importance of incorporating an on-line gait characterization of the user, using non-wearable and non-invasive methods, in the context of a robotic MAD.

ICRA Conference 2017 Conference Paper

Comparative experimental validation of human gait tracking algorithms for an intelligent robotic rollator

Georgia Chalvatzaki
Xanthi S. Papageorgiou
Costas S. Tzafestas
Petros Maragos

Tracking human gait accurately and robustly constitutes a key factor for a smart robotic walker, aiming to provide assistance to patients with different mobility impairment. A context-aware assistive robot needs constant knowledge of the user's kinematic state to assess the gait status and adjust its movement properly to provide optimal assistance. In this work, we experimentally validate the performance of two gait tracking algorithms using data from elderly patients; the first algorithm employs a Kalman Filter (KF), while the second one tracks the user legs separately using two probabilistically associated Particle Filters (PFs). The algorithms are compared according to their accuracy and robustness, using data captured from real experiments, where elderly subjects performed specific walking scenarios with physical assistance from a prototype Robotic Rollator. Sensorial data were provided by a laser rangefinder mounted on the robotic platform recording the movement of the user's legs. The accuracy of the proposed algorithms is analysed and validated with respect to ground truth data provided by a Motion Capture system tracking a set of visual markers worn by the patients. The robustness of the two tracking algorithms is also analysed comparatively in a complex maneuvering scenario. Current experimental findings demonstrate the superior performance of the PFs in difficult cases of occlusions and clutter, where KF tracking often fails.

IROS Conference 2017 Conference Paper

Real-time end-effector motion behavior planning approach using on-line point-cloud data towards a user adaptive assistive bath robot

Athanasios C. Dometios
Xanthi S. Papageorgiou
Antonis Arvanitakis
Costas S. Tzafestas
Petros Maragos

Elderly people have particular needs in performing bathing activities, since these tasks require body flexibility. Our aim is to build an assistive robotic bath system, in order to increase the independence and safety of this procedure. Towards this end, the expertise of professional carers for bathing sequences and appropriate motions has to be adopted, in order to achieve natural, physical human - robot interaction. In this paper, a real-time end-effector motion planning method for an assistive bath robot, using on-line Point-Cloud information, is proposed. The visual feedback obtained from Kinect depth sensor is employed to adapt suitable washing paths to the user's body part motion and deformable surface. We make use of a navigation function-based controller, with guarantied globally uniformly asymptotic stability, and bijective transformations for the adaptation of the paths. Experiments were conducted with a rigid rectangular object for validation purposes, while a female subject took part to the experiment in order to evaluate and demonstrate the basic concepts of the proposed methodology.

IROS Conference 2015 Conference Paper

Hidden markov modeling of human pathological gait using laser range finder for an assisted living intelligent robotic walker

Xanthi S. Papageorgiou
Georgia Chalvatzaki
Costas S. Tzafestas
Petros Maragos

The precise analysis of a patient's or an elderly person's walking pattern is very important for an effective intelligent active mobility assistance robot. This walking pattern can be described by a cyclic motion, which can be modeled using the consecutive gait phases. In this paper, we present a completely non-invasive framework for analyzing and recognizing a pathological human walking gait pattern. Our framework utilizes a laser range finder sensor to detect and track the human legs, and an appropriately synthesized Hidden Markov Model (HMM) for state estimation, and recognition of the gait patterns. We demonstrate the applicability of this setup using real data, collected from an ensemble of different elderly persons with a number of pathologies. The results presented in this paper demonstrate that the proposed human data analysis scheme has the potential to provide the necessary methodological (modeling, inference, and learning) framework for a cognitive behavior-based robot control system. More specifically, the proposed framework has the potential to be used for the classification of specific walking pathologies, which is needed for the development of a context-aware robot mobility assistant.

JMLR Journal 2015 Journal Article

Multimodal Gesture Recognition via Multiple Hypotheses Rescoring

Vassilis Pitsikalis
Athanasios Katsamanis
Stavros Theodorakis
Petros Maragos

We present a new framework for multimodal gesture recognition that is based on a multiple hypotheses rescoring fusion scheme. We specifically deal with a demanding Kinect-based multimodal data set, introduced in a recent gesture recognition challenge (ChaLearn 2013), where multiple subjects freely perform multimodal gestures. We employ multiple modalities, that is, visual cues, such as skeleton data, color and depth images, as well as audio, and we extract feature descriptors of the hands' movement, handshape, and audio spectral properties. Using a common hidden Markov model framework we build single-stream gesture models based on which we can generate multiple single stream-based hypotheses for an unknown gesture sequence. By multimodally rescoring these hypotheses via constrained decoding and a weighted combination scheme, we end up with a multimodally-selected best hypothesis. This is further refined by means of parallel fusion of the monomodal gesture models applied at a segmental level. In this setup, accurate gesture modeling is proven to be critical and is facilitated by an activity detection system that is also presented. The overall approach achieves 93.3% gesture recognition accuracy in the ChaLearn Kinect-based multimodal data set, significantly outperforming all recently published approaches on the same challenging multimodal gesture recognition task, providing a relative error rate reduction of at least 47.6%. [abs] [ pdf ][ bib ] &copy JMLR 2015. ( edit, beta )

ICRA Conference 2014 Conference Paper

Hidden Markov modeling of human normal gait using laser range finder for a mobility assistance robot

Xanthi S. Papageorgiou
Georgia Chalvatzaki
Costas S. Tzafestas
Petros Maragos

For an effective intelligent active mobility assistance robot, the walking pattern of a patient or an elderly person has to be analyzed precisely. A well-known fact is that the walking patterns are gaits, that is, cyclic patterns with several consecutive phases. These cyclic motions can be modeled using the consecutive gait phases. In this paper, we present a completely non-invasive framework for analyzing a normal human walking gait pattern. Our framework utilizes a laser range finder sensor to collect the data, a combination of filters to preprocess these data, and an appropriately synthesized Hidden Markov Model (HMM) for state estimation, and recognition of the gait data. We demonstrate the applicability of this setup using real data, collected from an ensemble of different persons. The results presented in this paper demonstrate that the proposed human data analysis scheme has the potential to provide the necessary methodological (modeling, inference, and learning) framework for a cognitive behavior-based robot control system. More specifically, the proposed framework has the potential to be used for the recognition of abnormal gait patterns and the subsequent classification of specific walking pathologies, which is needed for the development of a context-aware robot mobility assistant.

JMLR Journal 2013 Journal Article

Dynamic Affine-Invariant Shape-Appearance Handshape Features and Classification in Sign Language Videos

Anastasios Roussos
Stavros Theodorakis
Vassilis Pitsikalis
Petros Maragos

We propose the novel approach of dynamic affine-invariant shape- appearance model (Aff-SAM) and employ it for handshape classification and sign recognition in sign language (SL) videos. Aff-SAM offers a compact and descriptive representation of hand configurations as well as regularized model-fitting, assisting hand tracking and extracting handshape features. We construct SA images representing the hand's shape and appearance without landmark points. We model the variation of the images by linear combinations of eigenimages followed by affine transformations, accounting for 3D hand pose changes and improving model's compactness. We also incorporate static and dynamic handshape priors, offering robustness in occlusions, which occur often in signing. The approach includes an affine signer adaptation component at the visual level, without requiring training from scratch a new singer-specific model. We rather employ a short development data set to adapt the models for a new signer. Experiments on the Boston- University-400 continuous SL corpus demonstrate improvements on handshape classification when compared to other feature extraction approaches. Supplementary evaluations of sign recognition experiments, are conducted on a multi-signer, 100-sign data set, from the Greek sign language lemmas corpus. These explore the fusion with movement cues as well as signer adaptation of Aff- SAM to multiple signers providing promising results. [abs] [ pdf ][ bib ] &copy JMLR 2013. ( edit, beta )