Arrow Research search

Author name cluster

Sergio Escalera

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

4D Point Cloud Segmentation via Active Test-Time Adaptation

  • Mingrong Gong
  • Chaoqi Chen
  • Luyao Tang
  • Yuxi Wang
  • Sergio Escalera

4D point cloud segmentation is crucial for autonomous driving with continuous LiDAR streams. While test-time adaptation (TTA) is the standard approach for handling dynamic environments, current methods suffer from catastrophic error accumulation due to over-reliance on pseudo-labels. Active learning could provide reliable annotations for critical samples, but combining it with TTA faces severe challenges: realtime processing requirements and expensive 3D labeling costs. In this paper, we propose ATTA-4DSeg, the first framework to achieve efficient active test-time adaptation for 4D point cloud segmentation under extreme budget constraints. Our key insight is a self-reinforcing loop: oracle annotations refine adaptation prototypes, which then guide the selection of subsequent high-value samples from regions with severe distribution shifts, maximizing each annotation’s impact. Specifically, we propose three key innovations: (1) dual-prototype comparison that precisely localizes distribution shift boundaries to narrow annotation scope, (2) Class-Inverse Budget Allocation (CIBA) ensuring balanced adaptation across all categories, coupled with hybrid uncertainty scoring combining voxel-level geometry and point-wise variance for optimal sample selection, and (3) a refinement strategy leveraging sparse oracle annotations to improve predictions on unlabeled points, maximizing annotation utility. Extensive experiments show ATTA-4DSeg improves mIoU by 18.87%, 19.92%, and 3.6% on three domain adaptation benchmarks using only 1% annotation budget. Our method operates 2.28× faster than state-of-the-art methods. Remarkably, our approach reaches 90% of fully-supervised performance using only 5% annotation budget.

ICLR Conference 2025 Conference Paper

MANTRA: The Manifold Triangulations Assemblage

  • Rubén Ballester
  • Ernst Röell
  • Daniel Bin Schmid
  • Mathieu Alain
  • Sergio Escalera
  • Carles Casacuberta
  • Bastian Rieck

The rising interest in leveraging higher-order interactions present in complex systems has led to a surge in more expressive models exploiting higher-order structures in the data, especially in topological deep learning (TDL), which designs neural networks on higher-order domains such as simplicial complexes. However, progress in this field is hindered by the scarcity of datasets for benchmarking these architectures. To address this gap, we introduce MANTRA, the first large-scale, diverse, and intrinsically higher-order dataset for benchmarking higher-order models, comprising over 43,000 and 250,000 triangulations of surfaces and three-dimensional manifolds, respectively. With MANTRA, we assess several graph- and simplicial complex-based models on three topological classification tasks. We demonstrate that while simplicial complex-based neural networks generally outperform their graph-based counterparts in capturing simple topological invariants, they also struggle, suggesting a rethink of TDL. Thus, MANTRA serves as a benchmark for assessing and advancing topological methods, paving the way towards more effective higher-order models.

AAAI Conference 2025 Conference Paper

Mixture-of-Attack-Experts with Class Regularization for Unified Physical-Digital Face Attack Detection

  • Shunxin Chen
  • Ajian Liu
  • Junze Zheng
  • Jun Wan
  • Kailai Peng
  • Sergio Escalera
  • Zhen Lei

Unified detection of digital and physical attacks in facial recognition systems has become a focal point of research in recent years. However, current multi-modal methods typically ignore the intra-class and inter-class variability across different types of attacks, leading to degraded performance. To address this limitation, we propose MoAE-CR, a framework that effectively leverages class-aware information for improved attack detection. Our improvements manifest at two levels, i.e., the feature and loss level. At the feature level, we propose Mixture-of-Attack-Experts (MoAEs) to capture more subtle differences among various types of fake faces. At the loss level, we introduce Class Regularization (CR) through the Disentanglement Module (DM) and the Cluster Distillation Module (CDM). The DM enhances class separability by increasing the distance between the centers of live and fake face classes. However, center-to-center constraints alone are insufficient to ensure distinctive representations for individual features. Thus, we propose the CDM to further cluster features around their class centers while maintaining separation from other classes. Moreover, specific attacks that significantly deviate from common attack patterns are often overlooked. To address this issue, our distance calculation prioritizes more distant features. Extensive experiments on two unified physical-digital attack datasets demonstrate the state-of-the-art performance of the proposed method.

JMLR Journal 2024 Journal Article

TopoX: A Suite of Python Packages for Machine Learning on Topological Domains

  • Mustafa Hajij
  • Mathilde Papillon
  • Florian Frantzen
  • Jens Agerberg
  • Ibrahem AlJabea
  • Rubén Ballester
  • Claudio Battiloro
  • Guillermo Bernárdez

We introduce TopoX, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. TopoX consists of three packages: TopoNetX facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; TopoEmbedX provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; TopoModelX is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of TopoX is available under MIT license at https://pyt-team.github.io. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2024 Conference Paper

Unified Physical-Digital Face Attack Detection

  • Hao Fang
  • Ajian Liu
  • Haocheng Yuan
  • Junze Zheng
  • Dingheng Zeng
  • Yanhong Liu
  • Jiankang Deng
  • Sergio Escalera

Face Recognition (FR) systems can suffer from physical (i. e. , print photo) and digital (i. e. , DeepFake) attacks. However, previous related work rarely considers both situations at the same time. This implies the deployment of multiple models and thus more computational burden. The main reasons for this lack of an integrated model are caused by two factors: (1) The lack of a dataset including both physical and digital attacks which the same ID covers the real face and all attack types; (2) Given the large intra-class variance between these two attacks, it is difficult to learn a compact feature space to detect both attacks simultaneously. To address these issues, we collect a Unified physical-digital Attack dataset, called UniAttackData. The dataset consists of 1, 800 participations of 2 and 12 physical and digital attacks, respectively, resulting in a total of 28, 706 videos. Then, we propose a Unified Attack Detection framework based on Vision-Language Models (VLMs), namely UniAttackDetection, which includes three main modules: the Teacher-Student Prompts (TSP) module, focused on acquiring unified and specific knowledge respectively; the Unified Knowledge Mining (UKM) module, designed to capture a comprehensive feature space; and the Sample-Level Prompt Interaction (SLPI) module, aimed at grasping sample-level semantics. These three modules seamlessly form a robust unified attack detection framework. Extensive experiments on UniAttackData and three other datasets demonstrate the superiority of our approach for unified face attack detection. Dataset link: https: //sites. google. com/view/face-anti-spoofing-challenge/dataset-download/uniattackdatacvpr2024

JMLR Journal 2023 Journal Article

CodaLab Competitions: An Open Source Platform to Organize Scientific Challenges

  • Adrien Pavao
  • Isabelle Guyon
  • Anne-Catherine Letournel
  • Dinh-Tuan Tran
  • Xavier Baro
  • Hugo Jair Escalante
  • Sergio Escalera
  • Tyler Thomas

CodaLab Competitions is an open source web platform designed to help data scientists and research teams to crowd-source the resolution of machine learning problems through the organization of competitions, also called challenges or contests. CodaLab Competitions provides useful features such as multiple phases, results and code submissions, multi-score leaderboards, and jobs running inside Docker containers. The platform is very flexible and can handle large scale experiments, by allowing organizers to upload large datasets and provide their own CPU or GPU compute workers. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

JBHI Journal 2023 Journal Article

Deep Learning Segmentation of the Right Ventricle in Cardiac MRI: The M&Ms Challenge

  • Carlos Martín-Isla
  • Víctor M. Campello
  • Cristian Izquierdo
  • Kaisar Kushibar
  • Carla Sendra-Balcells
  • Polyxeni Gkontra
  • Alireza Sojoudi
  • Mitchell J. Fulton

In recent years, several deep learning models have been proposed to accurately quantify and diagnose cardiac pathologies. These automated tools heavily rely on the accurate segmentation of cardiac structures in MRI images. However, segmentation of the right ventricle is challenging due to its highly complex shape and ill-defined borders. Hence, there is a need for new methods to handle such structure's geometrical and textural complexities, notably in the presence of pathologies such as Dilated Right Ventricle, Tricuspid Regurgitation, Arrhythmogenesis, Tetralogy of Fallot, and Inter-atrial Communication. The last MICCAI challenge on right ventricle segmentation was held in 2012 and included only 48 cases from a single clinical center. As part of the 12th Workshop on Statistical Atlases and Computational Models of the Heart (STACOM 2021), the M&Ms-2 challenge was organized to promote the interest of the research community around right ventricle segmentation in multi-disease, multi-view, and multi-center cardiac MRI. Three hundred sixty CMR cases, including short-axis and long-axis 4-chamber views, were collected from three Spanish hospitals using nine different scanners from three different vendors, and included a diverse set of right and left ventricle pathologies. The solutions provided by the participants show that nnU-Net achieved the best results overall. However, multi-view approaches were able to capture additional information, highlighting the need to integrate multiple cardiac diseases, views, scanners, and acquisition protocols to produce reliable automatic cardiac segmentation algorithms.

NeurIPS Conference 2022 Conference Paper

Meta-Album: Multi-domain Meta-Dataset for Few-Shot Image Classification

  • Ihsan Ullah
  • Dustin Carrión-Ojeda
  • Sergio Escalera
  • Isabelle Guyon
  • Mike Huisman
  • Felix Mohr
  • Jan N. van Rijn
  • Haozhe Sun

We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical character recognition, featuring various image scales (microscopic, human scales, remote sensing). All datasets are preprocessed, annotated, and formatted uniformly, and come in 3 versions (Micro $\subset$ Mini $\subset$ Extended) to match users’ computational resources. We showcase the utility of the first 30 datasets on few-shot learning problems. The other 10 will be released shortly after. Meta-Album is already more diverse and larger (in number of datasets) than similar efforts, and we are committed to keep enlarging it via a series of competitions. As competitions terminate, their test data are released, thus creating a rolling benchmark, available through OpenML. org. Our website https: //meta-album. github. io/ contains the source code of challenge winning methods, baseline methods, data loaders, and instructions for contributing either new datasets or algorithms to our expandable meta-dataset.

JMLR Journal 2016 Journal Article

Challenges in multimodal gesture recognition

  • Sergio Escalera
  • Vassilis Athitsos
  • Isabelle Guyon

This paper surveys the state of the art on multimodal gesture recognition and introduces the JMLR special topic on gesture recognition 2011-2015. We began right at the start of the \kinect revolution when inexpensive infrared cameras providing image depth recordings became available. We published papers using this technology and other more conventional methods, including regular video cameras, to record data, thus providing a good overview of uses of machine learning and computer vision using multimodal data in this area of application. Notably, we organized a series of challenges and made available several datasets we recorded for that purpose, including tens of thousands of videos, which are available to conduct further research. We also overview recent state of the art works on gesture recognition based on a proposed taxonomy for gesture recognition, discussing challenges and future lines of research. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

IJCAI Conference 2011 Conference Paper

Human Behavior Analysis from Video Data Using Bag-of-Gestures

  • V
  • iacute; ctor Ponce
  • Mario Gorga
  • Xavier Bar
  • oacute;
  • Sergio Escalera

Human Behavior Analysis in Uncontrolled Environmentscan be categorized in two main challenges: 1) Feature extraction and 2) Behavior analysisfrom a set of corporal language vocabulary. Inthis work, we present our achievements characterizingsome simple behaviors from visual data ondifferent real applications and discuss our plan forfuture work: low level vocabulary definition frombag-of-gesture units and high level modelling andinference of human behaviors.

JMLR Journal 2010 Journal Article

Error-Correcting Output Codes Library

  • Sergio Escalera
  • Oriol Pujol
  • Petia Radeva

In this paper, we present an open source Error-Correcting Output Codes (ECOC) library. The ECOC framework is a powerful tool to deal with multi-class categorization problems. This library contains both state-of-the-art coding (one-versus-one, one-versus-all, dense random, sparse random, DECOC, forest-ECOC, and ECOC-ONE) and decoding designs (hamming, euclidean, inverse hamming, laplacian, β-density, attenuated, loss-based, probabilistic kernel-based, and loss-weighted) with the parameters defined by the authors, as well as the option to include your own coding, decoding, and base classifier. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2010. ( edit, beta )