Arrow Research search

Author name cluster

Nassir Navab

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

77 papers
2 author rows

Possible papers

77

AAAI Conference 2026 Conference Paper

Conformable Convolution for Topologically Constrained Learning of Complex Anatomical Structures

  • Yousef Yeganeh
  • Goktug Guvercin
  • Nassir Navab
  • Azade Farshad

While conventional computer vision emphasizes pixel-level and feature-based objectives, medical image analysis of intricate biological structures necessitates explicit representation of their complex topological properties. Despite their successes, deep learning models often struggle to accurately capture the connectivity and continuity of fine, sometimes pixel-thin, yet critical structures due to their reliance on implicit learning from data. To address this challenge, we introduce Conformable Convolution, a novel convolutional layer designed to explicitly impose topological consistency. Conformable Convolution learns adaptive kernel offsets that focus on regions of high topological significance within an image. This prioritization is guided by our proposed Topological Posterior Generator (TPG) module, which leverages persistent homology. The TPG module identifies key topological features and guides the convolutional layers by applying persistent homology to feature maps transformed into cubical complexes. Unlike existing approaches that are merely aware of topology, our method explicitly constrains the learning process to ensure topological correctness. The proposed modules are architecture-agnostic, enabling them to be integrated seamlessly into various architectures. We showcase the effectiveness of our framework in the segmentation task, where preserving the interconnectedness of structures is critical. The results on three diverse datasets demonstrate that our framework effectively preserves the topology both quantitatively and qualitatively.

AAAI Conference 2026 Conference Paper

Where It Moves, It Matters: Referring Surgical Instrument Segmentation via Motion

  • Meng Wei
  • Kun Yuan
  • Shi Li
  • Yue Zhou
  • Long Bai
  • Nassir Navab
  • Hongliang Ren
  • Hong Joo Lee

Enabling intuitive, language-driven interaction with surgical scenes is a critical step toward intelligent operating rooms and autonomous surgical robotic assistance. However, the task of referring segmentation, localizing surgical instruments based on natural language descriptions, remains underexplored in surgical videos, with existing approaches struggling to generalize due to reliance on static visual cues and predefined instrument names. In this work, we introduce SurgRef, a novel motion-guided framework that grounds free-form language expressions in instrument motion, capturing how tools move and interact across time, rather than what they look like. This allows models to understand and segment instruments even under occlusion, ambiguity, or unfamiliar terminology. To train and evaluate SurgRef, we present Ref-IMotion, a diverse, multi-institutional video dataset with dense spatiotemporal masks and rich motion-centric expressions. SurgRef achieves state-of-the-art accuracy and generalization across surgical procedures, setting a new benchmark for robust, language-driven surgical video segmentation.

ICRA Conference 2025 Conference Paper

Design and Effectiveness of Virtual Monitors and AR-Based Endoscope Control for Robotically Assisted Laparoscopic Surgery

  • Nikola Budjakoski
  • Dominik Schneider
  • Tianyu Song 0002
  • Michael Sommersperger
  • Bernhard M. Weber
  • Nassir Navab
  • Julian Klodmann

Managing indirect access in laparoscopy as a minimally invasive procedure poses challenges to physicians. In particular, an endoscope must be navigated to achieve adequate visualization of the surgical anatomy, while coping with unergonomic poses, tremor, and fatigue. Furthermore, the alignment of visual perception and physical movement, dictated by the endoscope's position relative to the monitor, can lead to hand-eye coordination challenges. We propose unified deployment of a robotic endoscope holder together with an augmented reality display to counteract the aforementioned challenges in laparoscopy. Our augmented reality system provides an interactive, stereoscopic, virtual monitor displaying an endoscopic stream. In addition, our method design enables direct control of the robotic endoscope holder. Our user study demonstrates the potential of the proposed method to significantly improve hand-eye coordination, while insights from our usability study for robotic control indicate promising trends, including high usability and low cognitive demand.

NeurIPS Conference 2025 Conference Paper

EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding

  • Ege Özsoy
  • Arda Mamur
  • Felix Tristram
  • Chantal Pellegrini
  • Magdalena Wysocki
  • Benjamin Busam
  • Nassir Navab

Operating rooms (ORs) demand precise coordination among surgeons, nurses, and equipment in a fast-paced, occlusion-heavy environment, necessitating advanced perception models to enhance safety and efficiency. Existing datasets either provide partial egocentric views or sparse exocentric multi-view context, but do not explore the comprehensive combination of both. We introduce EgoExOR, the first OR dataset and accompanying benchmark to fuse first-person and third-person perspectives. Spanning 94 minutes (84, 553 frames at 15 FPS) of two emulated spine procedures, Ultrasound-Guided Needle Insertion and Minimally Invasive Spine Surgery, EgoExOR integrates egocentric data (RGB, gaze, hand tracking, audio) from wearable glasses, exocentric RGB and depth from RGB-D cameras, and ultrasound imagery. Its detailed scene graph annotations, covering 36 entities and 22 relations (568, 235 triplets), enable robust modeling of clinical interactions, supporting tasks like action recognition and human-centric perception. We evaluate the surgical scene graph generation performance of two adapted state-of-the-art models and offer a new baseline that explicitly leverages EgoExOR’s multimodal and multi-perspective signals. This new dataset and benchmark set a new foundation for OR perception, offering a rich, multimodal resource for next-generation clinical perception. Our code and data are available at https: //github. com/ardamamur/EgoExOR.

ICRA Conference 2025 Conference Paper

Improving Probe Localization for Freehand 3D Ultrasound Using Lightweight Cameras

  • Dianye Huang
  • Nassir Navab
  • Zhongliang Jiang

Ultrasound (US) probe localization relative to the examined subject is essential for freehand 3D US imaging, which offers significant clinical value due to its affordability and unrestricted field of view. However, existing methods often rely on expensive tracking systems or bulky probes, while recent US image-based deep learning methods suffer from accumulated errors during probe maneuvering. To address these challenges, this study proposes a versatile, cost-effective probe pose localization method for freehand 3D US imaging, utilizing two lightweight cameras. To eliminate accumulated errors during US scans, we introduce PoseNet, which directly predicts the probe's 6 D pose relative to a preset world coordinate system based on camera observations. We first jointly train pose and camera image encoders based on pairs of 6 D pose and camera observations densely sampled in simulation. This will encourage each pair of probe pose and its corresponding camera observation to share the same representation in latent space. To ensure the two encoders handle unseen images and poses effectively, we incorporate a triplet loss that enforces smaller differences in latent features between nearby poses compared to distant ones. Then, the pose decoder uses the latent representation of the camera images to predict the probe's 6 D pose. To bridge the sim-to-real gap, in the real world, we use the trained image encoder and pose decoder for initial predictions, followed by an additional MLP layer to refine the estimated pose, improving accuracy. The results obtained from an arm phantom demonstrate the effectiveness of the proposed method, which notably surpasses state-of-the-art techniques, achieving average positional and rotational errors of 2. 03 mm and 0. 37°, respectively. Code: https://github.com/dianyeHuang/FreehandUS_Pose_Estimation

ICRA Conference 2025 Conference Paper

Intraoperative Trocar-Based Eyeball Rotation Estimation Using Only 2D Microscope Images

  • Junjie Yang 0001
  • Satoshi Inagaki
  • Zhihao Zhao
  • Daniel Zapp
  • Mathias Maier
  • Peter C. Issa
  • Kai Huang 0001
  • Nassir Navab

In ophthalmic surgery, surgeons or robots manipulate a light probe and an instrument around two separated trocars following sclerotomy to achieve orbital control for eyeball pose adjustment and subsequent surgical tasks referring to microscope frames. However, current methods face significant challenges in directly extracting the eyeball pose from real-time microscope frames due to the limited microscope perspective and the darkened operating room (OR). This paper decomposes eyeball rotations only along the x and y axes. Then, a method of calculating eyeball poses using eyeball geometry and microscopic trocar positions is presented. This method is tested by simulation and a phantom system with current [2. 0, 2. 8] degree error, providing assistant intraoperative eyeball status in the dark OR with extended method discussions.

AAAI Conference 2025 Conference Paper

Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment

  • Yaling Shen
  • Zhixiong Zhuang
  • Kun Yuan
  • Maria-Irina Nicolae
  • Nassir Navab
  • Nicolas Padoy
  • Mario Fritz

Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems, assisting medical personnel with decision making and results analysis. Models for radiology report generation are able to interpret medical imagery, thus reducing the workload of radiologists. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. However, these assets are potentially vulnerable to model stealing, where attackers aim to replicate their functionality via black-box access. So far, model stealing for the medical domain has focused on image classification; however, existing attacks are not effective against MLLMs. In this paper, we introduce Adversarial Domain Alignment (ADA-Steal), the first stealing attack against medical MLLMs. ADA-Steal relies on natural images, which are public and widely available, as opposed to their medical counterparts. We show that data augmentation with adversarial noise is sufficient to overcome the data distribution gap between natural images and the domain-specific distribution of the victim MLLM. Experiments on the IU X-RAY and MIMIC-CXR radiology datasets demonstrate that Adversarial Domain Alignment enables attackers to steal the medical MLLM without any access to medical data.

ICRA Conference 2025 Conference Paper

Pre-Surgical Planner for Robot-Assisted Vitreoretinal Surgery: Integrating Eye Posture, Robot Position and Insertion Point

  • Satoshi Inagaki
  • Alireza Alikhani
  • Nassir Navab
  • Peter C. Issa
  • M. Ali Nasseri

Several robotic frameworks have been recently developed to assist ophthalmic surgeons in performing complex vitreoretinal procedures such as subretinal injection of advanced therapeutics. These surgical robots show promising capabilities; however, most of them have to limit their working volume to achieve maximum accuracy. Moreover, the visible area seen through the surgical microscope is limited and solely depends on the eye posture. If the eye posture, trocar position, and robot configuration are not correctly arranged, the instrument may not reach the target position, and the preparation will have to be redone. Therefore, this paper proposes the optimization framework of the eye tilting and the robot positioning to reach various target areas for different patients. Our method was validated with an adjustable phantom eye model, and the error of this workflow was 0. 13 ± 1. 65 deg (rotational joint around Y axis), -1. 40 ± 1. 13 deg (around X axis), and 1. 80 ± 1. 51 mm (depth, Z). The potential error sources are also analyzed in the discussion section.

ICRA Conference 2025 Conference Paper

Real-Time Deformation-Aware Control for Autonomous Robotic Subretinal Injection Under iOCT Guidance

  • Demir Arikan
  • Peiyao Zhang
  • Michael Sommersperger
  • Shervin Dehghani
  • Mojtaba Esfandiari
  • Russell H. Taylor
  • M. Ali Nasseri
  • Peter Gehlbach

Robotic platforms provide consistent and precise tool positioning that significantly enhances retinal microsurgery. Integrating such systems with intraoperative optical coherence tomography (iOCT) enables image-guided robotic interventions, allowing autonomous performance of advanced treatments, such as injecting therapeutic agents into the subretinal space. However, tissue deformations due to tool-tissue interactions constitute a significant challenge in autonomous iOCT-guided robotic subretinal injections. Such interactions impact correct needle positioning and procedure outcomes. This paper presents a novel method for autonomous subretinal injection under iOCT guidance that considers tissue deformations during the insertion procedure. The technique is achieved through real-time segmentation and 3D reconstruction of the surgical scene from densely sampled iOCT B-scans, which we refer to as B 5 _ scans. Using B 5 -scans we monitor the position of the instrument relative to a virtual target layer between the ILM and RPE. Our experiments on ex-vivo porcine eyes demonstrate dynamic adjustment of the insertion depth and overall improved accuracy in needle positioning compared to prior autonomous insertion approaches. Compared to a 35% success rate in subretinal bleb generation with previous approaches, our method reliably created subretinal blebs in 90% our experiments. The source code and data used in this study are publicly available on GitHub 1 1 https://github.com/demirarikan/virtual-Iayer-retinal-surgery.

IROS Conference 2025 Conference Paper

Shape Completion and Real-Time Visualization in Robotic Ultrasound Spine Acquisitions

  • Miruna-Alexandra Gafencu
  • Reem Shaban
  • Yordanka Velikova
  • Mohammad Farid Azampour
  • Nassir Navab

Ultrasound (US) imaging is increasingly used in spinal procedures due to its real-time, radiation-free capabilities; however, its effectiveness is hindered by shadowing artifacts that obscure deeper tissue structures. Traditional approaches, such as CT-to-US registration, incorporate anatomical information from preoperative CT scans to guide interventions, but they are limited by complex registration requirements, differences in spine curvature, and the need for recent CT imaging. Recent shape completion methods can offer an alternative by reconstructing spinal structures in US data, while being pretrained on large set of publicly available CT scans. However, these approaches are typically offline and have limited reproducibility. In this work, we introduce a novel integrated system that combines robotic ultrasound with real-time shape completion to enhance spinal visualization. Our robotic platform autonomously acquires US sweeps of the lumbar spine, extracts vertebral surfaces from ultrasound, and reconstructs the complete anatomy using a deep learning-based shape completion network. This framework provides interactive, real-time visualization with the capability to autonomously repeat scans and can enable navigation to target locations. This can contribute to better consistency, reproducibility, and understanding of the underlying anatomy. We validate our approach through quantitative experiments assessing shape completion accuracy and evaluations of multiple spine acquisition protocols on a phantom setup. Additionally, we present qualitative results of the visualization on a volunteer scan.

IROS Conference 2025 Conference Paper

Tactile-Guided Robotic Ultrasound: Mapping Preplanned Scan Paths for Intercostal Imaging

  • Yifan Zhang
  • Dianye Huang
  • Nassir Navab
  • Zhongliang Jiang

Medical ultrasound (US) imaging is widely used in clinical examinations due to its portability, real-time capability, and radiation-free nature. To address inter- and intra-operator variability, robotic ultrasound systems have gained increasing attention. However, their application in challenging intercostal imaging remains limited due to the lack of an effective scan path generation method within the constrained acoustic window. To overcome this challenge, we explore the potential of tactile cues for characterizing subcutaneous rib structures as an alternative signal for ultrasound segmentation-free bone surface point cloud extraction. Compared to 2D US images, 1D tactile-related signals offer higher processing efficiency and are less susceptible to acoustic noise and artifacts. By leveraging robotic tracking data, a sparse tactile point cloud is generated through a few scans along the rib, mimicking human palpation. To robustly map the scanning trajectory into the intercostal space, the sparse tactile bone location point cloud is first interpolated to form a denser representation. This refined point cloud is then registered to an image-based dense bone surface point cloud, enabling accurate scan path mapping for individual patients. Additionally, to ensure full coverage of the object of interest, we introduce an automated tilt angle adjustment method to visualize structures beneath the bone. To validate the proposed method, we conducted comprehensive experiments on four distinct phantoms. The final scanning waypoint mapping achieved Mean Nearest Neighbor Distance (MNND) and Hausdorff distance (HD) errors of 3. 41 mm and 3. 65 mm, respectively, while the reconstructed object beneath the bone had errors of 0. 69 mm and 2. 2 mm compared to the CT ground truth.

IROS Conference 2025 Conference Paper

Video-Rate 4D OCT Segmentation Based on Motion-Aware Probabilistic A-Scan Sampling

  • Shervin Dehghani
  • Michael Sommersperger
  • Nassir Navab

Recent advancements in robotic eye surgery and intraoperative 4D optical coherence tomography (iOCT) imaging could enable fully or partially autonomous robotic procedures and enhanced surgical visualization. A fundamental requirement for such applications is rapid semantic segmentation of intraoperative 4D OCT data, which is capable of acquiring volumes at video rate, to provide real-time three-dimensional scene perception. Significant advancements have been made in learning-based 2D and 3D OCT segmentation techniques, pushing the boundaries of accuracy and performance. However, despite these achievements, the computational demands of 2D and 3D convolutions make real-time intraoperative processing of 4D OCT infeasible, even with substantial computational resources. This work introduces a novel real-time iOCT volume segmentation methodology. The novelty consists of a dynamic motion-aware A-scan sampling strategy, followed by an efficient segmentation approach, guaranteeing both speed and accuracy of segmentation. Our A-scan-based processing network leverages a 1D convolution approach to resolve the complexities of multi-dimensional kernels and allow for maximum parallelization, resulting in significantly faster performance. We further show that OCT volume segmentation can be reconstructed from a sparse A-scan sampling strategy that prioritizes areas in which inter-volume motion was detected, and that even missing anatomical surface information below the surgical tools can be reconstructed. Our results show high segmentation performance in dynamic surgical environments and video-rate segmentation performance meeting the demanding processing requirements of 4D OCT and leading to substantial speed improvements over previous methods.

ICRA Conference 2024 Conference Paper

AiAReSeg: Catheter Detection and Segmentation in Interventional Ultrasound using Transformers

  • Alex Ranne
  • Yordanka Velikova
  • Nassir Navab
  • Ferdinando Rodriguez y Baena

This work proposes a state-of-the-art transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences. The network architecture was inspired by the Attention in Attention mechanism, temporal tracking networks, and introduced a novel 3D segmentation head that performs 3D deconvolution across time. To train the network, we introduce a new data synthesis pipeline that uses physics-based catheter insertion simulations, along with a convolutional ray-casting ultrasound simulator to produce synthetic ultrasound images of endovascular interventions. The proposed method is validated on a hold-out validation dataset, thus demonstrated robustness to ultrasound noise and a wide range of scanning angles. It was also tested on data collected from silicon aorta phantoms, thus demonstrated its potential for translation from sim-to-real. This work represents a significant step towards safer and more efficient endovascular surgery using interventional ultrasound.

ICRA Conference 2024 Conference Paper

Analyzing Accessibility in Robot-Assisted Vitreoretinal Surgery: Integrating Eye Posture and Robot Position

  • Satoshi Inagaki
  • Alireza Alikhani
  • Nassir Navab
  • Mathias Maier
  • M. Ali Nasseri

Several robotic frameworks have been recently developed to assist ophthalmic surgeons in performing complex vitreoretinal procedures such as subretinal injection. However, in order to intuitively integrate robots into the surgical workflow, it is crucial to emphasize that an accessibility analysis framework for vitreoretinal surgery must be considered as an essential component. Such a framework, ideally, considers the comprehensive factors of the eye anatomy and its positioning, the insertion point, and the initial pose and position of the robot. By combining the mobilization of the eyeball and adjusting the pose and position of the robot, the accessibility of such systems is significantly optimized. At the same time, the accessible-visible area is better and faster matched to the working volume of the robot. This paper presents an analysis of an expansion strategy for the robot’s accessibility and visibility area. The outcomes of this method demonstrate the promising potential to enhance the robot’s accessibility, as evidenced in our analytical and experimental findings from 22. 4% to 99. 0% of the required working area on an adjustable phantom model.

IROS Conference 2024 Conference Paper

CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers

  • Alex Ranne
  • Liming Kuang
  • Yordanka Velikova
  • Nassir Navab
  • Ferdinando Rodriguez y Baena

In minimally invasive endovascular procedures, contrast-enhanced angiography remains the most robust imaging technique, but exposes patients and surgeons to prolonged radiation. Alternatives such as ultrasound are difficult to interpret, are highly prone to artifacts and noise, and vary in quality, depending on the experience of the interventional radiologist and machine settings. In this work, we seek to address both problems by introducing a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images, without demanding any labeled data. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism, and is capable of learning feature changes across time and space. To facilitate training, we used synthetic ultrasound data based on physics-driven catheter insertion simulations, and translated the data into a unique CT-Ultrasound common domain, CACTUSS, to improve the segmentation performance. We generated ground truth segmentation masks by computing the optical flow between adjacent frames using FlowNet2, and performed thresholding to obtain a binary mask estimate. Finally, we validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms, thus demonstrating its potential for applications to clinical data in the future.

ICRA Conference 2024 Conference Paper

Colibri5: Real-Time Monocular 5-DoF Trocar Pose Tracking for Robot-Assisted Vitreoretinal Surgery

  • Shervin Dehghani
  • Michael Sommersperger
  • Mahdi Saleh
  • Alireza Alikhani
  • Benjamin Busam
  • Peter Gehlbach
  • Iulian I. Iordachita
  • Nassir Navab

Retinal surgery is a complex medical procedure that requires high precision dexterity to perform delicate instrument maneuvers with sub-millimeter accuracy. Minimizing the manual tremor and achieving precise and repeatable execution of surgical tasks has motivated the development of robotic platforms to overcome the limitations of manual surgery. However, specific tasks, such as instrument insertion through the trocar, are more challenging in robotic surgery than in conventional manual procedures since the robot control is often optimized for navigation inside the eye. This challenges the integration of robotic systems, creating a high cognitive load on the operator and prolonging the surgery time. Moreover, misalignment of the robot’s remote center of motion (RCM) and trocar position during the procedure can lead to excessive forces between the instrument and the trocar, potentially causing patient trauma. Precise and rapid localization of the trocars enables the automation of the insertion procedure and dynamic compensation of eye motion. In this work, we present a real-time marker-less method for 3D pose tracking of trocar, achieved with only a single monocular camera. Our experiments show promising results towards real-time trocar pose estimation and tracking, achieving an average error of 3 ◦ in trocar orientation estimation, with an average processing time of 15 fps. This could serve as a foundation to improve robotic systems’ automation, integration, and efficiency of robotic systems for retinal surgery. The dataset created for this work is made publicly available.

IROS Conference 2024 Conference Paper

DNS-SLAM: Dense Neural Semantic-Informed SLAM

  • Kunyi Li
  • Michael Niemeyer
  • Nassir Navab
  • Federico Tombari

In recent years, coordinate-based neural implicit representations have shown promising results for the task of Simultaneous Localization and Mapping (SLAM). While achieving impressive performance on small synthetic scenes, these methods often suffer from losing details, especially for complex real-world scenes. In this work, we introduce DNS SLAM, a novel neural RGB-D semantic SLAM approach featuring a hybrid representation. Relying only on 2D semantic priors, we propose the first semantic neural SLAM method that trains class-wise scene representations while providing stable camera tracking at the same time. Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details and to output color, occupancy, and semantic class information, enabling many downstream applications. To further enable fast tracking, we introduce a lightweight coarse scene representation which is trained in a self-supervised manner in latent space. Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking while maintaining a commendable operational speed on off-the-shelf hardware. Further, our method outputs class-wise decomposed reconstructions with better texture, capturing appearance and geometric details.

ICRA Conference 2024 Conference Paper

Envibroscope: Real-Time Monitoring and Prediction of Environmental Motion for Enhancing Safety in Robot-Assisted Microsurgery

  • Alireza Alikhani
  • Satoshi Inagaki
  • Shervin Dehghani
  • Mathias Maier
  • Nassir Navab
  • M. Ali Nasseri

Several robotic systems have emerged in the recent past to enhance the precision of micro-surgeries such as retinal procedures. Significant advancements have recently been achieved to increase the precision of such systems beyond surgeon capabilities. However, little attention has been paid to the impact of non-predicted and sudden movements of the patient and the environment. Therefore, analyzing environmental motion and vibrations is crucial to ensuring the optimal performance and reliability of medical systems that require micron-level precision, especially in real-life scenarios. To address this challenge, this paper introduces a novel environmental motion analysis system that employs a grid layout with distributed sensing nodes throughout the environment. This system effectively tracks undesired movements (motions) at designated locations and predicts upcoming motions using neural network-based approaches. The outcomes of our experiments exhibit promising prospects for real-time motion monitoring and prediction, which has the potential to form a solid basis for enhancing the automation, safety, integration, and overall efficiency of robot-assisted micro-surgeries.

ICRA Conference 2024 Conference Paper

Exploring the Needle Tip Interaction Force with Retinal Tissue Deformation in Vitreoretinal Surgery

  • Simon Pannek
  • Shervin Dehghani
  • Michael Sommersperger
  • Peiyao Zhang
  • Peter Gehlbach
  • M. Ali Nasseri
  • Iulian I. Iordachita
  • Nassir Navab

Recent advancements in age-related macular degeneration treatments necessitate precision delivery into the subretinal space, emphasizing minimally invasive procedures targeting the retinal pigment epithelium (RPE)-Bruch’s membrane complex without causing trauma. Even for skilled surgeons, the inherent hand tremors during manual surgery can jeopardize the safety of these critical interventions. This has fostered the evolution of robotic systems designed to prevent such tremors. These robots are enhanced by FBG sensors, which sense the small force interactions between the surgical instruments and retinal tissue. To enable the community to design algorithms taking advantage of such force feedback data, this paper focuses on the need to provide a specialized dataset, integrating optical coherence tomography (OCT) imaging together with the aforementioned force data. We introduce a unique dataset, integrating force sensing data synchronized with OCT B-scan images, derived from a sophisticated setup involving robotic assistance and OCT integrated microscopes. Furthermore, we present a neural network model for image-based force estimation to demonstrate the dataset’s applicability.

ICRA Conference 2024 Conference Paper

Implicit Neural Representations for Breathing-compensated Volume Reconstruction in Robotic Ultrasound

  • Yordanka Velikova
  • Mohammad Farid Azampour
  • Walter Simson
  • Marco Esposito
  • Nassir Navab

Ultrasound (US) imaging is widely used in diagnosing and staging abdominal diseases due to its lack of non-ionizing radiation and prevalent availability. However, significant inter-operator variability and inconsistent image acquisition hinder the widespread adoption of extensive screening programs. Robotic ultrasound systems have emerged as a promising solution, offering standardized acquisition protocols and the possibility of automated acquisition. Additionally, these systems enable access to 3D data via robotic tracking, enhancing volumetric reconstruction for improved ultrasound interpretation and precise disease diagnosis. However, the interpretability of 3D US reconstruction of abdominal images can be affected by the patient’s breathing motion. This study introduces a method to compensate for breathing motion in 3D US compounding by leveraging implicit neural representations. Our approach employs a robotic ultrasound system for automated screenings. To demonstrate the method’s effectiveness, we evaluate our proposed method for the diagnosis and monitoring of abdominal aorta aneurysms as a representative use case. Our experiments demonstrate that our proposed pipeline facilitates robust automated robotic acquisition, mitigating artifacts from breathing motion, and yields smoother 3D reconstructions for enhanced screening and medical diagnosis.

IROS Conference 2024 Conference Paper

Intraocular Reflection Modeling and Avoidance Planning in Image-Guided Ophthalmic Surgeries

  • Junjie Yang 0001
  • Zhihao Zhao
  • Yinzheng Zhao
  • Daniel Zapp
  • Mathias Maier
  • Kai Huang 0001
  • Nassir Navab
  • M. Ali Nasseri

Intuitive enhancement of surgical precision in robotic retinal surgery highly depends on the stable acquisition of intraocular imaging data. Such acquisition requires segmenting intraocular components, especially instrument-tip positions, to achieve state estimation and subsequent navigation and motion control. However, intraocular light reflections and glares significantly impact instrument segmentation, state estimation, and subsequent visual servoing in retinal surgery. At the same time, light reflections are among the sources of information for intraoperative navigation. In this work, we propose a method for modeling and optimizing light reflections using microscopy as the standard surgical imaging modality. Beyond optimization, our approach seamlessly integrates the optimized reflection with path planning, strategically circumventing reflection areas and ensuring uninterrupted visibility of instrument tips throughout the surgical procedure. Experiments demonstrate the methodology’s efficacy in avoiding glare affections during eye surgeries.

IROS Conference 2024 Conference Paper

Neural Semantic Map-Learning for Autonomous Vehicles

  • Markus Herb
  • Nassir Navab
  • Federico Tombari

Autonomous vehicles demand detailed maps to maneuver reliably through traffic, which need to be kept up-to-date to ensure a safe operation. A promising way to adapt the maps to the ever-changing road-network is to use crowd-sourced data from a fleet of vehicles. In this work, we present a mapping system that fuses local submaps gathered from a fleet of vehicles at a central instance to produce a coherent map of the road environment including drivable area, lane markings, poles, obstacles and more as a 3D mesh. Each vehicle contributes locally reconstructed submaps as lightweight meshes, making our method applicable to a wide range of reconstruction methods and sensor modalities. Our method jointly aligns and merges the noisy and incomplete local submaps using a scene-specific Neural Signed Distance Field, which is supervised using the submap meshes to predict a fused environment representation. We leverage memory-efficient sparse feature-grids to scale to large areas and introduce a confidence score to model uncertainty in scene reconstruction. Our approach is evaluated on two datasets with different local mapping methods, showing improved pose alignment and reconstruction over existing methods. Additionally, we demonstrate the benefit of multi-session mapping and examine the required amount of data to enable high-fidelity map learning for autonomous vehicles.

ICRA Conference 2024 Conference Paper

Physics-Encoded Graph Neural Networks for Deformation Prediction under Contact

  • Mahdi Saleh
  • Michael Sommersperger
  • Nassir Navab
  • Federico Tombari

In robotics, it’s crucial to understand object deformation during tactile interactions. A precise understanding of deformation can elevate robotic simulations and have broad implications across different industries. We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions. Similar to robotic grasping and manipulation scenarios, we focus on modeling the dynamics between a rigid mesh contacting a deformable mesh under external forces. Our approach represents both the soft body and the rigid body within graph structures, where nodes hold the physical states of the meshes. We also incorporate cross-attention mechanisms to capture the interplay between the objects. By jointly learning geometry and physics, our model reconstructs consistent and detailed deformations. We’ve made our code and dataset public to advance research in robotic simulation and grasping. †

NeurIPS Conference 2024 Conference Paper

Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation

  • Kun Yuan
  • Vinkle Srivastav
  • Nassir Navab
  • Nicolas Padoy

Surgical video-language pretraining (VLP) faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data. This study aims to bridge the gap by addressing issues regarding textual information loss in surgical lecture videos and the spatial-temporal challenges of surgical VLP. To tackle these issues, we propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining (PeskaVLP) framework. The proposed knowledge augmentation approach uses large language models (LLM) to refine and enrich surgical concepts, thus providing comprehensive language supervision and reducing the risk of overfitting. The PeskaVLP framework combines language supervision with visual self-supervision, constructing hard negative samples and employing a Dynamic Time Warping (DTW) based loss function to effectively comprehend the cross-modal procedural alignment. Extensive experiments on multiple public surgical scene understanding and cross-modal retrieval datasets show that our proposed method significantly improves zero-shot transferring performance and offers a generalist visual repre- sentation for further advancements in surgical scene understanding. The source code will be available at https: //github. com/CAMMA-public/PeskaVLP.

ICRA Conference 2024 Conference Paper

RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint Detection and Invariant Description for Endoscopy

  • Mert Asim Karaoglu
  • Viktoria Markova
  • Nassir Navab
  • Benjamin Busam
  • Alexander Ladikos

Unlike in natural images, in endoscopy there is no clear notion of an up-right camera orientation. Endoscopic videos therefore often contain large rotational motions, which require keypoint detection and description algorithms to be robust to these conditions. While most classical methods achieve rotation-equivariant detection and invariant description by design, many learning-based approaches learn to be robust only up to a certain degree. At the same time learning-based methods under moderate rotations often outperform classical approaches. In order to address this shortcoming, in this paper we propose RIDE, a learning-based method for rotation-equivariant detection and invariant description. Following recent advancements in group-equivariant learning, RIDE models rotation-equivariance implicitly within its architecture. Trained in a self-supervised manner on a large curation of endoscopic images, RIDE requires no manual labeling of training data. We test RIDE in the context of surgical tissue tracking on the SuPeR dataset as well as in the context of relative pose estimation on a repurposed version of the SCARED dataset. In addition we perform explicit studies showing its robustness to large rotations. Our comparison against recent learning-based and classical approaches shows that RIDE sets a new state-ofthe-art performance on matching and relative pose estimation tasks and scores competitively on surgical tissue tracking.

NeurIPS Conference 2024 Conference Paper

SCRREAM : SCan, Register, REnder And Map: A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark

  • Hyunjun Jung
  • Weihang Li
  • Shun-Cheng Wu
  • William Bittner
  • Nikolas Brasch
  • Jifei Song
  • Eduardo Pérez-Pellitero
  • Zhensong Zhang

Traditionally, 3d indoor datasets have generally prioritized scale over ground-truth accuracy in order to obtain improved generalization. However, using these datasets to evaluate dense geometry tasks, such as depth rendering, can be problematic as the meshes of the dataset are often incomplete and may produce wrong ground truth to evaluate the details. In this paper, we propose SCRREAM, a dataset annotation framework that allows annotation of fully dense meshes of objects in the scene and registers camera poses on the real image sequence, which can produce accurate ground truth for both sparse 3D as well as dense 3D tasks. We show the details of the dataset annotation pipeline and showcase four possible variants of datasets that can be obtained from our framework with example scenes, such as indoor reconstruction and SLAM, scene editing & object removal, human reconstruction and 6d pose estimation. Recent pipelines for indoor reconstruction and SLAM serve as new benchmarks. In contrast to previous indoor dataset, our design allows to evaluate dense geometry tasks on eleven sample scenes against accurately rendered ground truth depth maps.

ICRA Conference 2024 Conference Paper

SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs

  • Guangyao Zhai
  • Xiaoni Cai
  • Dianye Huang
  • Yan Di
  • Fabian Manhardt
  • Federico Tombari
  • Nassir Navab
  • Benjamin Busam

Object rearrangement is pivotal in robotic-environment interactions, representing a significant capability in embodied AI. In this paper, we present SG-Bot, a novel rearrangement framework that utilizes a coarse-to-fine scheme with a scene graph as the scene representation. Unlike previous methods that rely on either known goal priors or zero-shot large models, SG-Bot exemplifies lightweight, real-time, and user-controllable characteristics, seamlessly blending the consideration of commonsense knowledge with automatic generation capabilities. SG-Bot employs a three-fold procedure– observation, imagination, and execution–to adeptly address the task. Initially, objects are discerned and extracted from a cluttered scene during the observation. These objects are first coarsely organized and depicted within a scene graph, guided by either commonsense or user-defined criteria. Then, this scene graph subsequently informs a generative model, which forms a fine-grained goal scene considering the shape information from the initial scene and object semantics. Finally, for execution, the initial and envisioned goal scenes are matched to formulate robotic action policies. Experimental results demonstrate that SG-Bot outperforms competitors by a large margin.

IROS Conference 2024 Conference Paper

Shadow Maintenance for Automatic Light-Probe Control in Ophthalmic Surgeries Using Only 2D information

  • Junjie Yang 0001
  • Satoshi Inagaki
  • Zhihao Zhao
  • Daniel Zapp
  • Mathias Maier
  • Kai Huang 0001
  • Nassir Navab
  • M. Ali Nasseri

In ophthalmic surgeries, the light probe is responsible for providing safe intraocular illumination and ensuring the visibility of the instrument and its shadow as the only available reference for qualitative depth estimation and landing point prediction in fundus microscopic images. To achieve sustainable shadow-based estimation during surgeries, we propose controlling the light probe automatically to limit the shadow position around the instrument tip using only 2D information from the microscope. We also integrate an intensity balancing sub-module to guarantee the normal intensity distribution and the safe depth of light-tip placement. Without motor-based pose coordination between the light probe and the instrument, experiments analyze the performance of our image-based shadow maintenance with only image information under the constraints of RCM and discuss the working volume and segmentation limitations during simulation and real-robot tests.

ICRA Conference 2024 Conference Paper

Shadow-Based 3D Pose Estimation of Intraocular Instrument Using Only 2D Images

  • Junjie Yang 0001
  • Zhihao Zhao
  • Mathias Maier
  • Kai Huang 0001
  • Nassir Navab
  • M. Ali Nasseri

In ophthalmic surgeries, such as vitreoretinal operations, surgeons rely on imaging systems, primarily microscopes, for real-time instrument monitoring and motion planning. However, novice surgeons struggle to extract 3D instrument positions from 2D microscope frames, necessitating extensive trial-and-error experience with the background that additional imaging modalities such as iOCT remain inaccessible in most operating rooms. Targeting intraocular assessment within the current surgical setup, this paper presents an imagebased pose estimation method to obtain real-time instrument tip positions in a standard 12mm-radius spherical eyeball model, which links floating instruments with on-the-retinal objects based on the intraocular shadowing principle. We validate this estimation method in a Unity simulator and verify its depth estimation capability using a specially designed eyeball phantom. Both simulator and phantom experiments demonstrate an average needle-tip estimation error within [1. 0, 2. 0] mm using only 2D microscope frames.

ICRA Conference 2024 Conference Paper

Uncertainty-Aware Contextual Visualization for Human Supervision of OCT-Guided Autonomous Robotic Subretinal Injection

  • Michael Sommersperger
  • Shervin Dehghani
  • Philipp Matten
  • Hessam Roodaki
  • Nassir Navab

The injection of therapeutic agents into the sub-retinal space might allow improved treatment of age-related macular degeneration. Various robotic systems have been developed to achieve the required precision and, in combination with intraoperative Optical Coherence Tomography (iOCT) imaging, methods for autonomous robotic guidance have been proposed. In such systems, the robot’s cognition is often governed by machine learning algorithms, such as convolutional neural networks (CNNs), which provide semantic scene information from iOCT images. Although the robot performs a surgical task autonomously, human supervision is critical to monitor the robot’s execution and, if necessary, stop the robot or take control to avoid trauma to the patient. In this paper, we propose a novel visualization concept for improved human supervision of autonomous robotic subretinal injection that integrates uncertainty information of the data provided to the robot. We design a focus and context visualization that renders an automatically identified instrument-aligned B-scan in the context of the 3D OCT volume. Our visualization is enriched by augmenting the uncertainty information on the instrument-aligned B-scan. To dynamically model task-specific uncertainty, we introduce a weighting scheme to assign an importance factor to each pair of classes, controlling the impact of their confusion on the overall uncertainty. We demonstrate our visualization concept on iOCT volumes acquired at different stages during subretinal injection on ex-vivo porcine eyes. We show that our processing pipeline achieves sufficient update rates for surgical display and discuss the impact of our visualization concept on the acceptance of robotic task autonomy for subretinal injection procedures.

NeurIPS Conference 2023 Conference Paper

CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion

  • Guangyao Zhai
  • Evin Pınar Örnek
  • Shun-Cheng Wu
  • Yan Di
  • Federico Tombari
  • Nassir Navab
  • Benjamin Busam

Controllable scene synthesis aims to create interactive environments for numerous industrial use cases. Scene graphs provide a highly suitable interface to facilitate these applications by abstracting the scene context in a compact manner. Existing methods, reliant on retrieval from extensive databases or pre-trained shape embeddings, often overlook scene-object and object-object relationships, leading to inconsistent results due to their limited generation capacity. To address this issue, we present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes, which are semantically realistic and conform to commonsense. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes via latent diffusion, capturing global scene-object and local inter-object relationships in the scene graph while preserving shape diversity. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model. Due to the lack of a scene graph dataset offering high-quality object-level meshes with relations, we also construct SG-FRONT, enriching the off-the-shelf indoor dataset 3D-FRONT with additional scene graph labels. Extensive experiments are conducted on SG-FRONT, where CommonScenes shows clear advantages over other methods regarding generation consistency, quality, and diversity. Codes and the dataset are available on the website.

ICRA Conference 2023 Conference Paper

MonoGraspNet: 6-DoF Grasping with a Single RGB Image

  • Guangyao Zhai
  • Dianye Huang
  • Shun-Cheng Wu
  • HyunJun Jung
  • Yan Di
  • Fabian Manhardt
  • Federico Tombari
  • Nassir Navab

6-DoF robotic grasping is a long-lasting but un-solved problem. Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors, demonstrating superior accuracy on common objects but performing unsatisfactorily on photometrically challenging objects, e. g. , objects in transparent or reflective materials. The bottleneck lies in that the surface of these objects can not reflect accurate depth due to the absorption or refraction of light. In this paper, in contrast to exploiting the inaccurate depth data, we propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet that utilizes stable 2D features to simultaneously handle arbitrary object grasping and overcome the problems induced by photometrically challenging objects. MonoGraspNet leverages a keypoint heatmap and a normal map to recover the 6-DoF grasping poses represented by our novel representation parameterized with 2D keypoints with corresponding depth, grasping direction, grasping width, and angle. Extensive experiments in real scenes demonstrate that our method can achieve competitive results in grasping common objects and surpass the depth-based competitor by a large margin in grasping photometrically challenging objects. To further stimulate robotic manipulation research, we annotate and open-source a multi-view grasping dataset in the real world containing 44 sequence collections of mixed photometric complexity with nearly 20M accurate grasping labels.

IROS Conference 2023 Conference Paper

Motion Magnification in Robotic Sonography: Enabling Pulsation-Aware Artery Segmentation

  • Dianye Huang
  • Yuan Bi
  • Nassir Navab
  • Zhongliang Jiang

Ultrasound (US) imaging is widely used for diagnosing and monitoring arterial diseases, mainly due to the advantages of being non-invasive, radiation-free, and real-time. In order to provide additional information to assist clinicians in diagnosis, the tubular structures are often segmented from US images. To improve the artery segmentation accuracy and stability during scans, this work presents a novel pulsation-assisted segmentation neural network (PAS-NN) by explicitly taking advantage of the cardiac-induced motions. Motion magnification techniques are employed to amplify the subtle motion within the frequency band of interest to extract the pulsation signals from sequential US images. The extracted real-time pulsation information can help to locate the arteries on cross-section US images; therefore, we explicitly integrated the pulsation into the proposed PAS-NN as attention guidance. Notably, a robotic arm is necessary to provide stable movement during US imaging since magnifying the target motions from the US images captured along a scan path is not manually feasible due to the hand tremor. To validate the proposed robotic US system for imaging arteries, experiments are carried out on volunteers' carotid and radial arteries. The results demonstrated that the PAS-NN could achieve comparable results as state-of-the-art on carotid and can effectively improve the segmentation performance for small vessels (radial artery). The code 1 1 Code: https://qithub.com/dianveHuanq/RobPMEPASNN and demonstration video 2 2 Video: https://youtu.belc9AM042_lUQ can be publicly accessed.

ICRA Conference 2023 Conference Paper

Robotic Navigation Autonomy for Subretinal Injection via Intelligent Real-Time Virtual iOCT Volume Slicing

  • Shervin Dehghani
  • Michael Sommersperger
  • Peiyao Zhang
  • Alejandro Martin-Gomez
  • Benjamin Busam
  • Peter Gehlbach
  • Nassir Navab
  • M. Ali Nasseri

In the last decade, various robotic platforms have been introduced that could support delicate retinal surgeries. Concurrently, to provide semantic understanding of the surgical area, recent advances have enabled microscope-integrated intraoperative Optical Coherent Tomography (iOCT) with high-resolution 3D imaging at near video rate. The combination of robotics and semantic understanding enables task autonomy in robotic retinal surgery, such as for subretinal injection. This procedure requires precise needle insertion for best treatment outcomes. However, merging robotic systems with iOCT intro-duces new challenges. These include, but are not limited to high demands on data processing rates and dynamic registration of these systems during the procedure. In this work, we propose a framework for autonomous robotic navigation for subretinal injection, based on intelligent real-time processing of iOCT volumes. Our method consists of an instrument pose estimation method, an online registration between the robotic and the iOCT system, and trajectory planning tailored for navigation to an injection target. We also introduce intelligent virtual B-scans, a volume slicing approach for rapid instrument pose estimation, which is enabled by Convolutional Neural Networks (CNNs). Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method. Finally, we discuss identified challenges in this work and suggest potential solutions to further the development of such systems.

IROS Conference 2023 Conference Paper

Thoracic Cartilage Ultrasound-CT Registration Using Dense Skeleton Graph

  • Zhongliang Jiang
  • Chenyang Li 0004
  • Xuesong Li
  • Nassir Navab

Autonomous ultrasound (US) imaging has gained increased interest recently, and it has been seen as a potential solution to overcome the limitations of free-hand US exami-nations, such as inter-operator variations. However, it is still challenging to accurately map planned paths from a generic atlas to individual patients, particularly for thoracic applications with high acoustic-impedance bone structures below the skin. To address this challenge, a dense graph-based non-rigid registration is proposed to transfer planned paths from the atlas to the current setup by explicitly considering subcutaneous bone surface. To this end, the sternum and cartilage branches are segmented using a template matching to assist coarse alignment of US and CT point clouds. Afterward, a directed graph is generated based on the CT template. Then, the self-organizing map using geographical distance is successively performed twice to extract the optimal graph representations for CT and US point clouds, individually. To evaluate the proposed approach, five cartilage point clouds from distinct patients are employed. The results demonstrate that the proposed graph-based registration can effectively map trajectories from CT to the current setup to do US examination through limited intercostal space. The non-rigid registration results in terms of Hausdorff distance (Mean±SD) is $9. 48 \pm 0. 27$ mm and the path transferring error in terms of Euclidean distance is $2. 21\pm 1. 11\ mm$. The code 1 1 https://github.com/marslicy/Cartilage-graph-based-US-CT-Registration and video 2 2 Video: https://www.youtube.com/watch?v=QJz2fkwgbP8 can be publicly accessed.

IROS Conference 2022 Conference Paper

CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point Cloud Learning

  • Mahdi Saleh
  • Yige Wang
  • Nassir Navab
  • Benjamin Busam
  • Federico Tombari

Processing 3D data efficiently has always been a challenge. Spatial operations on large-scale point clouds, stored as sparse data, require extra cost. Attracted by the success of transformers, researchers are using multi-head attention for vision tasks. However, attention calculations in transformers come with quadratic complexity in the number of inputs and miss spatial intuition on sets like point clouds. We redesign set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation. We propose our local attention unit, which captures features in a spatial neighborhood. We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration. Finally, to mitigate the non-heterogeneity of point clouds, we propose an efficient Multi-Scale Tokenization (MST), which extracts scale-invariant tokens for attention operations. The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods while requiring significantly fewer computations. Our proposed architecture predicts segmentation labels with around half the latency and parameter count of the previous most effi-cient method with comparable performance. The code is available at https://github.com/YigeWang-WHU/CloudAttention.

ICRA Conference 2022 Conference Paper

ColibriDoc: an Eye-in-Hand Autonomous Trocar Docking System

  • Shervin Dehghani
  • Michael Sommersperger
  • Junjie Yang 0001
  • Mehrdad Salehi
  • Benjamin Busam
  • Kai Huang 0001
  • Peter Gehlbach
  • Iulian I. Iordachita

Retinal surgery is a complex medical procedure that requires exceptional expertise and dexterity. For this purpose, several robotic platforms are currently under development to enable or improve the outcome of microsurgical tasks. Since the control of such robots is often designed for navigation inside the eye in proximity to the retina, successful trocar docking and insertion of the instrument into the eye represents an additional cognitive effort, and is therefore one of the open challenges in robotic retinal surgery. For this purpose, we present a platform for autonomous trocar docking that combines computer vision and a robotic setup. Inspired by the Cuban Colibri (hummingbird) aligning its beak to a flower using only vision, we mount a camera onto the endeffector of a robotic system. By estimating the position and pose of the trocar, the robot is able to autonomously align and navigate the instrument towards the Trocar Entry Point (TEP) and finally perform the insertion. Our experiments show that the proposed method is able to accurately estimate the position and pose of the trocar and achieve repeatable autonomous docking. The aim of this work is to reduce the complexity of the robotic setup prior to the surgical task and therefore, increase the intuitiveness of the system integration into clinical workflow.

IROS Conference 2021 Conference Paper

DemoGrasp: Few-Shot Learning for Robotic Grasping with Human Demonstration

  • Pengyuan Wang 0002
  • Fabian Manhardt
  • Luca Minciullo
  • Lorenzo Garattoni
  • Sven Meier
  • Nassir Navab
  • Benjamin Busam

The ability to successfully grasp objects is crucial in robotics, as it enables several interactive downstream applications. To this end, most approaches either compute the full 6D pose for the object of interest or learn to predict a set of grasping points. While the former approaches do not scale well to multiple object instances or classes yet, the latter require large annotated datasets and are hampered by their poor generalization capabilities to new geometries. To overcome these shortcomings, we propose to teach a robot how to grasp an object with a simple and short human demonstration. Hence, our approach neither requires many annotated images nor is it restricted to a specific geometry. We first present a small sequence of RGB-D images displaying a human-object interaction. This sequence is then leveraged to build associated hand and object meshes that represent the depicted interaction. Subsequently, we complete missing parts of the reconstructed object shape and estimate the relative transformation between the reconstruction and the visible object in the scene. Finally, we transfer the a-priori knowledge from the relative pose between object and human hand with the estimate of the current object pose in the scene into necessary grasping instructions for the robot. Exhaustive evaluations with Toyota’s Human Support Robot (HSR) in real and synthetic environments demonstrate the applicability of our proposed methodology and its advantage in comparison to previous approaches.

NeurIPS Conference 2021 Conference Paper

Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information

  • Yang Zhang
  • Ashkan Khakzar
  • Yawei Li
  • Azade Farshad
  • Seong Tae Kim
  • Nassir Navab

One principal approach for illuminating a black-box neural network is feature attribution, i. e. identifying the importance of input features for the network’s prediction. The predictive information of features is recently proposed as a proxy for the measure of their importance. So far, the predictive information is only identified for latent features by placing an information bottleneck within the network. We propose a method to identify features with predictive information in the input domain. The method results in fine-grained identification of input features' information and is agnostic to network architecture. The core idea of our method is leveraging a bottleneck on the input that only lets input features associated with predictive latent features pass through. We compare our method with several feature attribution methods using mainstream feature attribution evaluation experiments. The code is publicly available.

ICRA Conference 2021 Conference Paper

Lightweight Semantic Mesh Mapping for Autonomous Vehicles

  • Markus Herb
  • Tobias Weiherer
  • Nassir Navab
  • Federico Tombari

Lightweight and semantically meaningful environment maps are crucial for many applications in robotics and autonomous driving to facilitate higher-level tasks such as navigation and planning. In this paper we present a novel approach to incrementally build a meaningful and lightweight semantic map directly as a 3D mesh from a monocular or stereo sequence. Our system leverages existing feature-based visual odometry paired with learned depth prediction and semantic image segmentation to identify and reconstruct semantically relevant environment structure. We introduce a probabilistic fusion scheme to incrementally refine and extend a 3D mesh with semantic labels for each face without intermediate voxel-based fusion. To demonstrate its effectiveness, we evaluate our system in outdoor driving scenarios with monocular depth prediction and stereo and present quantitative and qualitative reconstruction results with comparison to ground truth. Our results show that the proposed approach achieves reconstruction quality comparable to current state-of-the-art voxel-based methods while being much more lightweight both in storage and computation.

ICRA Conference 2021 Conference Paper

Motion-Aware Robotic 3D Ultrasound

  • Zhongliang Jiang
  • Hanyu Wang 0007
  • Zhenyu Li
  • Matthias Grimm
  • Mingchuan Zhou
  • Ulrich Eck
  • Sandra V. Brecht
  • Tim C. Lueth

Robotic three-dimensional (3D) ultrasound (US) imaging has been employed to overcome the drawbacks of traditional US examinations, such as high inter-operator variability and lack of repeatability. However, object movement remains a challenge as unexpected motion decreases the quality of the 3D compounding. Furthermore, attempted adjustment of objects, e. g. , adjusting limbs to display the entire limb artery tree, is not allowed for conventional robotic US systems. To address this challenge, we propose a vision-based robotic US system that can monitor the object’s motion and automatically update the sweep trajectory to provide 3D compounded images of the target anatomy seamlessly. To achieve these functions, a depth camera is employed to extract the manually planned sweep trajectory after which the normal direction of the object is estimated using the extracted 3D trajectory. Subsequently, to monitor the movement and further compensate for this motion to accurately follow the trajectory, the position of firmly attached passive markers is tracked in real-time. Finally, a stepwise compounding was performed. The experiments on a gel phantom demonstrate that the system can resume a sweep when the object is not stationary during scanning.

ICRA Conference 2021 Conference Paper

RGB-D SLAM with Structural Regularities

  • Yanyan Li 0001
  • Raza Yunus
  • Nikolas Brasch
  • Nassir Navab
  • Federico Tombari

This work proposes a RGB-D SLAM system specifically designed for structured environments and aimed at improved tracking and mapping accuracy by relying on geometric features that are extracted from the surrounding. Structured environments offer, in addition to points, also an abundance of geometrical features such as lines and planes, which we exploit to design both the tracking and mapping components of our SLAM system. For the tracking part, we explore geometric relationships between these features based on the assumption of a Manhattan World (MW). We propose a decoupling-refinement method based on points, lines, and planes, as well as the use of Manhattan relationships in an additional pose refinement module. For the mapping part, different levels of maps from sparse to dense are reconstructed at a low computational cost. We propose an instance-wise meshing strategy to build a dense map by meshing plane instances independently. The overall performance in terms of pose estimation and reconstruction is evaluated on public benchmarks and shows improved performance compared to state-of-the-art methods. The code is released at https://github.com/yanyan-li/PlanarSLAM.

JBHI Journal 2021 Journal Article

Seamless Virtual Whole Slide Image Synthesis and Validation Using Perceptual Embedding Consistency

  • Amal Lahiani
  • Irina Klaman
  • Nassir Navab
  • Shadi Albarqouni
  • Eldad Klaiman

Stain virtualization is an application with growing interest in digital pathology allowing simulation of stained tissue images thus saving lab and tissue resources. Thanks to the success of Generative Adversarial Networks (GANs) and the progress of unsupervised learning, unsupervised style transfer GANs have been successfully used to generate realistic, clinically meaningful and interpretable images. The large size of high resolution Whole Slide Images (WSIs) presents an additional computational challenge. This makes tilewise processing necessary during training and inference of deep learning networks. Instance normalization has a substantial positive effect in style transfer GAN applications but with tilewise inference, it has the tendency to cause a tiling artifact in reconstructed WSIs. In this paper we propose a novel perceptual embedding consistency (PEC) loss forcing the network to learn color, contrast and brightness invariant features in the latent space and hence substantially reducing the aforementioned tiling artifact. Our approach results in more seamless reconstruction of the virtual WSIs. We validate our method quantitatively by comparing the virtually generated images to their corresponding consecutive real stained images. We compare our results to state-of-the-art unsupervised style transfer methods and to the measures obtained from consecutive real stained tissue slide images. We demonstrate our hypothesis about the effect of the PEC loss by comparing model robustness to color, contrast and brightness perturbations and visualizing bottleneck embeddings. We validate the robustness of the bottleneck feature maps by measuring their sensitivity to the different perturbations and using them in a tumor segmentation task. Additionally, we propose a preliminary validation of the virtual staining application by comparing interpretation of 2 pathologists on real and virtual tiles and inter-pathologist agreement.

IROS Conference 2021 Conference Paper

Semantic Image Alignment for Vehicle Localization

  • Markus Herb
  • Matthias Lemberger
  • Marcel M. Schmitt
  • Alexander Kurz 0003
  • Tobias Weiherer
  • Nassir Navab
  • Federico Tombari

Accurate and reliable localization is a fundamental requirement for autonomous vehicles to use map information in higher-level tasks such as navigation or planning. In this paper, we present a novel approach to vehicle localization in dense semantic maps, including vectorized high-definition maps or 3D meshes, using semantic segmentation from a monocular camera. We formulate the localization task as a direct image alignment problem on semantic images, which allows our approach to robustly track the vehicle pose in semantically labeled maps by aligning virtual camera views rendered from the map to sequences of semantically segmented camera images. In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors. We demonstrate the wide applicability of our method on a diverse set of semantic mesh maps generated from stereo or LiDAR as well as manually annotated HD maps and show that it achieves reliable and accurate localization in real-time.

AIIM Journal 2021 Journal Article

Simultaneous imputation and classification using Multigraph Geometric Matrix Completion (MGMC): Application to neurodegenerative disease classification

  • Gerome Vivar
  • Anees Kazi
  • Hendrik Burwinkel
  • Andreas Zwergal
  • Nassir Navab
  • Seyed-Ahmad Ahmadi

Large-scale population-based studies in medicine are a key resource towards better diagnosis, monitoring, and treatment of diseases. They also serve as enablers of clinical decision support systems, in particular computer-aided diagnosis (CADx) using machine learning (ML). Numerous ML approaches for CADx have been proposed in literature. However, these approaches assume feature-complete data, which is often not the case in clinical data. To account for missing data, incomplete data samples are either removed or imputed, which could lead to data bias and may negatively affect classification performance. As a solution, we propose an end-to-end learning of imputation and disease prediction of incomplete medical datasets via Multi-graph Geometric Matrix Completion (MGMC). MGMC uses multiple recurrent graph convolutional networks, where each graph represents an independent population model based on a key clinical meta-feature like age, sex, or cognitive function. Graph signal aggregation from local patient neighborhoods, combined with multi-graph signal fusion via self-attention, has a regularizing effect on both matrix reconstruction and classification performance. Our proposed approach is able to impute class relevant features as well as perform accurate and robust classification on two publicly available medical datasets. We empirically show the superiority of our proposed approach in terms of classification and imputation performance when compared with state-of-the-art approaches. MGMC enables disease prediction in multimodal and incomplete medical datasets. These findings could serve as baseline for future CADx approaches which utilize incomplete datasets.

IROS Conference 2021 Conference Paper

Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

  • Artem Savkin
  • Rachid Ellouze
  • Nassir Navab
  • Federico Tombari

Image synthesis driven by computer graphics achieved recently a remarkable realism, yet synthetic image data generated this way reveals a significant domain gap with respect to real-world data. This is especially true in autonomous driving scenarios, which represent a critical aspect for over-coming utilizing synthetic data for training neural networks. We propose a method based on domain-invariant scene representation to directly synthesize traffic scene imagery without rendering. Specifically, we rely on synthetic scene graphs as our internal representation and introduce an unsupervised neural network architecture for realistic traffic scene synthesis. We enhance synthetic scene graphs with spatial information about the scene and demonstrate the effectiveness of our approach through scene manipulation.

YNICL Journal 2020 Journal Article

Analyzing the co-localization of substantia nigra hyper-echogenicities and iron accumulation in Parkinson's disease: A multi-modal atlas study with transcranial ultrasound and MRI

  • Seyed-Ahmad Ahmadi
  • Kai Bötzel
  • Johannes Levin
  • Juliana Maiostre
  • Tassilo Klein
  • Wolfgang Wein
  • Verena Rozanski
  • Olaf Dietrich

BACKGROUND: Transcranial B-mode sonography (TCS) can detect hyperechogenic speckles in the area of the substantia nigra (SN) in Parkinson's disease (PD). These speckles correlate with iron accumulation in the SN tissue, but an exact volumetric localization in and around the SN is still unknown. Areas of increased iron content in brain tissue can be detected in vivo with magnetic resonance imaging, using quantitative susceptibility mapping (QSM). METHODS: In this work, we i) acquire, co-register and transform TCS and QSM imaging from a cohort of 23 PD patients and 27 healthy control subjects into a normalized atlas template space and ii) analyze and compare the 3D spatial distributions of iron accumulation in the midbrain, as detected by a signal increase (TCS+ and QSM+) in both modalities. RESULTS: We achieved sufficiently accurate intra-modal target registration errors (TRE<1 mm) for all MRI volumes and multi-modal TCS-MRI co-localization (TRE<4 mm) for 66.7% of TCS scans. In the caudal part of the midbrain, enlarged TCS+ and QSM+ areas were located within the SN pars compacta in PD patients in comparison to healthy controls. More cranially, overlapping TCS+ and QSM+ areas in PD subjects were found in the area of the ventral tegmental area (VTA). CONCLUSION: Our findings are concordant with several QSM-based studies on iron-related alterations in the area SN pars compacta. They substantiate that TCS+ is an indicator of iron accumulation in Parkinson's disease within and in the vicinity of the SN. Furthermore, they are in favor of an involvement of the VTA and thereby the mesolimbic system in Parkinson's disease.

AIIM Journal 2020 Journal Article

GANs for medical image analysis

  • Salome Kazeminia
  • Christoph Baur
  • Arjan Kuijper
  • Bram van Ginneken
  • Nassir Navab
  • Shadi Albarqouni
  • Anirban Mukhopadhyay

Generative adversarial networks (GANs) and their extensions have carved open many exciting ways to tackle well known and challenging medical image analysis problems such as medical image de-noising, reconstruction, segmentation, data simulation, detection or classification. Furthermore, their ability to synthesize images at unprecedented levels of realism also gives hope that the chronic scarcity of labeled data in the medical field can be resolved with the help of these generative models. In this review paper, a broad overview of recent literature on GANs for medical applications is given, the shortcomings and opportunities of the proposed methods are thoroughly discussed, and potential future work is elaborated. We review the most relevant papers published until the submission date. For quick access, essential details such as the underlying method, datasets, and performance are tabulated. An interactive visualization that categorizes all papers to keep the review alive is available at http: //livingreview. in. tum. de/GANs_for_Medical_Applications/.

JBHI Journal 2020 Journal Article

Machine Learning Techniques for Ophthalmic Data Processing: A Review

  • Mhd Hasan Sarhan
  • M. Ali Nasseri
  • Daniel Zapp
  • Mathias Maier
  • Chris P. Lohmann
  • Nassir Navab
  • Abouzar Eslami

Machine learning and especially deep learning techniques are dominating medical image and data analysis. This article reviews machine learning approaches proposed for diagnosing ophthalmic diseases during the last four years. Three diseases are addressed in this survey, namely diabetic retinopathy, age-related macular degeneration, and glaucoma. The review covers over 60 publications and 25 public datasets and challenges related to the detection, grading, and lesion segmentation of the three considered diseases. Each section provides a summary of the public datasets and challenges related to each pathology and the current methods that have been applied to the problem. Furthermore, the recent machine learning approaches used for retinal vessels segmentation, and methods of retinal layers and fluid segmentation are reviewed. Two main imaging modalities are considered in this survey, namely color fundus imaging, and optical coherence tomography. Machine learning approaches that use eye measurements and visual field data for glaucoma detection are also included in the survey. Finally, the authors provide their views, expectations and the limitations of the future of these techniques in the clinical practice.

IROS Conference 2020 Conference Paper

Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks

  • Daniil Pakhomov
  • Wei Shen
  • Nassir Navab

Surgical tool segmentation in endoscopic images is an important problem: it is a crucial step towards full instrument pose estimation and it is used for integration of pre- and intra-operative images into the endoscopic view. While many recent approaches based on convolutional neural networks have shown great results, a key barrier to progress lies in the acquisition of a large number of manually-annotated images which is necessary for an algorithm to generalize and work well in diverse surgical scenarios. Unlike the surgical image data itself, annotations are difficult to acquire and may be of variable quality. On the other hand, synthetic annotations can be automatically generated by using forward kinematic model of the robot and CAD models of tools by projecting them onto an image plane. Unfortunately, this model is very inaccurate and cannot be used for supervised learning of image segmentation models. Since generated annotations will not directly correspond to endoscopic images due to errors, we formulate the problem as an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation using an adversarial model. Our approach allows to train image segmentation models without the need to acquire expensive annotations and can potentially exploit large unlabeled endoscopic image collection outside the annotated distributions of image/annotation data. We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.

IROS Conference 2020 Conference Paper

Ultrasound-Guided Robotic Navigation with Deep Reinforcement Learning

  • Hannes Hase
  • Mohammad Farid Azampour
  • Maria Tirindelli
  • Magdalini Paschali
  • Walter Simson
  • Emad Fatemizadeh
  • Nassir Navab

In this paper we introduce the first reinforcement learning (RL) based robotic navigation method which utilizes ultrasound (US) images as an input. Our approach combines state-of-the-art RL techniques, specifically deep Q-networks (DQN) with memory buffers and a binary classifier for deciding when to terminate the task. Our method is trained and evaluated on an in-house collected data-set of 34 volunteers and when compared to pure RL and supervised learning (SL) techniques, it performs substantially better, which highlights the suitability of RL navigation for US-guided procedures. When testing our proposed model, we obtained a 82. 91% chance of navigating correctly to the sacrum from 165 different starting positions on 5 different unseen simulated environments.

ICRA Conference 2019 Conference Paper

A Real-Time Interactive Augmented Reality Depth Estimation Technique for Surgical Robotics

  • Megha Kalia
  • Nassir Navab
  • Septimiu E. Salcudean

Augmented reality (AR) is a promising technology where the surgeon can see the medical abnormality in the context of the patient. It makes the anatomy of interest visible to the surgeon which otherwise is not visible. It can result in better surgical precision and therefore, potentially better surgical outcomes and faster recovery times. Despite these benefits, the current AR systems suffer from two major challenges; first, incorrect depth perception and, second, the lack of suitable evaluation systems. Therefore, in the current paper we addressed both of these problems. We proposed a color depth encoding (CDE) technique to estimate the distance between the tumor and the tissue surface using a surgical instrument. We mapped the distance between the tumor and the tissue surface to the blue-red color spectrum. For evaluation and interaction with our AR technique, we propose to use a virtual surgical instrument method using the CAD model of the instrument. The users were asked to reach the judged distance in the surgical field using the virtual tool. Realistic tool movement was simulated by collecting the forward kinematics joint encoder data. The results showed significant improvement in depth estimation, time for task completion and confidence, using our CDE technique with and without stereo versus other two cases, that are, Stereo-No CDE and No Stereo-No CDE.

ICRA Conference 2019 Conference Paper

Attention-based Lane Change Prediction

  • Oliver Scheel
  • Naveen Shankar Nagaraja
  • Loren Arthur Schwarz
  • Nassir Navab
  • Federico Tombari

Lane change prediction of surrounding vehicles is a key building block of path planning. The focus has been on increasing the accuracy of prediction by posing it purely as a function estimation problem at the cost of model understandability. However, the efficacy of any lane change prediction model can be improved when both corner and failure cases are humanly understandable. We propose an attention-based recurrent model to tackle both understandability and prediction quality. We also propose metrics which reflect the discomfort felt by the driver. We show encouraging results on a publicly available dataset and proprietary fleet data.

YNIMG Journal 2019 Journal Article

Bayesian QuickNAT: Model uncertainty in deep whole-brain segmentation for structure-wise quality control

  • Abhijit Guha Roy
  • Sailesh Conjeti
  • Nassir Navab
  • Christian Wachinger

We introduce Bayesian QuickNAT for the automated quality control of whole-brain segmentation on MRI T1 scans. Next to the Bayesian fully convolutional neural network, we also present inherent measures of segmentation uncertainty that allow for quality control per brain structure. For estimating model uncertainty, we follow a Bayesian approach, wherein, Monte Carlo (MC) samples from the posterior distribution are generated by keeping the dropout layers active at test time. Entropy over the MC samples provides a voxel-wise model uncertainty map, whereas expectation over the MC predictions provides the final segmentation. Next to voxel-wise uncertainty, we introduce four metrics to quantify structure-wise uncertainty in segmentation for quality control. We report experiments on four out-of-sample datasets comprising of diverse age range, pathology and imaging artifacts. The proposed structure-wise uncertainty metrics are highly correlated with the Dice score estimated with manual annotation and therefore present an inherent measure of segmentation quality. In particular, the intersection over union over all the MC samples is a suitable proxy for the Dice score. In addition to quality control at scan-level, we propose to incorporate the structure-wise uncertainty as a measure of confidence to do reliable group analysis on large data repositories. We envisage that the introduced uncertainty metrics would help assess the fidelity of automated deep learning based segmentation methods for large-scale population studies, as they enable automated quality control and group analyses in processing large data repositories.

IROS Conference 2019 Conference Paper

Crowd-sourced Semantic Edge Mapping for Autonomous Vehicles

  • Markus Herb
  • Tobias Weiherer
  • Nassir Navab
  • Federico Tombari

Highly accurate maps of the road infrastructure are a crucial cornerstone for self-driving cars to enable navigation in complex traffic scenarios. Traditional methods for creating detailed maps of road environments involve expensive survey vehicles that cannot keep up with the frequent changes in the road network. In this paper, we propose a novel method to derive detailed high-definition maps by crowd sourcing data using commodity sensors. Our system uses multi-session feature-based visual SLAM to align submaps recorded by individual vehicles on a central backend server. We reconstruct 3D boundaries of road infrastructure elements such as road markings and road boundaries from semantic object contours detected in keyframes by a neural network. The result is a concise map of semantically meaningful objects suitable both for localization and higher-level planning tasks of automated vehicles. We evaluate our method on real-world data against a globally referenced ground-truth map demonstrating a high level of detail and metric accuracy.

ICRA Conference 2019 Conference Paper

Needle Localization for Robot-assisted Subretinal Injection based on Deep Learning

  • Mingchuan Zhou
  • Xijia Wang
  • Jakob Weiss
  • Abouzar Eslami
  • Kai Huang 0001
  • Mathias Maier
  • Chris P. Lohmann
  • Nassir Navab

Subretinal injection is known to be a complicated task for ophthalmologists to perform, the main sources of difficulties are the fine anatomy of the retina, insufficient visual feedback, and high surgical precision. Image guided robot-assisted surgery is one of the promising solutions that bring significant surgical enhancement in treatment outcome and reduces the physical limitations of human surgeons. In this paper, we demonstrate a robust framework for needle detection and localization in subretinal injection using microscope-integrated Optical Coherence Tomography (MI-OCT) based on deep learning. The proposed method consists of two main steps: a) the preprocessing of OCT volumetric images; b) needle localization in the processed images. The first step is to coarsely localize the needle position based on the needle information above the retinal surface and crop the original image into a small region of interest (ROI). Afterward, the cropped small image is fed into a well trained network for detection and localization of the needle segment. The entire framework is extensively validated in ex-vivo pig eye experiments with robotic subretinal injection. The results show that the proposed method can localize the needle accurately with a confidence of 99. 2%.

YNIMG Journal 2019 Journal Article

QuickNAT: A fully convolutional network for quick and accurate segmentation of neuroanatomy

  • Abhijit Guha Roy
  • Sailesh Conjeti
  • Nassir Navab
  • Christian Wachinger

Whole brain segmentation from structural magnetic resonance imaging (MRI) is a prerequisite for most morphological analyses, but is computationally intense and can therefore delay the availability of image markers after scan acquisition. We introduce QuickNAT, a fully convolutional, densely connected neural network that segments a MRI brain scan in 20 s. To enable training of the complex network with millions of learnable parameters using limited annotated data, we propose to first pre-train on auxiliary labels created from existing segmentation software. Subsequently, the pre-trained model is fine-tuned on manual labels to rectify errors in auxiliary labels. With this learning strategy, we are able to use large neuroimaging repositories without manual annotations for training. In an extensive set of evaluations on eight datasets that cover a wide age range, pathology, and different scanners, we demonstrate that QuickNAT achieves superior segmentation accuracy and reliability in comparison to state-of-the-art methods, while being orders of magnitude faster. The speed up facilitates processing of large data repositories and supports translation of imaging biomarkers by making them available within seconds for fast clinical decision making.

IROS Conference 2019 Conference Paper

Robotic Ultrasound for Catheter Navigation in Endovascular Procedures

  • Fernanda Langsch
  • Salvatore Virga
  • Javier Esteban
  • Rüdiger Göbl
  • Nassir Navab

Endovascular procedures require real time visual feedback on the location of inserted catheters. This is currently achieved using X-ray fluoroscopy, which causes exposure to radiation. This study describes an alternative method using a robotic ultrasound system for catheter tracking and navigation in endovascular interventions, focusing on endovascular aneurysm repair. This approach relies on the registration of pre-operative images to provide both a tracking trajectory and visual feedback of the real-time catheter position. The procedure was validated on healthy volunteers and on a phantom that included a realistic vessel structure, showing an average tracking error of the moving catheter tip of 1. 78±1. 02 mm.

ICRA Conference 2018 Conference Paper

An Observer-Based Fusion Method Using Multicore Optical Shape Sensors and Ultrasound Images for Magnetically-Actuated Catheters

  • Alper Denasi
  • Fouzia Khan
  • Klaas Jelmer Boskma
  • Mert Kaya
  • Christoph Hennersperger
  • Rüdiger Göbl
  • Maria Tirindelli
  • Nassir Navab

Minimally invasive surgery involves using flexible medical instruments such as endoscopes and catheters. Magnetically actuated catheters can provide improved steering precision over conventional catheters. However, besides the actuation method, an accurate tip position is required for precise control of the medical instruments. In this study, the tip position obtained from transverse 2D ultrasound images and multicore optical shape sensors are combined using a robust sensor fusion algorithm. The tip position is tracked in the ultrasound images using a template-based tracker and a convolutional neural network based tracker, respectively. Experimental results for a rhombus path are presented, where data obtained from both tracking sources are fused using Luenberger and Kalman state estimators. The mean and standard deviation of the Euclidean error for the Luenberger observer is 0. 2 ± 0. 11 [mm] whereas for the Kalman filter it is 0. 18 ± 0. 13 [mm], respectively.

ICRA Conference 2018 Conference Paper

Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction

  • Oliver Scheel
  • Loren Arthur Schwarz
  • Nassir Navab
  • Federico Tombari

We introduce an extension of the Dubins Traveling Salesman Problem with Neighborhoods into the 3D space in which a fixed-wing aerial vehicle is requested to visit a set of target regions while the vehicle motion constraints are satisfied, i. e. , the minimum turning radius and maximum climb and dive angles. The primary challenge is to address both the combinatorial optimization part of finding the sequence of target visits and the continuous optimization part of the final trajectory determination. Due to its high complexity, we propose to address both parts of the problem separately by a decoupled approach in which the sequence is determined by a new distance function designed explicitly for the utilized 3D Dubins Airplane model. The final trajectory is then found by a local optimization which improves the solution quality. The proposed approach provides significantly better solutions than using Euclidean distance in the sequencing part of the problem. Moreover, the found solutions are of the competitive quality to the sampling-based algorithm while its computational requirements are about two orders of magnitude lower.

ICRA Conference 2018 Conference Paper

Situation Assessment for Planning Lane Changes: Combining Recurrent Models and Prediction

  • Oliver Scheel
  • Loren Arthur Schwarz
  • Nassir Navab
  • Federico Tombari

One of the greatest challenges towards fully autonomous cars is the understanding of complex and dynamic scenes. Such understanding is needed for planning of maneuvers, especially those that are particularly frequent such as lane changes. While in recent years advanced driver-assistance systems have made driving safer and more comfortable, these have mostly focused on car following scenarios, and less on maneuvers involving lane changes. In this work we propose a situation assessment algorithm for classifying driving situations with respect to their suitability for lane changing. For this, we propose a deep learning architecture based on a Bidirectional Recurrent Neural Network, which uses Long Short-Term Memory units, and integrates a prediction component in the form of the Intelligent Driver Model. We prove the feasibility of our algorithm on the publicly available NGSIM datasets, where we outperform existing methods.

ICRA Conference 2018 Conference Paper

When Regression Meets Manifold Learning for Object Recognition and Pose Estimation

  • Mai Bui 0001
  • Sergey Zakharov
  • Shadi Albarqouni
  • Slobodan Ilic
  • Nassir Navab

In this work, we propose a method for object recognition and pose estimation from depth images using convolutional neural networks. Previous methods addressing this problem rely on manifold learning to learn low dimensional viewpoint descriptors and employ them in a nearest neighbor search on an estimated descriptor space. In comparison we create an efficient multi-task learning framework combining manifold descriptor learning and pose regression. By combining the strengths of manifold learning using triplet loss and pose regression, we could either estimate the pose directly reducing the complexity compared to NN search, or use the learned descriptor for the NN descriptor matching. By in depth experimental evaluation of the novel loss function we observed that the view descriptors learned by the network are much more discriminative resulting in almost 30% increase regarding relative pose accuracy compared to related works. On the other hand, regarding directly regressed poses we obtained important improvement compared to simple pose regression. By leveraging the advantages of both manifold learning and regression tasks, we are able to improve the current state-of-the-art for object recognition and pose retrieval.

IROS Conference 2016 Conference Paper

Automatic force-compliant robotic ultrasound screening of abdominal aortic aneurysms

  • Salvatore Virga
  • Oliver Zettinig
  • Marco Esposito
  • Karin Pfister
  • Benjamin Frisch
  • Thomas Neff
  • Nassir Navab
  • Christoph Hennersperger

Ultrasound (US) imaging is commonly employed for the diagnosis and staging of abdominal aortic aneurysms (AAA), mainly due to its non-invasiveness and high availability. High inter-operator variability and a lack of repeatability of current US image acquisition impair the implementation of extensive screening programs for affected patient populations. However, this opens the way to a possible automation of the procedure, and recent works have exploited the use of robotic platforms for US applications, both in diagnostic and interventional scenarios. In this work, we propose a system for autonomous robotic US acquisitions aimed at the quantitative assessment of patients' vessel diameter for abdominal aortic aneurysm screening. Using a probabilistic measure of the US quality, we introduce an automatic estimation of the optimal pressure to be applied during the acquisition, and an online optimization of the out-of-plane rotation of the US probe to maximize the visibility of the aorta. We evaluate our method on healthy volunteers and compare the results to manual acquisitions performed by a clinical expert, demonstrating the feasibility of the presented system for AAA screening.

ICRA Conference 2016 Conference Paper

Confidence-driven control of an ultrasound probe: Target-specific acoustic window optimization

  • Pierre Chatelain
  • Alexandre Krupa
  • Nassir Navab

We propose a control framework to optimize the quality of robotic ultrasound imaging while tracking an anatomical target. We use a multitask approach to control the in-plane motion of a convex probe mounted on the end-effector of a robotic arm, based not only on the position of the target in the image, but also on features extracted from an ultrasound confidence map. The resulting control law therefore guarantees a good image quality, while keeping the target aligned with the central ultrasound scan-line. Potential applications of the proposed approach are, for example, teleoperated ultrasound examination, motion compensation for ultrasound-guided interventions, or automatic ultrasound acquisition. We demonstrate our approach with experiments on an ultrasound examination training phantom in motion.

IROS Conference 2016 Conference Paper

Incremental scene understanding on dense SLAM

  • Chi Li
  • Han Xiao
  • Keisuke Tateno
  • Federico Tombari
  • Nassir Navab
  • Gregory D. Hager

We present an architecture for online, incremental scene modeling which combines a SLAM-based scene understanding framework with semantic segmentation and object pose estimation. The core of this approach comprises a probabilistic inference scheme that predicts semantic labels for object hypotheses at each new frame. From these hypotheses, recognized scene structures are incrementally constructed and tracked. Semantic labels are inferred using a multi-domain convolutional architecture which operates on the image time series and which enables efficient propagation of features as well as robust model registration. To evaluate this architecture, we introduce a large-scale RGB-D dataset JHUSEQ-25 as a new benchmark for the sequence-based scene understanding in complex and densely cluttered scenes. This dataset contains 25 RGB-D video sequences with 100, 000 labeled frames in total. We validate our method on this dataset and demonstrate improved performance of semantic segmentation and 6-DoF object pose estimation compared with methods based on the single view.

JBHI Journal 2016 Journal Article

Lumen Segmentation in Intravascular Optical Coherence Tomography Using Backscattering Tracked and Initialized Random Walks

  • Abhijit Guha Roy
  • Sailesh Conjeti
  • Stephane G. Carlier
  • Pranab K. Dutta
  • Adnan Kastrati
  • Andrew F. Laine
  • Nassir Navab
  • Amin Katouzian

Intravascular imaging using ultrasound or optical coherence tomography (OCT) is predominantly used to adjunct clinical information in interventional cardiology. OCT provides high-resolution images for detailed investigation of atherosclerosis induced thickening of the lumen wall resulting in arterial blockage and triggering acute coronary events. However, the stochastic uncertainty of speckles limits effective visual investigation over large volume of pullback data, and clinicians are challenged by their inability to investigate subtle variations in the lumen topology associated with plaque vulnerability and onset of necrosis. This paper presents a lumen segmentation method using OCT imaging physics-based graph representation of signals and random walks image segmentation approaches. The edge weights in the graph are assigned incorporating OCT signal attenuation physics models. Optical backscattering maxima is tracked along each Ascan of OCT and is subsequently refined using global graylevel statistics and used for initializing seeds for the random walks image segmentation. Accuracy of lumen versus tunica segmentation has been measured on 15 in vitro and 6 in vivo pullbacks, each with 150-200 frames using 1) Cohen's kappa coefficient (0. 9786 ± 0. 0061) measured with respect to cardiologist's annotation and 2) divergence of histogram of the segments computed with Kullback-Leibler (5. 17 ± 2. 39) and Bhattacharya measures (0. 56 ± 0. 28). High segmentation accuracy and consistency substantiates the characteristics of this method to reliably segment lumen across pullbacks in the presence of vulnerability cues and necrotic pool and has a deterministic finite time-complexity. This paper in general also illustrates the development of methods and framework for tissue classification and segmentation incorporating cues of tissue-energy interaction physics in imaging.

ICRA Conference 2016 Conference Paper

Robotic ultrasound trajectory planning for volume of interest coverage

  • Christoph Graumann
  • Bernhard Fuerst
  • Christoph Hennersperger
  • Felix Bork
  • Nassir Navab

Medical robotic ultrasound offers potential to assist interventions, ease long-term monitoring and reduce operator dependency. Various techniques for remote control of ultrasound probes through telemanipulation systems have been presented in the past, however not exploiting the potential of fully autonomous acquisitions directly performed by robotic systems. In this paper, a trajectory planning algorithm for automatic robotic ultrasound acquisition under expert supervision is introduced. The objective is to compute a suitable path for covering a volume of interest selected in diagnostic images, for example by prior segmentation. A 3D patient surface point cloud is acquired using a depth camera, which is the sole prerequisite besides the volume delineation. An easily parameterizable path function generates single or multiple parallel scan trajectories capable of dealing with large target volumes. A spline is generated through the preliminary path points and is transferred to a lightweight robot to perform the ultrasound scan using an impedance control mode. The proposed approach is validated via simulation as well as on phantoms and on animal viscera.

IROS Conference 2016 Conference Paper

Sensor substitution for video-based action recognition

  • Christian Rupprecht 0001
  • Colin Lea
  • Federico Tombari
  • Nassir Navab
  • Gregory D. Hager

There are many applications where domain-specific sensing, such as accelerometers, kinematics, or force sensing, provide unique and important information for control or for analysis of motion. However, it is not always the case that these sensors can be deployed or accessed beyond laboratory environments. For example, it is possible to instrument humans or robots to measure motion in the laboratory in ways that it is not possible to replicate in the wild. An alternative, which we explore in this paper, is to address situations where accurate sensing is available while training an algorithm, but for which only video is available for deployment. We present two examples of this sensory substitution methodology. The first variation trains a convolutional neural network to regress real-valued signals, including robot end-effector pose, from video. The second example regresses binary signals derived from accelerometer data which signifies when specific objects are in motion. We evaluate these on the JIGSAWS dataset for robotic surgery training assessment and the 50 Salads dataset for modeling complex structured cooking tasks. We evaluate the trained models for video-based action recognition and show that the trained models provide information that is comparable to the sensory signals they replace.

AIIM Journal 2016 Journal Article

Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection

  • Sebastian Pölsterl
  • Sailesh Conjeti
  • Nassir Navab
  • Amin Katouzian

Background In clinical research, the primary interest is often the time until occurrence of an adverse event, i. e. , survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis. Results We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes – less than 500 patients – models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6. 3% in concordance index (inter-quartile range: [−1. 2%; 14. 6%]). Conclusions If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes – in our experiments, 2500 samples or more – feature extraction methods perform as well as feature selection methods.

ICRA Conference 2016 Conference Paper

Toward real-time 3D ultrasound registration-based visual servoing for interventional navigation

  • Oliver Zettinig
  • Bernhard Fuerst
  • Risto Kojcev
  • Marco Esposito
  • Mehrdad Salehi
  • Wolfgang Wein
  • Julia Rackerseder
  • Edoardo Sinibaldi

While intraoperative imaging is commonly used to guide surgical interventions, automatic robotic support for image-guided navigation has not yet been established in clinical routine. In this paper, we propose a novel visual servoing framework that combines, for the first time, full image-based 3D ultrasound registration with a real-time servo-control scheme. Paired with multi-modal fusion to a pre-interventional plan such as an annotated needle insertion path, it thus allows tracking a target anatomy, continuously updating the plan as the target moves, and keeping a needle guide aligned for accurate manual insertion. The presented system includes a motorized 3D ultrasound transducer mounted on a force-controlled robot and a GPU-based image processing toolkit. The tracking accuracy of our framework is validated on a geometric agar/gelatin phantom using a second robot, achieving positioning errors of on average 0. 42–0. 44 mm. With compounding and registration runtimes of up to total around 550 ms, real-time performance comes into reach. We also present initial results on a spine phantom, demonstrating the feasibility of our system for lumbar spine injections.

ICRA Conference 2016 Conference Paper

When 2. 5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM

  • Keisuke Tateno
  • Federico Tombari
  • Nassir Navab

While the main trend of 3D object recognition has been to infer object detection from single views of the scene — i. e. , 2. 5D data — this work explores the direction on performing object recognition on 3D data that is reconstructed from multiple viewpoints, under the conjecture that such data can improve the robustness of an object recognition system. To achieve this goal, we propose a framework which is able (i) to carry out incremental real-time segmentation of a 3D scene while being reconstructed via Simultaneous Localization And Mapping (SLAM), and (ii) to simultaneously and incrementally carry out 3D object recognition and pose estimation on the reconstructed and segmented 3D representations. Experimental results demonstrate the advantages of our approach with respect to traditional single view-based object recognition and pose estimation approaches, as well as its usefulness in robotic perception and augmented reality applications.

ICRA Conference 2015 Conference Paper

3D ultrasound-guided robotic steering of a flexible needle via visual servoing

  • Pierre Chatelain
  • Alexandre Krupa
  • Nassir Navab

We present a method for the three-dimensional (3D) steering of a flexible needle under 3D ultrasound guidance. The proposed solution is based on a duty-cycling visual servoing strategy we designed in a previous work, and on a new needle tracking algorithm for 3D ultrasound. The flexible needle modeled as a polynomial curve is tracked during automatic insertion using particle filtering. This new tracking algorithm enables real-time closed-loop needle control with 3D ultrasound feedback. Experimental results of a targeting task demonstrate the robustness of the proposed tracking algorithm and the feasibility of 3D ultrasound-guided needle steering.

ICRA Conference 2015 Conference Paper

Optimization of ultrasound image quality via visual servoing

  • Pierre Chatelain
  • Alexandre Krupa
  • Nassir Navab

In this paper we propose a new ultrasound-based visual servoing framework, for the optimization of the positioning of an ultrasound probe manipulated by a robotic arm, in order to improve the quality of the acquired ultrasound images. To this end, we use the recent framework of ultrasound confidence map, which aims at estimating the perpixel quality of the ultrasound signal based on a model of sound propagation in soft tissues. More specifically, we treat the ultrasound confidence maps as a new modality to design a visual servoing control law for image quality optimization. The proposed framework aims at improving ultrasound imaging techniques, such as robotic tele-echography, target tracking or volume reconstruction. Here we illustrate our approach with the application of robotic tele-echography. Experiments are performed on both an ultrasound examination training phantom and ex vivo tissue samples.

IROS Conference 2015 Conference Paper

Real-time and scalable incremental segmentation on dense SLAM

  • Keisuke Tateno
  • Federico Tombari
  • Nassir Navab

This work proposes a real-time segmentation method for 3D point clouds obtained via Simultaneous Localization And Mapping (SLAM). The proposed method incrementally merges segments obtained from each input depth image in a unified global model using a SLAM framework. Differently from all other approaches, our method is able to yield segmentation of scenes reconstructed from multiple views in real-time, with a complexity that does not depend on the size of the global model. At the same time, it is also general, as it can be deployed with any frame-wise segmentation approach as well as any SLAM algorithm. We validate our proposal by a comparison with the state of the art in terms of computational efficiency and accuracy on a benchmark dataset, as well as by showing how our method can enable real-time segmentation from reconstructions of diverse real indoor environments.

IROS Conference 2012 Conference Paper

Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images

  • Stefan Holzer
  • Radu Bogdan Rusu
  • M. Dixon
  • Suat Gedikli
  • Nassir Navab

In this paper we present two real-time methods for estimating surface normals from organized point cloud data. The proposed algorithms use integral images to perform highly efficient border- and depth-dependent smoothing and covariance estimation. We show that this approach makes it possible to obtain robust surface normals from large point clouds at high frame rates and therefore, can be used in real-time computer vision algorithms that make use of Kinect-like data.

AIIM Journal 2011 Journal Article

Prediction of intraoperative complexity from preoperative patient data for laparoscopic cholecystectomy

  • Loubna Bouarfa
  • Armin Schneider
  • Hubertus Feussner
  • Nassir Navab
  • Heinz U. Lemke
  • Pieter P. Jonker
  • Jenny Dankelman

Objective Different reasons may cause difficult intraoperative surgical situations. This study aims to predict intraoperative complexity by classifying and evaluating preoperative patient data. The basic prediction problem addressed in this paper involves the classification of preoperative data into two classes: easy (Class 0) and complex (Class 1) surgeries. Methods and material preoperative patient data were collected from 337 patients admitted to the Klinikum rechts der Isar hospital in Munich, Germany for laparoscopic cholecystectomy (LAPCHOL) in the period of 2005–2008. The data include the patient's body mass index (BMI), sex, inflammation, wall thickening, age and history of previous surgery, as well as the name and level of experience of the operating surgeon. The operating surgeon was asked to label the intraoperative complexity after the surgery: ‘0’ if the surgery was easy and ‘1’ if it was complex. For the classification task a set of classifiers was evaluated, including linear discriminant classifier (LDC), quadratic discriminant classifier (QDC), Parzen and support vector machine (SVM). Moreover, feature-selection was applied to derive the optimal preoperative patient parameters for predicting intraoperative complexity. Results Classification results indicate a preference for the LDC in terms of classification error, although the SVM classifier is preferred in terms of results concerning the area under the curve. The trained LDC or SVM classifier can therefore be used in preoperative settings to predict complexity from preoperative patient data with classification error rates below 17%. Moreover, feature-selection results identify bias in the process of labelling surgical complexity, although this bias is irrelevant for patients with inflammation, wall thickening, male sex and high BMI. These patients tend to be at high risk for complex LAPCHOL surgeries, regardless of labelling bias. Conclusions Intraoperative complexity can be predicted before surgery according to preoperative data with accuracy up to 83% using an LDC or SVM classifier. The set of features that are relevant for predicting complexity includes inflammation, wall thickening, sex and BMI score.