Author name cluster

Karthik Ramani

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

NeurIPS Conference 2024 Conference Paper

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee

We study the problem of estimating the body movements of a camera wearer from egocentric videos. Current methods for ego-body pose estimation rely on temporally dense sensor data, such as IMU measurements from spatially sparse body parts like the head and hands. However, we propose that even temporally sparse observations, such as hand poses captured intermittently from egocentric videos during natural or periodic hand movements, can effectively constrain overall body motion. Naively applying diffusion models to generate full-body pose from head pose and sparse hand pose leads to suboptimal results. To overcome this, we develop a two-stage approach that decomposes the problem into temporal completion and spatial completion. First, our method employs masked autoencoders to impute hand trajectories by leveraging the spatiotemporal correlations between the head pose sequence and intermittent hand poses, providing uncertainty estimates. Subsequently, we employ conditional diffusion models to generate plausible full-body motions based on these temporally dense trajectories of the head and hands, guided by the uncertainty estimates from the imputation. The effectiveness of our methods was rigorously tested and validated through comprehensive experiments conducted on various HMD setup with AMASS and Ego-Exo4D datasets. Project page: https: //sgchi. github. io/dsposer

PDF Details DOI

IROS Conference 2024 Conference Paper

Multi-Modal Representation Learning with Tactile Data

Hyung-Gun Chi
Jose A. Barreiros
Jean Mercat
Karthik Ramani
Thomas Kollar

Advancements in embodied language models like PALM-E and RT-2 have significantly enhanced language-conditioned robotic manipulation. However, these advances remain predominantly focused on vision and language, often overlooking the pivotal role of tactile feedback which is advantageous in contact-rich interactions. Our research introduces a novel approach that synergizes tactile information with vision and language. We present the Multi-Modal Wand (MMWand) dataset enriched with linguistic descriptions and tactile data. By integrating tactile feedback, we aim to bridge the divide between human linguistic understanding and robotic sensory interpretation. Our multi-modal representation model is trained on these datasets by employing the multi-modal embedding alignment principle from ImageBind which has shown promising results, emphasizing the potential of tactile data in robotic applications. The validation of our approach in downstream robotics tasks, such as texture-based object classification, cross-modality retrieval, and the dense reward function for visuomotor control, attests to its effectiveness. Our contributions underscore the importance of tactile feedback in multi-modal robotic learning and its potential to enhance robotic tasks. The MMWand dataset is publicly available at https://hyung-gun.me/mmwand/.

Details

NeurIPS Conference 2023 Conference Paper

AircraftVerse: A Large-Scale Multimodal Dataset of Aerial Vehicle Designs

Adam Cobb
Anirban Roy
Daniel Elenius
Frederick Heim
Brian Swenson
Sydney Whittington
James Walker
Theodore Bapty

We present AircraftVerse, a publicly available aerial vehicle design dataset. Aircraft design encompasses different physics domains and, hence, multiple modalities of representation. The evaluation of these designs requires the use of scientific analytical and simulation models ranging from computer-aided design tools for structural and manufacturing analysis, computational fluid dynamics tools for drag and lift computation, battery models for energy estimation, and simulation models for flight control and dynamics. AircraftVerse contains $27{, }714$ diverse air vehicle designs - the largest corpus of designs with this level of complexity. Each design comprises the following artifacts: a symbolic design tree describing topology, propulsion subsystem, battery subsystem, and other design details; a STandard for the Exchange of Product (STEP) model data; a 3D CAD design using a stereolithography (STL) file format; a 3D point cloud for the shape of the design; and evaluation results from high fidelity state-of-the-art physics models that characterize performance metrics such as maximum flight distance and hover-time. We also present baseline surrogate models that use different modalities of design representation to predict design performance metrics, which we provide as part of our dataset release. Finally, we discuss the potential impact of this dataset on the use of learning in aircraft design, and more generally, in the emerging field of deep learning for scientific design. AircraftVerse is accompanied by a datasheet as suggested in the recent literature, and it is released under Creative Commons Attribution-ShareAlike (CC BY-SA) license. The dataset with baseline models are hosted at http: //doi. org/10. 5281/zenodo. 6525446, code at https: //github. com/SRI-CSL/AircraftVerse, and the dataset description at https: //uavdesignverse. onrender. com/.

PDF Details

ICRA Conference 2023 Conference Paper

Pose Relation Transformer Refine Occlusions for Human Pose Estimation

Hyung-Gun Chi
Seung-geun Chi
Stanley Chan
Karthik Ramani

Accurately estimating the human pose is an essential task for many applications in robotics. However, existing pose estimation methods suffer from poor performance when occlusion occurs. Recent advances in NLP have been very successful in predicting the missing words conditioned on visible words. We draw upon the sentence completion analogy in NLP to guide our model to address occlusions in the pose estimation problem. We propose a novel approach that can mitigate the effect of occlusions motivated by the sentence completion task of NLP. In an analogous manner, we designed our model to reconstruct occluded joints given the visible joints utilizing joint correlations by capturing the implicit joint connectivity through the attention mechanism. In this work, we propose a POse Relation Transformer (PORT) that captures the global context of the pose using self-attention and a local context by aggregating adjacent joint features. To supervise PORT in learning joint correlations, we guide PORT to reconstruct randomly masked joints, which we call Masked Joint Modeling (MJM). PORT trained with MJM adds to existing keypoint detection methods and successfully refines occlusions. Notably, PORT is a model-agnostic plug-and-play module for pose refinement under occlusion that can be plugged into any keypoint detector with substantially low computational costs. We conducted extensive experiments to demonstrate the advantage of PORT mitigating the occlusion on the hand and body pose PORT improves the pose estimation accuracy of existing human pose estimation methods by up to 16% with only 5% of additional parameters. The code is publicly available at https://github.com/stnoah1/PORT.

Details

ICRA Conference 2016 Conference Paper

Cubimorph: Designing modular interactive devices

Anne Roudaut
Diana Krusteva
Mike McCoy
Abhijit Karnik
Karthik Ramani
Sriram Subramanian

We introduce Cubimorph, a modular interactive device that accommodates touchscreens on each of the six module faces, and that uses a hinge-mounted turntable mechanism to self-reconfigure in the user's hand. Cubimorph contributes toward the vision of programmable matter where interactive devices reconfigure in any shape that can be made out of a chain of cubes in order to fit a myriad of functionalities, e. g. a mobile phone shifting into a console when a user launches a game. We present a design rationale that exposes user requirements to consider when designing homogeneous modular interactive devices. We present our Cubimorph mechanical design, three prototypes demonstrating key aspects (turntable hinges, embedded touchscreens and miniaturization), and an adaptation of the probabilistic roadmap algorithm for the reconfiguration.

Details

NeurIPS Conference 2016 Conference Paper

Deconvolving Feedback Loops in Recommender Systems

Ayan Sinha
David Gleich
Karthik Ramani

Collaborative filtering is a popular technique to infer users' preferences on new content based on the collective information of all users preferences. Recommender systems then use this information to make personalized suggestions to users. When users accept these recommendations it creates a feedback loop in the recommender system, and these loops iteratively influence the collaborative filtering algorithm's predictions over time. We investigate whether it is possible to identify items affected by these feedback loops. We state sufficient assumptions to deconvolve the feedback loops while keeping the inverse solution tractable. We furthermore develop a metric to unravel the recommender system's influence on the entire user-item rating matrix. We use this metric on synthetic and real-world datasets to (1) identify the extent to which the recommender system affects the final rating matrix, (2) rank frequently recommended items, and (3) distinguish whether a user's rated item was recommended or an intrinsic preference. Our results indicate that it is possible to recover the ratings matrix of intrinsic user preferences using a single snapshot of the ratings matrix without any temporal information.

PDF Details

IROS Conference 2014 Conference Paper

HexaMorph: A reconfigurable and foldable hexapod robot inspired by origami

Wei Gao
Ke Huo
Jasjeet Singh Seehra
Karthik Ramani
Raymond J. Cipra

Origami affords the creation of diverse 3D objects through explicit folding processes from 2D sheets of material. Originally as a paper craft from 17th century AD, origami designs reveal the rudimentary characteristics of sheet folding: it is lightweight, inexpensive, compact and combinatorial. In this paper, we present “HexaMorph”, a novel starfish-like hexapod robot designed for modularity, foldability and reconfigurability. Our folding scheme encompasses periodic foldable tetrahedral units, called “Basic Structural Units” (BSU), for constructing a family of closed-loop spatial mechanisms and robotic forms. The proposed hexapod robot is fabricated using single sheets of cardboard. The electronic and battery components for actuation are allowed to be preassembled on the flattened crease-cut pattern and enclosed inside when the tetrahedral modules are folded. The self-deploying characteristic and the mobility of the robot are investigated, and we discuss the motion planning and control strategies for its squirming locomotion. Our design and folding paradigm provides a novel approach for building reconfigurable robots using a range of lightweight foldable sheets.

Details