Arrow Research search

Author name cluster

Yogesh Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

AAAI Conference 2026 Conference Paper

Temporal Object-Aware Vision Transformer for Few-Shot Video Object Detection

  • Yogesh Kumar
  • Anand Mishra

Few-shot Video Object Detection (FSVOD) addresses the challenge of detecting novel objects in videos with limited labeled examples, overcoming the constraints of traditional detection methods that require extensive training data. This task presents key challenges, including maintaining temporal consistency across frames affected by occlusion and appearance variations, and achieving novel object generalization without relying on complex region proposals, which are often computationally expensive and require task-specific training. Our novel object-aware temporal modeling approach addresses these challenges by incorporating a filtering mechanism that selectively propagates high-confidence object features across frames. This enables efficient feature progression, reduces noise accumulation, and enhances detection accuracy in a few-shot setting. By utilizing few-shot trained detection and classification heads with focused feature propagation, we achieve robust temporal consistency without depending on explicit object tube proposals. Our approach achieves performance gains, with AP improvements of 3.7% (FSVOD-500), 5.3% (FSYTV-40), 4.3% (VidOR), and 4.5% (VidVRD) in the 5-shot setting. Further results demonstrate improvements in 1-shot, 3-shot, and 10-shot configurations.

ICRA Conference 2025 Conference Paper

Design, Contact Modeling, and Collision-Inclusive Planning of a Dual-Stiffness Aerial RoboT (DART)

  • Yogesh Kumar
  • Karishma Patnaik
  • Wenlong Zhang

Collision-resilient quadrotors have gained significant attention given their potential for operating in cluttered environments and leveraging impacts to perform agile maneuvers. However, existing designs are typically single-mode: either safeguarded by propeller guards that prevent deformation or deformable but lacking rigidity, which is crucial for stable flight in open environments. This paper introduces DART, a Dual-stiffness Aerial RoboT, that adapts its post-collision response by either engaging a locking mechanism for a rigid mode or disengaging it for a flexible mode, respectively. Comprehensive characterization tests highlight the significant difference in post-collision responses between its rigid and flexible modes, with the rigid mode offering seven times higher stiffness compared to the flexible mode. To understand and harness the collision dynamics, we propose a novel collision response prediction model based on the linear complementarity system theory. We demonstrate the accuracy of predicting collision forces for both the rigid and flexible modes of DART. Experimental results confirm the accuracy of the model and underscore its potential to advance collision-inclusive trajectory planning in aerial robotics.

AAAI Conference 2024 Conference Paper

QDETRv: Query-Guided DETR for One-Shot Object Localization in Videos

  • Yogesh Kumar
  • Saswat Mallick
  • Anand Mishra
  • Sowmya Rasipuram
  • Anutosh Maitra
  • Roshni Ramnani

In this work, we study one-shot video object localization problem that aims to localize instances of unseen objects in the target video using a single query image of the object. Toward addressing this challenging problem, we extend a popular and successful object detection method, namely DETR (Detection Transformer), and introduce a novel approach –query-guided detection transformer for videos (QDETRv). A distinctive feature of QDETRv is its capacity to exploit information from the query image and spatio-temporal context of the target video, which significantly aids in precisely pinpointing the desired object in the video. We incorporate cross-attention mechanisms that capture temporal relationships across adjacent frames to handle the dynamic context in videos effectively. Further, to ensure strong initialization for QDETRv, we also introduce a novel unsupervised pretraining technique tailored to videos. This involves training our model on synthetic object trajectories with an analogous objective as the query-guided localization task. During this pretraining phase, we incorporate recurrent object queries and loss functions that encourage accurate patch feature reconstruction. These additions enable better temporal understanding and robust representation learning. Our experiments show that the proposed model significantly outperforms the competitive baselines on two public benchmarks, VidOR and ImageNet-VidVRD, extended for one-shot open-set localization tasks.

IROS Conference 2023 Conference Paper

Design, Characterization and Control of a Whole-body Grasping and Perching (WHOPPEr) Drone

  • Weijia Tao
  • Karishma Patnaik
  • Fuchen Chen
  • Yogesh Kumar
  • Wenlong Zhang

Flying robots can exploit perching abilities to position themselves on strategically-chosen locations and monitor the areas of interest from a critical vantage point. Moreover, they can significantly extend their battery life by turning off the propulsion systems when carrying out a surveillance mission. However, unknown disturbances arise from the physical interactions between the robot and the object, making it challenging to stabilize the robot during perching. In this paper, we present a Whole-body Grasping and Perching (WHOPPEr) Drone, which is capable of fast and robust perching by utilizing its entire body as the grasper in lieu of an add-on grasper. We first present the design concept, parameter selection and characterization of the novel whole-body grasping drone. Next, we analyze the grasping ability of the morphing chassis and present an aerodynamic analysis for the effect of motor thrust on the compliant arm. We finally demonstrate, via real-time experiments, the performance of WHOPPEr in autonomous perching and payload delivery tasks.

NeurIPS Conference 2022 Conference Paper

Deconfounded Representation Similarity for Comparison of Neural Networks

  • Tianyu Cui
  • Yogesh Kumar
  • Pekka Marttinen
  • Samuel Kaski

Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to understand neural networks by comparing their layer-wise representations. However, these metrics are confounded by the population structure of data items in the input space, leading to inconsistent conclusions about the \emph{functional} similarity between neural networks, such as spuriously high similarity of completely random neural networks and inconsistent domain relations in transfer learning. We introduce a simple and generally applicable fix to adjust for the confounder with covariate adjustment regression, which improves the ability of CKA and RSA to reveal functional similarity and also retains the intuitive invariance properties of the original similarity measures. We show that deconfounding the similarity metrics increases the resolution of detecting functionally similar neural networks across domains. Moreover, in real-world applications, deconfounding improves the consistency between CKA and domain similarity in transfer learning, and increases the correlation between CKA and model out-of-distribution accuracy similarity.