Author name cluster

Uttaran Bhattacharya

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

AAAI Conference 2024 Conference Paper

DanceAnyWay: Synthesizing Beat-Guided 3D Dances with Randomized Temporal Contrastive Learning

Aneesh Bhattacharya
Manas Paranjape
Uttaran Bhattacharya
Aniket Bera

We present DanceAnyWay, a generative learning method to synthesize beat-guided dances of 3D human characters synchronized with music. Our method learns to disentangle the dance movements at the beat frames from the dance movements at all the remaining frames by operating at two hierarchical levels. At the coarser "beat" level, it encodes the rhythm, pitch, and melody information of the input music via dedicated feature representations only at the beat frames. It leverages them to synthesize the beat poses of the target dances using a sequence-to-sequence learning framework. At the finer "repletion" level, our method encodes similar rhythm, pitch, and melody information from all the frames of the input music via dedicated feature representations. It generates the full dance sequences by combining the synthesized beat and repletion poses and enforcing plausibility through an adversarial learning framework. Our training paradigm also enforces fine-grained diversity in the synthesized dances through a randomized temporal contrastive loss, which ensures different segments of the dance sequences have different movements and avoids motion freezing or collapsing to repetitive movements. We evaluate the performance of our approach through extensive experiments on the benchmark AIST++ dataset and observe improvements of about 7%-12% in motion quality metrics and 1.5%-4% in motion diversity metrics over the current baselines, respectively. We also conducted a user study to evaluate the visual quality of our synthesized dances. We noted that, on average, the samples generated by our method were about 9-48% more preferred by the participants and had a 4-27% better five-point Likert-scale score over the best available current baseline in terms of motion quality and synchronization. Our source code and project page are available at https://github.com/aneeshbhattacharya/DanceAnyWay.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

Ashmit Khandelwal
Aditya Agrawal
Aanisha Bhattacharyya
Yaman Kumar Singla
Somesh Singh 0003
Uttaran Bhattacharya
Ishita Dasgupta 0002
Stefano Petrangeli

Shannon and Weaver's seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Large Language Models (LLMs), with their wide generalizability, make some progress towards the second level. However, LLMs and other communication models are not conventionally designed for predicting and optimizing communication for desired receiver behaviors and intents. As a result, the effectiveness level remains largely untouched by modern communication systems. In this paper, we introduce the receivers' "behavior tokens," such as shares, likes, clicks, purchases, and retweets, in the LLM's training corpora to optimize content for the receivers and predict their behaviors. Other than showing similar performance to LLMs on content understanding tasks, our trained models show generalization capabilities on the behavior dimension for behavior simulation, content simulation, behavior understanding, and behavior domain adaptation. We show results on all these capabilities using a wide range of tasks on three corpora. We call these models Large Content and Behavior Models (LCBMs). Further, to spur more research on LCBMs, we release our new Content Behavior Corpus (CBC), a repository containing communicator, message, and corresponding receiver behavior (https://behavior-in-the-wild.github.io/LCBM).

Details

AAAI Conference 2022 Conference Paper

Learning Unseen Emotions from Gestures via Semantically-Conditioned Zero-Shot Perception with Adversarial Autoencoders

Abhishek Banerjee
Uttaran Bhattacharya
Aniket Bera

We present a novel generalized zero-shot algorithm to recognize perceived emotions from gestures. Our task is to map gestures to novel emotion categories not encountered in training. We introduce an adversarial autoencoder-based representation learning that correlates 3D motion-captured gesture sequences with the vectorized representation of the naturallanguage perceived emotion terms using word2vec embeddings. The language-semantic embedding provides a representation of the emotion label space, and we leverage this underlying distribution to map the gesture sequences to the appropriate categorical emotion labels. We train our method using a combination of gestures annotated with known emotion terms and gestures not annotated with any emotions. We evaluate our method on the MPI Emotional Body Expressions Database (EBEDB) and obtain an accuracy of 58. 43%. We see an improvement in performance compared to current state-of-the-art algorithms for generalized zero-shot learning by an absolute 25–27%. We also demonstrate our approach on publicly available online videos and movie scenes, where the actors’ pose has been extracted and mapped to their respective emotive states.

PDF Details

IROS Conference 2020 Conference Paper

CMetric: A Driving Behavior Measure using Centrality Functions

Rohan Chandra
Uttaran Bhattacharya
Trisha Mittal
Aniket Bera
Dinesh Manocha

We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. Our approach is designed for realtime autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video. We compute a dynamic geometric graph (DGG) based on the positions and proximity of the road-agents and centrality functions corresponding to closeness and degree. These functions are used to compute the CMetric based on style likelihood and style intensity estimates. Our approach is general and makes no assumption about traffic density, heterogeneity, or how driving behaviors change over time. We present an algorithm to compute CMetric and demonstrate its performance on real-world traffic datasets. To test the accuracy of CMetric, we introduce a new evaluation protocol (called "Time Deviation Error") that measures the difference between human prediction and the prediction made by CMetric.

Details

ICRA Conference 2020 Conference Paper

GraphRQI: Classifying Driver Behaviors Using Graph Spectrums

Rohan Chandra
Uttaran Bhattacharya
Trisha Mittal
Xiaoyu Li
Aniket Bera
Dinesh Manocha

We present a novel algorithm (GraphRQI) to identify driver behaviors from road-agent trajectories. Our approach assumes that the road-agents exhibit a range of driving traits, such as aggressive or conservative driving. Moreover, these traits affect the trajectories of nearby road-agents as well as the interactions between road-agents. We represent these inter-agent interactions using unweighted and undirected traffic graphs. Our algorithm classifies the driver behavior using a supervised learning algorithm by reducing the computation to the spectral analysis of the traffic graph. Moreover, we present a novel eigenvalue algorithm to compute the spectrum efficiently. We provide theoretical guarantees for the running time complexity of our eigenvalue algorithm and show that it is faster than previous methods by 2 times. We evaluate the classification accuracy of our approach on traffic videos and autonomous driving datasets corresponding to urban traffic. In practice, GraphRQI achieves an accuracy improvement of up to 25% over prior driver behavior classification algorithms. We also use our classification algorithm to predict the future trajectories of road-agents.

Details

AAAI Conference 2020 Conference Paper

M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues

Trisha Mittal
Uttaran Bhattacharya
Rohan Chandra
Aniket Bera
Dinesh Manocha

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a persample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efﬁciency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82. 7% on IEMOCAP and 89. 0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

PDF Details

ICRA Conference 2020 Conference Paper

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

Rohan Chandra
Uttaran Bhattacharya
Tanmay Randhavane
Aniket Bera
Dinesh Manocha

We present a realtime tracking algorithm, Road-Track, to track heterogeneous road-agents in dense traffic videos. Our approach is designed for dense traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road. We use the tracking-by-detection approach where we track a road-agent by matching the appearance or bounding box region in the current frame with the predicted bounding box region propagated from the previous frame. Roadtrack uses a novel motion model called the Simultaneous Collision Avoidance and Interaction (SimCAI) model to predict the motion of road-agents by modeling collision avoidance and interactions between the road-agents for the next frame. We demonstrate the advantage of RoadTrack on a dataset of dense traffic videos and observe an accuracy of 75. 8% on this dataset, outperforming prior state-of-the-art tracking algorithms by at least 5. 2%. RoadTrack operates in realtime at approximately 30 fps and is at least 4× faster than prior tracking algorithms on standard tracking datasets.

Details

AAAI Conference 2020 Conference Paper

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

Uttaran Bhattacharya
Trisha Mittal
Rohan Chandra
Tanmay Randhavane
Aniket Bera
Dinesh Manocha

We present a novel classiﬁer network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel pushpull regularization loss in the CVAE formulation of STEP- Gen to generate realistic gaits and improve the classiﬁcation accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4, 227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classiﬁcation accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.

PDF Details

IROS Conference 2019 Conference Paper

DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

Rohan Chandra
Uttaran Bhattacharya
Aniket Bera
Dinesh Manocha

We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (>2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4. 5 × faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2. 6% on the absolute scale on average.

Details