Arrow Research search

Author name cluster

Pratyush Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

AAAI Conference 2023 Conference Paper

IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian Languages

  • Tahir Javed
  • Kaushal Bhogale
  • Abhigyan Raman
  • Pratyush Kumar
  • Anoop Kunchukuttan
  • Mitesh M. Khapra

A cornerstone in AI research has been the creation and adoption of standardized training and test datasets to earmark the progress of state-of-the-art models. A particularly successful example is the GLUE dataset for training and evaluating Natural Language Understanding (NLU) models for English. The large body of research around self-supervised BERT-based language models revolved around performance improvements on NLU tasks in GLUE. To evaluate language models in other languages, several language-specific GLUE datasets were created. The area of speech language understanding (SLU) has followed a similar trajectory. The success of large self-supervised models such as wav2vec2 enable creation of speech models with relatively easy to access unlabelled data. These models can then be evaluated on SLU tasks, such as the SUPERB benchmark. In this work, we extend this to Indic languages by releasing the IndicSUPERB benchmark. Specifically, we make the following three contributions. (i) We collect Kathbath containing 1,684 hours of labelled speech data across 12 Indian languages from 1,218 contributors located in 203 districts in India. (ii) Using Kathbath, we create benchmarks across 6 speech tasks: Automatic Speech Recognition, Speaker Verification, Speaker Identification (mono/multi), Language Identification, Query By Example, and Keyword Spotting for 12 languages. (iii) On the released benchmarks, we train and evaluate different self-supervised models alongside the a commonly used baseline FBANK. We show that language-specific fine-tuned models are more accurate than baseline on most of the tasks, including a large gap of 76% for Language Identification task. However, for speaker identification, self-supervised models trained on large datasets demonstrate an advantage. We hope IndicSUPERB contributes to the progress of developing speech language understanding models for Indian languages.

TMLR Journal 2023 Journal Article

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

  • Jay Gala
  • Pranjal A Chitale
  • A K Raghavan
  • Varun Gumma
  • Sumanth Doddapaneni
  • Aswanth Kumar M
  • Janki Atul Nawale
  • Anupama Sujatha

India has a rich linguistic landscape, with languages from 4 major language families spoken by over a billion people. 22 of these languages listed in the Constitution of India (referred to as scheduled languages) are the focus of this work. Given the linguistic diversity, high-quality and accessible Machine Translation (MT) systems are essential in a country like India. Before this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models that support all 22 scheduled languages of India. In this work, we aim to address this gap by focusing on the missing pieces required for enabling wide, easy, and open access to good machine translation systems for all 22 scheduled Indian languages. We identify four key areas of improvement: curating and creating larger training datasets, creating diverse and high-quality benchmarks, training multilingual models, and releasing models with open access. Our first contribution is the release of the Bharat Parallel Corpus Collection (BPCC), the largest publicly available parallel corpora for Indic languages. BPCC contains a total of 230M bitext pairs, of which a total of 126M were newly added, including 644K manually translated sentence pairs created as part of this work. Our second contribution is the release of the first $n$-way parallel benchmark covering all 22 Indian languages, featuring diverse domains, Indian-origin content, and conversational test sets. Next, we present IndicTrans2, the first translation model to support all 22 languages, surpassing existing models in performance on multiple existing and new benchmarks created as a part of this work. Lastly, to promote accessibility and collaboration, we release our models and associated data with permissive licenses at https://github.com/AI4Bharat/IndicTrans2.

NeurIPS Conference 2022 Conference Paper

Addressing Resource Scarcity across Sign Languages with Multilingual Pretraining and Unified-Vocabulary Datasets

  • Gokul NC
  • Manideep Ladi
  • Sumit Negi
  • Prem Selvaraj
  • Pratyush Kumar
  • Mitesh Khapra

There are over 300 sign languages in the world, many of which have very limited or no labelled sign-to-text datasets. To address low-resource data scenarios, self-supervised pretraining and multilingual finetuning have been shown to be effective in natural language and speech processing. In this work, we apply these ideas to sign language recognition. We make three contributions. - First, we release SignCorpus, a large pretraining dataset on sign languages comprising about 4. 6K hours of signing data across 10 sign languages. SignCorpus is curated from sign language videos on the internet, filtered for data quality, and converted into sequences of pose keypoints thereby removing all personal identifiable information (PII). - Second, we release Sign2Vec, a graph-based model with 5. 2M parameters that is pretrained on SignCorpus. We envisage Sign2Vec as a multilingual large-scale pretrained model which can be fine-tuned for various sign recognition tasks across languages. - Third, we create MultiSign-ISLR -- a multilingual and label-aligned dataset of sequences of pose keypoints from 11 labelled datasets across 7 sign languages, and MultiSign-FS -- a new finger-spelling training and test set across 7 languages. On these datasets, we fine-tune Sign2Vec to create multilingual isolated sign recognition models. With experiments on multiple benchmarks, we show that pretraining and multilingual transfer are effective giving significant gains over state-of-the-art results. All datasets, models, and code has been made open-source via the OpenHands toolkit.

AAAI Conference 2022 Conference Paper

Towards Building ASR Systems for the Next Billion Users

  • Tahir Javed
  • Sumanth Doddapaneni
  • Abhigyan Raman
  • Kaushal Santosh Bhogale
  • Gowtham Ramesh
  • Anoop Kunchukuttan
  • Pratyush Kumar
  • Mitesh M. Khapra

Recent methods in speech and language technology pretrain very large models which are fine-tuned for specific tasks. However, the benefits of such large models are often limited to a few resource rich languages of the world. In this work, we make multiple contributions towards building ASR systems for low resource languages from the Indian subcontinent. First, we curate 17, 000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance. Second, using this raw speech data we pretrain several variants of wav2vec style models for 40 Indian languages. Third, we analyze the pretrained models to find key features: codebook vectors of similar sounding phonemes are shared across languages, representations across layers are discriminative of the language family, and attention heads often pay attention within small local windows. Fourth, we fine-tune this model for downstream ASR for 9 languages and obtain state-of-the-art results on 3 public datasets, including on very low-resource languages such as Sinhala and Nepali. Our work establishes that multilingual pretraining is an effective strategy for building ASR systems for the linguistically diverse speakers of the Indian subcontinent.

AAAI Conference 2021 Conference Paper

A Systematic Evaluation of Object Detection Networks for Scientific Plots

  • Pritha Ganguly
  • Nitesh S Methani
  • Mitesh M. Khapra
  • Pratyush Kumar

Are existing object detection methods adequate for detecting text and visual elements in scientific plots which are arguably different than the objects found in natural images? To answer this question, we train and compare the accuracy of Fast/Faster R-CNN, SSD, YOLO and RetinaNet on the PlotQA dataset with over 220, 000 scientific plots. At the standard IOU setting of 0. 5, most networks perform well with mAP scores greater than 80% in detecting the relatively simple objects in plots. However, the performance drops drastically when evaluated at a stricter IOU of 0. 9 with the best model giving a mAP of 35. 70%. Note that such a stricter evaluation is essential when dealing with scientific plots where even minor localisation errors can lead to large errors in downstream numerical inferences. Given this poor performance, we propose minor modifications to existing models by combining ideas from different object detection networks. While this significantly improves the performance, there are still two main issues: (i) performance on text objects which are essential for reasoning is very poor, and (ii) inference time is unacceptably large considering the simplicity of plots. To solve this open problem, we make a series of contributions: (a) an efficient region proposal method based on Laplacian edge detectors, (b) a feature representation of region proposals that includes neighbouring information, (c) a linking component to join multiple region proposals for detecting longer textual objects, and (d) a custom loss function that combines a smooth `1-loss with an IOU-based loss. Combining these ideas, our final model is very accurate at extreme IOU values achieving a mAP of 93. 44%@0. 9 IOU. Simultaneously, our model is very efficient with an inference time 16x lesser than the current models, including one-stage detectors. Our model also achieves a high accuracy on an extrinsic plot-to-table conversion task with an F1 score of 0. 77. With these contributions, we make a definitive progress in object detection for plots and enable further exploration on automated reasoning of plots.

AAAI Conference 2021 Conference Paper

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT

  • Madhura Pande
  • Aakriti Budhraja
  • Preksha Nema
  • Pratyush Kumar
  • Mitesh M. Khapra

Multi-headed attention heads are a mainstay in transformerbased models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. These roles include syntactic (tokens with some syntactic relation), local (nearby tokens), block (tokens in the same sentence) and delimiter (the special [CLS], [SEP] tokens). There are two main challenges with existing methods for classification: (a) there are no standard scores across studies or across functional roles, and (b) these scores are often average quantities measured across sentences without capturing statistical significance. In this work, we formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference. This provides us the right lens to systematically analyze attention heads and confidently comment on many commonly posed questions on analyzing the BERT model. In particular, we comment on the co-location of multiple functional roles in the same attention head, the distribution of attention heads across layers, and effect of fine-tuning for specific NLP tasks on these functional roles. Code is made publicly available at https: //github. com/iitmnlp/heads-hypothesis

AAAI Conference 2016 Conference Paper

An Axiomatic Framework for Ex-Ante Dynamic Pricing Mechanisms in Smart Grid

  • Sambaran Bandyopadhyay
  • Ramasuri Narayanam
  • Pratyush Kumar
  • Sarvapali Ramchurn
  • Vijay Arya
  • Iskandarbin Petra

In electricity markets, the choice of the right pricing regime is crucial for the utilities because the price they charge to their consumers, in anticipation of their demand in real-time, is a key determinant of their profits and ultimately their survival in competitive energy markets. Among the existing pricing regimes, in this paper, we consider ex-ante dynamic pricing schemes as (i) they help to address the peak demand problem (a crucial problem in smart grids), and (ii) they are transparent and fair to consumers as the cost of electricity can be calculated before the actual consumption. In particular, we propose an axiomatic framework that establishes the conceptual underpinnings of the class of ex-ante dynamic pricing schemes. We first propose five key axioms that reflect the criteria that are vital for energy utilities and their relationship with consumers. We then prove an impossibility theorem to show that there is no pricing regime that satisfies all the five axioms simultaneously. We also study multiple cost functions arising from various pricing regimes to examine the subset of axioms that they satisfy. We believe that our proposed framework in this paper is first of its kind to evaluate the class of ex-ante dynamic pricing schemes in a manner that can be operationalised by energy utilities.

ICAPS Conference 2016 Conference Paper

Planning Curtailment of Renewable Generation in Power Grids

  • Sambaran Bandyopadhyay
  • Pratyush Kumar
  • Vijay Arya

The increasing penetration of renewable sources like solar energy add new dimensions in planning power grid operations. We study the problem of curtailing a subset of prosumers generating solar power with the twin goals of being close to a target collection and maintaining fairness across prosumers. The problem is complicated by the uncertainty in the amount of energy fed-in by each prosumer and the large problem size in terms of number of prosumers. To meet these challenges, we propose an algorithm based on the Combinatorial Multi-Armed Bandit problem with an approximate Knapsack based oracle. With real-data on solar panel output across multiple prosumers, we are able to demonstrate the effectiveness of the proposed algorithm.