Arrow Research search

Author name cluster

Pushpak Bhattacharyya

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

ECAI Conference 2025 Conference Paper

Improving Text Style Transfer Using Masked Diffusion Language Models with Inference-Time Scaling

  • Tejomay Kishor Padole
  • Suyash P. Awate
  • Pushpak Bhattacharyya

Masked diffusion language models (MDMs) have recently gained traction as a viable generative framework for natural language. This can be attributed to its scalability and ease of training compared to other diffusion model paradigms for discrete data, establishing itself as the state-of-the-art non-autoregressive generator for discrete data. Diffusion models, in general, have shown excellent ability to improve the generation quality by leveraging inference-time scaling either by increasing the number of denoising steps or by using external verifiers on top of the outputs of each step to guide the generation. In this work, we propose a verifier-based inference-time scaling method that aids in finding a better candidate generation during the denoising process of the MDM. Our experiments demonstrate the application of MDMs for standard text-style transfer tasks and establish MDMs as a better alternative to autoregressive language models. Additionally, we show that a simple soft-value-based verifier setup for MDMs using off-the-shelf pre-trained embedding models leads to significant gains in generation quality even when used on top of typical classifier-free guidance setups in the existing literature.

NeurIPS Conference 2024 Conference Paper

Beyond Aesthetics: Cultural Competence in Text-to-Image Models

  • Nithish Kannen
  • Arif Ahmad
  • Marco Andreetto
  • Vinodkumar Prabhakaran
  • Utsav Prabhu
  • Adji B. Dieng
  • Pushpak Bhattacharyya
  • Shachi Dave

Text-to-Image (T2I) models are being increasingly adopted in diverse global communities where they create visual representations of their unique cultures. Current T2I benchmarks primarily focus on faithfulness, aesthetics, and realism of generated images, overlooking the critical dimension of cultural competence. In this work, we introduce a framework to evaluate cultural competence of T2I models along two crucial dimensions: cultural awareness and cultural diversity, and present a scalable approach using a combination of structured knowledge bases and large language models to build a large dataset of cultural artifacts to enable this evaluation. In particular, we apply this approach to build CUBE (CUltural BEnchmark for Text-to-Image models), a first-of-its-kind benchmark to evaluate cultural competence of T2I models. CUBE covers cultural artifacts associated with 8 countries across different geo-cultural regions and along 3 concepts: cuisine, landmarks, and art. CUBE consists of 1) CUBE-1K, a set of high-quality prompts that enable the evaluation of cultural awareness, and 2) CUBE-CSpace, a larger dataset of cultural artifacts that serves as grounding to evaluate cultural diversity. We also introduce cultural diversity as a novel T2I evaluation component, leveraging quality-weighted Vendi score. Our evaluations reveal significant gaps in the cultural awareness of existing models across countries and provide valuable insights into the cultural diversity of T2I outputs for underspecified prompts. Our methodology is extendable to other cultural regions and concepts and can facilitate the development of T2I models that better cater to the global population.

AAAI Conference 2024 Conference Paper

IndicCONAN: A Multilingual Dataset for Combating Hate Speech in Indian Context

  • Nihar Ranja Sahoo
  • Gyana Prakash Beria
  • Pushpak Bhattacharyya

Hate speech (HS) is a growing concern in many parts of the world, including India, where it has led to numerous instances of violence and discrimination. The development of effective counter-narratives (CNs) is a critical step in combating hate speech, but there is a lack of research in this area, especially in non-English languages. In this paper, we introduce a new dataset, IndicCONAN, of counter-narratives against hate speech in Hindi and Indian English. We propose a scalable human-in-the-loop approach for generating counter-narratives by an auto-regressive language model through machine generation - human correction cycle, where the model uses augmented data from previous cycles to generate new training samples. These newly generated samples are then reviewed and edited by annotators, leading to further model refnement. The dataset consists of over 2,500 exam- ˜ ples of counter-narratives each in both English and Hindi corresponding to various hate speeches in the Indian context. We also present a framework for generating CNs conditioned on specifc CN type with a mean perplexity of 3.85 for English and 3.70 for Hindi, a mean toxicity score of 0.04 for English and 0.06 for Hindi, and a mean diversity of 0.08 for English and 0.14 for Hindi. Our dataset and framework provide valuable resources for researchers and practitioners working to combat hate speech in the Indian context.

AAAI Conference 2024 Conference Paper

Well, Now We Know! Unveiling Sarcasm: Initiating and Exploring Multimodal Conversations with Reasoning

  • Gopendra Vikram Singh
  • Mauajama Firdaus
  • Dushyant Singh Chauhan
  • Asif Ekbal
  • Pushpak Bhattacharyya

Sarcasm is a widespread linguistic phenomenon that poses a considerable challenge to explain due to its subjective nature, absence of contextual cues, and rooted personal perspectives. Even though the identification of sarcasm has been extensively studied in dialogue analysis, merely detecting sarcasm falls short of enabling conversational systems to genuinely comprehend the underlying meaning of a conversation and generate fitting responses. It is imperative to not only detect sarcasm but also pinpoint its origination and the rationale behind the sarcastic expressions to capture its authentic essence. In this paper, we delve into the discourse structure of conversations infused with sarcasm and introduce a novel task - Sarcasm Initiation and Reasoning in Conversations (SIRC). Embedded in a multimodal environment and involving a combination of both English and code-mixed interactions, the objective of the task is to discern the trigger or starting point of sarcasm. Additionally, the task involves producing a natural language explanation that rationalizes the satirical dialogues. To this end, we introduce Sarcasm Initiation and Reasoning Dataset (SIRD) to facilitate our task and provide sarcasm initiation annotations and reasoning. We develop a comprehensive model named Sarcasm Initiation and Reasoning Generation (SIRG), which is designed to encompass textual, audio, and visual representations. To achieve this, we introduce a unique shared fusion method that employs cross-attention mechanisms to seamlessly integrate these diverse modalities. Our experimental outcomes, conducted on the SIRC dataset, demonstrate that our proposed framework establishes a new benchmark for both sarcasm initiation and its reasoning generation in the context of multimodal conversations. The code and dataset can be accessed from https://www.iitp.ac.in/∼ai-nlp-ml resources.html#sarcasm-explain and https://github.com/GussailRaat/SIRG-Sarcasm-Initiation-and-Reasoning-Generation.

ECAI Conference 2023 Conference Paper

Local context is not enough! Towards Query Semantic and Knowledge Guided Multi-Span Medical Question Answering

  • Abhisek Tiwari
  • Aman Bhansali
  • Sriparna Saha 0001
  • Pushpak Bhattacharyya
  • Preeti Verma
  • Minakshi Dhar

Medical Question Answering (MedQA) is one of the most popular and significant tasks in developing healthcare assistants. When humans extract an answer to a question from a document, they first (a) understand the question itself in detail and (b) utilize relevant knowledge/experiences to determine the answer segments. In multi-span question answering, it becomes increasingly important to comprehend the query accurately and possess relevant knowledge, as the interrelationship among different answer segments is essential for achieving completeness. Motivated by this, we first propose a transformer-based query semantic and knowledge (QueSemKnow) guided multi-span question-answering model. The proposed QueSemKnow works in a two-phased manner; in the first stage, a multi-task model is proposed to extract query semantics: (i) intent identification and (ii) question type prediction. In the second stage, QueSemKnow selects a relevant subset of the knowledge graph as the underlying context/document and extracts answers depending on the semantic information extracted from the first stage and context. We build a multi-task query semantic extraction model for query intent and query type identification to investigate the co-relation among these tasks. Furthermore, we created a semantically aware medical question-answering corpus named QueSeMSpan MedQA wherein each question is annotated with its corresponding semantic information. The proposed model outperforms several baselines and existing state-of-the-art models by a large margin on multiple datasets, which firmly demonstrates the effectiveness of the human-inspired multi-span question-answering methodology.

IJCAI Conference 2022 Conference Paper

Am I No Good? Towards Detecting Perceived Burdensomeness and Thwarted Belongingness from Suicide Notes

  • Soumitra Ghosh
  • Asif Ekbal
  • Pushpak Bhattacharyya

The World Health Organization (WHO) has emphasized the importance of significantly accelerating suicide prevention efforts to fulfill the United Nations' Sustainable Development Goal (SDG) objective of 2030. In this paper, we present an end-to-end multitask system to address a novel task of detection of two interpersonal risk factors of suicide, Perceived Burdensomeness (PB) and Thwarted Belongingness (TB) from suicide notes. We also introduce a manually translated code-mixed suicide notes corpus, CoMCEASE-v2. 0, based on the benchmark CEASE-v2. 0 dataset, annotated with temporal orientation, PB and TB labels. We exploit the temporal orientation and emotion information in the suicide notes to boost overall performance. For comprehensive evaluation of our proposed method, we compare it to several state-of-the-art approaches on the existing CEASE-v2. 0 dataset and the newly announced CoMCEASE-v2. 0 dataset. Empirical evaluation suggests that temporal and emotional information can substantially improve the detection of PB and TB.

AAAI Conference 2021 Conference Paper

More the Merrier: Towards Multi-Emotion and Intensity Controllable Response Generation

  • Mauajama Firdaus
  • Hardik Chauhan
  • Asif Ekbal
  • Pushpak Bhattacharyya

The focus on conversational systems has recently shifted towards creating engaging agents by embedding emotions into them. Human emotions are highly complex as humans can express multiple emotions with varying intensity in a single utterance, whereas the conversational agents convey only one emotion in their responses. To infuse human-like behaviour in the agents, we introduce the task of multi-emotion controllable response generation with the ability to express different emotions with varying levels of intensity in an open-domain dialogue system. We introduce a Multiple Emotion Intensity aware Multi-party Dialogue (MEIMD) dataset having 34k conversations taken from 8 different TV Series. We propose a Multiple Emotion with Intensity-based Dialogue Generation (MEI-DG) framework. The system employs two novel mechanisms: (i). Explicit Memory: to determine whether to generate an emotion or generic word, while focusing on the intensity of the desired emotions; and (ii). Implicit Memory: to compute the number of words remaining to express the emotion completely, thereby regulating the generation accordingly. The detailed evaluation shows that our proposed approach attains superior performance compared to the baseline models.

IS Journal 2019 Journal Article

Figure Summarization: A Multiobjective Optimization-Based Approach

  • Naveen Saini
  • Sriparna Saha
  • Vedavikas Potnuru
  • Rahul Grover
  • Pushpak Bhattacharyya

In the biomedical domain, figures in the scientific articles attribute significantly in understanding the core concepts. However, these figures are always difficult to interpret by the humans as well as machines and, thus, associated texts in the article are required to summarize the figures. This article proposes an unsupervised automatic summarization system for individual figures present in a scientific biomedical article, where different quality measures capturing relevance of the sentences to the figure are simultaneously optimized using the search capability of a multiobjective optimization technique to obtain a good set of sentences in the summary. A newly designed self-organizing map based genetic operator helping in new solution generation is also introduced in the multiobjective optimization framework. For evaluation of the proposed technique, 94 and 81 figures over two datasets from the biomedical literature are used. Our proposed system, namely MOOFigSum, obtains 5% and 11% improvements in terms of F1-measure metric over the unsupervised technique for both datasets, respectively, while in comparison to supervised techniques, MOOFigSum obtains 9% and 2% improvements over these datasets, respectively.

AAAI Conference 2017 System Paper

Sarcasm Suite: A Browser-Based Engine for Sarcasm Detection and Generation

  • Aditya Joshi
  • Diptesh Kanojia
  • Pushpak Bhattacharyya
  • Mark Carman

Sarcasm Suite is a browser-based engine that deploys five of our past papers in sarcasm detection and generation. The sarcasm detection modules use four kinds of incongruity: sentiment incongruity, semantic incongruity, historical context incongruity and conversational context incongruity. The sarcasm generation module is a chatbot that responds sarcastically to user input. With a visually appealing interface that indicates predictions using ‘faces’ of our co-authors from our past papers, Sarcasm Suite is our first demonstration of our work in computational sarcasm.

AAAI Conference 2017 Conference Paper

Scanpath Complexity: Modeling Reading Effort Using Gaze Information

  • Abhijit Mishra
  • Diptesh Kanojia
  • Seema Nagar
  • Kuntal Dey
  • Pushpak Bhattacharyya

Measuring reading effort is useful for practical purposes such as designing learning material and personalizing text comprehension environment. We propose a quantification of reading effort by measuring the complexity of eye-movement patterns of readers. We call the measure Scanpath Complexity. Scanpath complexity is modeled as a function of various properties of gaze fixations and saccades- the basic parameters of eye movement behavior. We demonstrate the effectiveness of our scanpath complexity measure by showing that its correlation with different measures of lexical and syntactic complexity as well as standard readability metrics is better than popular baseline measures based on fixation alone.

AAAI Conference 2016 Conference Paper

Predicting Readers’ Sarcasm Understandability by Modeling Gaze Behavior

  • Abhijit Mishra
  • Diptesh Kanojia
  • Pushpak Bhattacharyya

Sarcasm understandability or the ability to understand textual sarcasm depends upon readers’ language proficiency, social knowledge, mental state and attentiveness. We introduce a novel method to predict the sarcasm understandability of a reader. Presence of incongruity in textual sarcasm often elicits distinctive eye-movement behavior by human readers. By recording and analyzing the eye-gaze data, we show that eyemovement patterns vary when sarcasm is understood vis-à-vis when it is not. Motivated by our observations, we propose a system for sarcasm understandability prediction using supervised machine learning. Our system relies on readers’ eyemovement parameters and a few textual features, thence, is able to predict sarcasm understandability with an F-score of 93%, which demonstrates its efficacy. The availability of inexpensive embedded-eye-trackers on mobile devices creates avenues for applying such research which benefits web-content creators, review writers and social media analysts alike.

AAAI Conference 2016 Conference Paper

WWDS APIs: Application Programming Interfaces for Efficient Manipulation of World WordNet Database Structure

  • Hanumant Redkar
  • Sudha Bhingardive
  • Kevin Patel
  • Pushpak Bhattacharyya
  • Neha Prabhugaonkar
  • Apurva Nagvenkar
  • Ramdas Karmali

WordNets are useful resources for natu processing. Various WordNets for different l been developed by different groups. Re WordNet Database Structure (WWDS) was Redkar et. al (2015) as a common platform different WordNets. However, it is underutiliz of programming interface. In this paper, we p APIs, which are designed to address this short WWDS APIs, in conjunction with WWDS, ac that enables developers to utilize Word worrying about the underlying storage struct are developed in PHP, Java, and Python, a preferred programming languages of most d researchers working in language technologie can help in various applications like machi word sense disambiguation, multilingual retrieval, etc.

AAAI Conference 2015 Conference Paper

Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser

  • Devendra Chaplot
  • Pushpak Bhattacharyya
  • Ashwin Paranjape

Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.

AAAI Conference 2015 Conference Paper

World WordNet Database Structure: An Efficient Schema for Storing Information of WordNets of the World

  • Hanumant Redkar
  • Sudha Bhingardive
  • Diptesh Kanojia
  • Pushpak Bhattacharyya

WordNet is an online lexical resource which expresses unique concepts in a language. English WordNet is the first WordNet which was developed at Princeton University. Over a period of time, many language WordNets were developed by various organizations all over the world. It has always been a challenge to store the WordNet data. Some WordNets are stored using file system and some WordNets are stored using different database models. In this paper, we present the World WordNet Database Structure which can be used to efficiently store the WordNet information of all languages of the World. This design can be adapted by most language WordNets to store information such as synset data, semantic and lexical relations, ontology details, language specific features, linguistic information, etc. An attempt is made to develop Application Programming Interfaces to manipulate the data from these databases. This database structure can help in various Natural Language Processing applications like Multilingual Information Retrieval, Word Sense Disambiguation, Machine Translation, etc.

AAAI Conference 2010 Conference Paper

PR + RQ ≈ PQ: Transliteration Mining Using Bridge Language

  • Mitesh Khapra
  • Raghavendra Udupa
  • A. Kumaran
  • Pushpak Bhattacharyya

We address the problem of mining name transliterations from comparable corpora in languages P and Q in the following resource-poor scenario: • Parallel names in PQ are not available for training. • Parallel names in PR and RQ are available for training. We propose a novel solution for the problem by computing a common geometric feature space for P, Q and R where name transliterations are mapped to similar vectors. We employ Canonical Correlation Analysis (CCA) to compute the common geometric feature space using only parallel names in PR and RQ and without requiring parallel names in PQ. We test our algorithm on data sets in several languages and show that it gives results comparable to the state-of-the-art transliteration mining algorithms that use parallel names in PQ for training.

IJCAI Conference 2009 Conference Paper

  • Kamaljeet S. Verma
  • Pushpak Bhattacharyya

We propose a novel approach to context sensitive semantic smoothing by making use of an intermediate, ”semantically light” representation for sentences, called Semantically Relatable Sequences (SRS). SRSs of a sentence are tuples of words appearing in the semantic graph of the sentence as linked nodes depicting dependency relations. In contrast to patterns based on consecutive words, SRSs make use of groupings of non-consecutive but semantically related words. Our experiments on TREC AP89 collection show that the mixture model of SRS translation model and Two Stage Language Model (TSLM) of Lafferty and Zhai achieves MAP scores better than the mixture model of MultiWord Expression (MWE) translation model and TSLM. Furthermore, a system, which for each test query selects either the SRS or the MWE mixture model based on better query MAP score, shows significant improvements over the individual mixture models.

IJCAI Conference 2007 Conference Paper

  • Srinivas Medimi
  • Pushpak Bhattacharyya

In this paper we revisit the classical NLP problem of prepositional phrase attachment (PPattachment). Given the pattern V −NP1−P−NP2 in the text, where V is verb, NP1 is a noun phrase, P is the preposition and NP2 is the other noun phrase, the question asked is where does P − NP2 attach: V or NP1? This question is typically answered using both the word and the world knowledge. Word Sense Disambiguation (WSD) and Data Sparsity Reduction (DSR) are the two requirements for PP-attachment resolution. Our approach described in this paper makes use of training data extracted from raw text, which makes it an unsupervised approach. The unambiguous V −P −N and N1 −P −N2 tuples of the training corpus TEACH the system how to resolve the attachments in the ambiguous V − N1 − P − N2 tuples of the test corpus. A graph based approach to word sense disambiguation (WSD) is used to obtain the accurate word knowledge. Further, the data sparsity problem is addressed by (i) detecting synonymy using the wordnet and (ii) doing a form of inferencing based on the matching of V s and Ns in the unambiguous patterns of V −P −NP, NP1−P −NP2. For experimentation, Brown Corpus provides the training data andWall Street Journal Corpus the test data. The accuracy obtained for PP-attachment resolution is close to 85%. The novelty of the system lies in the flexible use of WSD and DSR phases.