Arrow Research search

Author name cluster

Niket Tandon

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2025 Conference Paper

Calibrating Large Language Models with Sample Consistency

  • Qing Lyu
  • Kumar Shridhar
  • Chaitanya Malaviya
  • Li Zhang
  • Yanai Elazar
  • Niket Tandon
  • Marianna Apidianaki
  • Mrinmaya Sachan

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we derive model confidence from the distribution of multiple randomly sampled generations, using three measures of consistency. We extensively evaluate eleven open and closed-source models on nine reasoning datasets. Results show that consistency-based calibration methods outperform existing post-hoc approaches in terms of calibration error. Meanwhile, we find that factors such as intermediate explanations, model scaling, and larger sample sizes enhance calibration, while instruction-tuning makes calibration more difficult. Moreover, confidence scores obtained from consistency can potentially enhance model performance. Finally, we offer guidance on choosing suitable consistency metrics for calibration, tailored to model characteristics such as the exposure to instruction-tuning and RLHF.

ICML Conference 2025 Conference Paper

MOGIC: Metadata-infused Oracle Guidance for Improved Extreme Classification

  • Suchith Chidananda Prabhu
  • Bhavyajeet Singh
  • Anshul Mittal
  • Siddarth Asokan
  • Shikhar Mohan
  • Deepak Saini
  • Yashoteja Prabhu
  • Lakshya Kumar

Retrieval-augmented classification and generation models benefit from early-stage fusion of high-quality text-based metadata, often called memory, but face high latency and noise sensitivity. In extreme classification (XC), where low latency is crucial, existing methods use late-stage fusion for efficiency and robustness. To enhance accuracy while maintaining low latency, we propose MOGIC, a novel approach to metadata-infused oracle guidance for XC. We train an early-fusion oracle classifier with access to both query-side and label-side ground-truth metadata in textual form and subsequently use it to guide existing memory-based XC disciple models via regularization. The MOGIC algorithm improves precision@1 and propensity-scored precision@1 of XC disciple models by 1-2% on six standard datasets, at no additional inference-time cost. We show that MOGIC can be used in a plug-and-play manner to enhance memory-free XC models such as NGAME or DEXA. Lastly, we demonstrate the robustness of the MOGIC algorithm to missing and noisy metadata. The code is publicly available at https: //github. com/suchith720/mogic.

ICML Conference 2024 Conference Paper

In-Context Principle Learning from Mistakes

  • Tianjun Zhang
  • Aman Madaan
  • Luyu Gao
  • Steven Zheng
  • Swaroop Mishra
  • Yiming Yang 0002
  • Niket Tandon
  • Uri Alon 0002

In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then we reflect on these mistakes, and learn explicit task-specific “principles” from them, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3. 5-turbo, GPT-4, GPT-4-turbo and Claude-2. 1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7. 5% in DROP, and by 3. 3% in HotpotQA. Importantly, LEAP does not require any more input or examples than the standard few-shot prompting settings.

NeurIPS Conference 2023 Conference Paper

Self-Refine: Iterative Refinement with Self-Feedback

  • Aman Madaan
  • Niket Tandon
  • Prakhar Gupta
  • Skyler Hallinan
  • Luyu Gao
  • Sarah Wiegreffe
  • Uri Alon
  • Nouha Dziri

Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides *feedback* for its output and uses it to *refine* itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner and the feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3. 5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by $\sim$20\% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test-time using our simple, standalone approach.

AAAI Conference 2020 Short Paper

I Am Guessing You Can’t Recognize This: Generating Adversarial Images for Object Detection Using Spatial Commonsense (Student Abstract)

  • Anurag Garg
  • Niket Tandon
  • Aparna S. Varde

Can we automatically predict failures of an object detection model on images from a target domain? We characterize errors of a state-of-the-art object detection model on the currently popular smart mobility domain, and find that a large number of errors can be identified using spatial commonsense. We propose CSK-SNIFFER, a system that automatically identifies a large number of such errors based on commonsense knowledge. Our system does not require any new annotations and can still find object detection errors with high accuracy (more than 80% when measured by humans). This work lays the foundation to answer exciting research questions on domain adaptation including the ability to automatically create adversarial datasets for target domain.

AAAI Conference 2016 Conference Paper

Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags

  • Niket Tandon
  • Charles Hariman
  • Jacopo Urbani
  • Anna Rohrbach
  • Marcus Rohrbach
  • Gerhard Weikum

Commonsense knowledge about part-whole relations (e. g. , screen partOf notebook) is important for interpreting user input in web search and question answering, or for object detection in images. Prior work on knowledge base construction has compiled part-whole assertions, but with substantial limitations: i) semantically different kinds of part-whole relations are conflated into a single generic relation, ii) the arguments of a part-whole assertion are merely words with ambiguous meaning, iii) the assertions lack additional attributes like visibility (e. g. , a nose is visible but a kidney is not) and cardinality information (e. g. , a bird has two legs while a spider eight), iv) limited coverage of only tens of thousands of assertions. This paper presents a new method for automatically acquiring part-whole commonsense from Web contents and image tags at an unprecedented scale, yielding many millions of assertions, while specifically addressing the four shortcomings of prior work. Our method combines pattern-based information extraction methods with logical reasoning. We carefully distinguish different relations: physicalPartOf, memberOf, substanceOf. We consistently map the arguments of all assertions onto WordNet senses, eliminating the ambiguity of wordlevel assertions. We identify whether the parts can be visually perceived, and infer cardinalities for the assertions. The resulting commonsense knowledge base has very high quality and high coverage, with an accuracy of 89% determined by extensive sampling, and is publicly available.

AAAI Conference 2015 Conference Paper

Multimedia Data for the Visually Impaired

  • Niket Tandon
  • Shekhar Sharma
  • Tanima Makkad

The Web contains a large amount of information in the form of videos that remains inaccessible to the visually impaired people. We identify a class of videos whose information content can be approximately encoded as an audio, thereby increasing the amount of accessible videos. We propose a model to automatically identify such videos. Our model jointly relies on the textual metadata and visual content of the video. We use this model to re-rank Youtube video search results based on accessibility of the video. We present preliminary results by conducting a user study with visually impaired people to measure the effectiveness of our system.

AAAI Conference 2014 Conference Paper

Acquiring Comparative Commonsense Knowledge from the Web

  • Niket Tandon
  • Gerard Melo
  • Gerhard Weikum

Applications are increasingly expected to make smart decisions based on what humans consider basic commonsense. An often overlooked but essential form of commonsense involves comparisons, e. g. the fact that bears are typically more dangerous than dogs, that tables are heavier than chairs, or that ice is colder than water. In this paper, we first rely on open information extraction methods to obtain large amounts of comparisons from the Web. We then develop a joint optimization model for cleaning and disambiguating this knowledge with respect to WordNet. This model relies on integer linear programming and semantic coherence scores. Experiments show that our model outperforms strong baselines and allows us to obtain a large knowledge base of disambiguated commonsense assertions.

AAAI Conference 2011 Conference Paper

Deriving a Web-Scale Common Sense Fact Database

  • Niket Tandon
  • Gerard de Melo
  • Gerhard Weikum

The fact that birds have feathers and ice is cold seems trivially true. Yet, most machine-readable sources of knowledge either lack such common sense facts entirely or have only limited coverage. Prior work on automated knowledge base construction has largely focused on relations between named entities and on taxonomic knowledge, while disregarding common sense properties. In this paper, we show how to gather large amounts of common sense facts from Web n-gram data, using seeds from the ConceptNet collection. Our novel contributions include scalable methods for tapping onto Web-scale data and a new scoring model to determine which patterns and facts are most reliable. The experimental results show that this approach extends ConceptNet by many orders of magnitude at comparable levels of precision.