Arrow Research search

Author name cluster

Rahul Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

NeurIPS Conference 2025 Conference Paper

Establishing Best Practices in Building Rigorous Agentic Benchmarks

  • Yuxuan Zhu
  • Tengjun Jin
  • Yada Pruksachatkun
  • Andy Zhang
  • Shu Liu
  • Sasha Cui
  • Sayash Kapoor
  • Shayne Longpre

Benchmarks are essential for quantitatively tracking progress in AI. As AI agents become increasingly capable, researchers and practitioners have introduced agentic benchmarks to evaluate agents on complex, real-world tasks. These benchmarks typically measure agent capabilities by evaluating task outcomes via specific reward designs. However, we show that many agentic benchmarks have issues in task setup or reward design. For example, SWE-bench-Verified uses insufficient test cases, while $\tau$-bench counts empty responses as successes. Such issues can lead to under- or overestimation of agents’ performance by up to 100% in relative terms. To make agentic evaluation rigorous, we introduce the Agentic Benchmark Checklist (ABC), a set of guidelines that we synthesized from our benchmark-building experience, a survey of best practices, and previously reported issues. When applied to CVE-Bench, a benchmark with a particularly complex evaluation design, ABC reduces performance overestimation by 33%.

AAAI Conference 2025 Conference Paper

HyperDefender: A Robust Framework for Hyperbolic GNNs

  • Nikita Malik
  • Rahul Gupta
  • Sandeep Kumar

Graph neural networks for hyperbolic space has emerged as a powerful tool for embedding datasets exhibiting a highly non-Euclidean latent anatomy e.g., graphs with hierarchical structures. While several Hyperbolic Graph Neural Networks (Hy-GNNs) have been developed to enhance the representation of hierarchical datasets, they remain susceptible to noise and adversarial attacks, posing serious risks in critical applications. The absence of robust Hy-GNN frameworks underscores a pressing problem. This research addresses this challenge by introducing HyperDefender—a robust and flexible approach designed to fortify Hy-GNNs against adversarial attacks and noises. HyperDefender aims to secure the reliability of applications that depend on the integrity of hierarchical graph-structured data in real-world scenarios. Experimental results demonstrate that HyperDefender significantly improves node classification accuracy across various attacks, effectively mitigating the performance degradation typically observed in Hy-GNNs when the hierarchy in original datasets is compromised.

NeurIPS Conference 2025 Conference Paper

VMDT: Decoding the Trustworthiness of Video Foundation Models

  • Yujin Potter
  • Zhun Wang
  • Nicholas Crispino
  • Kyle Montgomery
  • Alexander Xiong
  • Ethan Chang
  • Francesco Pinto
  • Yuqi Chen

As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustworthiness benchmarks. We introduce VMDT (Video-Modal DecodingTrust), the first unified platform for evaluating text-to-video (T2V) and video-to-text (V2T) models across five key trustworthiness dimensions: safety, hallucination, fairness, privacy, and adversarial robustness. Through our extensive evaluation of 7 T2V models and 19 V2T models using VMDT, we uncover several significant insights. For instance, all open-source T2V models evaluated fail to recognize harmful queries and often generate harmful videos, while exhibiting higher levels of unfairness compared to image modality models. In V2T models, unfairness and privacy risks rise with scale, whereas hallucination and adversarial robustness improve---though overall performance remains low. Uniquely, safety shows no correlation with model size, implying that factors other than scale govern current safety levels. Our findings highlight the urgent need for developing more robust and trustworthy video foundation models, and VMDT provides a systematic framework for measuring and tracking progress toward this goal. The code is available at https: //sunblaze-ucb. github. io/VMDT-page/.

EAAI Journal 2024 Journal Article

Predicting global horizontal irradiance of north central region of India via machine learning regressor algorithms

  • Rahul Gupta
  • Anil Kumar Yadav
  • S.K. Jha
  • Pawan Kumar Pathak

The ability to predict global horizontal irradiance (GHI) in the wake of changing weather conditions is becoming increasingly important in driving economic growth in the renewable energy industry. The choice of relevant meteorological variables for predicting GHI is crucial because it influences the prediction accuracy due to various geographical conditions. To fulfill this aim, this proposed study identifies a subset of the relevant input variables for the prediction of GHI by applying two methods Feature Combination (FC) and Feature Selection (FS). The results reveal that the most significant input variables for predicting GHI are Solar Zenith Angle, Dew Point, Diffuse Horizontal irradiance, Direct Normal Irradiance, and Wind speed obtained by the FS method. The predictive performance of the selected features is evaluated by feeding them into six different types of Machine Learning (ML) regressor algorithms such as Multiple Linear regressor (MLR), Decision Tree (DT), Random Forest (RF), Gradient Boost (GB), Light Gradient Boost Machine (LGBM) and Extra Tree (ET). The performances of models are evaluated by using statistical measures such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). ET regressor gives the best prediction performance among all the six models. It is observed that the percentage reduction in error for the FS method as compared to the FC method in terms of the statistical indicators MAE, RMSE, and MAPE are 8. 5%, 7. 72%, and 25. 84% respectively. It shows that the FS method gives better predictive performance than the FC method for the estimation of GHI.

AAAI Conference 2019 Conference Paper

A Task in a Suit and a Tie: Paraphrase Generation with Semantic Augmentation

  • Su Wang
  • Rahul Gupta
  • Nancy Chang
  • Jason Baldridge

Paraphrasing is rooted in semantics. We show the effectiveness of transformers (Vaswani et al. 2017) for paraphrase generation and further improvements by incorporating Prop- Bank labels via a multi-encoder. Evaluating on MSCOCO and WikiAnswers, we find that transformers are fast and effective, and that semantic augmentation for both transformers and LSTMs leads to sizable 2-3 point gains in BLEU, METEOR and TER. More importantly, we find surprisingly large gains on human evaluations compared to previous models. Nevertheless, manual inspection of generated paraphrases reveals ample room for improvement: even our best model produces human-acceptable paraphrases for only 28% of captions from the CHIA dataset (Sharma et al. 2018), and it fails spectacularly on sentences from Wikipedia. Overall, these results point to the potential for incorporating semantics in the task while highlighting the need for stronger evaluation.

AAAI Conference 2019 Conference Paper

Deep Reinforcement Learning for Syntactic Error Repair in Student Programs

  • Rahul Gupta
  • Aditya Kanade
  • Shirish Shevade

Novice programmers often struggle with the formal syntax of programming languages. In the traditional classroom setting, they can make progress with the help of real time feedback from their instructors which is often impossible to get in the massive open online course (MOOC) setting. Syntactic error repair techniques have huge potential to assist them at scale. Towards this, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without either supervision or any prior knowledge of the formal syntax of the programming language. We evaluate our technique on a publicly available dataset containing 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 1699 (24. 4%) programs completely and 1310 (18. 8%) program partially, outperforming DeepFix, a state-of-the-art syntactic error repair technique, which uses a fully supervised neural machine translation approach.

NeurIPS Conference 2019 Conference Paper

Neural Attribution for Semantic Bug-Localization in Student Programs

  • Rahul Gupta
  • Aditya Kanade
  • Shirish Shevade

Providing feedback is an integral part of teaching. Most open online courses on programming make use of automated grading systems to support programming assignments and give real-time feedback. These systems usually rely on test results to quantify the programs' functional correctness. They return failing tests to the students as feedback. However, students may find it difficult to debug their programs if they receive no hints about where the bug is and how to fix it. In this work, we present NeuralBugLocator, a deep learning based technique, that can localize the bugs in a faulty program with respect to a failing test, without even running the program. At the heart of our technique is a novel tree convolutional neural network which is trained to predict whether a program passes or fails a given test. To localize the bugs, we analyze the trained network using a state-of-the-art neural prediction attribution technique and see which lines of the programs make it predict the test outcomes. Our experiments show that NeuralBugLocator is generally more accurate than two state-of-the-art program-spectrum based and one syntactic difference based bug-localization baselines.

LPAR Conference 2018 Conference Paper

Knowledge Compilation meets Uniform Sampling

  • Shubham Sharma 0003
  • Rahul Gupta
  • Subhajit Roy 0001
  • Kuldeep S. Meel

Uniform sampling has drawn diverse applications in programming languages and software engineering, like in constrained-random verification (CRV), constrained-fuzzing and bug synthesis. The effectiveness of these applications depend on the uniformity of test stimuli generated from a given set of constraints. Despite significant progress over the past few years, the performance of the state of the art techniques still falls short of those of heuristic methods employed in the industry which sacrifice either uniformity or scalability when generating stimuli. In this paper, we propose a new approach to the uniform generation that builds on recent progress in knowledge compilation. The primary contribution of this paper is marrying knowledge compilation with uniform sampling: our algorithm, KUS, employs the state-of-the-art knowledge compilers to first compile constraints into d-DNNF form, and then, generates samples by making two passes over the compiled representation. We show that KUS is able to significantly outperform existing state-of-the-art algorithms, SPUR and UniGen2, by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of 1. 7× and 8. 3× over SPUR and UniGen2 respectively. Also, KUS achieves a lower PAR-21 score, around 0. 82× that of SPUR and 0. 38× that of UniGen2. Furthermore, KUS achieves speedups of up to 3 orders of magnitude for incremental sampling. The distribution generated by KUS is statistically indistinguishable from that generated by an ideal uniform sampler. Moreover, KUS is almost oblivious to the number of samples requested.

AAAI Conference 2017 Conference Paper

DeepFix: Fixing Common C Language Errors by Deep Learning

  • Rahul Gupta
  • Soham Pal
  • Aditya Kanade
  • Shirish Shevade

The problem of automatically fixing programming errors is a very active research topic in software engineering. This is a challenging problem as fixing even a single error may require analysis of the entire program. In practice, a number of errors arise due to programmer’s inexperience with the programming language or lack of attention to detail. We call these common programming errors. These are analogous to grammatical errors in natural languages. Compilers detect such errors, but their error messages are usually inaccurate. In this work, we present an end-to-end solution, called DeepFix, that can fix multiple such errors in a program without relying on any external tool to locate or fix them. At the heart of DeepFix is a multi-layered sequence-to-sequence neural network with attention which is trained to predict erroneous program locations along with the required correct statements. On a set of 6971 erroneous C programs written by students for 93 programming tasks, DeepFix could fix 1881 (27%) programs completely and 1338 (19%) programs partially.

JMLR Journal 2010 Journal Article

Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

  • Rahul Gupta
  • Sunita Sarawagi
  • Ajit A. Diwan

Many structured information extraction tasks employ collective graphical models that capture inter-instance associativity by coupling them with various clique potentials. We propose tractable families of such potentials that are invariant under permutations of their arguments, and call them symmetric clique potentials. We present three families of symmetric potentials-MAX, SUM, and MAJORITY. We propose cluster message passing for collective inference with symmetric clique potentials, and present message computation algorithms tailored to such potentials. Our first message computation algorithm, called α-pass, is sub-quadratic in the clique size, outputs exact messages for MAX, and computes 13/15-approximate messages for Potts, a popular member of the SUM family. Empirically, it is upto two orders of magnitude faster than existing algorithms based on graph-cuts or belief propagation. Our second algorithm, based on Lagrangian relaxation, operates on MAJORITY potentials and provides close to exact solutions while being two orders of magnitude faster. We show that the cluster message passing framework is more principled, accurate and converges faster than competing approaches. We extend our collective inference framework to exploit associativity of more general intra-domain properties of instance labelings, which opens up interesting applications in domain adaptation. Our approach leads to significant error reduction on unseen domains without incurring any overhead of model retraining. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

ICML Conference 2007 Conference Paper

Efficient inference with cardinality-based clique potentials

  • Rahul Gupta
  • Ajit A. Diwan
  • Sunita Sarawagi

Many collective labeling tasks require inference on graphical models where the clique potentials depend only on the number of nodes that get a particular label. We design efficient inference algorithms for various families of such potentials. Our algorithms are exact for arbitrary cardinality-based clique potentials on binary labels and for max-like and majority-like clique potentials on multiple labels. Moving towards more complex potentials, we show that inference becomes NP-hard even on cliques with homogeneous Potts potentials. We present a 13/15-approximation algorithm with runtime sub-quadratic in the clique size. In contrast, the best known previous guarantee for graphs with Potts potentials is only 0.5. We perform empirical comparisons on real and synthetic data, and show that our proposed methods are an order of magnitude faster than the well-known Tree-based re-parameterization (TRW) and graph-cut algorithms.

AAAI Conference 2004 System Paper

SEM-Ether: Semantic Web Based Pervasive Computing Framework — Integrating Web, Devices and People

  • Sushil Puradkar
  • Chintan Patel
  • Rahul Gupta

Pervasive computing aims to build an aggregated environment around a user by knitting diverse computing and communicating devices and software services into a single homogeneous unit. Our work is to develop a Pervasive computing framework which harnesses the power of Semantic Web and Web Services, facilitating the development of effective and intelligent Pervasive environments. This paper presents a high level view of the framework and how different Pervasive services can be built on this framework