Author name cluster

Shashank Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

2 author rows

TMLR Journal 2026 Journal Article

A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning

Shashank Gupta
Chaitanya Ahuja
Tsung-Yu Lin
Sreya Dutta Roy
Harrie Oosterhuis
Maarten de Rijke
Satya Narayan Shukla

Reinforcement learning (RL)-based fine-tuning has emerged as a powerful approach for aligning diffusion models with black-box objectives. Proximal policy optimization ( PPO) is a popular choice of method for policy optimization. While effective in terms of performance and sample complexity, PPO is highly sensitive to hyper-parameters and involves substantial computational overhead. REINFORCE, on the other hand, mitigates some implementation complexities such as high memory overhead and sensitive hyper-parameter tuning, but has suboptimal performance due to high variance and crucially sample inefficiency, which is the primary notion of efficiency we study in this work. While the variance of the REINFORCE can be reduced by sampling multiple actions per input prompt and using a baseline correction term, it still suffers from sample inefficiency. To address these challenges, we systematically analyze the sample efficiency-effectiveness trade-off between REINFORCE and PPO, and propose leave-one-out PPO ( LOOP), a novel RL for diffusion fine-tuning method. LOOP combines variance reduction techniques from REINFORCE, such as sampling multiple actions per input prompt and a baseline correction term, with the robustness and sample efficiency of PPO via clipping and importance sampling. Our results demonstrate that LOOP effectively improves diffusion models on various black-box objectives, and achieves a better balance between sample efficiency and final performance.

PDF Details

ICLR Conference 2025 Conference Paper

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

Parshin Shojaee
Kazem Meidani
Shashank Gupta
Amir Barati Farimani
Chandan K. Reddy

Mathematical equations have been unreasonably effective in describing complex natural phenomena across various scientific disciplines. However, discovering such insightful equations from data presents significant challenges due to the necessity of navigating extremely large combinatorial hypothesis spaces. Current methods of equation discovery, commonly known as symbolic regression techniques, largely focus on extracting equations from data alone, often neglecting the domain-specific prior knowledge that scientists typically depend on. They also employ limited representations such as expression trees, constraining the search space and expressiveness of equations. To bridge this gap, we introduce LLM-SR, a novel approach that leverages the extensive scientific knowledge and robust code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data. Specifically, LLM-SR treats equations as programs with mathematical operators and combines LLMs' scientific priors with evolutionary search over equation programs. The LLM iteratively proposes new equation skeleton hypotheses, drawing from its domain knowledge, which are then optimized against data to estimate parameters. We evaluate LLM-SR on four benchmark problems across diverse scientific domains (e.g., physics, biology), which we carefully designed to simulate the discovery process and prevent LLM recitation. Our results demonstrate that LLM-SR discovers physically accurate equations that significantly outperform state-of-the-art symbolic regression baselines, particularly in out-of-domain test settings. We also show that LLM-SR's incorporation of scientific priors enables more efficient equation space exploration than the baselines.

Details

NeurIPS Conference 2025 Conference Paper

Rectified CFG++ for Flow Based Models

Shreshth Saini
Shashank Gupta
Alan Bovik

Classifier‑free guidance (CFG) is the workhorse for steering large diffusion models toward text‑conditioned targets, yet its naïve application to rectified flow (RF) based models provokes severe off–manifold drift, yielding visual artifacts, text misalignment, and brittle behaviour. We present Rectified-CFG++, an adaptive predictor–corrector guidance that couples the deterministic efficiency of rectified flows with a geometry‑aware conditioning rule. Each inference step first executes a conditional RF update that anchors the sample near the learned transport path, then applies a weighted conditional correction that interpolates between conditional and unconditional velocity fields. We prove that the resulting velocity field is marginally consistent and that its trajectories remain within a bounded tubular neighbourhood of the data manifold, ensuring stability across a wide range of guidance strengths. Extensive experiments on large‑scale text‑to‑image models (Flux, Stable Diffusion 3/3. 5, Lumina) show that Rectified-CFG++ consistently outperforms standard CFG on benchmark datasets such as MS‑COCO, LAION‑Aesthetic, and T2I‑CompBench. Project page: https: //rectified-cfgpp. github. io/.

PDF Details

ICLR Conference 2024 Conference Paper

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Shashank Gupta
Vaishnavi Shrivastava
Ameet Deshpande
Ashwin Kalyan
Peter Clark
Ashish Sabharwal
Tushar Khot

Recent works have showcased the ability of large-scale language models (LLMs) to embody diverse personas in their responses, exemplified by prompts like ‘_You are Yoda. Explain the Theory of Relativity._’ While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs’ capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform _basic reasoning tasks_. Our study covers 24 reasoning datasets (spanning mathematics, law, medicine, morals, and more), 4 LLMs (2 versions of ChatGPT-3.5, GPT-4-Turbo, and Llama-2-70b-chat), and 19 diverse personas (e.g., ‘an Asian person’) spanning 5 socio-demographic groups: race, gender, religion, disability, and political affiliation. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. While they overtly reject stereotypes when explicitly asked (‘_Are Black people less skilled at mathematics?_’), they manifest stereotypical and often erroneous presumptions when prompted to answer questions while adopting a persona. These can be observed as abstentions in the model’s response, e.g., ‘_As a Black person, I am unable to answer this question as it requires math knowledge_’, and generally result in a substantial drop in performance on reasoning tasks. Our experiments with ChatGPT-3.5 show that this bias is _ubiquitous_—80% of our personas demonstrate bias; it is _significant_—some datasets show performance drops of 70%+; and can be especially _harmful for certain groups_—some personas suffer statistically significant drops on 80%+ of the datasets. Overall, all four LLMs exhibit persona-induced bias to varying extents, with GPT-4-Turbo showing the least but still a problematic amount of bias (evident in 42% of the personas). Further analysis shows that these persona-induced errors can be hard-to-discern as they do not always manifest as explicit abstentions, and can also be hard-to-avoid—we find de-biasing prompts to have minimal to no effect. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs—a trend on the rise—can surface their deep-rooted biases and have unforeseeable and detrimental side-effects.

Details

NeurIPS Conference 2023 Conference Paper

Self-Refine: Iterative Refinement with Self-Feedback

Aman Madaan
Niket Tandon
Prakhar Gupta
Skyler Hallinan
Luyu Gao
Sarah Wiegreffe
Uri Alon
Nouha Dziri

Like humans, large language models (LLMs) do not always generate the best output on their first try. Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement. The main idea is to generate an initial output using an LLMs; then, the same LLMs provides *feedback* for its output and uses it to *refine* itself, iteratively. Self-Refine does not require any supervised training data, additional training, or reinforcement learning, and instead uses a single LLM as the generator, refiner and the feedback provider. We evaluate Self-Refine across 7 diverse tasks, ranging from dialog response generation to mathematical reasoning, using state-of-the-art (GPT-3. 5, ChatGPT, and GPT-4) LLMs. Across all evaluated tasks, outputs generated with Self-Refine are preferred by humans and automatic metrics over those generated with the same LLM using conventional one-step generation, improving by $\sim$20\% absolute on average in task performance. Our work demonstrates that even state-of-the-art LLMs like GPT-4 can be further improved at test-time using our simple, standalone approach.

PDF Details

ICLR Conference 2022 Conference Paper

Knowledge Infused Decoding

Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah

Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence. they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID)---a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences.

Details