Author name cluster

Shivam Agarwal

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

ICLR Conference 2025 Conference Paper

Scaling Diffusion Language Models via Adaptation from Autoregressive Models

Shansan Gong
Shivam Agarwal
Yizhe Zhang 0002
Jiacheng Ye
Lin Zheng
Mukai Li
Chenxin An
Peilin Zhao

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to their AR counterparts and lack fair comparison on language modeling benchmarks. Additionally, training diffusion models from scratch at scale remains challenging. Given the prevalence of open-source AR language models, we propose adapting these models to build text diffusion models. We demonstrate connections between AR and diffusion modeling objectives and introduce a simple continual pre-training approach for training diffusion models. Through systematic evaluation on language modeling, reasoning, and commonsense benchmarks, we show that we can convert AR models ranging from 127M to 7B parameters (GPT2 and LLaMA) into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts. We release a suite of DLMs (127M-355M-7B) capable of generating fluent text, performing in-context learning, filling in the middle without prompt re-ordering, and following instructions.

Details

NeurIPS Conference 2025 Conference Paper

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Hao Peng

Entropy minimization (EM) trains the model to concentrate even more probability mass on its most confident outputs. We show that this simple objective alone, without any labeled data, can substantially improve large language models’ (LLMs) performance on challenging math, physics, and coding tasks. We explore three approaches: (1) EM-FT minimizes token-level entropy similarly to instruction finetuning, but on unlabeled outputs drawn from the model; (2) EM-RL: reinforcement learning with negative entropy as the only reward to maximize; (3) EM-INF: inference-time logit adjustment to reduce entropy without any training data or parameter updates. On Qwen-7B, EM-RL, without any labeled data, achieves comparable or better performance than strong RL baselines such as GRPO and RLOO that are trained on 60K labeled examples. Furthermore, EM-INF enables Qwen-32B to match or exceed the performance of proprietary models like GPT-4o, Claude 3 Opus, and Gemini 1. 5 Pro on the challenging SciCode benchmark, while being 3x more efficient than self-consistency and sequential refinement. Our findings reveal that many pretrained LLMs possess previously underappreciated reasoning capabilities that can be effectively elicited through entropy minimization alone, without any labeled data or even any parameter updates.

PDF Details

UAI Conference 2021 Conference Paper

Modeling financial uncertainty with multivariate temporal entropy-based curriculums

Ramit Sawhney
Arnav Wadhwa
Ayush Mangal
Vivek Mittal
Shivam Agarwal
Rajiv Ratn Shah

In the financial realm, profit generation greatly relies on the complicated task of stock prediction. Lately, neural methods have shown success in exploiting stock affecting signals from textual data across news and tweets to forecast stock performance. However, the dynamic, stochastic, and variably influential nature of text and prices makes it difficult to train neural stock trading models, limiting predictive performance and profits. To transcend this limitation, we propose a novel multi-modal curriculum learning approach: FinCLASS, which evaluates stock affecting signals via entropy-based heuristics and measures their linguistic and price-based complexities in a time-aware, hierarchical fashion. We show that training financial models can benefit by exposing neural networks to easier examples of stock affecting signals early during the training phase, before introducing samples having more complex linguistic and price-based temporal variations. Through experiments on benchmark English tweets and Chinese financial news spanning two major indexes and four global markets, we show how FinCLASS outperforms state-of-the-art across financial tasks of stock movement prediction, volatility regression, and profit generation. Through ablative and qualitative experiments, we set the case for FinCLASS as a generalizable framework for developing natural language-centric neural models for financial tasks.

Details

AAAI Conference 2021 Conference Paper

Stock Selection via Spatiotemporal Hypergraph Attention Network: A Learning to Rank Approach

Ramit Sawhney
Shivam Agarwal
Arnav Wadhwa
Tyler Derr
Rajiv Ratn Shah

Quantitative trading and investment decision making are intricate financial tasks that rely on accurate stock selection. Despite advances in deep learning that have made significant progress in the complex and highly stochastic stock prediction problem, modern solutions face two significant limitations. They do not directly optimize the target of investment in terms of profit, and treat each stock as independent from the others, ignoring the rich signals between related stocks’ temporal price movements. Building on these limitations, we reformulate stock prediction as a learning to rank problem and propose STHAN-SR, a neural hypergraph architecture for stock selection. The key novelty of our work is the proposal of modeling the complex relations between stocks through a hypergraph and a temporal Hawkes attention mechanism to tailor a new spatiotemporal attention hypergraph network architecture to rank stocks based on profit by jointly modeling stock interdependence and the temporal evolution of their prices. Through experiments on three markets spanning over six years of data, we show that STHAN-SR significantly outperforms state-of-the-art neural stock forecasting methods. We validate our design choices through ablative and exploratory analyses over STHAN-SR’s spatial and temporal components and demonstrate its practical applicability.

PDF Details

IJCAI Conference 2021 Conference Paper

TEC: A Time Evolving Contextual Graph Model for Speaker State Analysis in Political Debates

Ramit Sawhney
Shivam Agarwal
Arnav Wadhwa
Rajiv Shah

Political discourses provide a forum for representatives to express their opinions and contribute towards policy making. Analyzing these discussions is crucial for recognizing possible delegates and making better voting choices in an independent nation. A politician's vote on a proposition is usually associated with their past discourses and impacted by cohesion forces in political parties. We focus on predicting a speaker's vote on a bill by augmenting linguistic models with temporal and cohesion contexts. We propose TEC, a time evolving graph based model that jointly employs links between motions, speakers, and temporal politician states. TEC outperforms competitive models, illustrating the benefit of temporal and contextual signals for predicting a politician's stance.

PDF Details DOI