Arrow Research search

Author name cluster

Varun Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

EAAI Journal 2026 Journal Article

A digital twin for diesel engines: Operator-infused physics-informed neural networks with transfer learning for engine health monitoring

  • Kamaljyoti Nath
  • Varun Kumar
  • Daniel J. Smith
  • George Em Karniadakis

Improving diesel engine efficiency, reducing emissions, and enabling robust health monitoring have been critical research topics in engine modelling. While recent advancements in the use of neural networks for system monitoring have shown promising results, such methods often focus on component-level analysis, lack generalizability, and physical interpretability. In this study, we propose a novel hybrid framework that combines physics-informed neural networks (PINNs) with deep operator networks (DeepONet) to enable accurate and computationally efficient parameter identification in mean-value diesel engine models. Our method leverages physics-based system knowledge in combination with data-driven training of neural networks to enhance model applicability. Incorporating offline-trained DeepONets to predict actuator dynamics significantly lowers the online computation cost when compared to the existing PINN framework. To address the re-training burden typical of PINNs under varying input conditions, we propose two transfer learning (TL) strategies: (i) a multi-stage TL scheme offering better runtime efficiency than full online training of the PINN model and (ii) a few-shot TL scheme that freezes a shared multi-head network body and computes physics-based derivatives required for model training outside the training loop. The second strategy offers a computationally inexpensive and physics-based approach for predicting engine dynamics and parameter identification, offering computational efficiency over the existing PINN framework. Compared to existing health monitoring methods, our framework combines the interpretability of physics-based models with the flexibility of deep learning, offering substantial gains in generalization, accuracy, and deployment efficiency for diesel engine diagnostics.

NeurIPS Conference 2025 Conference Paper

CodeAssistBench (CAB): Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance

  • Myeongsoo Kim
  • Shweta Garg
  • Baishakhi Ray
  • Varun Kumar
  • Anoop Deoras

Programming assistants powered by large language models have transformed software development, yet most benchmarks focus narrowly on code generation tasks. Recent efforts like InfiBench and StackEval attempt to address this gap using Stack Overflow data but remain limited to single-turn interactions in isolated contexts, require significant manual curation, and fail to represent complete project environments. We introduce CodeAssistBench (CAB), the first benchmark framework for evaluating multi-turn programming assistance in realistic settings that address questions grounded in actual codebases. Unlike existing programming Q&A benchmarks, CAB automatically generates scalable datasets from GitHub issues tagged with questions using configurable parameters (e. g. , repository creation date, star count, programming languages), and includes automatic containerization of codebases for evaluation. It then evaluates models through simulated users in these containerized environments with full codebase access. Using this framework, we constructed a test set of 3, 286 real-world programming questions across 214 repositories, spanning seven programming languages and diverse problem domains. Our evaluation of leading LLMs reveals a substantial capability gap: while models perform well on Stack Overflow questions with success rates of 70-83%, they resolve only up to 16. 49% of CAB's issues from recent repositories (post-training cutoff). This discrepancy highlights the challenges of providing assistance in complex, project-specific contexts versus answering standalone questions. Our fully automated framework enables continuous benchmark expansion and is available at https: //github. com/amazon-science/CodeAssistBench/.

ICML Conference 2024 Conference Paper

Fewer Truncations Improve Language Modeling

  • Hantian Ding
  • Zijian Wang 0002
  • Giovanni Paolini
  • Varun Kumar
  • Anoop Deoras
  • Dan Roth 0001
  • Stefano Soatto

In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity—it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and factually consistent content that is grounded on the complete context. To address the issue, we propose Best-fit Packing, a scalable and efficient method that packs documents into training sequences through length-aware combinatorial optimization. Our method completely eliminates unnecessary truncations while retaining the same training efficiency as concatenation. Empirical results from both text and code pre-training show that our method achieves superior performance (e. g. , +4. 7% on reading comprehension; +16. 8% in context following; and +9. 2% on program synthesis), and reduces closed-domain hallucination effectively by up to 58. 3%.

NeurIPS Conference 2024 Conference Paper

LeDex: Training LLMs to Better Self-Debug and Explain Code

  • Nan Jiang
  • Xiaopeng Li
  • Shiqi Wang
  • Qiang Zhou
  • Soneya B. Hossain
  • Baishakhi Ray
  • Varun Kumar
  • Xiaofei Ma

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15. 92\% and pass@10 by 9. 30\% over four benchmarks. RL training brings additional up to 3. 54\% improvement on pass@1 and 2. 55\% improvement on pass@10. The trained LLMs show iterative refinement ability and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.

ICLR Conference 2023 Conference Paper

Multi-lingual Evaluation of Code Generation Models

  • Ben Athiwaratkun
  • Sanjay Krishna Gouda
  • Zijian Wang 0002
  • Xiaopeng Li 0002
  • Yuchen Tian
  • Ming Tan
  • Wasi Uddin Ahmad
  • Shiqi Wang 0002

We present two new benchmarks, MBXP and Multilingual HumanEval, designed to evaluate code completion models in over 10 programming languages. These datasets are generated using a conversion framework that transpiles prompts and test cases from the original MBPP and HumanEval datasets into the corresponding data in the target language. By using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities. In addition, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for other code-related evaluations such as code insertion, robustness, or summarization tasks.

RLDM Conference 2019 Conference Abstract

Graph-DQN: Fast generalization to novel objects using prior relational knowl- edge

  • Varun Kumar
  • Hanlin Tang
  • Arjun K Bansal

Humans have a remarkable ability to both generalize known actions to novel objects, and reason about novel objects once their relationship to known objects is understood. For example, on being told a novel object (e. g. bees) is to be avoided, we readily apply our prior experience avoiding known objects without needing to experience a sting. Deep Reinforcement Learning (RL) has achieved many remarkable successes in recent years including results with Atari games and Go that have matched or exceeded human performance. While a human playing Atari games can, with a few sentences of natural language instruction, quickly reach a decent level of performance, modern end-to-end deep reinforcement learning methods still require millions of frames of experience. Past studies have hypothesized a role for prior knowledge in ad- dressing this gap between human performance and Deep RL. However, scalable approaches for combining prior or instructional knowledge with deep reinforcement learning have remained elusive. We introduce a graph convolution based reinforcement learning architecture (Graph-DQN) for combining prior informa- tion, structured as a knowledge graph, with the visual scene, and demonstrate that this approach is able to generalize to novel objects whereas the baseline algorithms fail. Ablation experiments show that the agents apply learned self-object relationships to novel objects at test time. In both a Warehouse game and the more complex Pacman environment, Graph-DQN is also more sample efficient, reaching the same perfor- mance in fewer episodes compared to the baseline. Once the Graph-DQN is trained, we can manipulate agent behavior by modifying the knowledge graph in semantically meaningful ways. These results suggest that Graph-DQNs provide a framework for agents to reason over structured knowledge graphs while still leveraging gradient based learning approaches.