Arrow Research search

Author name cluster

Sumit Gulwani

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

ICLR Conference 2025 Conference Paper

Execution-guided within-prompt search for programming-by-example

  • Gust Verbruggen
  • Ashish Tiwari 0001
  • Mukul Singh
  • Vu Le 0002
  • Sumit Gulwani

Large language models (LLMs) can generate code from examples without being limited to a DSL, but they lack search, as sampled programs are independent. In this paper, we use an LLM as a policy that generates lines of code and then join these lines of code to let the LLM implicitly estimate the value of each of these lines in its next iteration. We further guide the policy and value estimation by executing each line and annotating it with its results on the given examples. This allows us to search for programs within a single (expanding) prompt until a sound program is found, by letting the policy reason in both the syntactic (code) and semantic (execution) space. We evaluate within-prompt search on straight-line Python code generation using five benchmarks across different domains (strings, lists, and arbitrary Python programming problems). We show that the model uses the execution results to guide the search and that within-prompt search performs well at low token budgets. We also analyze how the model behaves as a policy and value, show that it can parallelize the search, and that it can implicitly backtrack over earlier generations.

AAAI Conference 2024 System Paper

EmFORE: Learning Email Folder Classification Rules by Demonstration

  • Mukul Singh
  • Gust Verbruggen
  • José Cambronero
  • Vu Le
  • Sumit Gulwani

Tools that help with email folder management are limited, as users have to manually write rules to assign emails to folders. We present EMFORE, an iterative learning system that automatically learns and updates such rules from observations. EMFORE is fast enough to suggest and update rules in real time and suppresses mails with low confidence to reduce the number of false positives. EMFORE can use different rule grammars, and thus be adapted to different clients, without changing the user experience. Previous methods do not learn rules, require complete retraining or multiple new examples after making a mistake, and do not distinguish between inbox and other folders. EMFORE learns rules incrementally and can make the neutral decision of leaving emails in the inbox, making it an ideal candidate for integration in email clients.

AAAI Conference 2024 Conference Paper

FLAME: A Small Language Model for Spreadsheet Formulas

  • Harshit Joshi
  • Abishai Ebenezer
  • José Cambronero Sanchez
  • Sumit Gulwani
  • Aditya Kanade
  • Vu Le
  • Ivan Radiček
  • Gust Verbruggen

Spreadsheets are a vital tool for end-user data management. Using large language models for formula authoring assistance in these environments can be difficult, as these models are expensive to train and challenging to deploy due to their size (up to billions of parameters). We present FLAME, a transformer-based model trained exclusively on Excel formulas that leverages domain insights to achieve competitive performance while being substantially smaller (60M parameters) and training on two orders of magnitude less data. We curate a training dataset using sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction and noisy auto-encoding as pre-training objectives. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval. FLAME can outperform much larger models, such as the Davinci (175B) and Cushman (12B) variants of Codex and CodeT5 (220M), in 10 of 14 evaluation settings for the repair and completion tasks. For formula retrieval, FLAME outperforms CodeT5, CodeBERT, and GraphCodeBERT.

AAAI Conference 2023 Conference Paper

Repair Is Nearly Generation: Multilingual Program Repair with LLMs

  • Harshit Joshi
  • José Cambronero Sanchez
  • Sumit Gulwani
  • Vu Le
  • Gust Verbruggen
  • Ivan Radiček

Most programmers make mistakes when writing code. Some of these mistakes are small and require few edits to the original program – a class of errors recently termed last mile mistakes. These errors break the flow for experienced developers and can stump novice programmers. Existing automated repair techniques targeting this class of errors are language-specific and do not easily carry over to new languages. Transferring symbolic approaches requires substantial engineering and neural approaches require data and retraining. We introduce RING, a multilingual repair engine powered by a large language model trained on code (LLMC) such as Codex. Such a multilingual engine enables a flipped model for programming assistance, one where the programmer writes code and the AI assistance suggests fixes, compared to traditional code suggestion technology. Taking inspiration from the way programmers manually fix bugs, we show that a prompt-based strategy that conceptualizes repair as localization, transformation, and candidate ranking, can successfully repair programs in multiple languages with minimal effort. We present the first results for such a multilingual repair engine by evaluating on 6 different languages and comparing performance to language-specific repair engines. We show that RING can outperform language-specific repair engines for three of these languages.

ICLR Conference 2022 Conference Paper

Synchromesh: Reliable Code Generation from Pre-trained Language Models

  • Gabriel Poesia
  • Alex Polozov
  • Vu Le 0002
  • Ashish Tiwari 0001
  • Gustavo Soares
  • Christopher Meek
  • Sumit Gulwani

Large pre-trained language models have been used to generate code, providing a flexible interface for synthesizing programs from natural language specifications. However, they often violate syntactic and semantic rules of their output language, limiting their practical usability. In this paper, we propose Synchromesh: a framework for substantially improving the reliability of pre-trained models for code generation. Synchromesh comprises two components. First, it retrieves few-shot examples from a training bank using Target Similarity Tuning (TST), a novel method for semantic example selection. TST learns to recognize utterances that describe similar target programs despite of differences in surface natural language features. Then, Synchromesh feeds the examples to a pre-trained language model and samples programs using Constrained Semantic Decoding (CSD): a general framework for constraining the output to a set of valid programs in the target language. CSD leverages constraints on partial outputs to sample complete correct programs, and needs neither re-training nor fine-tuning of the language model. We evaluate our methods by synthesizing code from natural language descriptions using GPT-3 and Codex in three real-world languages: SQL queries, Vega-Lite visualizations and SMCalFlow programs. These domains showcase rich constraints that CSD is able to enforce, including syntax, scoping and typing rules. Across all languages, we observe complementary gains from CSD and TST in prediction accuracy and in effectively preventing parsing, type and run-time errors.

AAAI Conference 2018 Conference Paper

Disjunctive Program Synthesis: A Robust Approach to Programming by Example

  • Mohammad Raza
  • Sumit Gulwani

Programming by example (PBE) systems allow end users to easily create programs by providing a few input-output examples to specify their intended task. The system attempts to generate a program in a domain specific language (DSL) that satisfies the given examples. However, a key challenge faced by existing PBE techniques is to ensure the robustness of the programs that are synthesized from a small number of examples, as these programs often fail when applied to new inputs. This is because there can be many possible programs satisfying a small number of examples, and the PBE system has to somehow rank between these candidates and choose the correct one without any further information from the user. In this work we present a different approach to PBE in which the system avoids making a ranking decision at the synthesis stage, by instead synthesizing a disjunctive program that includes the many possible top-ranked programs as possible alternatives and selects between these different choices upon execution on a new input. This delayed choice brings the important benefit of comparing the possible outputs produced by the different disjuncts on a given input at execution time. We present a generic framework for synthesizing such disjunctive programs in arbitrary DSLs, and describe two concrete implementations of disjunctive synthesis in the practical domains of data extraction from plain text and HTML documents. We present an evaluation showing the significant increase in robustness achieved with our disjunctive approach, as illustrated by an increase from 59% to 93% of tasks for which correct programs can be learnt from a single example.

AAAI Conference 2017 Conference Paper

Automated Data Extraction Using Predictive Program Synthesis

  • Mohammad Raza
  • Sumit Gulwani

In recent years there has been rising interest in the use of programming-by-example techniques to assist users in data manipulation tasks. Such techniques rely on an explicit inputoutput examples specification from the user to automatically synthesize programs. However, in a wide range of data extraction tasks it is easy for a human observer to predict the desired extraction by just observing the input data itself. Such predictive intelligence has not yet been explored in program synthesis research, and is what we address in this work. We describe a predictive program synthesis algorithm that infers programs in a general form of extraction DSLs (domain specific languages) given input-only examples. We describe concrete instantiations of such DSLs and the synthesis algorithm in the two practical application domains of text extraction and web extraction, and present an evaluation of our technique on a range of extraction tasks encountered in practice.

IJCAI Conference 2017 Conference Paper

Learning to Learn Programs from Examples: Going Beyond Program Structure

  • Kevin Ellis
  • Sumit Gulwani

Programming-by-example technologies let end users construct and run new programs by providing examples of the intended program behavior. But, the few provided examples seldom uniquely determine the intended program. Previous approaches to picking a program used a bias toward shorter or more naturally structured programs. Our work here gives a machine learning approach for learning to learn programs that departs from previous work by relying upon features that are independent of the program structure, instead relying upon a learned bias over program behaviors, and more generally over program execution traces. Our approach leverages abundant unlabeled data for semisupervised learning, and incorporates simple kinds of world knowledge for common-sense reasoning during program induction. These techniques are evaluated in two programming-by-example domains, improving the accuracy of program learners.

AAAI Conference 2015 Conference Paper

Automatic Generation of Alternative Starting Positions for Simple Traditional Board Games

  • Umair Ahmed
  • Krishnendu Chatterjee
  • Sumit Gulwani

Simple board games, like Tic-Tac-Toe and CONNECT-4, play an important role not only in the development of mathematical and logical skills, but also in the emotional and social development. In this paper, we address the problem of generating targeted starting positions for such games. This can facilitate new approaches for bringing novice players to mastery, and also leads to discovery of interesting game variants. We present an approach that generates starting states of varying hardness levels for player 1 in a two-player board game, given rules of the board game, the desired number of steps required for player 1 to win, and the expertise levels of the two players. Our approach leverages symbolic methods and iterative simulation to efficiently search the extremely large state space. We present experimental results that include discovery of states of varying hardness levels for several simple gridbased board games. The presence of such states for standard game variants like 4 × 4 Tic-Tac-Toe opens up new games to be played that have never been played as the default start state is heavily biased.

IJCAI Conference 2015 Conference Paper

Compositional Program Synthesis from Natural Language and Examples

  • Mohammad Raza
  • Sumit Gulwani
  • Natasa Milic-Frayling

Compositionality is a fundamental notion in computation whereby complex abstractions can be constructed from simpler ones, yet this property has so far escaped the paradigm of end-user programming from examples or natural language. Existing approaches restrict end users to only give holistic specifications of tasks, which limits the expressivity and scalability of these approaches to relatively simple programs in very restricted domains. In this paper we propose Compositional Program Synthesis (CPS): an approach in which tasks can be specified in a compositional manner through a combination of natural language and examples. We present a domain-agnostic program synthesis algorithm and demonstrate its application to an expressive string manipulation language. We evaluate our approach on complex tasks from online help forums that are beyond the scope of current state-ofthe-art methods.

IJCAI Conference 2015 Conference Paper

FlashNormalize: Programming by Examples for Text Normalization

  • Dileep Kini
  • Sumit Gulwani

Several applications including text-to-speech require some normalized format of non-standard words in various domains such as numbers, dates, and currencies and in various human languages. The traditional approach of manually constructing a program for such a normalization task requires expertise in both programming and target (human) language and further does not scale to a large number of domain, format, and target language combinations. We propose to learn programs for such normalization tasks through examples. We present a domainspecific programming language that offers appropriate abstractions for succinctly describing such normalization tasks, and then present a novel search algorithm that can effectively learn programs in this language from input-output examples. We also briefly describe domain-specific heuristics for guiding users of our system to provide representative examples for normalization tasks related to that domain. Our experiments show that we are able to effectively learn desired programs for a variety of normalization tasks.

IJCAI Conference 2015 Conference Paper

Personalized Mathematical Word Problem Generation

  • Oleksandr Polozov
  • Eleanor O'Rourke
  • Adam M. Smith
  • Luke Zettlemoyer
  • Sumit Gulwani
  • Zoran Popović

Word problems are an established technique for teaching mathematical modeling skills in K-12 education. However, many students find word problems unconnected to their lives, artificial, and uninteresting. Most students find them much more difficult than the corresponding symbolic representations. To account for this phenomenon, an ideal pedagogy might involve an individually crafted progression of unique word problems that form a personalized plot. We propose a novel technique for automatic generation of personalized word problems. In our system, word problems are generated from general specifications using answer-set programming (ASP). The specifications include tutor requirements (properties of a mathematical model), and student requirements (personalization, characters, setting). Our system takes a logical encoding of the specification, synthesizes a word problem narrative and its mathematical model as a labeled logical plot graph, and realizes the problem in natural language. Human judges found our problems as solvable as the textbook problems, with a slightly more artificial language.

AAAI Conference 2014 Conference Paper

Programming by Example Using Least General Generalizations

  • Mohammad Raza
  • Sumit Gulwani
  • Natasa Milic-Frayling

Recent advances in Programming by Example (PBE) have supported new applications to text editing, but existing approaches are limited to simple text strings. In this paper we address transformations in richly formatted documents, using an approach based on the idea of least general generalizations from inductive inference, which avoids the scalability issues faced by stateof-the-art PBE methods. We describe a novel domain specific language (DSL) that expresses transformations over XML structures describing richly formatted content, and a synthesis algorithm that generates a minimal program with respect to a natural subsumption ordering in our DSL. We present experimental results on tasks collected from online help forums, showing an average of 4. 17 examples required for task completion.

AAAI Conference 2014 Conference Paper

Synthesis of Geometry Proof Problems

  • Chris Alvin
  • Sumit Gulwani
  • Rupak Majumdar
  • Supratik Mukhopadhyay

This paper presents a semi-automated methodology for generating geometric proof problems of the kind found in a highschool curriculum. We formalize the notion of a geometry proof problem and describe an algorithm for generating such problems over a user-provided figure. Our experimental results indicate that our problem generation algorithm can effectively generate proof problems in elementary geometry. On a corpus of 110 figures taken from popular geometry textbooks, our system generated an average of about 443 problems per figure in an average time of 4. 7 seconds per figure.

ICML Conference 2013 Conference Paper

A Machine Learning Framework for Programming by Example

  • Aditya Krishna Menon
  • Omer Tamuz
  • Sumit Gulwani
  • Butler W. Lampson
  • Adam Tauman Kalai

Learning programs is a timely and interesting challenge. In Programming by Example (PBE), a system attempts to infer a program from input and output examples alone, by searching for a composition of some set of base functions. We show how machine learning can be used to speed up this seemingly hopeless search problem, by learning weights that relate textual features describing the provided input-output examples to plausible sub-components of a program. This generic learning framework lets us address problems beyond the scope of earlier PBE systems. Experiments on a prototype implementation show that learning improves search and ranking on a variety of text processing tasks found on help forums.

IJCAI Conference 2013 Conference Paper

Automated Grading of DFA Constructions

  • Rajeev Alur
  • Loris D'Antoni
  • Sumit Gulwani
  • Dileep Kini
  • Mahesh Viswanathan

One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automaton (DFA) from the given description of its language. We focus on how to assign partial grades for incorrect answers. Each student’s answer is compared to the correct DFA using a hybrid of three techniques devised to capture different classes of errors. First, in an attempt to catch syntactic mistakes, we compute the edit distance between the two DFA descriptions. Second, we consider the entropy of the symmetric difference of the languages of the two DFAs, and compute a score that estimates the fraction of the number of strings on which the student answer is wrong. Our third technique is aimed at capturing mistakes in reading of the problem description. For this purpose, we consider a description language MOSEL, which adds syntactic sugar to the classical Monadic Second Order Logic, and allows defining regular languages in a concise and natural way. We provide algorithms, along with optimizations, for transforming MOSEL descriptions into DFAs and vice-versa. These allow us to compute the syntactic edit distance of the incorrect answer from the correct one in terms of their logical representations. We report an experimental study that evaluates hundreds of answers submitted by (real) students by comparing grades/feedback computed by our tool with human graders. Our conclusion is that the tool is able to assign partial grades in a meaningful way, and should be preferred over the human graders for both scalability and consistency.

Highlights Conference 2013 Conference Abstract

Automated grading of DFA constructions

  • Rajeev Alur
  • Loris D'Antoni
  • Sumit Gulwani
  • Dileep Kini
  • Mahesh Viswanathan

One challenge in making online education more effective is to develop automatic grading software that can provide meaningful feedback. This paper provides a solution to automatic grading of the standard computation-theory problem that asks a student to construct a deterministic finite automaton from the given description of its language. We focus on how to assign partial grades for incorrect answers. We experimentally show that the tool is able to assign partial grades in a meaningful way, and should be preferred over the human graders for both scalability and consistency. 13: 00 14: 00 Lunch

IJCAI Conference 2013 Conference Paper

Automatically Generating Problems and Solutions for Natural Deduction

  • Umair Z. Ahmed
  • Sumit Gulwani
  • Amey Karkare

Natural deduction, which is a method for establishing validity of propositional type arguments, helps develop important reasoning skills and is thus a key ingredient in a course on introductory logic. We present two core components, namely solution generation and practice problem generation, for enabling computer-aided education for this important subject domain. The key enabling technology is use of an offline-computed data-structure called Universal Proof Graph (UPG) that encodes all possible applications of inference rules over all small propositions abstracted using their bitvector-based truth-table representation. This allows an efficient forward search for solution generation. More interestingly, this allows generating fresh practice problems that have given solution characteristics by performing a backward search in UPG. We obtained around 300 natural deduction problems from various textbooks. Our solution generation procedure can solve many more problems than the traditional forward-chaining based procedure, while our problem generation procedure can efficiently generate several variants with desired characteristics.

LPAR Conference 2013 Conference Paper

Solving Geometry Problems Using a Combination of Symbolic and Numerical Reasoning

  • Shachar Itzhaky
  • Sumit Gulwani
  • Neil Immerman
  • Shmuel Sagiv

Abstract We describe a framework that combines deductive, numeric, and inductive reasoning to solve geometric problems. Applications include the generation of geometric models and animations, as well as problem solving in the context of intelligent tutoring systems. Our novel methodology uses (i) deductive reasoning to generate a partial program from logical constraints, (ii) numerical methods to evaluate the partial program, thus creating geometric models which are solutions to the original problem, and (iii) inductive synthesis to read off new constraints that are then applied to one more round of deductive reasoning leading to the desired deterministic program. By the combination of methods we were able to solve problems that each of the methods was not able to solve by itself. The number of nondeterministic choices in a partial program provides a measure of how close a problem is to being solved and can thus be used in the educational context for grading and providing hints. We have successfully evaluated our methodology on 18 Scholastic Aptitude Test geometry problems, and 11 ruler/compass-based geometry construction problems. Our tool solved these problems using an average of a few seconds per problem.

AAAI Conference 2012 Conference Paper

Automatically Generating Algebra Problems

  • Rohit Singh
  • Sumit Gulwani
  • Sriram Rajamani

We propose computer-assisted techniques for helping with pedagogy in Algebra. In particular, given a proof problem p (of the form Left-hand-side-term = Righthand-side-term), we show how to automatically generate problems that are similar to p. We believe that such a tool can be used by teachers in making examinations where they need to test students on problems similar to what they taught in class, and by students in generating practice problems tailored to their specific needs. Our first insight is that we can generalize p syntactically to a query Q that implicitly represents a set of problems [[Q]] (which includes p). Our second insight is that we can explore the space of problems [[Q]] automatically, use classical results from polynomial identity testing to generate only those problems in [[Q]] that are correct, and then use pruning techniques to generate only unique and interesting problems. Our third insight is that with a small amount of manual tuning on the query Q, the user can interactively guide the computer to generate problems of interest to her. We present the technical details of the above mentioned steps, and also describe a tool where these steps have been implemented. We also present an empirical evaluation on a wide variety of problems from various sub-fields of algebra including polynomials, trigonometry, calculus, determinants etc. Our tool is able to generate a rich corpus of similar problems from each given problem; while some of these similar problems were already present in the textbook, several were new!