Arrow Research search

Author name cluster

Federico Mora

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
1 author row

Possible papers

4

AAAI Conference 2025 Conference Paper

Online Prompt Selection for Program Synthesis

  • Yixuan Li
  • Lewis Frampton
  • Federico Mora
  • Elizabeth Polgreen

Large Language Models (LLMs) demonstrate impressive capabilities in the domain of program synthesis. This level of performance is not, however, universal across all tasks, all LLMs and all prompting styles. There are many areas where one LLM dominates, one prompting style dominates, or where calling a symbolic solver is a better choice than an LLM. A key challenge for the user then, is to identify not only when an LLM is the right choice of solver, and the appropriate LLM to call for a given synthesis task, but also the right way to call it. A non-expert user who makes the wrong choice, incurs a cost both in terms of results (number of tasks solved, and the time it takes to solve them) and financial cost, if using a closed-source language model via a commercial API. We frame this choice as an online learning problem. We use a multi-armed bandit algorithm to select which symbolic solver, or LLM and prompt combination to deploy in order to maximize a given reward function (which may prioritize solving time, number of synthesis tasks solved, or financial cost of solving). We implement an instance of this approach, called \name, and evaluate it on synthesis queries from the literature in ranking function synthesis, from the syntax-guided synthesis competition, and fresh, unseen queries generated from SMT problems. Cyanea solves 37.2 % more queries than the best single solver and achieves results within 4 % of the virtual best solver.

AAAI Conference 2024 Conference Paper

An Eager Satisfiability Modulo Theories Solver for Algebraic Datatypes

  • Amar Shah
  • Federico Mora
  • Sanjit A. Seshia

Algebraic data types (ADTs) are a construct classically found in functional programming languages that capture data structures like enumerated types, lists, and trees. In recent years, interest in ADTs has increased. For example, popular programming languages, like Python, have added support for ADTs. Automated reasoning about ADTs can be done using satisfiability modulo theories (SMT) solving, an extension of the Boolean satisfiability problem with first-order logic and associated background theories. Unfortunately, SMT solvers that support ADTs do not scale as state-of-the-art approaches all use variations of the same lazy approach. In this paper, we present an SMT solver that takes a fundamentally different approach, an eager approach. Specifically, our solver reduces ADT queries to a simpler logical theory, uninterpreted functions (UF), and then uses an existing solver on the reduced query. We prove the soundness and completeness of our approach and demonstrate that it outperforms the state of the art on existing benchmarks, as well as a new, more challenging benchmark set from the planning domain.

NeurIPS Conference 2024 Conference Paper

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages

  • Federico Mora
  • Justin Wong
  • Haley Lepe
  • Sahil Bhatia
  • Karim Elmaaroufi
  • George Varghese
  • Joseph E. González
  • Elizabeth Polgreen

Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs ``naturally'' know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce synthetic programming elicitation and compilation (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.

TCS Journal 2023 Journal Article

Towards more efficient methods for solving regular-expression heavy string constraints

  • Murphy Berzish
  • Joel D. Day
  • Vijay Ganesh
  • Mitja Kulczynski
  • Florin Manea
  • Federico Mora
  • Dirk Nowotka

Widespread use of string solvers in the formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context. Designing practical algorithms for the (generally undecidable) satisfiability problem for systems of string constraints requires a deep understanding of the structure of constraints present in the targeted cases. In this paper, we first investigate relevant benchmarks containing regular expression membership predicates, extract a series of first order logic theories, and prove the decidability, respectively undecidability, of their satisfiability problem. Further, building on the theoretical results, we present a novel length-aware solving algorithm for the quantifier-free first-order theory over regular expression membership predicates and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. Besides the theoretical aspects leading to this algorithm, a crucial insight that underpins it is that real-world regex and string formulas contain a wealth of information about upper and lower bounds on lengths of strings, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present several novel general heuristics, such as the prefix/suffix method, that can be used to make a variety of regex solving algorithms more efficient in practice. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers over this benchmark.