Arrow Research search
Back to TMLR

TMLR 2025

CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement

Journal Article Articles Artificial Intelligence · Machine Learning

Abstract

Large Language Models (LLMs) have revolutionized code generation but are require significant resources and tend to over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs is a cost-effective alternative, yet standard supervised approaches rely solely on correct examples, overlooking valuable insights from failures. We introduce CodeLutra, a new framework that leverages both correct and incorrect code attempts. Instead of purely instructing with correct solutions, CodeLutra uses iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This process narrows the performance gap with state-of-the-art, larger models, without requiring massive datasets or auxiliary models. For example, on a challenging data science coding task, using only 500 samples improved Llama-3-8B’s accuracy from 28.2% to 48.6%, approaching GPT-4’s level. By capitalizing on both successes and mistakes, \textsc{CodeLutra} offers a scalable, efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Transactions on Machine Learning Research
Archive span
2022-2026
Indexed papers
3849
Paper id
733030711686244099