NeurIPS Conference 2025 Conference Paper
- Wei Zhou
- Guoliang Li
- Haoyu Wang
- Yuxing Han
- Xufei Wu
- Fan Wu
- Xuanhe Zhou
Large language models (LLMs) have shown increasing effectiveness in Text-to-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a. k. a. , SQL-to-SQL), which adapts a query written for one database system (e. g. , MySQL) into its equivalent one for another system (e. g. , ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e. g. , customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQL Translation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e. g. , LLMS achieve lower than 38. 53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28, 003 translation (for extensive syntax testing) and PARROT-Simple with 5, 306 representative samples (for focused stress testing), covering 22 production-grade database systems. To promote future research, we release a public leaderboard and source code at: https: //code4db. github. io/parrot-bench/.