
Generating correct SQL queries is a foundational requirement in data analysis and database-driven systems. I’m writing this because AI-powered SQL generators often perform well on simple prompts but fail when queries involve grouped calculations, aggregation boundaries, or multi-table relationships. These failures are subtle, easy to miss, and costly in production.
This article examines a deceptively simple question:
“What is the percentage of orders that have more than one product?”
Although the intent is clear, answering it correctly requires understanding SQL aggregation rules, grouping semantics, and execution constraints. This post evaluates how two popular open-source models—SQLCoder2 and SQLCoder-34B—approached the problem, why their outputs failed at execution time, and how fine-tuning a model with a targeted dataset produced a correct, executable result. The goal is to show where generic SQL generation breaks down and how fine-tuning improves reliability for structured queries.
Initial Attempts with Open-Source SQL Generators
To evaluate how existing AI SQL generators handle aggregation logic, the same natural-language prompt was tested against two widely used open-source models: SQLCoder2 and SQLCoder-34B. Both are designed to translate natural language into SQL and are commonly used for analytics workflows.
Prompt used:
What is the percentage of orders that have more than one product?This prompt tests whether the model understands grouping, aggregation order, and percentage calculation across related tables.
SQLCoder2 generated the following SQL query:
SELECT CAST(COUNT(DISTINCT CASE WHEN COUNT(shopify_orderlineitem.id) > 1 THEN shopify_order.id END) AS FLOAT) / NULLIF(COUNT(DISTINCT shopify_order.id), 0) AS proportion
FROM shopify_order
JOIN shopify_orderlineitem ON shopify_order.id = shopify_orderlineitem.order_id;Execution error:
SQL Error [42803]: ERROR: aggregate function calls cannot be nestedThis failure occurred because SQL does not allow aggregate functions such as COUNT() to be nested directly inside another aggregate without a grouping boundary. Identifying orders with multiple line items requires a grouped subquery or HAVING clause, which this query does not provide.
SQLCoder-34B produced the following query:
SELECT CAST(COUNT(DISTINCT o.id) filter (
WHERE COUNT(DISTINCT oi.product_id) > 1) AS FLOAT) / NULLIF(COUNT(DISTINCT o.id), 0) AS proportion
FROM shopify_order o
JOIN shopify_orderlineitem oi ON o.id = oi.order_id
GROUP BY o.id;
Execution error:
SQL Error [42803]: ERROR: aggregate functions are not allowed in FILTERThis query incorrectly places an aggregate condition inside a FILTER clause. SQL requires aggregation conditions such as COUNT(*) > 1 to be evaluated in a grouped context using HAVING, not inside a filter predicate.
The failures from both SQLCoder2 and SQLCoder-34B highlight common weaknesses in generic SQL generation models:
These limitations explain why syntactically plausible SQL can still fail at runtime and why domain-specific training is often required.
After identifying consistent structural failures in existing models, a custom fine-tuning approach was used to explicitly teach aggregation patterns and execution-safe SQL construction for this class of queries.
A custom dataset was built around the original prompt and variations involving grouped counts, percentage calculations, and multi-table joins. Each entry paired a natural-language question with an executable SQL query that followed correct aggregation rules.
The dataset contained 196 samples, split into training and validation sets to balance learning and generalization.
This split provides a substantial amount of data for training while reserving a significant portion for testing, helping to ensure the model's ability to generalize to new, unseen examples.
Fine-tuning completed in 43 minutes, demonstrating the efficiency of adapting an existing model rather than training from scratch. Training and validation loss curves remained aligned, indicating stable learning without overfitting.
To assess the quality of our training, we plotted the training loss against the validation loss over the course of the training process. This visualization helps us understand how well the model is learning and whether it's overfitting or underfitting the data.
The loss curves indicate a good fit.
This indicates the fine-tuned model learned the target SQL patterns without simply memorizing the dataset.
Walk away with actionable insights on AI adoption.
Limited seats available!
By taking this custom training approach, we were able to address the limitations of existing models and create a solution tailored to our specific SQL generation needs. The next section will reveal the results of this training and the SQL query our model was able to produce.
After our custom training process, we achieved a significant improvement in the model's ability to generate the correct SQL query for our specific problem. Let's examine the successful query and analyze its structure and functionality.
The fine-tuned model generated the following executable SQL:
select
(COUNT(*) * 100.0 / (
select
COUNT(*)
from
shopify_order))
from
shopify_order
where
id in (
select
order_id
from
shopify_orderlineitem
group by
order_id
having
COUNT(*) > 1);This query correctly isolates orders with multiple products before calculating the overall percentage.
1. Overall Structure:
2. Key Features:
3. Functionality:
4. Improvements:
5. Potential Optimization:
This query successfully solves the problem, showing the model's improved SQL comprehension and ability to handle complex queries without common errors.
A transfer-learning approach was used, combining Hugging Face Transformers, PEFT, and LoRA to adapt the model efficiently. LoRA reduced trainable parameters while preserving the base model’s general knowledge.
We utilized a robust stack of modern machine-learning tools and libraries for this project:
1. PyTorch: As our primary deep learning framework, PyTorch provided the foundation for model manipulation and training.
2. Hugging Face Transformers: This library offered easy access to state-of-the-art pre-trained models and tools for fine-tuning.
3. PEFT (Parameter-Efficient Fine-Tuning): A library from Hugging Face that implements various efficient fine-tuning methods, including LoRA.
4. Datasets: Another Hugging Face library used for efficient data handling and preprocessing.
5. Accelerate: This library was used to enable distributed training and mixed precision, optimizing our training process.
LoRA is an efficient fine-tuning technique introduced in 2021 by Microsoft researchers. It addresses the challenge of adapting large language models to specific tasks without the need to retrain all parameters, which can be computationally expensive and prone to overfitting.
Key Concepts:
1. Low-Rank Matrices: LoRA works by adding pairs of rank-decomposition matrices to the weights of the original model. These matrices are initialized to a zero matrix, ensuring that the model starts from the pre-trained weights.
2. Trainable Parameters: During fine-tuning, only these low-rank matrices are updated, while the original model weights remain frozen. This significantly reduces the number of trainable parameters.
3. Adaptability: Despite the reduced parameter count, LoRA can adapt the model to new tasks effectively, often matching or exceeding the performance of full fine-tuning.
For more information : [check this paper]
Targeted datasets significantly improve SQL generation accuracy by teaching execution-safe patterns. Fine-tuning allows models to internalize aggregation boundaries that generic models often misapply.
Fine-tuning pre-trained models, rather than training from scratch, offers several advantages:
1. Reduced training time (our process took only 43 minutes)
2. Lower computational resource requirements
3. Ability to leverage the general knowledge of large language models while adapting to specific tasks
4. Improved performance on niche problems that general models might struggle with
These lessons underscore the value of customization in AI development, especially for specialized applications like SQL query generation. By combining the power of existing models with targeted fine-tuning, we can create more effective and efficient AI tools for specific data analysis needs.
Suggested Reads- How To Use Open Source LLMs (Large Language Model)
As we reflect on our journey of fine-tuning an AI model for SQL query generation, it's clear that while we've made significant progress, there's still room for improvement and expansion. This section explores potential enhancements to our model and discusses how our approach could be applied to other SQL-related tasks.
Walk away with actionable insights on AI adoption.
Limited seats available!
1. Expanding the training dataset:
2. Fine-tuning hyperparameters:
3. Incorporating user feedback:
1. Query optimization:
2. Schema design:
3. SQL to natural language:
4. Data cleaning and preprocessing:
5. Temporal and spatial data queries:
6. Security and access control:
By pursuing these improvements and exploring new applications, we can continue to push the boundaries of AI-assisted SQL query generation.
While effective for this query class, the approach remains dataset-specific, requires maintenance as schemas evolve, and depends on the quality of the base model.
1. Dataset Specificity: The model was trained on a custom dataset tailored to this particular problem. This means it may not generalize well to significantly different SQL query tasks without additional training.
2. Time and Resource Intensive: Creating a custom dataset and training the model required significant time and computational resources. This approach may not be feasible for every SQL query challenge, especially in time-sensitive situations.
3. Limited Scope: Our solution addresses a specific type of SQL query. It may not be equipped to handle a wide range of SQL operations or complex database schemas without further training.
4. Maintenance and Updates: As database structures or requirements change, the model may need to be retrained or fine-tuned, which requires ongoing effort and expertise.
5. Dependency on Initial Models: Our approach is built upon existing models (SQLCoder2). The quality and limitations of these base models influence the final output, even after custom training.
This work demonstrates that off-the-shelf SQL generation models often fail on aggregation-heavy queries despite producing syntactically convincing output. Fine-tuning with a focused dataset enables models to learn correct execution patterns and produce reliable SQL for specific use cases.
AI-assisted SQL generation benefits most when paired with domain knowledge, validation, and targeted training, making fine-tuning a practical path to production-ready accuracy rather than relying solely on generic models.
Walk away with actionable insights on AI adoption.
Limited seats available!