LLM SQL Generation Benchmark Results

We assessed the ability of popular LLMs to generate accurate and efficient SQL from natural language prompts. Using a 200 million record dataset from the GH Archive uploaded to Tinybird, we asked the LLMs to generate SQL based on 50 prompts. The results are shown below and can be compared to a human baseline.

Model Results for "top 10 Repositories with the most people who have push access to the main branch"

human
Success
Yes
--
1,191 ms
0 s
1
82,975,620
253
0
2,528.68 MB
claude-3.5-sonnet
Success
Yes
0.00
57 ms
2.59 s
1
459,796
217
5,346
0.99 MB
claude-3.7-sonnet
Success
Yes
0.00
297 ms
2.058 s
1
75,689,153
217
5,350
2,195.14 MB
deepseek-chat-v3-0324
Success
No
0.00
223 ms
3.842 s
3
75,689,153
186
4,849
2,195.14 MB
deepseek-chat-v3-0324:free
Success
No
0.00
215 ms
3.991 s
2
75,689,153
224
4,716
2,195.14 MB
gemini-2.0-flash-001
Success
Yes
0.00
398 ms
0.893 s
1
75,689,153
206
4,724
2,195.14 MB
gemini-2.5-flash-preview
Success
Yes
0.00
208 ms
1.659 s
1
75,689,153
224
4,726
2,195.14 MB
gemini-2.5-pro-preview-05-06
Success
Yes
87.18
742 ms
14.474 s
1
75,689,153
251
6,067
2,422.49 MB
llama-4-maverick
Success
Yes
0.00
276 ms
1.181 s
1
75,689,153
213
4,226
2,195.14 MB
llama-4-scout
Success
Yes
0.00
49 ms
1.201 s
1
75,689,153
256
4,228
144.35 MB
llama-3.3-70b-instruct
Failed
No
0.00
17 ms
1.818 s
3
0
292
4,506
0.00 MB
ministral-8b
Success
Yes
0.00
63 ms
0.986 s
1
75,689,153
221
4,619
144.35 MB
mistral-small-3.1-24b-instruct
Success
Yes
0.00
228 ms
1.452 s
1
75,689,153
207
4,606
2,143.92 MB
mistral-nemo
Success
Yes
0.00
68 ms
2.062 s
1
75,689,153
240
4,617
144.35 MB
gpt-4.1
Success
Yes
0.00
213 ms
1.799 s
1
75,689,153
224
4,225
2,195.14 MB
gpt-4.1-nano
Success
No
0.00
237 ms
2.596 s
2
75,689,153
207
4,334
2,195.14 MB
gpt-4o-mini
Success
Yes
0.00
57 ms
1.832 s
1
427,028
200
4,214
0.95 MB
o3-mini
Success
Yes
0.00
225 ms
8.67 s
1
75,689,153
208
4,934
2,195.14 MB
o4-mini
Success
Yes
0.00
234 ms
7.609 s
1
75,689,153
236
4,948
2,195.14 MB
o4-mini-high
Success
Yes
0.00
235 ms
11.276 s
1
75,689,153
206
5,198
2,195.14 MB