LLM SQL Generation Benchmark Results

We assessed the ability of popular LLMs to generate accurate and efficient SQL from natural language prompts. Using a 200 million record dataset from the GH Archive uploaded to Tinybird, we asked the LLMs to generate SQL based on 50 prompts. The results are shown below and can be compared to a human baseline.

Show metrics relative to human baseline

Model Results for "top 10 Repositories with the most steady star growth rate over time"

human

Success

Yes

260 ms

0 s

7,319,235

446

48.82 MB

claude-3.5-sonnet

Success

Yes

0.00

174 ms

3.444 s

7,319,235

368

5,410

48.82 MB

claude-3.7-sonnet

Failed

0.00

21 ms

4.407 s

692

6,115

0.00 MB

deepseek-chat-v3-0324

Success

Yes

0.00

68 ms

1.703 s

7,319,235

146

4,176

20.90 MB

deepseek-chat-v3-0324:free

Success

Yes

0.00

99 ms

6.679 s

7,319,235

244

4,578

48.82 MB

gemini-2.0-flash-001

Success

Yes

0.00

94 ms

0.929 s

7,319,235

153

4,702

20.90 MB

gemini-2.5-flash-preview

Success

Yes

0.00

87 ms

1.717 s

7,319,235

205

4,716

48.82 MB

gemini-2.5-pro-preview-05-06

Failed

0.00

13 ms

142.744 s

491

14,050

0.00 MB

llama-4-maverick

Success

Yes

0.00

69 ms

2.212 s

7,319,235

193

4,217

27.88 MB

llama-4-scout

Success

Yes

0.00

1,625 ms

1.614 s

56,050,135

419

4,280

373.56 MB

llama-3.3-70b-instruct

Failed

0.00

19 ms

2.118 s

161

4,427

0.00 MB

ministral-8b

Failed

0.00

22 ms

0.938 s

197

4,841

0.00 MB

mistral-small-3.1-24b-instruct

Failed

0.00

17 ms

3.227 s

408

5,004

0.00 MB

mistral-nemo

Success

Yes

0.00

80 ms

3.66 s

7,319,235

202

4,613

20.90 MB

gpt-4.1

Success

Yes

0.00

74 ms

1.676 s

7,319,235

296

4,241

48.82 MB

gpt-4.1-nano

Success

Yes

0.00

75 ms

1.347 s

7,319,235

327

4,247

48.82 MB

gpt-4o-mini

Success

0.00

99 ms

2.187 s

7,319,235

351

4,391

48.82 MB

o3-mini

Success

0.00

259 ms

19.099 s

7,319,235

335

6,600

48.82 MB

o4-mini

Success

Yes

0.00

150 ms

23.248 s

7,319,235

445

5,975

48.82 MB

o4-mini-high

Success

Yes

0.00

292 ms

47.537 s

7,319,235

369

8,960

48.82 MB