TuRTLe Logo

If you have any inquiries or wish to collaborate: hpai@bsc.es

Welcome to the TuRTLe Model Leaderboard! TuRTLe is a unified evaluation framework designed to systematically assess Large Language Models (LLMs) in RTL (Register-Transfer Level) generation for hardware design. Evaluation criteria include syntax correctness, functional accuracy, synthesizability, and post-synthesis quality (PPA: Power, Performance, Area). TuRTLe integrates multiple benchmarks to highlight strengths and weaknesses of available LLMs. Use the filters below to explore different RTL benchmarks, simulators and models.

UPDATE (JULY 2025): Our TuRTLe paper has been accepted to MLCAD 2025 which will be held in September in Santa Cruz, California!

UPDATE (JULY 2025): Verilator has been added as an additional simulator alongside Icarus Verilog. You can now filter and compare results by simulator

UPDATE (JUNE 2025): We make our framework open-source on GitHub and we add 7 new recent models! For a total of 40 base and instruct models and 5 RTL benchmarks

Select Task
Select Benchmark
Select Simulator
Select Model Type
6.74 700
Type
Model
Parameters (B)
Syntax
Functionality
Synthesis
Post-Synthesis

🟒

32.8
93.82
77.67
77.37
76.79