TuRTLe Logo

Welcome to the TuRTLe Model Leaderboard! TuRTLe is a unified evaluation framework designed to systematically assess Large Language Models (LLMs) in RTL (Register-Transfer Level) generation for hardware design. Evaluation criteria include syntax correctness, functional accuracy, synthesizability, and post-synthesis quality (PPA: Power, Performance, Area). TuRTLe integrates multiple benchmarks to highlight strengths and weaknesses of available LLMs. Use the filters below to explore different RTL benchmarks and models.

If you have any inquiries or wish to collaborate: hpai@bsc.es

Select Task
Select Benchmark
Select Model Type
6.74 700
Type
Model
Params
Aggregated ⬆️
Avg STX
Avg FNC
Avg SYN
Avg Power
Avg Perf
Avg Area

🟒

32.5
74.84
93.99
73.75
71.12
37.29
34.88
36.63