benchcad leaderboard

Model performance across four matched programmatic-CAD tasks. Scoring is execution-grounded and objective — geometry by voxel IoU between executed STEP solids, QA by symmetric ratio accuracy. No LLM judge. Numbers are re-graded from submitted predictions, never self-reported.

reproduce this task — full split
Loading…
frontier (proprietary) open weights control / baseline

To add a model to the this leaderboard, run the command above and open a PR on the code repo with your results.jsonl. Submissions are re-graded before listing.