about benchcad
Industrial CAD code generation requires more than recognizing the outer shape of a part: it requires understanding the part's 3D structure, inferring engineering parameters, and choosing CAD operations that reflect how the part would actually be designed and manufactured. Models that pass the eye too often fail the caliper — two programs may render to similar envelopes while differing substantially in editability, operation choice, and engineering detail.
BenchCAD is the first public CAD benchmark that combines four properties simultaneously:
- Execution-verified at scale — 17,900 sandbox-executed CadQuery parts across 106 industrial families.
- Standard-anchored — 49% of families (52/106) bound to real ISO / DIN / EN / ASME / IEC specification tables.
- Operation-rich — 46 distinct CadQuery ops including
helix,twistExtrude,polarArray, advanced sweeps and lofts. - Capability-decomposed — four matched tasks (Vision2Code · Vision QA · Code QA · Code Edit) that isolate visual recognition, parametric abstraction, and code synthesis.
three released datasets
Every family hand-crafted by domain experts from industrial standards. Croissant 1.0 metadata · code MIT · data CC-BY-4.0.
17,900Verified CadQuery parts — code · STEP · 4 canonical views · parameters · operation traces.
2,400Paired image / code numeric QA items along a four-level capability hierarchy.
748Verified before / after edit pairs across five edit types T1–T5.
106 industrial part families — fasteners, transmission, structural, fluid, panels, hardware, enclosures. 49% (52/106) anchored to real specification tables across 47 ISO / DIN / EN / ASME / IEC codes.
capability hierarchy
The same questions are evaluated under Vision QA (renders) and Code QA (source); the matched pair isolates whether a failure stems from visual recognition or from reasoning over the queried attribute.
L1 Holistic Visual Recognition · L2 CAD Operation Understanding · L3 Industrial Parametric Abstraction · L4 Compositional Spatial / Code Reasoning. Scoring is execution-grounded — voxel IoU for geometry, symmetric ratio accuracy for QA. No LLM judge.
team
The researchers behind BenchCAD.
BibTeX
@misc{benchcad2026,
title = {BenchCAD: A Comprehensive, Industry-Standard Benchmark for Programmatic CAD},
author = {Zhang, Haozhe and Liu, Kaichen and Chen, Miaomiao and Li, Lei
and Yang, Shaojie and Peng, Cheng and Chen, Hanjie},
year = {2026},
eprint = {2605.10865},
archivePrefix= {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2605.10865}
}