1. Catalog
Curated hardware, runtimes, workloads, and model variants power the first recommendation set.
Method
V1 is intentionally simple, but the architecture already separates the catalog from the inference heuristics and the recommendation layer. That makes the output easier to improve and easier to explain.
1. Catalog
Curated hardware, runtimes, workloads, and model variants power the first recommendation set.
2. Fit engine
The app estimates memory fit, safe context, decode throughput, and rough TTFT from bandwidth, quant assumptions, and workload shape.
3. Recommendation
Final ranking blends workload match, quality tier, fit safety, and performance instead of preferring the biggest model by default.