Method

Catalog, fit engine, and recommendation logic stay separate.

V1 is intentionally simple, but the architecture already separates the catalog from the inference heuristics and the recommendation layer. That makes the output easier to improve and easier to explain.

1. Catalog

Curated hardware, runtimes, workloads, and model variants power the first recommendation set.

2. Fit engine

The app estimates memory fit, safe context, decode throughput, and rough TTFT from bandwidth, quant assumptions, and workload shape.

3. Recommendation

Final ranking blends workload match, quality tier, fit safety, and performance instead of preferring the biggest model by default.