Will It Run AI · Calculator

Tell us what you own and what you want to do. We will rank the local models that make sense.

Start from your hardware and workload, then get a shortlist based on fit, speed, and runtime support instead of guessing from generic model lists or benchmark screenshots.

Live catalog snapshot: 167 hardware profiles, 269 models, 19 runtimes. That keeps the calculator aligned with the current catalog instead of a static benchmark list.

Now evaluating

NVIDIA A16 64GB

Workload

Coding

Runtime

ExLlamaV2

Ranking is a heuristic, not a benchmark. It blends workload match, catalog freshness, artifact support, fit safety, context coverage, memory utilization, throughput, and latency so the ordering stays explainable.

Inputs

Pick the hardware, runtime, and workload you want to test.

Use the detected hardware if it is right, override it if it is not, and rerun the ranking to compare realistic local AI options.

Browser detection

Collecting GPU metadata…

Awaiting detection

Update the hardware or workload and recalculate to refresh the ranking.

1. Fit

Memory fit and headroom decide whether a model is realistic on the selected hardware.

2. Workload

The score rewards models that match the selected task and penalizes stale or legacy families when newer specialist releases exist.

3. Speed

Decode throughput and TTFT keep the shortlist practical for real usage, not just theoretically possible runs.

Qwen

Qwen 2.5 Coder 32B

Why it wins

Qwen 2.5 Coder 32B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope.

Rank #1

Score

128.7

Fit status

Runs well

Fit: Runs well with 33K safe context.

Runtime support: native via EXL2 on cuda-local.

Runtime

ExLlamaV2

Artifact

EXL2

Quant

exl2-4bpw

Decode

23.8 tok/s

Safe ctx

33K

Official ctx

131K

Support

native

TTFT

11011 ms

Weights: 18.6 GB

KV cache: 5.0 GB

Backend: cuda-local

Score 128.7 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.

Qwen

Qwen 2.5 32B

Why it wins

Qwen 2.5 32B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope.

Rank #2

Score

125.0

Fit status

Runs well

Fit: Runs well with 33K safe context.

Runtime support: native via EXL2 on cuda-local.

Runtime

ExLlamaV2

Artifact

EXL2

Quant

exl2-4bpw

Decode

23.8 tok/s

Safe ctx

33K

Official ctx

131K

Support

native

TTFT

11011 ms

Weights: 18.6 GB

KV cache: 5.0 GB

Backend: cuda-local

Score 125.0 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.

Gemma

Gemma 3 27B

CurrentHugging FaceOllamaLM Studio

Why it wins

Gemma 3 27B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope. Known distribution channels: huggingface, ollama, lm-studio.

Rank #3

Score

121.2

Fit status

Runs well

Fit: Runs well with 38K safe context.

Runtime support: native via EXL2 on cuda-local.

Runtime

ExLlamaV2

Artifact

EXL2

Quant

exl2-4bpw

Decode

28.2 tok/s

Safe ctx

38K

Official ctx

131K

Support

native

TTFT

9290 ms

Weights: 15.7 GB

KV cache: 4.2 GB

Backend: cuda-local

Score 121.2 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.