Qwen
Qwen 2.5 Coder 32B
Why it wins
Qwen 2.5 Coder 32B is a specialized fit for Coding. It sits in the middle of the current generation mix. It fits natively with comfortable headroom. Context coverage stays within the requested workload envelope.
Score
128.7
Fit status
Runs well
Fit: Runs well with 33K safe context.
Runtime support: native via EXL2 on cuda-local.
Runtime
ExLlamaV2
Artifact
EXL2
Quant
exl2-4bpw
Decode
23.8 tok/s
Safe ctx
33K
Official ctx
131K
Support
native
TTFT
11011 ms
Weights: 18.6 GB
KV cache: 5.0 GB
Backend: cuda-local
Score 128.7 combines workload match, catalog freshness, fit safety, context coverage, artifact choice, memory utilization, throughput, and latency.