Compare

Compare local AI hardware with workload-aware output.

Side A

NVIDIA A16 64GB

Best current pick for coding:

Qwen 2.5 Coder 32B

  • Runtime: ExLlamaV2
  • Decode: 23.8 tok/s
  • TTFT: 11011 ms

Side B

NVIDIA A40 48GB

Best current pick for coding:

Gemma 3 27B

  • Runtime: ExLlamaV2
  • Decode: 32.7 tok/s
  • TTFT: 8009 ms