Z.ai
GLM-5
GLM family, 744B parameters, recommended as Q4_K_M for first-pass local usage in V1.
Params
744B
Context
200K
License
Custom
Best runtime
vLLM
Recommended hardware
First-pass fit across priority GPUs
| Hardware | Fit | Decode | Safe ctx |
|---|---|---|---|
| NVIDIA A10 24GB | Too heavy | 2 tok/s | 4K |
| NVIDIA A100 40GB | Too heavy | 4.5 tok/s | 4K |
| NVIDIA A100 80GB | Too heavy | 6 tok/s | 4K |
| NVIDIA A16 64GB | Too heavy | 2 tok/s | 4K |