How much VRAM does Qwen3.5 35B A3B need?

Qwen3.5 35B A3B (35B parameters) requires approximately 32.8 GB of memory with Q4_K_M quantization.

What is the best quantization for Qwen3.5 35B A3B?

The recommended quantization for Qwen3.5 35B A3B is Q4_K_M, which balances quality and memory efficiency.

Can it run?

Can NVIDIA L40S 48GB run Qwen3.5 35B A3B?

Q: Can NVIDIA L40S 48GB run Qwen3.5 35B A3B?

Yes, NVIDIA L40S 48GB can run Qwen3.5 35B A3B with a C grade (Runs well). Expected decode speed: 31.6 tok/s.

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

31.6 tok/s

TTFT

6133 ms

Safe context

23K

Memory

32.8 GB / 48.0 GB

Memory breakdown

Weights21.3 GB

KV Cache5.5 GB

Runtime1.2 GB

Headroom4.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	31.6 tok/s	8921 ms	40K
Chat	C	Runs well	31.6 tok/s	3345 ms	13K
Coding	C	Runs well	31.6 tok/s	6133 ms	23K
RAG	C	Runs well	31.6 tok/s	11151 ms	40K
Reasoning	C	Runs well	31.6 tok/s	7248 ms	23K

Quantization options

How Qwen3.5 35B A3B (35B params) fits at each quantization level on NVIDIA L40S 48GB (48.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	13.7 GB	Low	D36
Q3_K_S	3	17.2 GB	Low	D37
NVFP4	4	19.6 GB	Medium	D38
Q4_K_M	4	21.3 GB	Medium	D39
Q5_K_M	5	25.2 GB	High	C41
Q6_K	6	28.7 GB	High	C42
Q8_0Best for your GPU	8	37.5 GB	Very High	C45
F16	16	71.8 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download hf-unsloth--qwen3-5-35b-a3b-gguf

See all results for NVIDIA L40S 48GB See all hardware for Qwen3.5 35B A3B

Can it run?

Can NVIDIA L40S 48GB run Qwen3.5 35B A3B?

CUsable

Runs well

Using Q4_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

31.6 tok/s

TTFT

6133 ms

Safe context

23K

Memory

32.8 GB / 48.0 GB

Memory breakdown

Weights21.3 GB

KV Cache5.5 GB

Runtime1.2 GB

Headroom4.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	31.6 tok/s	8921 ms	40K
Chat	C	Runs well	31.6 tok/s	3345 ms	13K
Coding	C	Runs well	31.6 tok/s	6133 ms	23K
RAG	C	Runs well	31.6 tok/s	11151 ms	40K
Reasoning	C	Runs well	31.6 tok/s	7248 ms	23K

Quantization options

How Qwen3.5 35B A3B (35B params) fits at each quantization level on NVIDIA L40S 48GB (48.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	13.7 GB	Low	D36
Q3_K_S	3	17.2 GB	Low	D37
NVFP4	4	19.6 GB	Medium	D38
Q4_K_M	4	21.3 GB	Medium	D39
Q5_K_M	5	25.2 GB	High	C41
Q6_K	6	28.7 GB	High	C42
Q8_0Best for your GPU	8	37.5 GB	Very High	C45
F16	16	71.8 GB	Maximum	F0

Get started

HuggingFace

huggingface-cli download hf-unsloth--qwen3-5-35b-a3b-gguf

See all results for NVIDIA L40S 48GB See all hardware for Qwen3.5 35B A3B