How much VRAM does MPT-30B-Instruct need?

MPT-30B-Instruct (30B parameters) requires approximately 32.3 GB of memory with Q5_K_M quantization.

What is the best quantization for MPT-30B-Instruct?

The recommended quantization for MPT-30B-Instruct is Q5_K_M, which balances quality and memory efficiency.

Can it run?

Can NVIDIA L40S 48GB run MPT-30B-Instruct?

Q: Can NVIDIA L40S 48GB run MPT-30B-Instruct?

Yes, NVIDIA L40S 48GB can run MPT-30B-Instruct with a C grade (Runs well). Expected decode speed: 31.8 tok/s.

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

31.8 tok/s

TTFT

6083 ms

Safe context

Memory

32.3 GB / 48.0 GB

Memory breakdown

Weights21.6 GB

KV Cache4.7 GB

Runtime1.2 GB

Headroom4.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	31.8 tok/s	8849 ms	8K
Chat	C	Runs well	31.8 tok/s	3318 ms	8K
Coding	C	Runs well	31.8 tok/s	6083 ms	8K
RAG	C	Runs well	31.8 tok/s	11061 ms	8K
Reasoning	C	Runs well	31.8 tok/s	7190 ms	8K

Quantization options

How MPT-30B-Instruct (30B params) fits at each quantization level on NVIDIA L40S 48GB (48.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	11.7 GB	Low	D34
Q3_K_S	3	14.7 GB	Low	D35
NVFP4	4	16.8 GB	Medium	D36
Q4_K_M	4	18.3 GB	Medium	D37
Q5_K_M	5	21.6 GB	High	D38
Q6_K	6	24.6 GB	High	D40
Q8_0Best for your GPU	8	32.1 GB	Very High	C43
F16	16	61.5 GB	Maximum	F0

Get started

Ollama

ollama run mpt-30b-instruct

HuggingFace

huggingface-cli download mpt-30b-instruct

See all results for NVIDIA L40S 48GB See all hardware for MPT-30B-Instruct

Can it run?

Can NVIDIA L40S 48GB run MPT-30B-Instruct?

CUsable

Runs well

Using Q5_K_M in Ollama

Capabilities:

Fit status

Runs well

Decode

31.8 tok/s

TTFT

6083 ms

Safe context

Memory

32.3 GB / 48.0 GB

Memory breakdown

Weights21.6 GB

KV Cache4.7 GB

Runtime1.2 GB

Headroom4.8 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	31.8 tok/s	8849 ms	8K
Chat	C	Runs well	31.8 tok/s	3318 ms	8K
Coding	C	Runs well	31.8 tok/s	6083 ms	8K
RAG	C	Runs well	31.8 tok/s	11061 ms	8K
Reasoning	C	Runs well	31.8 tok/s	7190 ms	8K

Quantization options

How MPT-30B-Instruct (30B params) fits at each quantization level on NVIDIA L40S 48GB (48.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	11.7 GB	Low	D34
Q3_K_S	3	14.7 GB	Low	D35
NVFP4	4	16.8 GB	Medium	D36
Q4_K_M	4	18.3 GB	Medium	D37
Q5_K_M	5	21.6 GB	High	D38
Q6_K	6	24.6 GB	High	D40
Q8_0Best for your GPU	8	32.1 GB	Very High	C43
F16	16	61.5 GB	Maximum	F0

Get started

Ollama

ollama run mpt-30b-instruct

HuggingFace

huggingface-cli download mpt-30b-instruct

See all results for NVIDIA L40S 48GB See all hardware for MPT-30B-Instruct