How much VRAM does embeddinggemma 300M need?

embeddinggemma 300M (0.30000001192092896B parameters) requires approximately 4.0 GB of memory with Q6_K quantization.

What is the best quantization for embeddinggemma 300M?

The recommended quantization for embeddinggemma 300M is Q6_K, which balances quality and memory efficiency.

Can it run?

Yes, MacBook Air M1 16GB can run embeddinggemma 300M with a C grade (Runs well). Expected decode speed: 46.4 tok/s.

CUsable

Runs well

Using Q6_K in Ollama

Capabilities:

Fit status

Runs well

Decode

46.4 tok/s

TTFT

4172 ms

Safe context

46K

Memory

4.0 GB / 11.5 GB

Weights0.2 GB

KV Cache0.8 GB

Runtime1.2 GB

Headroom1.7 GB

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	C	Runs well	42.8 tok/s	6573 ms	93K
Chat	C	Runs well	42.8 tok/s	2465 ms	23K
Coding	C	Runs well	46.4 tok/s	4172 ms	46K
RAG	C	Runs well	42.8 tok/s	8217 ms	93K
Reasoning	C	Runs well	42.8 tok/s	5341 ms	46K

How embeddinggemma 300M (0.30000001192092896B params) fits at each quantization level on MacBook Air M1 16GB (11.5 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	0.1 GB	Low	D30
Q3_K_S	3	0.1 GB	Low	D30
NVFP4	4

HuggingFace

huggingface-cli download hf-ggml-org--embeddinggemma-300m-gguf

Upgrade options

Intel Arc B580 12GBBudget pick

C229.8 tok/s decode

~$249 MSRP

RTX 3060 12GBBest value

C249.5 tok/s decode

~$329 MSRP

MacBook Pro M3 Pro 18GBBiggest leap

C115 tok/s decode

~$1,999 MSRP