How much VRAM does Jina Embeddings v3 need?

Jina Embeddings v3 (0.5720000267028809B parameters) requires approximately 2.7 GB of memory with F16 quantization.

What is the best quantization for Jina Embeddings v3?

The recommended quantization for Jina Embeddings v3 is F16, which balances quality and memory efficiency.

Can it run?

Can GTX 1650 4GB run Jina Embeddings v3?

Q: Can GTX 1650 4GB run Jina Embeddings v3?

Yes, GTX 1650 4GB can run Jina Embeddings v3 with a B grade (Runs well). Expected decode speed: 64.0 tok/s.

BGood

Runs well

Using F16 in Ollama

Capabilities:

Fit status

Runs well

Decode

64.0 tok/s

TTFT

3025 ms

Safe context

Memory

2.7 GB / 4.0 GB

Memory breakdown

Weights0.3 GB

KV Cache0.8 GB

Runtime1.2 GB

Headroom0.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	64.0 tok/s	4400 ms	8K
Chat	B	Runs well	64.0 tok/s	1650 ms	8K
Coding	B	Runs well	64.0 tok/s	3025 ms	8K
RAG	B	Runs well	64.0 tok/s	5500 ms	8K
Reasoning	B	Runs well	64.0 tok/s	3575 ms	8K

Quantization options

How Jina Embeddings v3 (0.5720000267028809B params) fits at each quantization level on GTX 1650 4GB (4.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	0.2 GB	Low	D30
Q3_K_S	3	0.3 GB	Low	D31
NVFP4	4	0.3 GB	Medium	D31
Q4_K_M	4	0.3 GB	Medium	D31
Q5_K_M	5	0.4 GB	High	D31
Q6_K	6	0.5 GB	High	D32
Q8_0	8	0.6 GB	Very High	D32
F16Best for your GPU	16	1.2 GB	Maximum	D36

Get started

Ollama

ollama run jina-embeddings-v3

HuggingFace

huggingface-cli download jina-embeddings-v3

See all results for GTX 1650 4GB See all hardware for Jina Embeddings v3

Can it run?

Can GTX 1650 4GB run Jina Embeddings v3?

BGood

Runs well

Using F16 in Ollama

Capabilities:

Fit status

Runs well

Decode

64.0 tok/s

TTFT

3025 ms

Safe context

Memory

2.7 GB / 4.0 GB

Memory breakdown

Weights0.3 GB

KV Cache0.8 GB

Runtime1.2 GB

Headroom0.4 GB

Performance by workload

Workload	Grade	Fit	Decode	TTFT	Context
Agentic Coding	B	Runs well	64.0 tok/s	4400 ms	8K
Chat	B	Runs well	64.0 tok/s	1650 ms	8K
Coding	B	Runs well	64.0 tok/s	3025 ms	8K
RAG	B	Runs well	64.0 tok/s	5500 ms	8K
Reasoning	B	Runs well	64.0 tok/s	3575 ms	8K

Quantization options

How Jina Embeddings v3 (0.5720000267028809B params) fits at each quantization level on GTX 1650 4GB (4.0 GB usable).

Quant	Bits	VRAM	Quality	Fit
Q2_K	2	0.2 GB	Low	D30
Q3_K_S	3	0.3 GB	Low	D31
NVFP4	4	0.3 GB	Medium	D31
Q4_K_M	4	0.3 GB	Medium	D31
Q5_K_M	5	0.4 GB	High	D31
Q6_K	6	0.5 GB	High	D32
Q8_0	8	0.6 GB	Very High	D32
F16Best for your GPU	16	1.2 GB	Maximum	D36

Get started

Ollama

ollama run jina-embeddings-v3

HuggingFace

huggingface-cli download jina-embeddings-v3

See all results for GTX 1650 4GB See all hardware for Jina Embeddings v3