Agentic Coding
CGranite 3.1 8B
This model is still usable for agentic-coding, but it is not the most specialized pick. It sits in the middle of the current model mix. It fits natively with comfortable headroom. Known channels: huggingface, ollama.
NVIDIA
The RTX 3060 12GB is one of the most popular entry points for local AI inference. Its generous 12 GB of GDDR6 VRAM — more than the RTX 3060 Ti — allows it to run 7B parameter models at full precision and 13B models with Q4 quantization. While its compute throughput is modest, the VRAM capacity makes it a budget-friendly option for getting started with local LLMs.
Official product page ↗Architecture
Ampere is NVIDIA's second-generation RTX architecture, built on Samsung's 8nm process. It introduced 3rd-generation Tensor Cores with support for sparsity-accelerated INT8 operations and improved FP16 throughput over Turing.
AI Relevance
Sparsity-aware Tensor Cores can effectively double throughput for structured sparse workloads. However, the lack of FP8 support means quantized inference is less efficient than Ada Lovelace or Blackwell.
Ampere is NVIDIA's second-generation RTX architecture, built on Samsung's 8nm process. It introduced 3rd-generation Tensor Cores with support for sparsity-accelerated INT8 operations and improved FP16 throughput over Turing.
The RTX 3060 uses the GA106 GPU die with 28 Streaming Multiprocessors, each containing 128 CUDA cores. While its 112 Tensor Cores are modest, they provide meaningful acceleration for quantized inference workloads.
Notably, the RTX 3060 uses a wider 192-bit memory bus than the RTX 3060 Ti (256-bit), but compensates with more VRAM chips to reach 12 GB total. For AI workloads, this VRAM advantage is significant — it determines which models can run entirely on-GPU versus requiring slower CPU offloading.
Agentic Coding
CThis model is still usable for agentic-coding, but it is not the most specialized pick. It sits in the middle of the current model mix. It fits natively with comfortable headroom. Known channels: huggingface, ollama.
Chat
CThis model is a direct match for chat. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.
Coding
CThis model is still usable for coding, but it is not the most specialized pick. It sits in the middle of the current model mix. It fits natively with comfortable headroom. Known channels: huggingface, ollama.
RAG
CThis model is a direct match for rag. It sits in the middle of the current model mix. It fits natively with comfortable headroom.
Reasoning
CThis model is a direct match for reasoning. It belongs to a current frontier family for local AI. It fits natively with comfortable headroom. Known channels: huggingface, ollama, lm-studio.
Just out of reach
High-quality models that need a bit more memory
Upgrade paths
See what you unlock with more powerful hardware
Upgrade options
~$1,999 MSRP
~$329 MSRP
~$1,000 MSRP
~$8,000 MSRP