CUDA8GB VRAM

arnav080/qwen3-30b-moe-8gb-cpu-offload

Name: arnav080/qwen3-30b-moe-8gb-cpu-offload
Author: arnav080

Qwen3-30B-A3B (MoE, 3B active params) on a single RTX 4060 Ti 8GB. Offloads all 41 MoE expert blocks to CPU RAM. Achieves 49 t/s at full 262k native context with 3.2 GB VRAM to spare. Trades ~10% speed for 4x the context vs partial GPU expert loading.

Configuration Specifications

Base Model	huggingface:Qwen/Qwen3-30B-A3B
Engine	llama.cpp
Quantization	Q4_K_M
Platform	cuda
VRAM Required	8GB minimum

Telemetry Benchmarks

No runs recorded yet

Be the first to benchmark this recipe! Run the CLI command from your terminal:

bloc run arnav080/qwen3-30b-moe-8gb-cpu-offload

qwen3-30b-moe-8gb-cpu-offload.yaml

Loading workspace editor...