Back to registry
CUDA192GB VRAM
arnav080/step-3-7-flash-nvfp4
Step 3.7 Flash MoE (198B) quantized to 4-bit (NVFP4) optimized for dual RTX 6000 GPUs
Configuration Specifications
| Base Model | huggingface:stepfun-ai/Step-3.7-Flash-NVFP4 |
| Engine | llama.cpp |
| Quantization | modelopt |
| Platform | cuda |
| VRAM Required | 192GB minimum |
Telemetry Benchmarks
No runs recorded yet
Be the first to benchmark this recipe! Run the CLI command from your terminal:
bloc run arnav080/step-3-7-flash-nvfp4
step-3-7-flash-nvfp4.yaml
Loading workspace editor...