Back to registry
CUDA192GB VRAM

arnav080/step-3-7-flash-nvfp4

Step 3.7 Flash MoE (198B) quantized to 4-bit (NVFP4) optimized for dual RTX 6000 GPUs

Configuration Specifications

Base Modelhuggingface:stepfun-ai/Step-3.7-Flash-NVFP4
Enginellama.cpp
Quantizationmodelopt
Platformcuda
VRAM Required192GB minimum

Telemetry Benchmarks

No runs recorded yet

Be the first to benchmark this recipe! Run the CLI command from your terminal:

bloc run arnav080/step-3-7-flash-nvfp4
step-3-7-flash-nvfp4.yaml
Loading workspace editor...