Back to registry
CUDA8GB VRAM

arnav080/vllm-speculative-decoding

Stress test recipe for native vLLM engine utilizing draft models for speculative decoding

Configuration Specifications

Base Modelhuggingface:Qwen/Qwen2.5-1.5B-Instruct
Enginellama.cpp
QuantizationQ4_K_M
Platformcuda
VRAM Required8GB minimum

Telemetry Benchmarks

No runs recorded yet

Be the first to benchmark this recipe! Run the CLI command from your terminal:

bloc run arnav080/vllm-speculative-decoding
vllm-speculative-decoding.yaml
Loading workspace editor...