Back to registry
CUDA8GB VRAM
arnav080/vllm-speculative-decoding
Stress test recipe for native vLLM engine utilizing draft models for speculative decoding
Configuration Specifications
| Base Model | huggingface:Qwen/Qwen2.5-1.5B-Instruct |
| Engine | llama.cpp |
| Quantization | Q4_K_M |
| Platform | cuda |
| VRAM Required | 8GB minimum |
Telemetry Benchmarks
No runs recorded yet
Be the first to benchmark this recipe! Run the CLI command from your terminal:
bloc run arnav080/vllm-speculative-decoding
vllm-speculative-decoding.yaml
Loading workspace editor...