CUDA8GB VRAM

arnav080/vllm-speculative-decoding

Name: arnav080/vllm-speculative-decoding
Author: arnav080

Stress test recipe for native vLLM engine utilizing draft models for speculative decoding

Configuration Specifications

Base Model	huggingface:Qwen/Qwen2.5-1.5B-Instruct
Engine	llama.cpp
Quantization	Q4_K_M
Platform	cuda
VRAM Required	8GB minimum

No runs recorded yet

Be the first to benchmark this recipe! Run the CLI command from your terminal:

bloc run arnav080/vllm-speculative-decoding

vllm-speculative-decoding.yaml

Loading workspace editor...