The huggingface For
Local AI Deployments.
Discover and deploy production-ready local AI recipes in minutes. Run identical llama.cpp environments shared by the community with one command. Any model, any hardware. Fully local and reproducible.
What is Bloc?
Bloc Hub is an open platform for sharing and deploying reproducible local AI environments. Developers can publish complete llama.cpp recipes including models, quantization settings, runtimes, GPU optimizations, and configurations, while others can instantly run the exact same setup using a single CLI command. Instead of manually recreating local AI stacks from scattered guides and configs, Bloc Hub makes deployments portable, reproducible, and community-driven.
Build, Share, Run – instantly
Bloc Hub helps teams share reproducible local AI deployments from prototype to production. Discover community-built llama.cpp recipes, deploy identical environments instantly, and continuously improve performance across models, hardware, and runtimes.
Build
Create a reproducible recipe with your models, configs, runtimes, and optimizations.
Share
Publish to Bloc Hub and share with the community.
Run
Anyone can deploy the exact same environment with one command.
Everything needed to run local AI.
Deploy community-built llama.cpp recipes, reproduce exact environments, and optimize performance from prototype to production.
Auto-Hardware Prober
Automatically checks CPU instructions, VRAM thresholds, and GPU capabilities to guide configuration options.
Zero-Dependency Setup
Runtimes, libraries, and optimized model runners are pre-packaged and managed dynamically without manual build config.
OpenAI-Compatible Endpoint
Instantly serves endpoints at localhost:8080/v1. Drop in existing chat UI SDKs, LangChain, or Autogen scripts.
Granular Layer Offloading
Control layers in VRAM vs RAM to balance speed and memory usage.
Telemetry Benchmarks
Opt-in speed profiles verify tokens/sec across specific host chips.
Multi-Format Support
Compatible with GGUF, EXL2, AWQ, and custom quantized manifests.
Offline Sovereignty
Runs 100% offline, guaranteeing no text data leaves the host environment.
Ready-to-deploy local AI setups
Explore curated local AI recipes optimized for different models, GPUs, runtimes, and workloads deployable instantly with a single command.
New recipes are added constantly by developers, researchers, and local AI enthusiasts. Browse the full collection and find the setup that fits your workflow.
Built by the local AI community.
Share your own recipes, improve existing deployments, and collaborate on open infrastructure powering the next generation of local AI.
Local AI Recipes
Browse, customize, and share optimized llama.cpp recipes configured for different host hardware. Fully version-controlled and reproducible.
Next.js Web Hub
A collaborative registry and web interface to explore community-built setups, track model downloads, and manage remote endpoints.
Developer CLI Tool
A lightweight terminal client to deploy, benchmark, and serve local AI environments instantly with a single offline command.
Your next local AI stack is
one command away.
Frequently Asked Questions
The local infrastructure for collaborative team intelligence. Plug-and-play AI that stays in your office.