1-command deployments·100% local and open-source·Community-powered AI infrastructure

The huggingface For Local AI Deployments.

Discover and deploy production-ready local AI recipes in minutes. Run identical llama.cpp environments shared by the community with one command. Any model, any hardware. Fully local and reproducible.

$bloc deploy arnav/qwen-3.5-9b-super
Alibaba Cloud Logo
Meta Logo
Qwen AI Logo
Mistral AI Logo
MiniMax Logo

What is Bloc?

Bloc Hub is an open platform for sharing and deploying reproducible local AI environments. Developers can publish complete llama.cpp recipes including models, quantization settings, runtimes, GPU optimizations, and configurations, while others can instantly run the exact same setup using a single CLI command. Instead of manually recreating local AI stacks from scattered guides and configs, Bloc Hub makes deployments portable, reproducible, and community-driven.

Build, Share, Run – instantly

Bloc Hub helps teams share reproducible local AI deployments from prototype to production. Discover community-built llama.cpp recipes, deploy identical environments instantly, and continuously improve performance across models, hardware, and runtimes.

01

Build

Create a reproducible recipe with your models, configs, runtimes, and optimizations.

02

Share

Publish to Bloc Hub and share with the community.

03

Run

Anyone can deploy the exact same environment with one command.

Everything needed to run local AI.

Deploy community-built llama.cpp recipes, reproduce exact environments, and optimize performance from prototype to production.

Auto-Hardware Prober

Automatically checks CPU instructions, VRAM thresholds, and GPU capabilities to guide configuration options.

Zero-Dependency Setup

Runtimes, libraries, and optimized model runners are pre-packaged and managed dynamically without manual build config.

OpenAI-Compatible Endpoint

Instantly serves endpoints at localhost:8080/v1. Drop in existing chat UI SDKs, LangChain, or Autogen scripts.

Granular Layer Offloading

Control layers in VRAM vs RAM to balance speed and memory usage.

Telemetry Benchmarks

Opt-in speed profiles verify tokens/sec across specific host chips.

Multi-Format Support

Compatible with GGUF, EXL2, AWQ, and custom quantized manifests.

Offline Sovereignty

Runs 100% offline, guaranteeing no text data leaves the host environment.

Ready-to-deploy local AI setups

Explore curated local AI recipes optimized for different models, GPUs, runtimes, and workloads deployable instantly with a single command.

New recipes are added constantly by developers, researchers, and local AI enthusiasts. Browse the full collection and find the setup that fits your workflow.

Your next local AI stack is
one command away.

Frequently Asked Questions

What is a local AI recipe?+
A recipe is a pre-configured, reproducible environment descriptor that defines the model weight quantization, llama.cpp startup parameters, system prompts, and required hardware profile (CPU, Metal, or specific CUDA setups) to run an LLM optimally.
How do I deploy a recipe?+
Install the Bloc CLI using npm or brew, then run 'bloc deploy <recipe-name>'. The CLI pulls the optimized weights and handles dependencies, starting a local API server in seconds.
Is Bloc Hub fully open-source and local?+
Yes. Bloc Hub is built entirely on open infrastructure. All models run completely on your own machine without sending data to external APIs, ensuring absolute privacy.
What hardware is supported?+
We support Apple Silicon (Metal acceleration), Nvidia GPUs (CUDA), AMD GPUs (ROCm), and standard CPU-only configurations. The registry automatically matches recipes to your hardware.
How can I share my own recipe?+
Simply define your hardware targets and run parameters in a 'bloc.yaml' file and use the 'bloc publish' command to share it with the registry.

The local infrastructure for collaborative team intelligence. Plug-and-play AI that stays in your office.

© 2026 Bloc AI Inc. All rights reserved.Designed for local intelligence