How Bloc Works

Learn about the core design principles, execution lifecycle, and architecture of Bloc.

Bloc is designed as a version-agnostic, local-first orchestrator for local AI model deployments. It doesn't fight the rapid velocity of upstream engines like llama.cpp or vLLM — instead, it decouples metadata indexing from execution logic.

The Sealed Envelope Pattern

A recipe is a YAML document split into two distinct responsibility layers:

┌─────────────────────────────────────────────────────────┐
│                    BLOC RECIPE (.yaml)                  │
├─────────────────────────────────────────────────────────┤
│ LAYER 1: Registry Metadata                              │
│ (Parsed & indexed by the Hub website)                   │
│ - name, tags, base_model, min_vram, target_platform     │
│ - engine.name, engine.runtime, model.hf_repo            │
├─────────────────────────────────────────────────────────┤
│ LAYER 2: Engine Configuration                           │
│ (Opaque payload, executed verbatim by the CLI)           │
│ - engine_config (ctx_size, tensor_parallel_size, etc.)  │
│ - pre_run hooks (env vars, shell commands)              │
└─────────────────────────────────────────────────────────┘

Layer 1: Registry Metadata

This layer contains structured fields that the Bloc Hub parses and indexes. It powers the Hub search engine, tags, stars, and hardware compatibility filters (like VRAM requirements, target platforms, engine names, and runtimes).

Layer 2: Engine Configuration

This layer is treated as an opaque configuration payload by the Hub backend. The Hub stores it verbatim without attempting to validate or run it.

When you run bloc deploy, the Bloc CLI opens this "envelope", resolves the engine and runtime parameters, validates the configuration, downloads the weights, and boots up the model engine.

The Deployment Lifecycle

When a user runs a command like bloc deploy arnav/llama-3-8b, the following sequence executes locally:

Validate

The CLI retrieves the recipe YAML from the Hub API and validates the schema.

Download

The CLI checks your local cache directory (~/.cache/bloc/models or ~/.cache/bloc/repos). If the weights are missing, it downloads either the single GGUF file or the entire Hugging Face repository (safetensors, configurations, tokenizers) using your local credentials.

Resolve Engine

The CLI determines the designated engine plugin (llama.cpp or vllm) and runtime mode (native or docker).

Probe Capabilities

The CLI probes the local hardware and automatically detects the engine's current capabilities (e.g., by parsing --help or querying the container) to ensure the exact flag syntax matches your installed version.

Map Flags & Pre-Run

The CLI maps the recipe features into exact command-line arguments. It prompts you to confirm any pre-run scripts or custom code execution.

Launch & Await

A shared process.Supervisor launches the engine process, streams logs, and actively polls the /health endpoint until the server is fully ready.

Present

Once ready, the CLI reports success and presents the local URL to the user.

Telemetry & Compatibility Loop

Upstream inference tools release updates daily. To guarantee compatibility without manual maintenance, Bloc uses an automated Telemetry Feedback Loop:

Run Completion: After a deployment runs successfully or encounters an exit failure, the CLI posts a lightweight, anonymous summary back to the Hub:
```
{
  "recipe_id": "arnav/llama-3-8b",
  "engine": "vllm",
  "runtime": "docker",
  "success": true,
  "tokens_per_sec": 78.4
}
```
Crowdsourced Compatibility: The Hub parses these summaries to compile compatibility stats for the recipe. This is displayed directly on the Hub recipe card so other users can verify which environments are working.
Privacy by Design: Telemetry never includes prompt text, outputs, file paths, hostnames, or IP addresses. It can be permanently disabled via bloc telemetry off or by setting the BLOC_NO_TELEMETRY=1 environment variable.