Quickstart
Get up and running with your first local AI model deployment in under 5 minutes.
This guide walks you through authenticating, searching the registry, deploying an optimized model, and querying the local server.
Step 1: Verify Installation
Ensure the bloc CLI utility is installed and accessible:
bloc version
This should print the version number (e.g. v0.1.0) and build date.
Step 2: Authenticate with the Hub
Log in to link your terminal session with your Bloc Hub profile. This lets you access your custom recipe configurations and publish new recipes:
bloc login
- The CLI will output a unique verification code and open your browser to the GitHub OAuth authorization page.
- Paste the code and authorize the application.
- Your secure token will be stored locally in
~/.config/bloc/auth.json.
Step 3: Find a Model Recipe
Search the public registry for a recipe optimized for your hardware. For example, let's search for the Qwen3 model family:
bloc search qwen3
You can narrow your search by specifying VRAM and platform constraints:
bloc search qwen3 --vram 8GB --platform cuda
Copy the recipe identifier (e.g. arnav/qwen3-7b-budget-beast) for the next step.
Step 4: Deploy and Serve
Deploy the selected recipe. The CLI will download the weights (if they are not already cached in your local ~/.cache/bloc directory), configure the llama.cpp server flags, and boot up the endpoint:
bloc deploy arnav/qwen3-7b-budget-beast
[!NOTE] If the recipe contains
pre_runsystem setup commands (like power-limiting commands), the CLI will display them and ask for your explicit confirmation (y/N) before running.
Once the engine finishes loading model layers into memory, it will start a local API server (by default at http://127.0.0.1:8080).
Step 5: Query Your Local Model
The API server runs an OpenAI-compatible endpoint. You can test it from another terminal window using curl:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Write a python function to check if a number is prime."}
]
}'
You can also point any OpenAI-compatible front-end UI (like Chatbox, SillyTavern, or local coding assistants) to http://localhost:8080/v1 to interact with the model.
Step 6: Manage Cached Weights
Model weights are large. You can inspect your cached GGUFs at any time to check disk consumption:
bloc models
To free up space, run the clear command to purge all downloaded model files:
bloc models clear
(Prompts for confirmation before deleting).