just let it 

DISTRIBUTED AI ORCHESTRATION

CLUSTER YOUR FLOCK. SATURATE YOUR HARDWARE.

ClusterFlock turns every GPU you own into one unified AI backend. Load models, run inference, and launch autonomous missions across your entire fleet - from a single command.

WHAT IT DOES

SMART ALLOCATION
Automatically detects VRAM and profiles your hardware, then bin-packs the best models onto each GPU. No manual config needed. Self adapting and self healing, even while a mission runs.
MIXTURE OF AGENTS
A lead LLM orchestrates a flock of worker models - dispatching tasks, evaluating results, and iterating until the job is done.
AUTONOMOUS MISSIONS
Just describe a goal. ClusterFlock spins up sandboxed containers, assigns agents, and works toward the solution on its own.
REAL-TIME TELEMETRY
See live GPU utilization, VRAM usage, model status, and tokens/sec across every node in your cluster.
MULTI-BACKEND
Supports llama.cpp, LM Studio, Metal, and CUDA out of the box. Mix and match DGX Spark, consumer GPUs, and Mac - all in one cluster.
OPENAI-COMPATIBLE API
A true drop-in replacement. Point any OpenAI SDK, LangChain app, or curl command at port 1919 and you're up. ClusterFlock routes requests across your network and keeps every GPU busy.

ONE ENDPOINT. FULL CLUSTER.

POST http://your-cluster:1919/v1/chat/completions { "model": "clusterflock", "messages": [ {"role": "user", "content": "Explain quantum computing"} ] }
Three routing modes: FANOUT broadcast to all, synthesize best SPEED fastest single endpoint MANUAL pick your model
Works with OpenAI SDK · LangChain · LiteLLM · curl

AVAILABLE NOW.

ClusterFlock is free and open source. Dive in, build with it, and make it your own.

View on GitHub