just let it
DISTRIBUTED AI ORCHESTRATION
// OVERVIEW
CLUSTER YOUR FLOCK. SATURATE YOUR HARDWARE.
ClusterFlock turns every GPU you own into one unified AI backend. Load models, run inference, and launch autonomous missions across your entire fleet - from a single command.
// CAPABILITIES
WHAT IT DOES
SMART ALLOCATION
Automatically detects VRAM and profiles your hardware, then bin-packs the best models onto each GPU. No manual config needed. Self adapting and self healing, even while a mission runs.
MIXTURE OF AGENTS
A lead LLM orchestrates a flock of worker models - dispatching tasks, evaluating results, and iterating until the job is done.
AUTONOMOUS MISSIONS
Just describe a goal. ClusterFlock spins up sandboxed containers, assigns agents, and works toward the solution on its own.
REAL-TIME TELEMETRY
See live GPU utilization, VRAM usage, model status, and tokens/sec across every node in your cluster.
MULTI-BACKEND
Supports llama.cpp, LM Studio, Metal, and CUDA out of the box. Mix and match DGX Spark, consumer GPUs, and Mac - all in one cluster.
OPENAI-COMPATIBLE API
A true drop-in replacement. Point any OpenAI SDK, LangChain app, or curl command at port 1919 and you're up. ClusterFlock routes requests across your network and keeps every GPU busy.
// API
ONE ENDPOINT. FULL CLUSTER.
POST http://your-cluster:1919/v1/chat/completions
{
"model": "clusterflock",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
]
}
Three routing modes:
FANOUT
broadcast to all, synthesize best
SPEED
fastest single endpoint
MANUAL
pick your model
Works with OpenAI SDK · LangChain · LiteLLM · curl
// OPEN SOURCE (MIT)
AVAILABLE NOW.
ClusterFlock is free and open source. Dive in, build with it, and make it your own.
View on GitHub