just let it

DISTRIBUTED AI ORCHESTRATION

// OVERVIEW

CLUSTER YOUR FLOCK. SATURATE YOUR HARDWARE.

ClusterFlock turns every GPU you own into one unified AI backend. Load models, run inference, and launch autonomous missions across your entire fleet - from a single command.

// CAPABILITIES

WHAT IT DOES

⎔

SMART ALLOCATION

Automatically detects VRAM and profiles your hardware, then bin-packs the best models onto each GPU. No manual config needed. Self adapting and self healing, even while a mission runs.

⟁

MIXTURE OF AGENTS

A lead LLM orchestrates a flock of worker models - dispatching tasks, evaluating results, and iterating until the job is done.

◈

AUTONOMOUS MISSIONS

Just describe a goal. ClusterFlock spins up sandboxed containers, assigns agents, and works toward the solution on its own.

⟟

REAL-TIME TELEMETRY

See live GPU utilization, VRAM usage, model status, and tokens/sec across every node in your cluster.

⊞

MULTI-BACKEND

Supports llama.cpp, LM Studio, Metal, and CUDA out of the box. Mix and match DGX Spark, consumer GPUs, and Mac - all in one cluster.

⎊

OPENAI-COMPATIBLE API

A true drop-in replacement. Point any OpenAI SDK, LangChain app, or curl command at port 1919 and you're up. ClusterFlock routes requests across your network and keeps every GPU busy.

// API

ONE ENDPOINT. FULL CLUSTER.

POST http://your-cluster:1919/v1/chat/completions { "model": "clusterflock", "messages": [ {"role": "user", "content": "Explain quantum computing"} ] }

Three routing modes: FANOUT broadcast to all, synthesize best SPEED fastest single endpoint MANUAL pick your model

Works with OpenAI SDK · LangChain · LiteLLM · curl

// OPEN SOURCE (MIT)

AVAILABLE NOW.

ClusterFlock is free and open source. Dive in, build with it, and make it your own.

View on GitHub