
Decentralized, verifiable knowledge base for LLMs
“What Wikipedia is for humans — but for machines.”
Build your own RAG — or share knowledge with anyone — with no servers in between.
Live demo
Try HIVE
Live demo connected to the public queen. Every answer comes from cryptographically signed fragments by real BEEs.
Public demo · LanceDB queen + Wikipedia & RSS bees on Hetzner · token pre-loaded, zero setup. Wire it into Claude via MCP for verifiable citations in your IDE.
Live nodes
A queen and a bee, running right now
Not a canned demo — three real nodes in production. Open their dashboards and watch them work.
Queen
:8090Replicates both bees' cores, indexes signed vectors into LanceDB, answers /api/query.
Open dashboardBee · Wikipedia
:8080Generalist BEE crawling Wikipedia (drift-ok policy). Embeds, signs and publishes fragments to its Hypercore.
Open dashboardBee · News (RSS)
:8081Specialist BEE on RSS feeds (BBC, Guardian, NPR, ArsTechnica), exclusive policy — different manifest, same protocol.
Open dashboardServed over HTTPS (Caddy auto-TLS). These are the same nodes the live demo above reads from.
The problem
The problem with AI knowledge
Today's AI models — GPT, Claude, Gemini — are trained once and frozen. Their knowledge has a cutoff date. They hallucinate when they don't know something. Their content is decided by a handful of corporations. And every query goes through servers you don't control.
This is the wrong architecture for a world that runs on AI.
What
What HIVE is
HIVE is a decentralized, verifiable knowledge base built for LLMs — not for humans. It is to AI what Wikipedia is to humans: a living, open, source-traceable repository of knowledge that anyone can read, anyone can contribute to, and no one controls.
Verified source
No fabricated citations. Every fragment has a real origin.
Cryptographic signature
ed25519 + SHA-256. You know who added it and that it hasn't been modified.
Append-only log
Permanent history in Hypercore. Corrections are explicit.
No single point of failure
Hundreds of independent nodes. No censorship or central point.
How
How it works
Each participant runs a BEE (producer) or a QUEEN (consumer). BEEs are autonomous agents that:
Declare a source and scope (a Wikipedia category, arXiv categories, feeds…) and split the work with other BEEs via partitions
Extract content verbatim from verified sources: Wikipedia, arXiv, PubMed, RSS, Common Crawl
Sign each fragment with their ed25519 identity and append it to their append-only Hypercore
Replicate to peers over Hyperswarm; QUEENs index into LanceDB and answer the queries
→ Reads its manifest: declared sources + scope/partition
→ Sources: Wikipedia · arXiv · PubMed · RSS · Common Crawl · personal memory (beta) (ForagerRegistry)
→ Seeds its crawl queue from the declared scope
→ Per article: fetch verbatim → all sections → chunk → sign (ed25519) → append
→ Loop ~continuous: extract → sign → store → replicate to peers
→ TTL dedup: skips fresh content (wiki 7d · rss 24h · arXiv 30d)
Why
Why it matters
For AI users
Answers grounded in verifiable, up-to-date sources. Know exactly where every fact came from.
For developers
A decentralized RAG layer that doesn't require building and maintaining your own knowledge pipeline.
For the open web
A commons of machine-readable knowledge that no corporation can take down, edit silently, or monetize.
Use cases
What it's for
Distributed RAG — public and private, specialized and general, for local or cloud LLMs. Run your own knowledge base, or share it peer-to-peer with no server in the middle. The same protocol composes into deployment patterns ranging from public swarms to per-LLM-host integrations — eleven shown below, full catalogue at the bottom.
Join the public swarm with one topic hash
A queen joins the public HIVE by hashing a known string — sha256("hive-network-v0.1") — and calling swarm.join(topic). Hyperswarm's DHT introduces it to every BEE on that topic; native Hypercore replication brings their signed fragments down with no central registry in between. Specialized public meshes are just a different string — "hive-medical-v0.1", "hive-legal-v0.1" — same protocol, narrower swarm.
Run a private swarm for internal use
Same BEEs and queens, three config knobs flip the network private: a random 32-byte swarm topic (2²⁵⁶ search space), Hypercore encryption keys so cores are ciphertext at rest and on the wire, and a peer allowlist by pubkey that drops any unauthorized connection on sight. Internal BEEs index company wikis, tickets, repos, contracts; a queen indexes them and serves /api/query — no traffic ever leaves the perimeter.
Share private keys between companies
Two organisations exchange three values out-of-band — swarm topic, Hypercore encryption key, and each side's queen pubkey for the allowlist. Both queens join the same private swarm and replicate only the BEEs the other party chose to expose. No copy, no third-party broker, no merge of hives: each company keeps its own queen, its own LanceDB index, its own audit trail. Revocation is a key roll or an allowlist edit.
One queen in many swarms — composed coverage
Cases 01–03 compose at the queen layer. A single queen can join as many topics as it has credentials for — public mesh, its own private swarm, every partner swarm — and replicate BEEs from all of them into one LanceDB index. One query, one LLM synthesis, sources drawn from every swarm the queen belongs to. Every fragment keeps its origin pubkey and signature, so provenance survives the merge. Nothing crosses between swarms — the queen is the only place they meet.
Custom connectors as ForagerSource plugins
Anything not already covered — a legacy ERP, an in-house REST API, a proprietary archive — is wired in by implementing the ForagerSource interface (seed / fetch / normalize / owns), publishing it as an npm package and adding its id to the BEE's manifest. On next start the forager picks it up, drains its queue mechanically and signs every emitted fragment. No fork of HIVE core, no central registry to update — the connector lives in the customer's repo.
Queen with a local LLM — full offline stack
The queen's LLM client is pluggable; point it at Ollama (or any local runtime) and the entire stack runs on-prem — BEEs extracting, LanceDB indexing, embedder local, synthesis local. No API key, no traffic leaves the box. A small model has narrow parametric memory; the queen's retrieval gives it grounded, signed context at query time — the combination behaves like a much larger model on domain-bounded tasks while preserving privacy. The natural knowledge layer for QVAC-style local agents.
Training corpus with cryptographic provenance
BEEs store extraction verbatim — no LLM in the loop, no paraphrase. Every fragment carries source URL, scope, timestamp and an ed25519 signature. That makes a HIVE an unusually clean training source: stream fragments straight off the queen's replicated Hypercores into a pre-training, SFT or distillation pipeline. Filter by source, scope, language or signing BEE to build a broad generalist corpus or a narrow specialist one. Provenance is per-fragment and verifiable — useful for licence propagation and dataset audit.
Personal memory for your AI
A local-only HIVE queen indexes your own activity — Claude conversations, command history, notes, agent memory files. The MCP server exposes it to any client, so Claude (or Cursor, or any MCP-aware assistant) regains cross-session memory without sending anything to a third party. ForagerSource adapter, ed25519-signed by your own key, lives only in your private swarm — privacy by design.
Verifiable citations for journalism and compliance
BEEs index regulated and official sources (government gazettes, court records, regulator RSS, standards bodies) and sign every extraction. A journalist or compliance officer cites by fragment id; the cite is independently verifiable years later against the bee's ed25519 pubkey, even if the original URL changes or disappears. Per-fragment cryptographic provenance is the differentiator — no other RAG architecture provides it natively.
MCP server — wire HIVE into any LLM host
@capybaralabs/hive-mcp ships HIVE as an MCP server for Claude Desktop, Claude Code, Cursor, Continue, Goose, OpenClaw — and the rest of the MCP ecosystem. One line in your client config and Claude can query a HIVE queen as a native tool. Returns raw signed fragments; the host LLM does the synthesis. No glue code, no fork, no platform-specific build.
Claude Skill — when and how to cite HIVE
The hive-research Claude Skill is behavioural guidance the model loads before answering. It teaches Claude when to consult HIVE versus WebSearch, how to read score + retrieval-gate flag, how to cite each claim by fragment id + URL, and to admit when the queen has no relevant data instead of fabricating. Pure markdown — works in any client that reads ~/.claude/skills/, independent of MCP.
Plug HIVE into Claude, Cursor, OpenClaw — one command
Three pieces, one product story. Spin up a HIVE queen with one command, plug it into Claude / Cursor / OpenClaw via the MCP server, and load the Skill so the model proactively cites HIVE fragments instead of fabricating.
npx @capybaralabs/hiveInteractive wizard → starts a queen / bee / hive node. No Docker required.
npx @capybaralabs/hive-mcpAdd to your client's MCP config, point at the queen, done.
cp -r hive/skills/hive-research \
~/.claude/skills/Reload Claude — proactive citations, no fabrication.
Technology
The technology
Built on battle-tested P2P infrastructure:
No blockchain. No tokens yet. No central server.
Under the hood
How it works underneath
For readers who want the detail — the cryptographic and P2P primitives HIVE is built on.
Hypercore
Append-only signed logEach node owns a Hypercore: an append-only log where every block is hashed into a Merkle tree and signed by the node's key. Blocks are immutable and verifiable in isolation — a peer can prove block N belongs to the log without trusting anyone. This is the same core that powers Keet.
Hyperbee
B-tree over HypercoreFragments, claims and the bee manifest are stored in a Hyperbee — an ordered key-value B-tree layered on the Hypercore. It gives range queries and history streams while inheriting the log's append-only, signed guarantees. The queen's replication reads a Hyperbee history stream to ingest fragments in order.
ed25519 + SHA-256
Per-fragment provenanceEvery fragment is hashed (SHA-256) over its payload and signed (ed25519) by the producing bee. On receive, a queen recomputes the hash and verifies the signature against the bee's published public key before indexing — a tampered or unsigned fragment is dropped. Provenance survives replication and even cross-swarm merges.
Hyperswarm DHT
Discovery + NAT traversalNodes find each other by joining a topic (a 32-byte key) on the Hyperswarm DHT — no central registry, no bootstrap server you have to run. The DHT introduces peers and hole-punches through NATs; from there native Hypercore replication takes over the connection.
Encryption & allowlists
Private swarmsA public swarm is a known topic hash. A private one flips three knobs: a random 32-byte topic (2²⁵⁶ search space), Hypercore encryption keys so cores are ciphertext at rest and on the wire, and a pubkey allowlist that drops any unauthorized connection. Same protocol, sealed perimeter.
LanceDB + e5-base
Vectorization & queryProducer-side vectorization: each BEE embeds its own chunks with multilingual-e5-base (768-d, ONNX int8) and signs the vector inline. The queen never embeds passages — it copies the signed vectors into an in-process LanceDB (the default; the VectorIndex interface is swappable, so a queen can run Qdrant or any backend instead). A query embeds only the question, pulls top-K by cosine similarity, gates them by score + keyword match, and passes the survivors to one LLM call for synthesis. The LLM is the only non-local, non-deterministic step — and the only place a key is used.
Status
In productionCurrent state
HIVE is in production (v1.0) with the BEE/QUEEN architecture: bees extract, embed and sign each fragment (vector inline), queens index the pre-signed vectors into LanceDB and serve queries. On top of that engine it added a Settings UI, public/private topics, auth, and distribution via npm + MCP. A live queen and bees are running right now (links below).
Shipped recently (2026-05)
Roadmap
Run a BEE
Run a BEE
Your BEE will start, find a knowledge area nobody is covering, and begin extracting. No configuration needed.
npx @capybaralabs/hive # wizard → starts a nodegit clone https://github.com/capybarist/hive && cd hive && docker compose up -dgit clone https://github.com/capybarist/hive && cd hive && bash hive.shBusiness Source License (BUSL-1.1). Free for non-commercial use. Converts to MIT in 4 years.
