Capybara Labs — AI, Software & Open Source

Open Source · P2P · Verified Knowledge

Decentralized, verifiable knowledge base for LLMs

“What Wikipedia is for humans — but for machines.”

Build your own RAG — or share knowledge with anyone — with no servers in between.

View on GitHub How it works

Live demo

Try HIVE

Live demo connected to the public queen. Every answer comes from cryptographically signed fragments by real BEEs.

Public demo · LanceDB queen + Wikipedia & RSS bees on Hetzner · token pre-loaded, zero setup. Wire it into Claude via MCP for verifiable citations in your IDE.

Live nodes

A queen and a bee, running right now

Not a canned demo — three real nodes in production. Open their dashboards and watch them work.

Queen

:8090

Replicates both bees' cores, indexes signed vectors into LanceDB, answers /api/query.

Open dashboard

Bee · Wikipedia

:8080

Generalist BEE crawling Wikipedia (drift-ok policy). Embeds, signs and publishes fragments to its Hypercore.

Open dashboard

Bee · News (RSS)

:8081

Specialist BEE on RSS feeds (BBC, Guardian, NPR, ArsTechnica), exclusive policy — different manifest, same protocol.

Open dashboard

Served over HTTPS (Caddy auto-TLS). These are the same nodes the live demo above reads from.

The problem

The problem with AI knowledge

Today's AI models — GPT, Claude, Gemini — are trained once and frozen. Their knowledge has a cutoff date. They hallucinate when they don't know something. Their content is decided by a handful of corporations. And every query goes through servers you don't control.

This is the wrong architecture for a world that runs on AI.

What

What HIVE is

HIVE is a decentralized, verifiable knowledge base built for LLMs — not for humans. It is to AI what Wikipedia is to humans: a living, open, source-traceable repository of knowledge that anyone can read, anyone can contribute to, and no one controls.

Verified source

No fabricated citations. Every fragment has a real origin.

Cryptographic signature

ed25519 + SHA-256. You know who added it and that it hasn't been modified.

Append-only log

Permanent history in Hypercore. Corrections are explicit.

No single point of failure

Hundreds of independent nodes. No censorship or central point.

How

How it works

Each participant runs a BEE (producer) or a QUEEN (consumer). BEEs are autonomous agents that:

Declare a source and scope (a Wikipedia category, arXiv categories, feeds…) and split the work with other BEEs via partitions

Extract content verbatim from verified sources: Wikipedia, arXiv, PubMed, RSS, Common Crawl

Sign each fragment with their ed25519 identity and append it to their append-only Hypercore

Replicate to peers over Hyperswarm; QUEENs index into LanceDB and answer the queries

BEE starts

→ Reads its manifest: declared sources + scope/partition

→ Sources: Wikipedia · arXiv · PubMed · RSS · Common Crawl · personal memory (beta) (ForagerRegistry)

→ Seeds its crawl queue from the declared scope

→ Per article: fetch verbatim → all sections → chunk → sign (ed25519) → append

→ Loop ~continuous: extract → sign → store → replicate to peers

→ TTL dedup: skips fresh content (wiki 7d · rss 24h · arXiv 30d)

Why

Why it matters

For AI users

Answers grounded in verifiable, up-to-date sources. Know exactly where every fact came from.

For developers

A decentralized RAG layer that doesn't require building and maintaining your own knowledge pipeline.

For the open web

A commons of machine-readable knowledge that no corporation can take down, edit silently, or monetize.

Use cases

What it's for

Distributed RAG — public and private, specialized and general, for local or cloud LLMs. Run your own knowledge base, or share it peer-to-peer with no server in the middle. The same protocol composes into deployment patterns ranging from public swarms to per-LLM-host integrations — eleven shown below, full catalogue at the bottom.

Public

Join the public swarm with one topic hash

A queen joins the public HIVE by hashing a known string — sha256("hive-network-v0.1") — and calling swarm.join(topic). Hyperswarm's DHT introduces it to every BEE on that topic; native Hypercore replication brings their signed fragments down with no central registry in between. Specialized public meshes are just a different string — "hive-medical-v0.1", "hive-legal-v0.1" — same protocol, narrower swarm.

hyperswarm topicDHT discoveryed25519 fragmentsno registry

Private

Run a private swarm for internal use

Same BEEs and queens, three config knobs flip the network private: a random 32-byte swarm topic (2²⁵⁶ search space), Hypercore encryption keys so cores are ciphertext at rest and on the wire, and a peer allowlist by pubkey that drops any unauthorized connection on sight. Internal BEEs index company wikis, tickets, repos, contracts; a queen indexes them and serves /api/query — no traffic ever leaves the perimeter.

random topicencrypted corespubkey allowlistair-gapped

B2B

Share private keys between companies

Two organisations exchange three values out-of-band — swarm topic, Hypercore encryption key, and each side's queen pubkey for the allowlist. Both queens join the same private swarm and replicate only the BEEs the other party chose to expose. No copy, no third-party broker, no merge of hives: each company keeps its own queen, its own LanceDB index, its own audit trail. Revocation is a key roll or an allowlist edit.

shared swarmencryption keyselective exposurerevocable

Hybrid

One queen in many swarms — composed coverage

Cases 01–03 compose at the queen layer. A single queen can join as many topics as it has credentials for — public mesh, its own private swarm, every partner swarm — and replicate BEEs from all of them into one LanceDB index. One query, one LLM synthesis, sources drawn from every swarm the queen belongs to. Every fragment keeps its origin pubkey and signature, so provenance survives the merge. Nothing crosses between swarms — the queen is the only place they meet.

multi-swarm queensingle indexprovenance preservedno cross-leak

Extensibility

Custom connectors as ForagerSource plugins

Anything not already covered — a legacy ERP, an in-house REST API, a proprietary archive — is wired in by implementing the ForagerSource interface (seed / fetch / normalize / owns), publishing it as an npm package and adding its id to the BEE's manifest. On next start the forager picks it up, drains its queue mechanically and signs every emitted fragment. No fork of HIVE core, no central registry to update — the connector lives in the customer's repo.

ForagerSourcenpm packageBeeManifest.sourcesno fork

Local AI

Queen with a local LLM — full offline stack

The queen's LLM client is pluggable; point it at Ollama (or any local runtime) and the entire stack runs on-prem — BEEs extracting, LanceDB indexing, embedder local, synthesis local. No API key, no traffic leaves the box. A small model has narrow parametric memory; the queen's retrieval gives it grounded, signed context at query time — the combination behaves like a much larger model on domain-bounded tasks while preserving privacy. The natural knowledge layer for QVAC-style local agents.

ollama / local LLMon-premzero cloudgrounded small model

Training

Training corpus with cryptographic provenance

BEEs store extraction verbatim — no LLM in the loop, no paraphrase. Every fragment carries source URL, scope, timestamp and an ed25519 signature. That makes a HIVE an unusually clean training source: stream fragments straight off the queen's replicated Hypercores into a pre-training, SFT or distillation pipeline. Filter by source, scope, language or signing BEE to build a broad generalist corpus or a narrow specialist one. Provenance is per-fragment and verifiable — useful for licence propagation and dataset audit.

verbatim · signedfilter by scopepre-train / SFTdistillation

Personal

Personal memory for your AI

A local-only HIVE queen indexes your own activity — Claude conversations, command history, notes, agent memory files. The MCP server exposes it to any client, so Claude (or Cursor, or any MCP-aware assistant) regains cross-session memory without sending anything to a third party. ForagerSource adapter, ed25519-signed by your own key, lives only in your private swarm — privacy by design.

personal RAGcross-session memorylocal-onlyuser-signed

Audit

Verifiable citations for journalism and compliance

BEEs index regulated and official sources (government gazettes, court records, regulator RSS, standards bodies) and sign every extraction. A journalist or compliance officer cites by fragment id; the cite is independently verifiable years later against the bee's ed25519 pubkey, even if the original URL changes or disappears. Per-fragment cryptographic provenance is the differentiator — no other RAG architecture provides it natively.

signed provenanceverifiable citationcold-archive trustworthy

MCP

MCP server — wire HIVE into any LLM host

@capybaralabs/hive-mcp ships HIVE as an MCP server for Claude Desktop, Claude Code, Cursor, Continue, Goose, OpenClaw — and the rest of the MCP ecosystem. One line in your client config and Claude can query a HIVE queen as a native tool. Returns raw signed fragments; the host LLM does the synthesis. No glue code, no fork, no platform-specific build.

MCP serverClaude / Cursor / OpenClawsigned fragmentszero glue code

Skill

Claude Skill — when and how to cite HIVE

The hive-research Claude Skill is behavioural guidance the model loads before answering. It teaches Claude when to consult HIVE versus WebSearch, how to read score + retrieval-gate flag, how to cite each claim by fragment id + URL, and to admit when the queen has no relevant data instead of fabricating. Pure markdown — works in any client that reads ~/.claude/skills/, independent of MCP.

SKILL.mdbehavioural guidanceno API keyMCP-independent

Shipped

Plug HIVE into Claude, Cursor, OpenClaw — one command

Three pieces, one product story. Spin up a HIVE queen with one command, plug it into Claude / Cursor / OpenClaw via the MCP server, and load the Skill so the model proactively cites HIVE fragments instead of fabricating.

@capybaralabs/hive

npx @capybaralabs/hive

Interactive wizard → starts a queen / bee / hive node. No Docker required.

@capybaralabs/hive-mcp

npx @capybaralabs/hive-mcp

Add to your client's MCP config, point at the queen, done.

hive-research Skill

cp -r hive/skills/hive-research \
  ~/.claude/skills/

Reload Claude — proactive citations, no fabrication.

Technology

The technology

Built on battle-tested P2P infrastructure:

→

Hypercore

Append-only cryptographic log (same tech as Keet)

→

Hyperswarm

P2P DHT for node discovery and NAT hole-punching

→

ForagerRegistry

One registry for every connector — Wikipedia, arXiv, PubMed, RSS, Common Crawl, and a personal-memory umbrella (beta: Claude/notes/ChatGPT). Third-party connectors load from npm. Each BEE publishes a signed BeeManifest declaring its sources.

→

transformers.js (ONNX)

In-process embeddings — multilingual-e5-base, 768-d, int8. BEEs embed passages; the queen embeds only the query. No Python.

→

LanceDB

Default vector index on the queen — embedded, in-process, no separate service. Sits behind a swappable VectorIndex interface.

→

MCP + npm

Consume from Claude/Cursor/Goose via @capybaralabs/hive-mcp, or run a node in one command with npx @capybaralabs/hive. Ships a Claude Skill too.

→

Ollama / Groq / Gemini / Claude / OpenAI

Query synthesis only. Extraction is LLM-free — verbatim from source APIs, signed with ed25519.

No blockchain. No tokens yet. No central server.

Under the hood

How it works underneath

For readers who want the detail — the cryptographic and P2P primitives HIVE is built on.

Hypercore

Append-only signed log

Each node owns a Hypercore: an append-only log where every block is hashed into a Merkle tree and signed by the node's key. Blocks are immutable and verifiable in isolation — a peer can prove block N belongs to the log without trusting anyone. This is the same core that powers Keet.

Hyperbee

B-tree over Hypercore

Fragments, claims and the bee manifest are stored in a Hyperbee — an ordered key-value B-tree layered on the Hypercore. It gives range queries and history streams while inheriting the log's append-only, signed guarantees. The queen's replication reads a Hyperbee history stream to ingest fragments in order.

ed25519 + SHA-256

Per-fragment provenance

Every fragment is hashed (SHA-256) over its payload and signed (ed25519) by the producing bee. On receive, a queen recomputes the hash and verifies the signature against the bee's published public key before indexing — a tampered or unsigned fragment is dropped. Provenance survives replication and even cross-swarm merges.

Hyperswarm DHT

Discovery + NAT traversal

Nodes find each other by joining a topic (a 32-byte key) on the Hyperswarm DHT — no central registry, no bootstrap server you have to run. The DHT introduces peers and hole-punches through NATs; from there native Hypercore replication takes over the connection.

Encryption & allowlists

Private swarms

A public swarm is a known topic hash. A private one flips three knobs: a random 32-byte topic (2²⁵⁶ search space), Hypercore encryption keys so cores are ciphertext at rest and on the wire, and a pubkey allowlist that drops any unauthorized connection. Same protocol, sealed perimeter.

LanceDB + e5-base

Vectorization & query

Producer-side vectorization: each BEE embeds its own chunks with multilingual-e5-base (768-d, ONNX int8) and signs the vector inline. The queen never embeds passages — it copies the signed vectors into an in-process LanceDB (the default; the VectorIndex interface is swappable, so a queen can run Qdrant or any backend instead). A query embeds only the question, pulls top-K by cosine similarity, gates them by score + keyword match, and passes the survivors to one LLM call for synthesis. The LLM is the only non-local, non-deterministic step — and the only place a key is used.

Status

In production

Current state

HIVE is in production (v1.0) with the BEE/QUEEN architecture: bees extract, embed and sign each fragment (vector inline), queens index the pre-signed vectors into LanceDB and serve queries. On top of that engine it added a Settings UI, public/private topics, auth, and distribution via npm + MCP. A live queen and bees are running right now (links below).

BEE (producer) / QUEEN (consumer) role split — the bee uses no LLM; it embeds its own chunks in-process with multilingual-e5-base (ONNX int8, 768-d)

Source-driven extractor via the ForagerRegistry: Wikipedia, arXiv, PubMed, RSS, Common Crawl — plus third-party npm connectors and a personal-memory umbrella (beta: Claude, notes)

KnowledgeStore on Hypercore + Hyperbee — ed25519-signed append-only log (the vector is covered by the signature), native P2P replication

P2P network — Hyperswarm DHT + Hypercore replication with persistent cursor

Queen with LanceDB (default backend behind a swappable VectorIndex interface) — receives pre-signed vectors from bees, never re-embeds passages; 100% Node stack, no Python

Scope partitions (v0.7.6) — multiple bees split one source without overlap

Retrieval gating recalibrated for e5 (RELEVANT_SCORE 0.82) — only cites sources that genuinely match; local (Ollama) or cloud LLM (Groq/Gemini/Claude/OpenAI)

Distribution: npm package (npx @capybaralabs/hive), MCP server (@capybaralabs/hive-mcp) for Claude/Cursor/Goose, and a Claude Skill

Settings UI (manifest builder), public/private topics with a discovery registry, and bearer-token auth on the queen API

Shipped recently (2026-05)

MCP server (@capybaralabs/hive-mcp)Claude Skill bundlenpm CLI (npx @capybaralabs/hive)HTTPS auto-TLS via sslip.ioBearer-token auth on /api/*Periodic LanceDB compactionDirect mode (v1.1): bee→queen over HTTP, no P2P — closed/enterprise deploys

Roadmap

Personal memory → turnkey private queen + more sources (Gemini, Cursor)Multi-tenant API tokens + audit logSelective replication · Bloom routingOne-click self-host (Umbrel / CasaOS)Score-by-corroboration

Built on HIVE

⚖️ Acquis — verified EU law inside Claude

The first commercial product on HIVE's direct mode: a remote MCP connector that quotes 11 EU digital regulations — the AI Act, GDPR, DSA, DMA, NIS2, Data Act, Cyber Resilience Act, DORA, eIDAS 2, Product Liability Directive and Data Governance Act — verbatim, with exact citations, EUR-Lex links and cryptographically verifiable fragments. The full corpus is free to evaluate. Same engine you can self-host — running a closed, signed legal corpus.

acquislaw.com

Run a BEE

Your BEE will start, find a knowledge area nobody is covering, and begin extracting. No configuration needed.

1One command (recommended)

$npx @capybaralabs/hive # wizard → starts a node

2Docker

$git clone https://github.com/capybarist/hive && cd hive && docker compose up -d

3From source

$git clone https://github.com/capybarist/hive && cd hive && bash hive.sh

Business Source License (BUSL-1.1). Free for non-commercial use. Converts to MIT in 4 years.

View on GitHub Book a call