Distillary
Turn any knowledge source into a navigable, shareable brain.
Distillary is an open-source tool that takes books, YouTube videos, podcasts, and articles — anything with ideas — and distills them into atomic claims organized as a navigable knowledge graph. Each source gets decomposed into a pyramid of arguments, connected by entities and cross-source bridges, and published as an Obsidian vault or static website that both humans and AI agents can explore.
The entire pipeline runs through Claude agents — haiku for fast parallel extraction, opus for deep reasoning about how ideas connect. A 300-page book becomes a browsable brain in about 15 minutes.
Built with Claude agents. Published with Quartz.
Browse the demo brain
Explore a live brain built from The Art of Money Getting by P.T. Barnum (1880, public domain): brain.distillary.xyz
One book, 79 claims, 23 entities, a 4-layer argument pyramid. Click the root thesis, follow wikilinks into clusters and claims, check entity pages — their backlinks show everything the brain knows about that concept. This is what your brain looks like when published.
Or let an agent query it — 10 questions answered from the live brain
Any AI agent with web access can query a published brain. Get the skill, paste it, ask a question:
curl https://brain.distillary.xyz/static/skill.txt| # | Question | Strategy | Fetches |
|---|---|---|---|
| 1 | What books are in this brain? | Read manifest | 1 |
| 2 | What are the main ideas? | Fetch clusters | 2 |
| 3 | What does Barnum say about debt? | Concept lookup | 2 |
| 4 | Does luck play a role in success? | Concept lookup | 2 |
| 5 | How important is integrity? | Concept lookup | 2 |
| 6 | Career advice? | Concept lookup | 2 |
| 7 | Role of health in wealth? | Concept lookup | 2 |
| 8 | How to advertise? | Concept lookup | 2 |
| 9 | Who is P.T. Barnum? | Entity lookup | 2 |
| 10 | Root thesis? | Fetch root note | 2 |
Average: 1.9 fetches per question. All answers from brain.distillary.xyz. Read the full answers →
Works with Claude Code, Codex, Gemini CLI, Cursor, and any agent with HTTP access. Setup guide →
How the pipeline works
When you add a source to your brain, it flows through a series of agent-powered steps. First, the text gets split into chunks and processed by 16 parallel haiku agents that extract individual claims. Those claims are deduplicated, entities (people, concepts, companies) are identified, and then opus agents do the deep work: clustering related claims into argumentative groups and building a hierarchy from atoms up to a single root thesis.
After the pyramid is built, haiku agents find lateral connections — tensions between claims, shared patterns, and evidence chains. Finally, a doctor agent fixes any orphaned notes and suggests concepts worth exploring further.
graph LR A["Any Source"] --> B["Extract + Dedupe"] B --> C["Group + Pyramid"] C --> D["Connect + Doctor"] D --> E["Brain Vault"]
The result is a structured vault where every claim links to its parent argument, every entity links to every claim that mentions it, and every source bridges to related sources through shared concepts.
| Step | What happens | Time |
|---|---|---|
| Extract | 16 parallel haiku agents pull atomic claims from text | ~2 min |
| Dedupe + Entities | Remove duplicates, identify people/concepts/companies | ~2 min |
| Group | Opus agents cluster claims by argumentative cohesion | ~5 min |
| Pyramid | Build root thesis → chapters → arguments → evidence | ~3 min |
| Connect | Find tensions, patterns, and evidence between claims | ~2 min |
| Doctor | Fix orphans, discover ghost concepts, suggest explorations | ~1 min |
The full pipeline for a 300-page book takes ~15 minutes
Most of that time is the opus grouping step, which requires deep reasoning about how claims relate to each other. Extraction is fast because 16 haiku agents work in parallel.
What you get
A pyramid of claims
Every source gets decomposed into a 4-layer hierarchy. The root thesis summarizes the entire source in one paragraph with wikilinks to chapter-level clusters. Each cluster links to mid-level structure notes, which link to individual atomic claims — the leaves of the tree, each traceable back to a specific chapter or section via source_ref.
This means you can read the book at any zoom level: the root for a 30-second summary, the clusters for chapter themes, or the atoms for specific evidence.
graph TD R["Root Thesis"] --> C1["Cluster: Validation"] R --> C2["Cluster: Metrics"] R --> C3["Cluster: Pivoting"] C1 --> S1["MVP tests assumptions"] C1 --> S2["Early adopters first"] S1 --> A1["Zappos tested demand with photos"] S1 --> A2["Dropbox used a video MVP"]
Entities as knowledge hubs
Every person, concept, company, and work mentioned across your sources gets its own page. The real power is the backlinks — every claim that references an entity shows up on its page, grouped by source. This turns entity pages into question-answering hubs.
When you want to know what your brain knows about “customer validation,” you don’t search — you open the entity page and read its backlinks. Each backlink is a claim from a specific source, with its own wikilinks to related concepts.
Answering questions through backlinks
The “Real Signals” entity page has 36 backlinks from The Lean Startup and 26 from The Mom Test. Each one is a claim about detecting genuine customer interest — from two different perspectives, already organized by source. One page, complete answer.
Bridge concepts across sources
When two sources discuss the same idea under different names, Distillary creates bridge entities that unify them. A bridge page has aliases from both sources, descriptions of each perspective, and backlinks from both — making it the fastest way to get a cross-source answer.
For example, The Lean Startup calls unreliable indicators “Vanity Metrics” (quantitative: gross numbers that look good). The Mom Test calls them “Compliments” (qualitative: praise that costs nothing). Both describe the same problem — misleading signals that create false confidence. The bridge concept “False Signals” captures both perspectives.
| Bridge concept | Lean Startup calls it | Mom Test calls it |
|---|---|---|
| Real Signals | Actionable Metrics | Commitment |
| False Signals | Vanity Metrics | Compliments |
| Direct Customer Contact | Genchi Gembutsu | Customer conversation |
| Critical Assumptions | Hypothesis | Important questions |
| Demand Uncertainty | Leap of Faith | Market risk |
Your annotations are part of the graph
The brain isn’t read-only. You add your own reactions, questions, and insights to brain/personal/. Each annotation links to the claim it responds to, carries its own tags (status/agree, insight/aha), and appears in the graph as a connected node. Your voice is first-class data — queryable, filterable, visible in Obsidian’s graph view alongside the distilled sources.
For AI agents
Published brains expose an agent.json manifest — a lightweight entry point that tells any agent what sources exist, what bridge concepts connect them, and how to navigate the content. The agent doesn’t need to download everything; it follows links by relevance, the same way a human clicks through Obsidian.
Entity backlinks are the key mechanism. An agent looking up “validated learning” fetches the entity page, reads the backlinks grouped by source, and follows 2-3 claims for specific evidence. Multi-source answer with citations in under 2,500 tokens — compared to ~50,000 if it tried to read all claims.
graph TD A["Fetch agent.json"] --> B{"Question type?"} B -->|"What is X?"| C["Entity page → backlinks"] B -->|"Summary"| D["Thesis in manifest → root note"] B -->|"Do sources agree?"| E["Bridge concept or comparison page"] B -->|"Show me evidence"| F["Root → cluster → structure → atom"] B -->|"What's related?"| G["Any entity → follow backlinks + wikilinks"]
No MCP server. No authentication. No setup.
Published brains are static websites. The agent makes 2-3 HTTP GET requests to structured markdown pages. The link graph built during distillation IS the search engine — no keyword matching needed.
See the agent retrieval skill for setup instructions by tool, or the demo with full answers.
Source types
Distillary works with any source that contains ideas. The extraction step differs by format, but everything after that — deduplication, entity extraction, grouping, pyramid building, connection finding — is the same pipeline regardless of whether the input was a book, video, or article.
| Type | Input | How text is extracted |
|---|---|---|
| Book | EPUB, PDF, TXT | Parsed directly |
| YouTube video | URL | Transcript via yt-dlp |
| Podcast | Audio file | Transcribed via Whisper |
| Article | URL | Web fetch + clean HTML |
| Research paper | Parsed directly | |
| Lecture notes | Markdown, PDF | Read or parsed |
Community
Distilled knowledge becomes more valuable when it’s shared. When you publish your brain, others can browse it as a website, clone it into their Obsidian, or have their AI agents query it via the API. When multiple people distill books on the same topic, their brains can be compared and combined into field-level understanding.
- Process a source → it joins your brain
- Publish to GitHub Pages → anyone can browse it
- Share the URL → other people’s agents can query it
- Combine multiple brains → cross-source synthesis and meta-analysis
Tag your repos distillary-brain on GitHub. Join the Discord to share your brains, request distillations, and discuss cross-source insights.
Getting started
- Quick start — from zero to brain in 15 minutes
Concepts
- How it works — the pipeline, agents, and processing steps
- Note format — why YAML, what each field does, why
propositionis the semantic key - Tag taxonomy — the 6 dimensions, each value explained with when and why
- Pyramid design — why 4 layers, why atomic claims, the grouping invariant
- Entities and bridges — realized vs ghost, backlinks as search, bridges
- Brain structure — why
sources/, whyshared/, whypersonal/ - Cross-source analytics — entity mapping, comparison essays, graph analytics, statistical profiles
Deployment
- Architecture — agents, skills, Python utilities
- Publishing — deploy your brain as a website with agent API
For agents
- Agent retrieval — get the skill, setup by tool (Claude Code, Codex, Gemini CLI, Cursor, and more)
- Demo: 10 questions — live answers from the demo brain, showing each retrieval strategy
- README — full project overview with code structure
Acknowledgments
Documentation and brain publishing are powered by Quartz — an excellent open-source static site generator for Obsidian vaults by Jacky Zhao. Quartz renders wikilinks, graph view, backlinks, Mermaid diagrams, and callouts out of the box. We’re grateful for the project and recommend it for anyone publishing Obsidian content.