Documentation
Everything you need to install, configure, and use brain-mcp.
Getting Started
Prerequisites
- Python 3.11+
- Apple Silicon recommended β for fast local embeddings via MPS acceleration (works on Intel/Linux too, just slower)
- An MCP client β Claude Desktop, Cursor, Windsurf, or any MCP-compatible client
- ~2GB disk space for data + vectors (varies with conversation volume)
Installation
git clone https://github.com/mordechaipotash/brain-mcp.git
cd brain-mcp
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtFirst Run
python -m cli init # discover data sources, set up directories
python -m cli doctor # system health check β verifies all dependencies
python -m cli setup claude # auto-configure Claude Desktop MCP integrationAfter setup, open your MCP client and type βuse brainβ. That's it.
Data Sources
brain-mcp can ingest conversations from multiple sources. All data is stored locally as Parquet files.
ChatGPT Export
Go to Settings β Data Controls β Export Data in ChatGPT. You'll receive a conversations.json file. Place it in the brain-mcp data directory and run the import.
Claude Desktop
brain-mcp reads Claude Desktop conversation logs directly from their default location on your machine. No export step needed.
Claude Code
Session transcripts from Claude Code are synced automatically via the sync pipeline.
Custom Parquet
Bring any data source. Just provide a Parquet file with these columns:
Required columns:
message_id (string) β unique identifier
conversation_id (string) β groups messages into conversations
role (string) β "user" or "assistant"
content (string) β message text
created (datetime) β timestamp
source (string) β source label (e.g., "custom")Embeddings
brain-mcp generates embeddings locally using nomic-embed-text-v1.5 β a 768-dimensional embedding model. No API key required. No data leaves your machine.
- Apple Silicon (M1+): Uses MPS acceleration β fast and efficient
- Intel Mac / Linux: Falls back to CPU β slower but works
- GPU (CUDA): Supported if available
Only user messages are embedded (not assistant responses). Short messages (<10 chars) and noise patterns are automatically filtered.
Embeddings are stored in LanceDB, a local vector database (no server, just files).
python -m cli embed # embed new messages since last run
python -m cli embed --full # re-embed everything (slow, rarely needed)Summaries
brain-mcp generates structured v6 summaries from your conversations. Each summary extracts:
- Summary β concise overview of the conversation
- Decisions β explicit choices made during the conversation
- Open Questions β unresolved threads and pending items
- Breakthroughs β key insights or realizations
- Domain β what cognitive domain the conversation belongs to
Summary generation is the one step that optionally calls an LLM API (for the summarization itself). Typical cost: ~$0.05/day for active use.
MCP Configuration
brain-mcp runs as an MCP server. Configure your client to connect to it. The easiest way is python -m cli setup claude, but you can also configure manually.
Claude Desktop
Add to your Claude Desktop MCP configuration (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"brain": {
"command": "python",
"args": ["mcp_brain_server.py"],
"cwd": "/path/to/brain-mcp"
}
}
}Cursor
Add to your Cursor MCP settings (.cursor/mcp.json in your project or global config):
{
"mcpServers": {
"brain": {
"command": "python",
"args": ["mcp_brain_server.py"],
"cwd": "/path/to/brain-mcp"
}
}
}Windsurf
Add to your Windsurf MCP configuration:
{
"mcpServers": {
"brain": {
"command": "python",
"args": ["mcp_brain_server.py"],
"cwd": "/path/to/brain-mcp"
}
}
}Replace /path/to/brain-mcp with the actual path where you cloned the repository.
Tools Reference
All 25 tools exposed via MCP, organized by category.
π§ Cognitive Prosthetic (8)
Tools designed for how attention actually works β context recovery, domain switching, thread tracking.
tunnel_state(domain, limit?)Reconstruct your save-state for any cognitive domain. Returns current stage, open questions, recent decisions, and tone.
Parameters: domain (string) β the domain to query; limit (int, optional, default 10)
Example: tunnel_state("frontend-dev")
context_recovery(domain, summary_count?)Full re-entry brief when returning to a dormant topic. Like a "previously on..." for your thinking.
Parameters: domain (string) β the domain to recover; summary_count (int, optional, default 5)
Example: context_recovery("data-engineering")
switching_cost(current_domain, target_domain)Quantified cost of switching between two domains. Shows what you'd leave behind and what shared concepts exist.
Parameters: current_domain (string); target_domain (string)
Example: switching_cost("backend-dev", "frontend-dev")
open_threads(limit_per_domain?, max_domains?)Global unfinished business across all your domains. Surfaces questions you haven't resolved.
Parameters: limit_per_domain (int, optional); max_domains (int, optional)
Example: open_threads()
dormant_contexts(min_importance?, limit?)Alarm for abandoned topics with unresolved questions. Catches work you forgot about.
Parameters: min_importance (float, optional); limit (int, optional)
Example: dormant_contexts()
trust_dashboard()System-wide proof the safety net works. Shows sync status, data freshness, coverage stats.
Parameters: (none)
Example: trust_dashboard()
cognitive_patterns(domain?)Analyzes when and how you think best, backed by conversation data. "When do I do my best work?"
Parameters: domain (string, optional) β limit to a specific domain
Example: cognitive_patterns("ai-dev")
tunnel_history(domain)Engagement meta-view for any domain over time. Shows activity patterns and intensity.
Parameters: domain (string) β the domain to inspect
Example: tunnel_history("torah")
π Search (6)
semantic_search(query)Vector similarity search across all your conversations using local embeddings. Find by concept, not keyword.
Parameters: query (string) β natural language query
Example: semantic_search("embedding chunking strategies")
search_conversations(term, role?, limit?)Keyword search across conversations. Optionally filter by role (user/assistant).
Parameters: term (string); role (string, optional); limit (int, optional)
Example: search_conversations("deployment pipeline", role="user")
unified_search(query)Search across conversations, GitHub repos/commits, and markdown corpus simultaneously.
Parameters: query (string)
Example: unified_search("authentication flow")
search_summaries(query, extract?)Search structured v6 summaries. Extract specific sections: "summary", "questions", "decisions", "quotes".
Parameters: query (string); extract (string, optional) β one of: summary, questions, decisions, quotes
Example: search_summaries("database migration", extract="decisions")
search_docs(query, filter?)Search the markdown corpus. Optional filter: None, "ip", "breakthrough", "deep", "project", "todos".
Parameters: query (string); filter (string, optional)
Example: search_docs("monotropism", filter="breakthrough")
unfinished_threads(domain?)Find conversation threads with open/unresolved questions.
Parameters: domain (string, optional)
Example: unfinished_threads("backend-dev")
π Synthesis (4)
what_do_i_think(topic, mode?)Synthesize your views on a topic from real conversations. Mode: "synthesize" (default) or "precedent".
Parameters: topic (string); mode (string, optional) β "synthesize" or "precedent"
Example: what_do_i_think("microservices vs monolith")
alignment_check(decision)Check whether a decision aligns with your established principles and past decisions.
Parameters: decision (string) β the decision to evaluate
Example: alignment_check("switch from REST to GraphQL")
thinking_trajectory(topic, view?)Track how your thinking about a concept has evolved over time. View: "full", "velocity", or "first".
Parameters: topic (string); view (string, optional) β "full", "velocity", or "first"
Example: thinking_trajectory("testing strategy")
what_was_i_thinking(month)Snapshot of what you were working on and thinking about during a specific month.
Parameters: month (string) β format "YYYY-MM"
Example: what_was_i_thinking("2025-01")
π¬ Conversation (3)
get_conversation(id)Retrieve a full conversation by its ID.
Parameters: id (string) β conversation ID
Example: get_conversation("abc-123-def")
conversations_by_date(date)List all conversations that occurred on a specific date.
Parameters: date (string) β format "YYYY-MM-DD"
Example: conversations_by_date("2025-06-15")
brain_stats(view?)System statistics. Views: "overview", "domains", "pulse", "conversations", "embeddings", "github", "markdown".
Parameters: view (string, optional) β defaults to "overview"
Example: brain_stats(view="domains")
π GitHub (1)
github_search(project?, query?, mode?)Search GitHub repos and commits. Modes: "timeline", "conversations", "code", "validate".
Parameters: project (string, optional); query (string, optional); mode (string, optional)
Example: github_search(project="brain-mcp", mode="timeline")
π Analytics (1)
query_analytics(view, date?)Analytics views: "timeline", "stacks", "problems", "spend", "summary".
Parameters: view (string); date (string, optional)
Example: query_analytics(view="summary")
π Meta (2)
list_principles()List all defined principles (SEED framework).
Parameters: (none)
Example: list_principles()
get_principle(id)Get details of a specific principle by ID.
Parameters: id (string) β principle identifier
Example: get_principle("sovereignty")
Architecture
brain-mcp follows a layered architecture where raw conversation data flows through processing pipelines into queryable stores.
Your Conversations
(ChatGPT, Claude Desktop, Claude Code, Custom)
β
sync pipelines
(Python scripts)
β
βΌ
all_conversations.parquet
(unified format, immutable source)
β
ββββββββββΌβββββββββ
βΌ βΌ βΌ
LanceDB DuckDB Structured
vectors queries summaries (v6)
(768-dim) (SQL) (JSON)
β β β
ββββββββββΌβββββββββ
β
MCP Server
(25 tools)
β
ββββββββββΌβββββββββ
βΌ βΌ βΌ
Claude Cursor Windsurf
Desktop (any MCP client)Key Design Principles
100% Local
Embeddings run on your hardware (Apple Silicon MPS, CUDA, or CPU). LanceDB is a local file β no database server. No cloud dependencies for core operations.
Immutable Source Layer
Raw conversation data in Parquet is never modified. All derived layers (embeddings, summaries) can be deleted and regenerated from source.
Standard Protocol
MCP (Model Context Protocol) means any compatible client works out of the box. No custom integrations, no API wrappers.
Sync Pipeline
brain-mcp uses a two-tier sync pipeline to keep your data fresh without impacting performance.
Hourly Sync (Quick)
Syncs new conversations from active sources (e.g., recent Claude/Clawdbot sessions) into the Parquet file. Fast, lightweight, catches new conversations.
Nightly Sync (Full)
Full pipeline: syncs all sources, generates embeddings for new messages, updates summaries. Runs during low-activity hours.
# Hourly quick sync (just conversations β parquet)
5 * * * * /path/to/brain-mcp/scripts/sync_quick.sh
# Nightly full sync (all sources + embeddings + summaries)
0 3 * * * /path/to/brain-mcp/scripts/sync_full.sh
# Daily GitHub sync
30 2 * * * /path/to/brain-mcp/scripts/sync_github.shWhat Syncs When
| Step | Hourly | Nightly |
|---|---|---|
| Conversation sync | β | β |
| Embedding generation | β | β |
| Summary generation | β | β |
| GitHub sync | β | β |
You can also run any sync step manually:
python -m cli sync # sync all conversation sources
python -m cli embed # generate embeddings for new messages
python -m cli summarize # generate summaries for new conversations