Data Sources
What data goes into Brain MCP and how it gets there.
Supported Sources
Clawdbot Sessions
HourlyAI agent conversations via Clawdbot gateway
Claude Code
Nightly 3amClaude Code project conversations
Claude Desktop
Nightly 3amClaude Desktop chat history
ChatGPT Exports
On importExported ChatGPT conversations (JSON)
GitHub
Daily 2:30amRepos, commits, and code via gh CLI
Data Flow
Sources (sessions, exports, repos)
│
▼
sync scripts (Python)
│
▼
all_conversations.parquet (377K messages)
│
▼
embed_new_messages.py
│
▼
vectors/brain.lance (82K embeddings)
│
▼
Brain MCP Server → 25 toolsWhat Gets Indexed
- • User messages — your actual words and questions
- • Decisions — extracted from conversation context
- • Open questions — unresolved threads and queries
- • Domain tags — auto-classified into 25 domains
Only user messages are embedded (not assistant responses). Messages under 10 characters are skipped as noise.