Data Sources

What data goes into Brain MCP and how it gets there.

Supported Sources

Clawdbot Sessions

Hourly

AI agent conversations via Clawdbot gateway

Claude Code

Nightly 3am

Claude Code project conversations

Claude Desktop

Nightly 3am

Claude Desktop chat history

ChatGPT Exports

On import

Exported ChatGPT conversations (JSON)

GitHub

Daily 2:30am

Repos, commits, and code via gh CLI

Data Flow

Sources (sessions, exports, repos)
    │
    ▼
sync scripts (Python)
    │
    ▼
all_conversations.parquet (377K messages)
    │
    ▼
embed_new_messages.py
    │
    ▼
vectors/brain.lance (82K embeddings)
    │
    ▼
Brain MCP Server → 25 tools

What Gets Indexed

  • User messages — your actual words and questions
  • Decisions — extracted from conversation context
  • Open questions — unresolved threads and queries
  • Domain tags — auto-classified into 25 domains

Only user messages are embedded (not assistant responses). Messages under 10 characters are skipped as noise.