Data Sources

What data goes into Brain MCP and how it gets there.

Supported Sources

Claude Code

Automatic

Claude Code project conversations (auto-detected)

Claude Desktop

Automatic

Claude Desktop chat history (auto-detected)

Cursor

Automatic

Cursor AI conversations (auto-detected)

Windsurf

Automatic

Windsurf AI conversations (auto-detected)

Gemini CLI

Automatic

Gemini CLI conversations (auto-detected)

ChatGPT Exports

On import

Exported ChatGPT conversations (JSON)

Clawdbot Sessions

On import

AI agent conversations via Clawdbot gateway

Generic JSONL

On import

Custom sources via JSONL format

Data Flow

Sources (Claude Code, Claude Desktop, Cursor, Windsurf, Gemini CLI, ChatGPT exports)
    │
    ▼
brain-mcp sync (automatic)
    │
    ▼
conversations.parquet (unified format)
    │
    ▼
local embedding (nomic-embed-text-v1.5)
    │
    ▼
vectors/brain.lance (semantic index)
    │
    ▼
brain-mcp serve → 25 MCP tools

What Gets Indexed

  • User messages — your actual words and questions
  • Decisions — extracted from conversation context
  • Open questions — unresolved threads and queries
  • Domain tags — auto-classified into 25 domains

Only user messages are embedded (not assistant responses). Messages under 10 characters are skipped as noise.