Data Schema

The data structures behind Brain MCP.

Parquet Schema

The core conversation data stored in all_conversations.parquet:

message_id    : string   # Unique message ID
conversation_id: string   # Conversation this belongs to
role           : string   # "user" | "assistant"
content        : string   # Message text
timestamp      : datetime # When the message was sent
source         : string   # "clawdbot" | "claude-code" | "chatgpt" | "claude_desktop"
domain         : string   # Auto-classified domain (25 categories)

LanceDB Vector Schema

Embeddings stored in vectors/brain.lance:

vector         : float32[768]  # nomic-embed-text-v1.5 embedding
message_id     : string        # Links to parquet record
text           : string        # Original message text
conversation_id: string        # Parent conversation
timestamp      : datetime      # Message timestamp
source         : string        # Data source

Structured Summaries (v6)

9,979 conversation summaries with extracted metadata:

conversation_id : string
summary         : string    # 2-3 sentence summary
questions       : string[]  # Open questions extracted
decisions       : string[]  # Decisions made
quotes          : string[]  # Notable quotes
domain          : string    # Primary domain
thinking_stage  : string    # exploration | building | executing | reflecting
importance      : float     # 0-1 relevance score

Current Stats

377K

Parquet messages

82K

LanceDB vectors

9,979

Structured summaries

25

Tracked domains