Data Schema
The data structures behind Brain MCP.
Parquet Schema
The core conversation data stored in all_conversations.parquet:
message_id : string # Unique message ID conversation_id: string # Conversation this belongs to role : string # "user" | "assistant" content : string # Message text timestamp : datetime # When the message was sent source : string # "clawdbot" | "claude-code" | "chatgpt" | "claude_desktop" domain : string # Auto-classified domain (25 categories)
LanceDB Vector Schema
Embeddings stored in vectors/brain.lance:
vector : float32[768] # nomic-embed-text-v1.5 embedding message_id : string # Links to parquet record text : string # Original message text conversation_id: string # Parent conversation timestamp : datetime # Message timestamp source : string # Data source
Structured Summaries (v6)
9,979 conversation summaries with extracted metadata:
conversation_id : string summary : string # 2-3 sentence summary questions : string[] # Open questions extracted decisions : string[] # Decisions made quotes : string[] # Notable quotes domain : string # Primary domain thinking_stage : string # exploration | building | executing | reflecting importance : float # 0-1 relevance score
Current Stats
377K
Parquet messages
82K
LanceDB vectors
9,979
Structured summaries
25
Tracked domains