Data Sync Pipeline

Set up and maintain the Brain MCP data pipeline.

Sync Schedule

ScheduleScriptPurpose
Every hour (:05)brain_sync_quick.shClawdbot sessions → parquet
Nightly 3ambrain_sync_unified.shAll sources + embedding
Daily 2:30amsync_github.pyGitHub repos + commits

Manual Sync

Quick sync (parquet only):

brain_sync_quick.sh

Full sync with embedding:

brain_sync_unified.sh

Check embedding stats:

python pipelines/embed_new_messages.py stats

Embedding Details

  • Model: nomic-ai/nomic-embed-text-v1.5 (768 dimensions)
  • Only user messages are embedded (not assistant responses)
  • Noise filtered: messages under 10 chars skipped
  • Batch size: 10 (for stability)
  • Storage: LanceDB (local, no cloud dependency)