Data Sync Pipeline
Set up and maintain the Brain MCP data pipeline.
Sync Schedule
| Schedule | Script | Purpose |
|---|---|---|
| Every hour (:05) | brain_sync_quick.sh | Clawdbot sessions → parquet |
| Nightly 3am | brain_sync_unified.sh | All sources + embedding |
| Daily 2:30am | sync_github.py | GitHub repos + commits |
Manual Sync
Quick sync (parquet only):
brain_sync_quick.shFull sync with embedding:
brain_sync_unified.shCheck embedding stats:
python pipelines/embed_new_messages.py statsEmbedding Details
- • Model: nomic-ai/nomic-embed-text-v1.5 (768 dimensions)
- • Only user messages are embedded (not assistant responses)
- • Noise filtered: messages under 10 chars skipped
- • Batch size: 10 (for stability)
- • Storage: LanceDB (local, no cloud dependency)