vac:bi:rag:2025q4-rag-context-improvement
Description
Extract the transcript from Youtube video to use it for RAG context and other possibility
Task List
Add Code Chunking to the RAG
- fully qualified name:
vac:bi:rag:2025q4-rag-context-improvement:add-code-chunking - owner: nickninov
- status: in progress (35%)
- start-date: 2025/10/01
- end-date: 2025/12/31
Description
https://github.com/status-im/data-docs/issues/82
Schedule note: Dates reflect quarter bounds; update when actual timing is known.
Deliverables
- Updated the RAG upload prefix logic so freshly ingested chunks no longer collide, and backfilled the newest data through the pipeline.
- Patched the Quadrant HTTPS ingestion bug and added monitoring for data freshness as the Quadrant DB grows.
- Expanded the sources dashboard with chunk counts to make it easier to audit what has been loaded.
- Add task to dagster ETL to include code repository to the RAG context
- Write documentation in Data-docs.
Google Meeting transcript
- fully qualified name:
vac:bi:rag:2025q4-rag-context-improvement:google-meeting-transcript - owner: nickninov
- status: in progress (20%)
- start-date: 2025/10/01
- end-date: 2025/12/31
Description
Include transcript from Google Meeting to the RAG context. https://github.com/status-im/data-docs/issues/68
Schedule note: Dates reflect quarter bounds; update when actual timing is known.
Deliverables
- Debugged the meeting transcript ingestion (YouTube metadata & evaluation pipeline) and documented the outstanding edge cases.
- Add task to dagster ETL to include meeting transcript to the RAG context.
- Write documentation in Data-docs.