Private Knowledge Hub
CompleteLocal / Private — no live demoLocal document retrieval system for private project files. Indexes CLAUDE.md, README and SKILL.md files across 10+ projects and answers natural-language questions with grounded source references and rendered markdown output.
Technologies
Problem
Over 65 personal project documents (CLAUDE.md files, README files, SKILL.md files, code files) are scattered across multiple repos. No unified interface to query knowledge across projects or retrieve context on demand.
Approach
FolderIngester with recursive folder walk, 18 supported file types, SHA-256 dedup and 50 MB cap. MarkdownLoader with regex stripping pipeline (removes ##, **, table syntax) before embedding for better similarity scores. Similarity threshold tuned to 0.25 — markdown docs score lower than prose PDFs. Scoped via app_name="private_hub" in the shared pgvector database on AWS RDS.
Result
65+ personal project documents indexed. FastAPI running locally on localhost:8001 — completely separate from the public RAG demo. Per-document delete and de-index with cascade cleanup of chunks and embeddings. Answers rendered as formatted markdown via marked.js.
Learnings
Similarity threshold is document-type dependent: markdown docs with structural syntax score lower than prose PDFs — stripping before embedding measurably improves retrieval quality. app_name scoping enables multi-tenant reuse of a shared pgvector database without data mixing.
Relevance
Demonstrates privacy-first AI tooling, practical reuse of shared infrastructure, and building personal productivity tools for knowledge workers — without external dependencies or cloud deployments for private content.