kostadindev/knowledge-base-builder

Python package that constructs a structured markdown knowledge base from external sources such as PDFs, websites, and GitHub repos with LLM summarization. Ideal for RAG, search-friendly LLM contexts (/llms.txt), and chatbots.

37
/ 100
Emerging

Supports ingestion of 10+ content types (YouTube transcripts, arXiv papers, RSS feeds, Jupyter notebooks, PowerPoint slides) with specialized extractors that auto-detect sources by URL pattern or file extension. The three-phase pipeline uses concurrent text extraction with configurable semaphores and automatic retry logic, followed by LLM-driven summarization that generates output in multiple formats—markdown, `/llms.txt` spec, or vector-store chunks—all with optional incremental caching to skip unchanged sources on rebuild.

No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 2 / 25
Adoption 4 / 25
Maturity 18 / 25
Community 13 / 25

How are scores calculated?

Stars

8

Forks

2

Language

Python

License

MIT

Last pushed

Jun 16, 2025

Commits (30d)

0

Dependencies

16

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/kostadindev/knowledge-base-builder"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.