kostadindev/knowledge-base-builder
Python package that constructs a structured markdown knowledge base from external sources such as PDFs, websites, and GitHub repos with LLM summarization. Ideal for RAG, search-friendly LLM contexts (/llms.txt), and chatbots.
Supports ingestion of 10+ content types (YouTube transcripts, arXiv papers, RSS feeds, Jupyter notebooks, PowerPoint slides) with specialized extractors that auto-detect sources by URL pattern or file extension. The three-phase pipeline uses concurrent text extraction with configurable semaphores and automatic retry logic, followed by LLM-driven summarization that generates output in multiple formats—markdown, `/llms.txt` spec, or vector-store chunks—all with optional incremental caching to skip unchanged sources on rebuild.
No commits in the last 6 months. Available on PyPI.
Stars
8
Forks
2
Language
Python
License
MIT
Category
Last pushed
Jun 16, 2025
Commits (30d)
0
Dependencies
16
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/kostadindev/knowledge-base-builder"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ConardLi/easy-dataset
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
ItzCrazyKns/Vane
Vane is an AI-powered answering engine.
DS4SD/deepsearch-toolkit
Interact with the Deep Search platform for new knowledge explorations and discoveries
xuwei95/ezdata
基于python和llm大模型开发的数据处理和任务调度系统。...
ModelEngine-Group/DataMate
DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG...