AnkitNayak-eth/CrawlAI-RAG
CrawlAI RAG is an AI-powered website intelligence platform that allows users to crawl entire websites, index their content, and ask natural-language questions using Retrieval-Augmented Generation (RAG). It transforms static websites into queryable knowledge bases.
Leverages BeautifulSoup4 and Playwright for browser-based crawling, with LangChain orchestrating the RAG pipeline across ChromaDB vector storage and Groq's LLaMA 3.3 70B for inference. Supports multi-website indexing in a shared vector database, enabling cross-site semantic search. Built on FastAPI (backend) and Streamlit (frontend) for rapid deployment and extensibility.
Stars
93
Forks
18
Language
Python
License
MIT
Category
Last pushed
Feb 15, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/AnkitNayak-eth/CrawlAI-RAG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
lightfeed/extractor
Using LLMs and AI browser automation to robustly extract web data