Aavache/LLMWebCrawler

A Web Crawler based on LLMs implemented with Ray and Huggingface. The embeddings are saved into a vector database for fast clustering and retrieval. Use it for your RAG.

32
/ 100
Emerging

Implements recursive web crawling with configurable depth limits and stores both raw text and BERT embeddings in Milvus vector database for semantic similarity search. Distributes crawling workloads across Ray workers in a master-worker architecture, with a FastAPI interface for querying crawled content by vector proximity rather than keyword matching.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

98

Forks

13

Language

Python

License

Category

local-rag-stacks

Last pushed

Oct 15, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/Aavache/LLMWebCrawler"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.