hazemabdelkawy/SunnahGPT
SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the popular website sunnah.com and applying OpenAI's GPT-3.5 model to generate textual embeddings for each hadith
The scraper uses BeautifulSoup to parse sunnah.com's HTML structure and requests library to handle HTTP fetching with rate-limiting, while OpenAI's text-embedding API generates vector representations for semantic search and similarity analysis across the hadith corpus. The pipeline outputs structured JSON datasets containing bilingual text (Arabic/English), chain-of-transmission metadata, and 1536-dimensional embeddings for downstream NLP tasks. Pre-computed embeddings are available via Google Drive, enabling immediate use in retrieval-augmented generation or Islamic studies research without re-scraping.
No commits in the last 6 months.
Stars
86
Forks
12
Language
HTML
License
—
Category
Last pushed
Mar 26, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/hazemabdelkawy/SunnahGPT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
colonelwatch/abstracts-search
Semantic search engine indexing 110 million academic publications
ahr9n/quranic-search-v2
Quranic Lexical/Semantic Search
VIGINUM-FR/D3lta
A Python implementation of the D3lta algorithm for duplicated textual content detection
geetanjaliapp/geetanjali
RAG-powered ethical decision guidance from Bhagavad Geeta. Analyze dilemmas, get structured...
mufaizz/FAIZ-AI
FAIZ AI 🔍 – The search bot that finds what others miss. Searches HTTP, FTP, IPFS & Torrent with...