hazemabdelkawy/SunnahGPT

SunnahGPT is a natural language processing (NLP) project aimed at scraping hadith data from the popular website sunnah.com and applying OpenAI's GPT-3.5 model to generate textual embeddings for each hadith

32
/ 100
Emerging

The scraper uses BeautifulSoup to parse sunnah.com's HTML structure and requests library to handle HTTP fetching with rate-limiting, while OpenAI's text-embedding API generates vector representations for semantic search and similarity analysis across the hadith corpus. The pipeline outputs structured JSON datasets containing bilingual text (Arabic/English), chain-of-transmission metadata, and 1536-dimensional embeddings for downstream NLP tasks. Pre-computed embeddings are available via Google Drive, enabling immediate use in retrieval-augmented generation or Islamic studies research without re-scraping.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

86

Forks

12

Language

HTML

License

Last pushed

Mar 26, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/hazemabdelkawy/SunnahGPT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.