davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

25
/ 100
Experimental

The pipeline combines MinHash/LSH-based deduplication with connected component analysis to handle false negatives, selecting representative documents by readability score rather than naive deduplication. Fine-tuning leverages Huggingface's pretrained GPT-2 model on the cleaned dataset, with the full workflow orchestrated through modular Python scripts that support YAML configuration for each stage (scraping, deduplication, training, and generation).

No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 9 / 25
Community 9 / 25

How are scores calculated?

Stars

28

Forks

3

Language

Python

License

MIT

Last pushed

Oct 30, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/davidsvy/Neural-Scam-Artist"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.