davidsvy/Neural-Scam-Artist
Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.
The pipeline combines MinHash/LSH-based deduplication with connected component analysis to handle false negatives, selecting representative documents by readability score rather than naive deduplication. Fine-tuning leverages Huggingface's pretrained GPT-2 model on the cleaned dataset, with the full workflow orchestrated through modular Python scripts that support YAML configuration for each stage (scraping, deduplication, training, and generation).
No commits in the last 6 months.
Stars
28
Forks
3
Language
Python
License
MIT
Category
Last pushed
Oct 30, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/davidsvy/Neural-Scam-Artist"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jklu-jaipur/Political-Biasness-Detection
Our ML model calculates the biasness of a political article based on linguistic features and...
yamanalab/why-darkpattern
[Proc of IEEE BigData 2023] Why is the User Interface a Dark Pattern? : Explainable...
sdarjunwadkar/Political-Idealogies-Prediction-in-News-Articles
Media diversity shapes perspectives, yet biased news distorts reality, fostering misinformation....
nerdimite/bert-web-app
Code for the FullStack AI Live Coding Series- Part 2 (CellStrat AI Lab)
adriansprk/unbiased
🤖 Unbias: AI-powered tool to instantly decode news articles, revealing bias, factual claims,...