davidsvy/Neural-Scam-Artist

Web Scraping, Document Deduplication & GPT-2 Fine-tuning with a newly created scam dataset.

/ 100

Experimental

The pipeline combines MinHash/LSH-based deduplication with connected component analysis to handle false negatives, selecting representative documents by readability score rather than naive deduplication. Fine-tuning leverages Huggingface's pretrained GPT-2 model on the cleaned dataset, with the full workflow orchestrated through modular Python scripts that support YAML configuration for each stage (scraping, deduplication, training, and generation).

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 9 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

jklu-jaipur/Political-Biasness-Detection

Our ML model calculates the biasness of a political article based on linguistic features and...

yamanalab/why-darkpattern

[Proc of IEEE BigData 2023] Why is the User Interface a Dark Pattern? : Explainable...

sdarjunwadkar/Political-Idealogies-Prediction-in-News-Articles

Media diversity shapes perspectives, yet biased news distorts reality, fostering misinformation....

nerdimite/bert-web-app

Code for the FullStack AI Live Coding Series- Part 2 (CellStrat AI Lab)

adriansprk/unbiased

🤖 Unbias: AI-powered tool to instantly decode news articles, revealing bias, factual claims,...

Explore NLP Tools

All categories Trending NLP directory Insights