Hasnat-Aarif-Aslam/NLP-Foundation-Tokens-Ngrams-BoW-TF-IDF-TFIDF
Comprehensive guide to text preprocessing and vectorization techniques for NLP, covering tokenization, n-grams, Bag-of-Words, TF-IDF, and related feature-engineering methods.
No commits in the last 6 months.
Stars
—
Forks
—
Language
—
License
MIT
Category
Last pushed
Jun 28, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Hasnat-Aarif-Aslam/NLP-Foundation-Tokens-Ngrams-BoW-TF-IDF-TFIDF"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
textvec/textvec
Text vectorization tool to outperform TFIDF for classification tasks
DigitalPebble/behemoth
Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
nasa-jpl-memex/memex-gate
General Architecture for Text Engineering
NISH1001/tag-generator
A simple tool to generate tags for the given text (document) using TF-IDF.
cooperability/BMX-bookmark-extractor
Better brain. Knowledge management tool. Stop saving things you'll never read. Work in progress.