pytorch/text
Models, data loaders and abstractions for language processing, powered by PyTorch
ArchivedProvides pre-built datasets (WikiText, SQuAD, Multi30k, AG_NEWS, etc.), scriptable tokenizers (SentencePiece, GPT-2 BPE, BERT), and pre-trained transformer models (RoBERTa, T5, XLM-R) with TorchData integration for efficient data pipelines. Supports vectorized text transformations and vocabulary management, designed to streamline end-to-end NLP workflows within the PyTorch ecosystem.
3,565 stars. No commits in the last 6 months.
Stars
3,565
Forks
813
Language
Python
License
BSD-3-Clause
Category
Last pushed
Sep 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/pytorch/text"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
facebookresearch/stopes
A library for preparing data for machine translation research (monolingual preprocessing,...
rkcosmos/deepcut
A Thai word tokenization library using Deep Neural Network
Droidtown/ArticutAPI
API of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到...
fukuball/jieba-php
"結巴"中文分詞:做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...
jiesutd/NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER,...