facebookresearch/stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

/ 100

Verified

297 stars and 286 monthly downloads. Available on PyPI.

Maintenance 13 / 25

Adoption 16 / 25

Maturity 25 / 25

Community 19 / 25

Stars

297

Forks

Language

Python

License

MIT

Category

Last pushed

Mar 12, 2026

Monthly downloads

286

Commits (30d)

Dependencies

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/facebookresearch/stopes"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

rkcosmos/deepcut

A Thai word tokenization library using Deep Neural Network

Droidtown/ArticutAPI

API of Articut 中文斷詞 (兼具語意詞性標記)：「斷詞」又稱「分詞」，是中文資訊處理的基礎。Articut 不用機器學習，不需資料模型，只用現代白話中文語法規則，即能達到...

fukuball/jieba-php

"結巴"中文分詞：做最好的 PHP 中文分詞、中文斷詞組件。 / "Jieba" (Chinese for "to stutter") Chinese text segmentation:...

jiesutd/NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER,...

pytorch/text

Models, data loaders and abstractions for language processing, powered by PyTorch