ucto and python-ucto

One is a Python binding to the other, making them ecosystem siblings where the Python tool serves as a client library for the core tokenizer.

ucto

Established

python-ucto

Established

Maintenance 13/25

Adoption 9/25

Maturity 16/25

Community 18/25

Maintenance 10/25

Adoption 13/25

Maturity 17/25

Community 13/25

Stars: 70

Forks: 14

Downloads: —

Commits (30d): 0

Language: C++

License: GPL-3.0

Stars: 31

Forks: 5

Downloads: 487

Commits (30d): 0

Language: Cython

License: —

No Package No Dependents

No License

About ucto

LanguageMachines/ucto

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

About python-ucto

proycon/python-ucto

This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is regular-expression based, extensible, and advanced tokeniser written in C++ (http://ilk.uvt.nl/ucto).

Scores updated daily from GitHub, PyPI, and npm data. How scores work