tokenizers and libtokenizers
Maintenance
20/25
Adoption
25/25
Maturity
25/25
Community
20/25
Maintenance
10/25
Adoption
0/25
Maturity
11/25
Community
0/25
Stars: 10,520
Forks: 1,051
Downloads: 1,504,044
Commits (30d): 45
Language: Rust
License: Apache-2.0
Stars: —
Forks: —
Downloads: —
Commits (30d): 0
Language: Rust
License: Apache-2.0
No risk flags
No Package
No Dependents
About tokenizers
huggingface/tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.
natural-language-processing
machine-learning-engineering
text-pre-processing
AI-model-training
About libtokenizers
muna-ai/libtokenizers
C/C++ bindings from Huggingface Tokenizers.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work