tokenizers and azerbaijani-tokenizer

tokenizers
90
Verified
azerbaijani-tokenizer
20
Experimental
Maintenance 20/25
Adoption 25/25
Maturity 25/25
Community 20/25
Maintenance 2/25
Adoption 3/25
Maturity 15/25
Community 0/25
Stars: 10,520
Forks: 1,051
Downloads: 1,504,044
Commits (30d): 45
Language: Rust
License: Apache-2.0
Stars: 4
Forks:
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About tokenizers

huggingface/tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

When working with large volumes of text for natural language processing, this tool helps you convert raw text into a format that machine learning models can understand. It takes your raw text documents as input and produces a 'vocabulary' and 'tokens'—which are numerical representations of words or sub-word units. This is essential for AI researchers and machine learning engineers building or fine-tuning language models.

natural-language-processing machine-learning-engineering text-pre-processing AI-model-training

About azerbaijani-tokenizer

hikmatazimzade/azerbaijani-tokenizer

High-Performance Azerbaijani Tokenizers (30% fewer tokens, 40% faster than multilingual alternatives)

Scores updated daily from GitHub, PyPI, and npm data. How scores work