sentencepiece and sentencepiece-jni

The JNI wrapper is a Java language binding that enables direct access to the core SentencePiece tokenizer library, making them complements designed to be used together rather than alternatives.

sentencepiece
84
Verified
sentencepiece-jni
41
Emerging
Maintenance 13/25
Adoption 25/25
Maturity 25/25
Community 21/25
Maintenance 0/25
Adoption 7/25
Maturity 16/25
Community 18/25
Stars: 11,697
Forks: 1,333
Downloads: 33,078,873
Commits (30d): 2
Language: C++
License: Apache-2.0
Stars: 38
Forks: 14
Downloads:
Commits (30d): 0
Language: C++
License: MIT
No risk flags
Stale 6m No Package No Dependents

About sentencepiece

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

Implements both byte-pair-encoding (BPE) and unigram language model algorithms with subword regularization techniques to improve model robustness. Operates directly on raw Unicode text without requiring language-specific preprocessing, and provides end-to-end vocabulary-to-ID mapping with NFKC normalization. Available as self-contained C++ and Python libraries that achieve ~50k sentences/sec throughput while maintaining consistent tokenization across deployments.

About sentencepiece-jni

levyfan/sentencepiece-jni

Java JNI wrapper for SentencePiece: unsupervised text tokenizer for Neural Network-based text generation.

Scores updated daily from GitHub, PyPI, and npm data. How scores work