CarolinElsner/Speech-Tokenization
The tokenisation of spoken text. Received by the Watson STT and sent to the Apache OpenNLP. Additional code creates individual tokens, depending on the recorded sentences
No commits in the last 6 months.
Stars
4
Forks
—
Language
Java
License
MIT
Category
Last pushed
Jul 16, 2018
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CarolinElsner/Speech-Tokenization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
soaxelbrooke/python-bpe
Byte Pair Encoding for Python!
OpenNMT/Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
Systemcluster/kitoken
Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...
daac-tools/vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer