CarolinElsner/Speech-Tokenization

The tokenisation of spoken text. Received by the Watson STT and sent to the Apache OpenNLP. Additional code creates individual tokens, depending on the recorded sentences

/ 100

Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 3 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Java

License

MIT

Category

tokenization-algorithms

Last pushed

Jul 16, 2018

Commits (30d)

GitHub

Tokenization Algorithms · 57 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CarolinElsner/Speech-Tokenization"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

soaxelbrooke/python-bpe

Byte Pair Encoding for Python!

OpenNMT/Tokenizer

Fast and customizable text tokenization library with BPE and SentencePiece support

Systemcluster/kitoken

Fast and versatile tokenizer for language models, compatible with SentencePiece, Tokenizers,...

daac-tools/vibrato

🎤 vibrato: Viterbi-based accelerated tokenizer

Explore NLP Tools

All categories Trending NLP directory Insights