koomri/text-segmentation
Implementation of the paper: Text Segmentation as a Supervised Learning Task
Implements supervised neural segmentation using sentence embeddings (max pooling strategy) trained on wiki-727K and Choi datasets with word2vec representations. Built on PyTorch with configurable model architectures, includes dataset preprocessing pipelines for Wikipedia dumps and evaluation metrics via segeval. Provides CLI tools for training, evaluation, and custom dataset generation with TensorBoard logging support.
265 stars. No commits in the last 6 months.
Stars
265
Forks
57
Language
Python
License
—
Category
Last pushed
Oct 02, 2019
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/koomri/text-segmentation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PyThaiNLP/attacut
A Fast and Accurate Neural Thai Word Segmenter
UlugbekSalaev/UzTransliterator
UzTransliterator | State-of-the-art machine transliteration tool for Uzbek language
VietHoang1512/khmer-nltk
Khmer language processing toolkit
seanghay/KhmerOCR
A Fast Khmer Optical Character Recognition (KhmerOCR)
seanghay/khmernormalizer
A missing toolkit for Khmer Natural Language Processing.