hyeonsangjeon/computing-Korean-STT-error-rates
STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지
Implements Levenshtein distance-based edit distance computation using dynamic programming to calculate character-level (CER) and word-level (WER) error rates for Korean STT outputs against reference transcripts. Handles Korean-specific preprocessing including whitespace normalization and optional punctuation filtering, plus a regex-based keyword pattern matcher that robustly detects named entities despite morphological variations (particles, endings) and spacing inconsistencies common in speech recognition results.
No commits in the last 6 months.
Stars
68
Forks
10
Language
Python
License
MIT
Category
Last pushed
Jun 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/hyeonsangjeon/computing-Korean-STT-error-rates"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fgnt/meeteval
MeetEval - A meeting transcription evaluation toolkit
kahne/fastwer
A PyPI package for fast word/character error rate (WER/CER) calculation
tabahi/bournemouth-forced-aligner
Extract phoneme-level timestamps from speeh audio.
readbeyond/aeneas
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka...
wq2012/SimpleDER
A lightweight library to compute Diarization Error Rate (DER).