Korean Language Models

Pretrained transformer models specifically designed for Korean language processing, including BERT, ELECTRA, and specialized variants. Does NOT include general multilingual models, non-Korean language models, or downstream task-specific applications (unless they primarily showcase the Korean model architecture itself).

There are 33 korean language models tracked. The highest-rated is SKTBrain/KoBERT at 46/100 with 1,407 stars.

Get all 33 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=korean-language-models&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	SKTBrain/KoBERT Korean BERT pre-trained cased (KoBERT)	46	Emerging	1,407	Python
2	monologg/KoELECTRA Pretrained ELECTRA Model for Korean	44	Emerging	630	Python
3	monologg/KoBERT-Transformers KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)	40	Emerging	212	Python
4	VinAIResearch/PhoBERT PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)	40	Emerging	775	—
5	KB-AI-Research/KB-ALBERT KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델	39	Emerging	241	Python
6	monologg/KoBERT-KorQuAD Korean MRC (KorQuAD) with KoBERT	37	Emerging	65	Python
7	ymcui/MacBERT Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)	37	Emerging	702	—
8	monologg/DistilKoBERT Distillation of KoBERT from SKTBrain (Lightweight KoBERT)	35	Emerging	198	Python
9	Beomi/KcELECTRA 🤗 Korean Comments ELECTRA: 한국어 댓글로 학습한 ELECTRA 모델	34	Emerging	261	—
10	thevasudevgupta/bigbird Google's BigBird (Jax/Flax & PyTorch) @ 🤗Transformers	33	Emerging	49	Jupyter Notebook
11	monologg/korean-hate-speech-koelectra Bias, Hate classification with KoELECTRA 👿	33	Emerging	27	Python
12	monologg/KoBigBird 🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)	33	Emerging	201	Python
13	monologg/KoCharELECTRA Character-level Korean ELECTRA Model (음절 단위 한국어 ELECTRA)	33	Emerging	54	Python
14	toriving/text-classification-transformers Easy text classification for everyone : Bert based models via Huggingface...	30	Emerging	39	Python
15	monologg/KoELECTRA-Pipeline Transformers Pipeline with KoELECTRA	30	Emerging	40	Python
16	monologg/HanBert-Transformers HanBert on 🤗 Huggingface Transformers 🤗	29	Experimental	87	Python
17	bayartsogt-ya/albert-mongolian ALBERT trained on Mongolian text corpus	27	Experimental	18	Jupyter Notebook
18	sajjjadayobi/ParsBigBird Persian Bert For Long-Range Sequences	26	Experimental	63	Jupyter Notebook
19	Anshler/vietnamese-poem-classifier Classify genre and score Vietnamese poems 📜🔍	26	Experimental	5	Python
20	SciCrunch/bio_electra Bio-Electra - Small and efficient discriminatively pre-trained language...	24	Experimental	4	Python
21	oneonlee/KoAirBERT 🤗 항공 안전 도메인에 특화된 한국어 BERT 모델 ✈️	23	Experimental	2	Jupyter Notebook
22	svn05/vietnamese-sentiment-phobert a fine tuned PhoBERT model to classify product reviews across a range of...	20	Experimental	1	Python
23	Nikki-oo7/pos-tagger Part-of-Speech Tagger implemented in PyTorch using BiLSTM and Transformer models.	19	Experimental	—	Python
24	yejoon-lee/kr3 KR3: Korean Restaurant Review with Ratings / Experiments on...	18	Experimental	6	Jupyter Notebook
25	qanastek/French-Part-Of-Speech-Tagging Repository for the source code of the HuggingFace Space named...	16	Experimental	4	Python
26	codegram/calbert Catalan ALBERT (A Lite BERT for self-supervised learning of language representations)	16	Experimental	14	Python
27	Pirata-Codex/Tag-Persian-Entities-Using-Bert Using the fa-bert model to tag persian entities in a sentence	15	Experimental	2	Jupyter Notebook
28	edoost/pert Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging	13	Experimental	10	Jupyter Notebook
29	qanastek/ANTILLES ANTILLES : An Open French Linguistically Enriched Part-of-Speech Corpus	13	Experimental	7	Python
30	HRSadeghi/Joint_Comma_and_Kasreh_Recognizer In this repository, we provide a joint neural model based on BERT and two...	12	Experimental	3	Python
31	ilos-vigil/bigbird-small-indonesian Lighweight Indonesian language model for long sequence.	12	Experimental	4	Python
32	phanxuanphucnd/CoBERTa CoBERTa is a pre-trained models are the pre-trained language models for...	12	Experimental	3	Python
33	amanaser/BabyLM-ELECTRA-Pre-training BabyLM ELECTRA Pre-training on NVIDIA L40 GPU Cluster.	11	Experimental	—	Jupyter Notebook

Comparisons in this category

KoBERT and KoBERT-Transformers (46 vs 40) KoELECTRA and KoCharELECTRA (44 vs 33) KoELECTRA and KoELECTRA-Pipeline (44 vs 30)