VinAIResearch/PhoBERT
PhoBERT: Pre-trained language models for Vietnamese (EMNLP-2020 Findings)
Provides base and large transformer models (135M–370M parameters) optimized via RoBERTa's pre-training procedure and trained on 20GB of Vietnamese Wikipedia/news corpora. Integrates seamlessly with Hugging Face `transformers` and `fairseq` frameworks, with models available on the Hub; requires upstream word segmentation via VnCoreNLP's RDRSegmenter to handle Vietnamese morphology before inference. Achieves state-of-the-art on four downstream tasks: POS tagging, dependency parsing, NER, and natural language inference.
775 stars. No commits in the last 6 months.
Stars
775
Forks
112
Language
—
License
MIT
Category
Last pushed
Jul 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/VinAIResearch/PhoBERT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
SKTBrain/KoBERT
Korean BERT pre-trained cased (KoBERT)
monologg/KoELECTRA
Pretrained ELECTRA Model for Korean
monologg/KoBERT-Transformers
KoBERT on 🤗 Huggingface Transformers 🤗 (with Bug Fixed)
KB-AI-Research/KB-ALBERT
KB국민은행에서 제공하는 경제/금융 도메인에 특화된 한국어 ALBERT 모델
ymcui/MacBERT
Revisiting Pre-trained Models for Chinese Natural Language Processing (MacBERT)