Morizeyao/GPT2-Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
Supports multiple tokenization strategies (character-level, word-level, and BPE) and integrates with HuggingFace Transformers, enabling training on diverse Chinese text domains from classical poetry to novels with configurable model depth and batch sizes. Includes pre-trained models for specialized domains (ancient Chinese, lyrics, couplets) available on Hugging Face Model Hub, alongside utilities for perplexity evaluation and batch text generation with customizable sampling parameters.
7,598 stars. No commits in the last 6 months.
Stars
7,598
Forks
1,694
Language
Python
License
MIT
Category
Last pushed
Apr 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Morizeyao/GPT2-Chinese"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
graykode/gpt-2-Pytorch
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
imcaspar/gpt2-ml
GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
gyunggyung/KoGPT2-FineTuning
🔥 Korean GPT-2, KoGPT2 FineTuning cased. 한국어 가사 데이터 학습 🔥
liucongg/GPT2-NewsTitle
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。
lipiji/Guyu
Chinese GPT2: pre-training and fine-tuning framework for text generation