Morizeyao/GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

51
/ 100
Established

Supports multiple tokenization strategies (character-level, word-level, and BPE) and integrates with HuggingFace Transformers, enabling training on diverse Chinese text domains from classical poetry to novels with configurable model depth and batch sizes. Includes pre-trained models for specialized domains (ancient Chinese, lyrics, couplets) available on Hugging Face Model Hub, alongside utilities for perplexity evaluation and batch text generation with customizable sampling parameters.

7,598 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

7,598

Forks

1,694

Language

Python

License

MIT

Last pushed

Apr 25, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/Morizeyao/GPT2-Chinese"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.