imcaspar/gpt2-ml

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

51
/ 100
Established

Implements simplified training scripts based on Grover architecture with TPU support, and adapts BERT's tokenizer for multilingual corpus compatibility using CLUE vocabulary. Provides two 1.5B Chinese pretrained checkpoints trained on 15-30GB corpora with different tokenization schemes (BERT vs. CLUE tokens), optimized via Cloud TPU Pod for production-ready text generation tasks.

1,703 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

1,703

Forks

330

Language

Python

License

Apache-2.0

Last pushed

May 22, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/imcaspar/gpt2-ml"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.