GPT2-Chinese and gpt2-ml

These tools are competitors, with the former providing a Chinese version of GPT2 training code that uses a BERT tokenizer and the latter offering GPT2 support for multiple languages, including pretrained models, making them alternative solutions for similar NLP tasks involving GPT2 in multilingual or Chinese contexts.

GPT2-Chinese
51
Established
gpt2-ml
51
Established
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Stars: 7,598
Forks: 1,694
Downloads:
Commits (30d): 0
Language: Python
License: MIT
Stars: 1,703
Forks: 330
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stale 6m No Package No Dependents
Stale 6m No Package No Dependents

About GPT2-Chinese

Morizeyao/GPT2-Chinese

Chinese version of GPT2 training code, using BERT tokenizer.

Supports multiple tokenization strategies (character-level, word-level, and BPE) and integrates with HuggingFace Transformers, enabling training on diverse Chinese text domains from classical poetry to novels with configurable model depth and batch sizes. Includes pre-trained models for specialized domains (ancient Chinese, lyrics, couplets) available on Hugging Face Model Hub, alongside utilities for perplexity evaluation and batch text generation with customizable sampling parameters.

About gpt2-ml

imcaspar/gpt2-ml

GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

Implements simplified training scripts based on Grover architecture with TPU support, and adapts BERT's tokenizer for multilingual corpus compatibility using CLUE vocabulary. Provides two 1.5B Chinese pretrained checkpoints trained on 15-30GB corpora with different tokenization schemes (BERT vs. CLUE tokens), optimized via Cloud TPU Pod for production-ready text generation tasks.

Scores updated daily from GitHub, PyPI, and npm data. How scores work