ymcui/Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

50
/ 100
Established

Implements whole word masking at the Chinese word level rather than character level, using the HarBin LTP segmentation tool to mask complete words during pretraining instead of randomly masking individual subwords. Provides multiple model variants including BERT-wwm, RoBERTa-wwm-ext, and compressed versions (RBT3-6, RBTL3) trained on 5.4B tokens of extended Chinese corpora, all compatible with Hugging Face Transformers and PaddleHub for direct integration into downstream NLP tasks.

10,184 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

10,184

Forks

1,393

Language

Python

License

Apache-2.0

Last pushed

Jul 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ymcui/Chinese-BERT-wwm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.