ymcui/Chinese-BERT-wwm

Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）

/ 100

Established

Implements whole word masking at the Chinese word level rather than character level, using the HarBin LTP segmentation tool to mask complete words during pretraining instead of randomly masking individual subwords. Provides multiple model variants including BERT-wwm, RoBERTa-wwm-ext, and compressed versions (RBT3-6, RBTL3) trained on 5.4B tokens of extended Chinese corpora, all compatible with Hugging Face Transformers and PaddleHub for direct integration into downstream NLP tasks.

10,184 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

10,184

Forks

1,393

Language

Python

License

Apache-2.0

Related tools

sileod/tasknet

Easy modernBERT fine-tuning and multi-task learning

codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

920232796/bert_seq2seq

pytorch实现 Bert 做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持t5模型，支持GPT2进行文章续写。

JayYip/m3tl

BERT for Multitask Learning

graykode/toeicbert

TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.

Explore NLP Tools

All categories Trending NLP directory Insights