CLUECorpus2020 and CLUEPretrainedModels

CLUECorpus2020
53
Established
CLUEPretrainedModels
38
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 16/25
Community 17/25
Maintenance 0/25
Adoption 10/25
Maturity 8/25
Community 20/25
Stars: 1,002
Forks: 83
Downloads:
Commits (30d): 0
Language:
License: MIT
Stars: 816
Forks: 95
Downloads:
Commits (30d): 0
Language: Python
License:
No Package No Dependents
No License Stale 6m No Package No Dependents

About CLUECorpus2020

CLUEbenchmark/CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

This project offers a massive, cleaned collection of Chinese text data, perfect for training language models or generating Chinese text. It takes raw Chinese web content and refines it into a high-quality corpus, ready for use in various natural language processing applications. Data scientists, AI researchers, or developers working on Chinese language technologies would find this valuable.

Chinese NLP Language Model Training Text Generation Data Science AI Research

About CLUEPretrainedModels

CLUEbenchmark/CLUEPretrainedModels

高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型

This project provides pre-trained models specifically designed for understanding Chinese text. It takes raw Chinese text as input and helps classify content, determine sentence relationships, or find semantic similarities. The outputs are high-quality text analysis results for various tasks. This is ideal for developers and data scientists building applications that need to process and understand Chinese language data.

Chinese language processing natural language understanding text classification semantic similarity information retrieval

Scores updated daily from GitHub, PyPI, and npm data. How scores work