425776024/nlpcda
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
Provides nine distinct augmentation strategies for Chinese NLP—including entity/synonym/homophone replacement, character deletion and transposition, and generative methods via SimBERT—while preserving semantic meaning through targeted filtering (e.g., dates/numbers remain unchanged). Offers specialized support for NER tasks in BIO format, back-translation augmentation via Baidu/Google APIs, and integrates custom lexicon injection via jieba tokenizer. Designed to improve model generalization and robustness across classification, NER, and retrieval tasks without sacrificing label integrity.
1,878 stars and 405 monthly downloads. Used by 1 other package. No commits in the last 6 months. Available on PyPI.
Stars
1,878
Forks
172
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2025
Monthly downloads
405
Commits (30d)
0
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/425776024/nlpcda"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
dsfsi/textaugment
TextAugment: Text Augmentation Library
searchableai/KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
SanghunYun/UDA_pytorch
UDA(Unsupervised Data Augmentation) implemented by pytorch
google-research/uda
Unsupervised Data Augmentation (UDA)
toriving/KoEDA
Korean Easy Data Augmentation