425776024/nlpcda

一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda

/ 100

Established

Provides nine distinct augmentation strategies for Chinese NLP—including entity/synonym/homophone replacement, character deletion and transposition, and generative methods via SimBERT—while preserving semantic meaning through targeted filtering (e.g., dates/numbers remain unchanged). Offers specialized support for NER tasks in BIO format, back-translation augmentation via Baidu/Google APIs, and integrates custom lexicon injection via jieba tokenizer. Designed to improve model generalization and robustness across classification, NER, and retrieval tasks without sacrificing label integrity.

1,878 stars and 405 monthly downloads. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 17 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

1,878

Forks

172

Language

Python

License

Apache-2.0

Compare

nlpcda and EDA_NLP_for_Chinese nlpcda and nlp-data-augmentation

Related tools

dsfsi/textaugment

TextAugment: Text Augmentation Library

searchableai/KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

SanghunYun/UDA_pytorch

UDA(Unsupervised Data Augmentation) implemented by pytorch

google-research/uda

Unsupervised Data Augmentation (UDA)

toriving/KoEDA

Korean Easy Data Augmentation

Explore NLP Tools

All categories Trending NLP directory Insights