nlpcda and nlp-data-augmentation
These are competitors: both provide Chinese NLP data augmentation functionality with overlapping techniques (EDA, BERT-based augmentation), but nlpcda is significantly more mature and widely adopted (6x more stars, active downloads vs. abandoned project).
About nlpcda
425776024/nlpcda
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
Provides nine distinct augmentation strategies for Chinese NLP—including entity/synonym/homophone replacement, character deletion and transposition, and generative methods via SimBERT—while preserving semantic meaning through targeted filtering (e.g., dates/numbers remain unchanged). Offers specialized support for NER tasks in BIO format, back-translation augmentation via Baidu/Google APIs, and integrates custom lexicon injection via jieba tokenizer. Designed to improve model generalization and robustness across classification, NER, and retrieval tasks without sacrificing label integrity.
About nlp-data-augmentation
quincyliang/nlp-data-augmentation
Data Augmentation for NLP. NLP数据增强
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work