nlpcda and nlp-data-augmentation

These are competitors: both provide Chinese NLP data augmentation functionality with overlapping techniques (EDA, BERT-based augmentation), but nlpcda is significantly more mature and widely adopted (6x more stars, active downloads vs. abandoned project).

nlpcda
62
Established
nlp-data-augmentation
36
Emerging
Maintenance 0/25
Adoption 17/25
Maturity 25/25
Community 20/25
Maintenance 0/25
Adoption 10/25
Maturity 8/25
Community 18/25
Stars: 1,878
Forks: 172
Downloads: 405
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 294
Forks: 41
Downloads:
Commits (30d): 0
Language:
License:
Stale 6m
No License Stale 6m No Package No Dependents

About nlpcda

425776024/nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda

Provides nine distinct augmentation strategies for Chinese NLP—including entity/synonym/homophone replacement, character deletion and transposition, and generative methods via SimBERT—while preserving semantic meaning through targeted filtering (e.g., dates/numbers remain unchanged). Offers specialized support for NER tasks in BIO format, back-translation augmentation via Baidu/Google APIs, and integrates custom lexicon injection via jieba tokenizer. Designed to improve model generalization and robustness across classification, NER, and retrieval tasks without sacrificing label integrity.

About nlp-data-augmentation

quincyliang/nlp-data-augmentation

Data Augmentation for NLP. NLP数据增强

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work