425776024/nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda

62
/ 100
Established

Provides nine distinct augmentation strategies for Chinese NLP—including entity/synonym/homophone replacement, character deletion and transposition, and generative methods via SimBERT—while preserving semantic meaning through targeted filtering (e.g., dates/numbers remain unchanged). Offers specialized support for NER tasks in BIO format, back-translation augmentation via Baidu/Google APIs, and integrates custom lexicon injection via jieba tokenizer. Designed to improve model generalization and robustness across classification, NER, and retrieval tasks without sacrificing label integrity.

1,878 stars and 405 monthly downloads. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 17 / 25
Maturity 25 / 25
Community 20 / 25

How are scores calculated?

Stars

1,878

Forks

172

Language

Python

License

Apache-2.0

Last pushed

Mar 18, 2025

Monthly downloads

405

Commits (30d)

0

Reverse dependents

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/425776024/nlpcda"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.