Data Augmentation NLP NLP Tools

Tools and frameworks for generating synthetic training data, augmenting existing datasets, and applying transformation techniques to improve NLP model performance. Does NOT include general data preprocessing, cleaning, or annotation tools.

There are 25 data augmentation nlp tools tracked. 2 score above 50 (established tier). The highest-rated is dsfsi/textaugment at 65/100 with 433 stars and 2,436 monthly downloads.

Get all 25 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=data-augmentation-nlp&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 dsfsi/textaugment

TextAugment: Text Augmentation Library

65
Established
2 425776024/nlpcda

一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda

55
Established
3 searchableai/KitanaQA

KitanaQA: Adversarial training and data augmentation for neural...

43
Emerging
4 google-research/uda

Unsupervised Data Augmentation (UDA)

41
Emerging
5 SanghunYun/UDA_pytorch

UDA(Unsupervised Data Augmentation) implemented by pytorch

41
Emerging
6 toriving/KoEDA

Korean Easy Data Augmentation

41
Emerging
7 AlexKay28/zarnitsa

:cloud_with_lightning: Zarnitsa package for data augmentation ops

39
Emerging
8 KennethEnevoldsen/augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

39
Emerging
9 zhanlaoban/EDA_NLP_for_Chinese

An implement of the paper of EDA for Chinese corpus.中文语料的EDA数据增强工具。NLP数据增强。论文阅读笔记。

35
Emerging
10 lancopku/text-autoaugment

[EMNLP 2021] Text AutoAugment: Learning Compositional Augmentation Policy...

33
Emerging
11 quincyliang/nlp-data-augmentation

Data Augmentation for NLP. NLP数据增强

29
Experimental
12 patrick-batman/Unsupervised-Hypothesis-Creation

unsupervised creation of contradictory, entailing sentences from a given...

26
Experimental
13 chck/AugLy-jp

Data Augmentation for Japanese Text on AugLy

26
Experimental
14 remydecoupes/GeoNLPlify

:earth_africa: :book: A NLP library for data augmentation focusing on...

25
Experimental
15 k4black/fast-aug

Fast Augmentation library for NLP

25
Experimental
16 zhaominyiz/EPiDA

Official Code for 'EPiDA: An Easy Plug-in Data Augmentation Framework for...

22
Experimental
17 kajyuuen/daaja

This repository has implementations of data augmentation for NLP for Japanese.

18
Experimental
18 ChetanMJ/NL2SQL-Data-Augmentation

Data augmentation techniques help improve performance by generating data of...

15
Experimental
19 pemagrg1/nlp-data-augmentation

Augmentating Textual Data Using NLP Libraries.

15
Experimental
20 Ritvik19/Text-Data-Augmentation

State of the Art Text Data Augmentation for Natural Language Processing Applications

14
Experimental
21 aryashah2k/NLP-Data-Augmentation

Implementing 5 Different Approaches To Augmenting Data For Natural Language...

14
Experimental
22 dheeraj7596/CONDA

Generate synthetic training data using small LMs.

13
Experimental
23 masoudMZB/Text-Wizard-Fatsapi-NLP-project

NLP Visualization/Augmentation techniques using fast api to implement.

13
Experimental
24 dextergui/NLarge

NLarge - Dataset Augmentation Tool

11
Experimental
25 sminerport/TextAugmentor

This repo offers a Python script using NLPAug library & RTT to augment text...

11
Experimental