Indonesian NLP Resources NLP Tools

Curated collections, datasets, and resource lists specifically for Indonesian/Malay language NLP. Includes benchmark datasets, resource compilations, and toolkit libraries for Bahasa Indonesia. Does NOT include general NLP courses, application-specific projects (like sentiment analysis tools), or non-Indonesian language resources.

There are 25 indonesian nlp resources tools tracked. 1 score above 50 (established tier). The highest-rated is malaysia-ai/malaya at 64/100 with 521 stars. 1 of the top 10 are actively maintained.

Get all 25 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=indonesian-nlp-resources&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 malaysia-ai/malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/

64
Established
2 louisowen6/NLP_bahasa_resources

A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia

44
Emerging
3 IndoNLP/indonlu

The first-ever vast natural language processing benchmark for Indonesian...

44
Emerging
4 kirralabs/indonesian-NLP-resources

data resource untuk NLP bahasa indonesia

41
Emerging
5 wongnai/wongnai-corpus

Collection of Wongnai's datasets

38
Emerging
6 rizalespe/Dataset-Sentimen-Analisis-Bahasa-Indonesia

Repositori ini merupakan kumpulan dataset terkait analisis sentimen...

35
Emerging
7 kmkurn/id-pos-tagging

Indonesian part-of-speech (POS) tagging

32
Emerging
8 kmkurn/id-nlp-resource

A list of Indonesian NLP resources.

31
Emerging
9 IndoNLP/nusa-catalogue

Dataset Catalogue Homepage for Indonesian Languages

31
Emerging
10 IndoNLP/nusax

High-quality parallel resource on sentiment analysis for 10 low-resource...

30
Emerging
11 ariya/tebakmasa

Infer the date and time from the general description in Bahasa Indonesia

30
Emerging
12 yohanesgultom/nlp-experiments

Indonesian NLP experiments

28
Experimental
13 feryandi/Dataset-Artikel

Repository ini berisikan kumpulan data mentah berupa artikel dari berbagai...

27
Experimental
14 Wikidepia/indonesian_datasets

NLP Datasets for Indonesian

27
Experimental
15 Hyuto/indo-nlp

Library python sederhana tanpa dependency tambahan yang bertujuan untuk...

26
Experimental
16 ailabtelkom/id-NLP-resources

Kumpulan resource untuk pemrosesan bahasa alami Bahasa Indonesia. Segala...

25
Experimental
17 LazarusNLP/indonesian-sentence-embeddings

Embedding Representation for Indonesian Sentences!

25
Experimental
18 datascienceid/nlp-resources

A curated list of natural language processing courses, video lectures,...

25
Experimental
19 danieldanuega/spacyndo

Dependency Parser and NER model for Bahasa Indonesia Spacy 2.1

20
Experimental
20 rrayhka/indonesian-ner-spacy

Fine-tuning SpaCy for Indonesian Named Entity Recognition (NER) with custom dataset.

19
Experimental
21 irfandythalib/python-indonesia-stopwords-remover

This code is used to remove stopwords using Tala stopwords library for...

17
Experimental
22 nandanovenia/resource-nlp-indonesia

Natural Language Processing Resource for Bahasa Indonesia

15
Experimental
23 matbahasa/MALINDO_BLiMP

MALINDO BLiMP (Malay/Indonesian Benchmark of Linguistic Minimal Pairs)

15
Experimental
24 Cortana-Coders/NutriSense

NutriSense: Platform Pengukuran Gizi dengan Pemrosesan Bahasa Alami

11
Experimental
25 HantuGur/NUSANTAARA-LEARN-LANGUAGE

🌿 NusaLingua adalah platform web edukasi bahasa daerah Indonesia berbasis...

11
Experimental