All NLP Tools

11,856 tools ranked by quality score

Showing 1–100 of 11,856
# Tool Score Tier
1 explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

95
Verified
2 PyThaiNLP/pythainlp

Thai natural language processing in Python

93
Verified
3 urchade/GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any...

92
Verified
4 sloria/TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech...

91
Verified
5 nltk/nltk

NLTK Source

90
Verified
6 chrismattmann/tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing...

87
Verified
7 textlint/textlint

textlint is the pluggable linter for natural language text.

86
Verified
8 deepdoctection/deepdoctection

A Repo For Document AI

85
Verified
9 stanfordnlp/stanza

Stanford NLP Python library for tokenization, sentence segmentation, NER,...

84
Verified
10 google/sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.

84
Verified
11 miso-belica/sumy

Module for automatic summarization of text documents and HTML pages.

83
Verified
12 robocorp/rpaframework

Collection of open-source libraries and tools for Robotic Process Automation...

82
Verified
13 google/langextract

A Python library for extracting structured information from unstructured...

79
Verified
14 flairNLP/flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

78
Verified
15 deanmalmgren/textract

extract text from any document. no muss. no fuss.

78
Verified
16 spencermountain/compromise

modest natural-language processing

78
Verified
17 jxmorris12/language_tool_python

a free python grammar checker 📝✅

77
Verified
18 hankcs/HanLP

Natural Language Processing for the next decade. Tokenization,...

77
Verified
19 CAMeL-Lab/camel_tools

A suite of Arabic natural language processing tools developed by the CAMeL...

77
Verified
20 NPC-Worldwide/npcpy

The python library for research and development in NLP, multimodal LLMs,...

77
Verified
21 unitaryai/detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic...

76
Verified
22 EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a...

76
Verified
23 gunthercox/chatterbot-corpus

A multilingual dialog corpus

76
Verified
24 chatopera/Synonyms

:herb: 中文近义词:聊天机器人,智能问答工具包

76
Verified
25 lovit/soynlp

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

75
Verified
26 isaacus-dev/semchunk

A fast, lightweight and easy-to-use Python library for splitting text into...

74
Verified
27 huggingface/setfit

Efficient few-shot learning with Sentence Transformers

74
Verified
28 flairNLP/fundus

A very simple news crawler with a funny name

73
Verified
29 vi3k6i5/flashtext

Extract Keywords from sentence or Replace keywords in sentences.

72
Verified
30 estnltk/estnltk

Open source tools for Estonian natural language processing

72
Verified
31 JoeanAmier/XHS-Downloader

小红书(XiaoHongShu、RedNote)链接提取/作品采集工具:提取账号发布、收藏、点赞、专辑作品链接;提取搜索结果作品、用户链接;采集小红书作品...

72
Verified
32 cltk/cltk

The Classical Language Toolkit

72
Verified
33 google/langfun

OO for LLMs

71
Verified
34 kenlimmj/rouge

A Javascript implementation of the Recall-Oriented Understudy for Gisting...

71
Verified
35 hplt-project/sacremoses

Python port of Moses tokenizer, truecaser and normalizer

71
Verified
36 languagetool-org/languagetool

Style and Grammar Checker for 25+ Languages

71
Verified
37 dkpro/dkpro-cassis

UIMA CAS processing library written in Python

70
Verified
38 JohnSnowLabs/spark-nlp

State of the Art Natural Language Processing

70
Verified
39 grobidOrg/grobid

A machine learning software for extracting information from scholarly documents

70
Verified
40 forzagreen/n2words

Convert numerical numbers to written numbers, in 52+ languages.

70
Verified
41 bab2min/kiwipiepy

Python API for Kiwi

70
Verified
42 dongrixinyu/JioNLP

中文 NLP 预处理、解析工具包,准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package...

70
Verified
43 666ghj/BettaFish

微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

70
Verified
44 HIT-SCIR/ltp

Language Technology Platform

69
Established
45 hellohaptik/chatbot_ner

chatbot_ner: Named Entity Recognition for chatbots.

69
Established
46 aphp/edsnlp

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering...

69
Established
47 ChenghaoMou/text-dedup

All-in-one text de-duplication

69
Established
48 acl-org/acl-anthology

Data and software for building the ACL Anthology.

69
Established
49 chatopera/efaqa-corpus-zh

❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库

68
Established
50 zjunlp/DeepKE

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

68
Established
51 discopy/discopy

The Python toolkit for computing with string diagrams.

68
Established
52 MantisAI/nervaluate

Full named-entity (i.e., not tag/token) evaluation metrics based on SemEval’13

68
Established
53 Alir3z4/python-stop-words

Get list of common stop words in various languages in Python

68
Established
54 hankcs/pyhanlp

中文分词

68
Established
55 thisandagain/sentiment

AFINN-based sentiment analysis for Node.js.

68
Established
56 OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

68
Established
57 goodmami/wn

A modern, interlingual wordnet interface for Python

68
Established
58 adbar/htmldate

Fast and robust date extraction from web pages, with Python or on the command-line

67
Established
59 CUNY-CL/wikipron

Massively multilingual pronunciation mining

67
Established
60 jacksonllee/pycantonese

Cantonese Linguistics and NLP

67
Established
61 huggingface/neuralcoref

✨Fast Coreference Resolution in spaCy with Neural Networks

67
Established
62 anoopkunchukuttan/indic_nlp_library

Resources and tools for Indian language Natural Language Processing

67
Established
63 allenai/scispacy

A full spaCy pipeline and models for scientific/biomedical documents.

67
Established
64 apache/opennlp

Apache OpenNLP

67
Established
65 MIND-Lab/OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and...

67
Established
66 gunthercox/mathparse

A Python library for evaluating natural language mathematical equations

66
Established
67 DataFog/datafog-python

Python SDK for PII detection and redaction in text and images, combining...

66
Established
68 i-dot-ai/themefinder

A topic modelling Python package for analysing one-to-many question-answer data.

66
Established
69 undertheseanlp/underthesea

Underthesea - Vietnamese NLP Toolkit

66
Established
70 ziqizhang/jate

JATE - Just Automatic Term Extraction (in Python)

66
Established
71 facebookresearch/stopes

A library for preparing data for machine translation research (monolingual...

66
Established
72 codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

65
Established
73 blmoistawinde/HarvestText

文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法

65
Established
74 fastnlp/fastNLP

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

65
Established
75 go-ego/gse

Go efficient multilingual NLP and text segmentation; support English,...

65
Established
76 rmovva/HypotheSAEs

HypotheSAEs: hypothesizing interpretable relationships in text datasets...

65
Established
77 segment-any-text/wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust,...

65
Established
78 baidu/lac

百度NLP:分词,词性标注,命名实体识别,词重要性

65
Established
79 dsfsi/textaugment

TextAugment: Text Augmentation Library

65
Established
80 ownthink/Jiagu

Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类

64
Established
81 vmenger/deduce

Deduce: de-identification method for Dutch medical text

64
Established
82 quanteda/quanteda

An R package for the Quantitative Analysis of Textual Data

64
Established
83 angelosalatino/cso-classifier

Python library that classifies content from scientific papers with the...

64
Established
84 Tiiiger/bert_score

BERT score for text generation

64
Established
85 fhamborg/news-please

news-please - an integrated web crawler and information extractor for news...

64
Established
86 NatLibFi/Annif

Annif is a multi-algorithm automated subject indexing tool for libraries,...

64
Established
87 Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

64
Established
88 titipata/pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset

64
Established
89 malaysia-ai/malaya

Natural Language Toolkit for Malaysian language, https://malaya.readthedocs.io/

64
Established
90 MAIF/melusine

📧 Melusine: Use python to automatize your email processing workflow

64
Established
91 taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

63
Established
92 chartbeat-labs/textacy

NLP, before and after spaCy

63
Established
93 wooorm/franc

Natural language detection

63
Established
94 hyunwoongko/kss

KSS: Korean String processing Suite

63
Established
95 princeton-nlp/SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings...

63
Established
96 stair-lab/kg-gen

[NeurIPS '25] Knowledge Graph Generation from Any Text

63
Established
97 alvations/pywsd

Python Implementations of Word Sense Disambiguation (WSD) Technologies.

63
Established
98 davidsbatista/BREDS

"Bootstrapping Relationship Extractors with Distributional Semantics"...

62
Established
99 OmkarPathak/pyresparser

A simple resume parser used for extracting information from resumes

62
Established
100 hunspell/hunspell

The most popular spellchecking library.

62
Established
1 2 3 117 118 119 Next »