emiruz/textextract

textextract is a tiny library (87 lines of Go) that identifies where the article content is in a HTML page (as opposed to navigation, headers, footers, ads, etc), extracts it and returns it as a string. Like Boilerpipe but for Go in Go.

/ 100

Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 9 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

License

MIT

Category

go-nlp-libraries

Last pushed

Oct 15, 2018

Commits (30d)

GitHub

Go NLP Libraries · 77 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/emiruz/textextract"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

codingpot/kiwigo

https://github.com/bab2min/Kiwi for go

aaaton/golem

A lemmatizer implemented in Go

habeanf/yap

Yet Another (natural language) Parser

abadojack/whatlanggo

Natural language detection library for Go

ikawaha/kagome-dict

Dictionary Library for Kagome v2

Explore NLP Tools

All categories Trending NLP directory Insights