emiruz/textextract

textextract is a tiny library (87 lines of Go) that identifies where the article content is in a HTML page (as opposed to navigation, headers, footers, ads, etc), extracts it and returns it as a string. Like Boilerpipe but for Go in Go.

26
/ 100
Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 9 / 25
Community 12 / 25

How are scores calculated?

Stars

11

Forks

2

Language

Go

License

MIT

Category

go-nlp-libraries

Last pushed

Oct 15, 2018

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/emiruz/textextract"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.