emschwartz/html-to-text-comparison
Comparing Rust crates for extracting text from HTML
This tool helps developers evaluate different Rust libraries designed to extract plain text from HTML content. It takes a URL as input, then downloads the webpage and processes its HTML through multiple text extraction libraries. The output includes performance metrics (memory and time usage) and the extracted text from each library, helping you choose the most suitable one for your specific application.
No commits in the last 6 months.
Use this if you are a Rust developer building an application that needs to reliably convert HTML web pages into clean, readable plain text, such as for search indexing or LLM processing.
Not ideal if you are looking for a ready-to-use, end-user application to extract text from a single webpage without programming.
Stars
11
Forks
—
Language
Rust
License
—
Category
Last pushed
Jan 22, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/emschwartz/html-to-text-comparison"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Anush008/fastembed-rs
Rust library for vector embeddings and reranking.
huggingface/text-embeddings-inference
A blazing fast inference solution for text embeddings models
MinishLab/model2vec-rs
Official Rust Implementation of Model2Vec
finalfusion/finalfusion-rust
finalfusion embeddings in Rust
finalfusion/finalfusion-python
Finalfusion embeddings in Python