saidsef/tika-document-to-text

Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows

40
/ 100
Emerging

Built on Apache Tika's multi-format parser, this containerized implementation provides a unified REST API and web UI for extracting text and metadata from 1000+ file types through a single interface. Deployment leverages Kubernetes with Kustomize manifests and optional ArgoCD GitOps integration, exposing functionality via JSON-based HTTP endpoints for programmatic access. The solution targets content pipelines requiring scalable document processing, including search indexing, OCR workflows, and downstream NLP/translation tasks.

No Package No Dependents
Maintenance 13 / 25
Adoption 4 / 25
Maturity 9 / 25
Community 14 / 25

How are scores calculated?

Stars

5

Forks

3

Language

JavaScript

License

MIT

Last pushed

Mar 15, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/saidsef/tika-document-to-text"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.