saidsef/tika-document-to-text

Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows

/ 100

Emerging

Built on Apache Tika's multi-format parser, this containerized implementation provides a unified REST API and web UI for extracting text and metadata from 1000+ file types through a single interface. Deployment leverages Kubernetes with Kustomize manifests and optional ArgoCD GitOps integration, exposing functionality via JSON-based HTTP endpoints for programmatic access. The solution targets content pipelines requiring scalable document processing, including search indexing, OCR workflows, and downstream NLP/translation tasks.

No Package No Dependents

Maintenance 13 / 25

Adoption 4 / 25

Maturity 9 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

JavaScript

License

MIT

Higher-rated alternatives

whitphx/streamlit-stt-app

Real time web based Speech-to-Text app with Streamlit

open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to...

declare-lab/jamify

JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment

hipnologo/EchoForge_Studio

Multi-LLM writing and voice production workspace built with Streamlit.

SiddhantSadangi/st_deepgram_playground

API playground for Deepgram built with Streamlit

Explore Voice AI Tools

All categories Trending Voice AI directory Insights