saidsef/tika-document-to-text
Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows
Built on Apache Tika's multi-format parser, this containerized implementation provides a unified REST API and web UI for extracting text and metadata from 1000+ file types through a single interface. Deployment leverages Kubernetes with Kustomize manifests and optional ArgoCD GitOps integration, exposing functionality via JSON-based HTTP endpoints for programmatic access. The solution targets content pipelines requiring scalable document processing, including search indexing, OCR workflows, and downstream NLP/translation tasks.
Stars
5
Forks
3
Language
JavaScript
License
MIT
Category
Last pushed
Mar 15, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/saidsef/tika-document-to-text"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
whitphx/streamlit-stt-app
Real time web based Speech-to-Text app with Streamlit
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to...
declare-lab/jamify
JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
hipnologo/EchoForge_Studio
Multi-LLM writing and voice production workspace built with Streamlit.
SiddhantSadangi/st_deepgram_playground
API playground for Deepgram built with Streamlit