rririanto/unstructured-demo-streamlit
Extract your docs (CSV, PDF, JSON, HTML, DOCS, Sheets and more) for your own GPT and LLM projects using Unstructured.io via streamlit
This tool helps you convert complex documents like PDFs, HTML, or spreadsheets into a clean, organized format for use with AI systems. You upload your files, and it processes them, making the information easily usable for building your own custom AI assistants or data analysis tools. Anyone working with diverse document types who wants to leverage them for AI-driven insights will find this useful.
No commits in the last 6 months.
Use this if you need to extract text and data from a variety of document types to prepare them for use in large language models or other AI applications.
Not ideal if you primarily need to extract data from highly structured databases or require complex, rules-based data transformations beyond simple text extraction.
Stars
8
Forks
—
Language
Python
License
—
Category
Last pushed
Aug 01, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/rririanto/unstructured-demo-streamlit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NanoNets/docstrange
Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple...
th1nhhdk/local_ai_ocr
An local, offline (after initial setup), portable OCR software that can process images and PDF...
Dicklesworthstone/llm_aided_ocr
Enhances Tesseract OCR output using LLMs (local or API) for error correction, smart chunking,...
emcf/thepipe
Get clean data from tricky documents, powered by vision-language models ⚡
langstruct-ai/langstruct
Extract structured data from any content using LLMs.