aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai
This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.
No commits in the last 6 months.
Stars
15
Forks
2
Language
Jupyter Notebook
License
MIT-0
Category
Last pushed
Aug 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dorarad/gansformer
Generative Adversarial Transformers
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone.
zinengtang/TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)