aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

/ 100

Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 9 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT-0

Category

multimodal-fusion-transformers

Last pushed

Aug 04, 2025

Commits (30d)

GitHub

Multimodal Fusion Transformers · 37 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

dorarad/gansformer

Generative Adversarial Transformers

j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

zinengtang/TVLT

PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)

Explore Transformer Models

All categories Trending Transformer directory Insights