clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Donut combines a vision encoder (Swin Transformer backbone) with an autoregressive decoder to jointly perform document image understanding and structured text generation end-to-end. SynthDoG generates multilingual synthetic training data (English, Chinese, Japanese, Korean) to enable flexible pre-training across diverse document types and languages without requiring real annotated datasets. The model integrates seamlessly with Hugging Face's transformers library and achieves state-of-the-art results on document parsing, classification, and VQA tasks with sub-second inference.
6,815 stars. No commits in the last 6 months.
Stars
6,815
Forks
554
Language
Python
License
MIT
Category
Last pushed
Jul 11, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/clovaai/donut"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
deepdoctection/deepdoctection
A Repo For Document AI
deanmalmgren/textract
extract text from any document. no muss. no fuss.
eikek/docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources...
zzzDavid/ICDAR-2019-SROIE
ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction
axa-group/Parsr
Transforms PDF, Documents and Images into Enriched Structured Data