clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

/ 100

Emerging

Donut combines a vision encoder (Swin Transformer backbone) with an autoregressive decoder to jointly perform document image understanding and structured text generation end-to-end. SynthDoG generates multilingual synthetic training data (English, Chinese, Japanese, Korean) to enable flexible pre-training across diverse document types and languages without requiring real annotated datasets. The model integrates seamlessly with Hugging Face's transformers library and achieves state-of-the-art results on document parsing, classification, and VQA tasks with sub-second inference.

6,815 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

6,815

Forks

554

Language

Python

License

MIT

Higher-rated alternatives

deepdoctection/deepdoctection

A Repo For Document AI

deanmalmgren/textract

extract text from any document. no muss. no fuss.

eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources...

zzzDavid/ICDAR-2019-SROIE

ICDAR 2019 Robust Reading Challenge on Scanned Receipts OCR and Information Extraction

axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

Explore NLP Tools

All categories Trending NLP directory Insights