clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

45
/ 100
Emerging

Donut combines a vision encoder (Swin Transformer backbone) with an autoregressive decoder to jointly perform document image understanding and structured text generation end-to-end. SynthDoG generates multilingual synthetic training data (English, Chinese, Japanese, Korean) to enable flexible pre-training across diverse document types and languages without requiring real annotated datasets. The model integrates seamlessly with Hugging Face's transformers library and achieves state-of-the-art results on document parsing, classification, and VQA tasks with sub-second inference.

6,815 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

6,815

Forks

554

Language

Python

License

MIT

Last pushed

Jul 11, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/clovaai/donut"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.