joanrod/ocr-vqgan

OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 1 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Category

clip-vision-language

Last pushed

Jan 30, 2023

Commits (30d)

GitHub

Clip Vision Language · 5 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/joanrod/ocr-vqgan"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

OBA-Research/VAAS

VAAS is an inference-first, research-driven library for image integrity analysis. It integrates...

deepmancer/clip-object-detection

Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.

ABaldrati/CLIP4Cir

[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented...

IvanAer/G-Universal-CLIP

4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level...

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights