joanrod/ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in Paper2Fig100k dataset. Implementation of OCR Perceptual loss for clear text-within-image generation. Fork from VQGAN in CompVis/taming-transformers
No commits in the last 6 months.
Stars
83
Forks
2
Language
Python
License
—
Category
Last pushed
Jan 30, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/joanrod/ocr-vqgan"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OBA-Research/VAAS
VAAS is an inference-first, research-driven library for image integrity analysis. It integrates...
deepmancer/clip-object-detection
Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.
ABaldrati/CLIP4Cir
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented...
IvanAer/G-Universal-CLIP
4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level...