zai-org/CogView2
official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
Implements a three-stage hierarchical transformer (6B-9B-9B parameters) with custom local attention kernels for efficient token generation, featuring LoPAR acceleration and bidirectional completion via CogLM. Supports both text-to-image generation and text-guided inpainting with style control (photo, sketch, watercolor, etc.), optimized for A100 GPUs but scalable via batch size tuning. Built on SwissArmyTransformer framework with model hosting on Hugging Face Spaces and Replicate, primarily trained for Chinese/English text inputs.
955 stars. No commits in the last 6 months.
Stars
955
Forks
86
Language
Python
License
Apache-2.0
Category
Last pushed
Aug 03, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/zai-org/CogView2"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
EndlessSora/focal-frequency-loss
[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis
JIA-Lab-research/DreamOmni2
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing...
PKU-YuanGroup/ChronoMagic-Bench
[NeurIPS 2024 D&B Spotlightš„] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of...