zai-org/CogView2

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

/ 100

Emerging

Implements a three-stage hierarchical transformer (6B-9B-9B parameters) with custom local attention kernels for efficient token generation, featuring LoPAR acceleration and bidirectional completion via CogLM. Supports both text-to-image generation and text-guided inpainting with style control (photo, sketch, watercolor, etc.), optimized for A100 GPUs but scalable via batch size tuning. Built on SwissArmyTransformer framework with model hosting on Hugging Face Spaces and Replicate, primarily trained for Chinese/English text inputs.

955 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

955

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

VectorSpaceLab/OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

EndlessSora/focal-frequency-loss

[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

JIA-Lab-research/DreamOmni2

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing...

PKU-YuanGroup/ChronoMagic-Bench

[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of...

Explore Diffusion Models

All categories Trending Diffusion directory Insights