google-research/pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Archived

/ 100

Emerging

Unifies vision tasks (detection, segmentation, captioning, keypoint detection) through a single sequence-generation framework built on encoder-decoder transformers with pluggable diffusion or autoregressive decoders. Implements FitTransformer as an optional backbone and includes TPU/GPU optimization via TensorFlow 2, with pretrained checkpoints across ResNet and ViT architectures available on Google Cloud Storage.

939 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

939

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

NVIDIA/pix2pixHD

Synthesizing and manipulating 2048x1024 images with conditional GANs

GaParmar/clean-fid

PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]

albertpumarola/GANimation

GANimation: Anatomically-aware Facial Animation from a Single Image (ECCV'18 Oral) [PyTorch]

yuanming-hu/exposure

Learning infinite-resolution image processing with GAN and RL from unpaired image datasets,...

yiranran/APDrawingGAN

Code for APDrawingGAN: Generating Artistic Portrait Drawings from Face Photos with Hierarchical...

Explore Diffusion Models

All categories Trending Diffusion directory Insights