eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

/ 100

Emerging

Parameterizes image generation using FFT, DWT (wavelets), or direct RGB optimization—avoiding GANs entirely—enabling high-resolution outputs (fullHD/4K+) with stable, controllable synthesis. Supports multi-modal queries combining text prompts, image references, style descriptions, and negative prompts with weighted syntax, plus continuous video generation via frame interpolation with optional depth-based 3D effects. Integrates with multiple CLIP vision models (ViT and ResNet variants) and includes experimental aesthetic loss and progressive learning rate strategies for compositional control.

789 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

789

Forks

104

Language

Python

License

MIT

Higher-rated alternatives

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

AssemblyAI-Community/MinImagen

MinImagen: A minimal implementation of the Imagen text-to-image model

Explore Diffusion Models

All categories Trending Diffusion directory Insights