eps696/aphantasia
CLIP + FFT/DWT/RGB = text to image/video
Parameterizes image generation using FFT, DWT (wavelets), or direct RGB optimization—avoiding GANs entirely—enabling high-resolution outputs (fullHD/4K+) with stable, controllable synthesis. Supports multi-modal queries combining text prompts, image references, style descriptions, and negative prompts with weighted syntax, plus continuous video generation via frame interpolation with optional depth-based 3D effects. Integrates with multiple CLIP vision models (ViT and ResNet variants) and includes experimental aesthetic loss and progressive learning rate strategies for compositional control.
789 stars. No commits in the last 6 months.
Stars
789
Forks
104
Language
Python
License
MIT
Category
Last pushed
Feb 13, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/eps696/aphantasia"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVlabs/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
FoundationVision/VAR
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈]...
nerdyrodent/VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
huggingface/finetrainers
Scalable and memory-optimized training of diffusion models
AssemblyAI-Community/MinImagen
MinImagen: A minimal implementation of the Imagen text-to-image model