mohammadasim98/scenetok
[CVPR '26] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
Encodes multi-view 3D scenes into compressed, unstructured 1D tokens via a VA-VAE image compressor chained with a Perceiver module, enabling novel-view synthesis and scene generation through rectified flow diffusion. Supports multiple VAE backends (VideoDCAE, Wan 2.2) and integrates with PyTorch Lightning for distributed training on RealEstate10K and DL3DV datasets, with Flash Attention 2 optimization for efficient inference on high-memory GPUs (24GB–96GB).
124 stars.
Stars
124
Forks
4
Language
Python
License
MIT
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/mohammadasim98/scenetok"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jayin92/Skyfall-GS
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
ActiveVisionLab/gaussctrl
[ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Tencent-Hunyuan/Hunyuan3D-2
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
deepseek-ai/DreamCraft3D
[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with...
caiyuanhao1998/Open-DiffusionGS
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D...