mohammadasim98/scenetok

[CVPR '26] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

/ 100

Emerging

Encodes multi-view 3D scenes into compressed, unstructured 1D tokens via a VA-VAE image compressor chained with a Perceiver module, enabling novel-view synthesis and scene generation through rectified flow diffusion. Supports multiple VAE backends (VideoDCAE, Wan 2.2) and integrates with PyTorch Lightning for distributed training on RealEstate10K and DL3DV datasets, with Flash Attention 2 optimization for efficient inference on high-memory GPUs (24GB–96GB).

124 stars.

No Package No Dependents

Maintenance 13 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 7 / 25

How are scores calculated?

Stars

124

Forks

Language

Python

License

MIT

Higher-rated alternatives

jayin92/Skyfall-GS

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

ActiveVisionLab/gaussctrl

[ECCV 2024] GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

Tencent-Hunyuan/Hunyuan3D-2

High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.

deepseek-ai/DreamCraft3D

[ICLR 2024] Official implementation of DreamCraft3D: Hierarchical 3D Generation with...

caiyuanhao1998/Open-DiffusionGS

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D...

Explore Diffusion Models

All categories Trending Diffusion directory Insights