mohammadasim98/scenetok

[CVPR '26] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

41
/ 100
Emerging

Encodes multi-view 3D scenes into compressed, unstructured 1D tokens via a VA-VAE image compressor chained with a Perceiver module, enabling novel-view synthesis and scene generation through rectified flow diffusion. Supports multiple VAE backends (VideoDCAE, Wan 2.2) and integrates with PyTorch Lightning for distributed training on RealEstate10K and DL3DV datasets, with Flash Attention 2 optimization for efficient inference on high-memory GPUs (24GB–96GB).

124 stars.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 11 / 25
Community 7 / 25

How are scores calculated?

Stars

124

Forks

4

Language

Python

License

MIT

Last pushed

Mar 18, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/mohammadasim98/scenetok"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.