Santosh-Gupta/SpeedTorch
Library for faster pinned CPU <-> GPU transfer in Pytorch
Leverages CuPy pinned CPU tensors for 3.1x faster CPU→GPU and 410x faster GPU→CPU transfers compared to PyTorch pinned tensors, with speed scaling based on data size and CPU core count. Includes factory classes for model/optimizer management with flexible tensor placement (GPU CUDA, pinned CPU, or standard CPU) and specialized support for sparse embedding training by hosting idle parameters on CPU RAM. Integrates directly with PyTorch's optimizer ecosystem, enabling previously incompatible optimizers (Adam, RMSprop, AdamW) for sparse gradient operations.
683 stars and 103 monthly downloads. No commits in the last 6 months. Available on PyPI.
Stars
683
Forks
40
Language
Python
License
MIT
Category
Last pushed
Feb 21, 2020
Monthly downloads
103
Commits (30d)
0
Dependencies
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/Santosh-Gupta/SpeedTorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
MinishLab/model2vec
Fast State-of-the-Art Static Embeddings
AnswerDotAI/ModernBERT
Bringing BERT into modernity via both architecture changes and scaling
twang2218/vocab-coverage
语言模型中文认知能力分析
Embedding/Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
tensorflow/hub
A library for transfer learning by reusing parts of TensorFlow models.