Santosh-Gupta/SpeedTorch

Library for faster pinned CPU <-> GPU transfer in Pytorch

/ 100

Established

Leverages CuPy pinned CPU tensors for 3.1x faster CPU→GPU and 410x faster GPU→CPU transfers compared to PyTorch pinned tensors, with speed scaling based on data size and CPU core count. Includes factory classes for model/optimizer management with flexible tensor placement (GPU CUDA, pinned CPU, or standard CPU) and specialized support for sparse embedding training by hosting idle parameters on CPU RAM. Integrates directly with PyTorch's optimizer ecosystem, enabling previously incompatible optimizers (Adam, RMSprop, AdamW) for sparse gradient operations.

683 stars and 103 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 14 / 25

How are scores calculated?

Stars

683

Forks

Language

Python

License

MIT

Category

pretrained-embedding-models

Last pushed

Feb 21, 2020

Monthly downloads

103

Commits (30d)

Dependencies

GitHub PyPI

Pretrained Embedding Models · 45 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/Santosh-Gupta/SpeedTorch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related tools

MinishLab/model2vec

Fast State-of-the-Art Static Embeddings

AnswerDotAI/ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

twang2218/vocab-coverage

语言模型中文认知能力分析

Embedding/Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

tensorflow/hub

A library for transfer learning by reusing parts of TensorFlow models.

Explore Embedding Tools

All categories Trending Embeddings directory Insights