nomic-ai/contrastors
Train Models Contrastively in Pytorch
Leverages Flash Attention custom kernels and GradCache to enable memory-efficient training at scale, supporting multi-GPU distributed training with large batch sizes. Integrates with Hugging Face model hubs (BERT, Pythia, ViT via timm) and implements advanced techniques including Matryoshka Representation Learning, CLIP/LiT-style objectives, and MLM pretraining. Includes streaming data ingestion from cloud storage (Cloudflare R2) with byte-offset indexing for datasets exceeding available memory, demonstrated through production embeddings like Nomic Embed.
777 stars. No commits in the last 6 months.
Stars
777
Forks
65
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 26, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/nomic-ai/contrastors"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mims-harvard/ClinVec
ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine
NYUMedML/DeepEHR
Chronic Disease Prediction Using Medical Notes
mims-harvard/SHEPHERD
SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
JohnGiorgi/DeCLUTR
The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised...
biocentral/biocentral_server
Compute functionality for biocentral.