nomic-ai/contrastors

Train Models Contrastively in Pytorch

/ 100

Emerging

Leverages Flash Attention custom kernels and GradCache to enable memory-efficient training at scale, supporting multi-GPU distributed training with large batch sizes. Integrates with Hugging Face model hubs (BERT, Pythia, ViT via timm) and implements advanced techniques including Matryoshka Representation Learning, CLIP/LiT-style objectives, and MLM pretraining. Includes streaming data ingestion from cloud storage (Cloudflare R2) with byte-offset indexing for datasets exceeding available memory, demonstrated through production embeddings like Nomic Embed.

777 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

777

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

mims-harvard/ClinVec

ClinVec: Unified Embeddings of Clinical Codes Enable Knowledge-Grounded AI in Medicine

NYUMedML/DeepEHR

Chronic Disease Prediction Using Medical Notes

mims-harvard/SHEPHERD

SHEPHERD: Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases

JohnGiorgi/DeCLUTR

The corresponding code from our paper "DeCLUTR: Deep Contrastive Learning for Unsupervised...

biocentral/biocentral_server

Compute functionality for biocentral.

Explore Embedding Tools

All categories Trending Embeddings directory Insights