sacdallago/bio_embeddings
Get protein embeddings from protein sequences
Supports multiple state-of-the-art language models (SeqVec, ProtTrans, ESM, UniRep) with a unified Python interface, enabling transfer learning for downstream tasks like structure/function prediction. The pipeline generates per-amino-acid and per-sequence embeddings, applies dimensionality reduction (UMAP/t-SNE), and enables both supervised and unsupervised annotation extraction. Includes a distributed webserver API and Docker deployment for reproducible, scalable workflows with GPU acceleration and automatic out-of-memory handling.
507 stars. No commits in the last 6 months.
Stars
507
Forks
70
Language
HTML
License
MIT
Category
Last pushed
Apr 28, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/sacdallago/bio_embeddings"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
BernhoferM/TMbed
Transmembrane proteins predicted through Language Model embeddings
tbepler/prose
Multi-task and masked language model-based protein sequence embedding models.
DeepRank/DeepRank-GNN-esm
Graph Network for protein-protein interface including language model features
Rostlab/VESPA
VESPA is a simple, yet powerful Single Amino Acid Variant (SAV) effect predictor based on...