pnpnpn/dna2vec
dna2vec: Consistent vector representations of variable-length k-mers
Employs a multi-k approach to train embeddings across variable k-mer lengths (k=3 to k=8) simultaneously, capturing sequence patterns at multiple scales. Implements Word2Vec-style distributed representations accessible via a `MultiKModel` interface that supports vector lookup and cosine similarity queries. Includes pretrained models on human genome (hg38) and provides YAML-based configuration for training on custom genomic FASTA datasets.
192 stars. No commits in the last 6 months.
Stars
192
Forks
62
Language
Python
License
MIT
Category
Last pushed
Jun 21, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/pnpnpn/dna2vec"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/text2vec
text2vec, text to vector....
ddangelov/Top2Vec
Top2Vec learns jointly embedded topic, document and word vectors.
predict-idlab/pyRDF2Vec
đ Python Implementation and Extension of RDF2Vec
IntuitionEngineeringTeam/chars2vec
Character-based word embeddings model based on RNN for handling real world texts
IITH-Compilers/IR2Vec
Implementation of IR2Vec, LLVM IR Based Scalable Program Embeddings