pnpnpn/dna2vec

dna2vec: Consistent vector representations of variable-length k-mers

48
/ 100
Emerging

Employs a multi-k approach to train embeddings across variable k-mer lengths (k=3 to k=8) simultaneously, capturing sequence patterns at multiple scales. Implements Word2Vec-style distributed representations accessible via a `MultiKModel` interface that supports vector lookup and cosine similarity queries. Includes pretrained models on human genome (hg38) and provides YAML-based configuration for training on custom genomic FASTA datasets.

192 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

192

Forks

62

Language

Python

License

MIT

Last pushed

Jun 21, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/pnpnpn/dna2vec"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.