Llm Domain Datasets Embedding Tools

There are 6 llm domain datasets tools tracked. The highest-rated is itrummer/thalamusdb at 29/100 with 114 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=embeddings&subcategory=llm-domain-datasets&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 itrummer/thalamusdb

ThalamusDB: semantic query processing on multimodal data

29
Experimental
2 texttron/hyde

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

26
Experimental
3 ArslanKAS/Large-Language-Models-with-Semantic-Search

Explore from keyword search to dense retrieval and reranking, which injects...

24
Experimental
4 Ahren09/SciEvo

A longitudinal dataset for academic literature, including papers, metadata,...

22
Experimental
5 jzhoubu/vsearch

An Extensible Framework for Retrieval-Augmented LLM Applications: Learning...

19
Experimental
6 KRR-Oxford/LM-ontology-concept-placement

Language Model based ontology concept placement

14
Experimental