0xku/information-retrieval
Neural information retrieval / Semantic search / Bi-encoders
ArchivedComprehensive tutorial collection covering the full IR pipeline: from classical inverted index methods through modern dense retrieval architectures including bi-encoders, cross-encoders, and multilingual variants. Covers evaluation metrics (MRR, MAP, nDCG), dense representation learning from LSA to transformer finetuning, and unsupervised training approaches (TSDAE, SimCSE, GPL) that reduce labeled data requirements. Integrates BERT and Sentence-BERT frameworks with approximate nearest neighbor indexing techniques for scalable vector search across millions of documents.
174 stars. No commits in the last 6 months.
Stars
174
Forks
21
Language
Jupyter Notebook
License
—
Category
Last pushed
Aug 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/0xku/information-retrieval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
pinecone-io/pinecone-datasets
An open-source dataset library for pre-embedded dataset: create your own data catalog, or use...