dcarpintero/wikisearch
Multilingual Semantic Search with Reranking on a prepared large vectorized dataset comprising 10 million Wikipedia documents. It supports dense retrieval, keyword search, and hybrid search.
Implements a three-stage retrieval pipeline combining Weaviate vector database queries (BM25, dense, hybrid) with Cohere's rerank and generation APIs to progressively refine search results and synthesize answers. The architecture supports multiple languages through language-filtered embeddings and includes exponential backoff retry logic for API resilience, deployed as a Streamlit web application.
No commits in the last 6 months.
Stars
15
Forks
1
Language
Python
License
MIT
Category
Last pushed
Nov 07, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/dcarpintero/wikisearch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AmenRa/retriv
A Python Search Engine for Humans 🥸
raphaelsty/cherche
Neural Search
gnes-ai/gnes
GNES is Generic Neural Elastic Search, a cloud-native semantic search system based on deep...
AKSW/sante
The Ontology, Dataset and Knowledge Search Engine
eswar-7116/wiki-semantic-crawler
A Semantic A* Pathfinding agent that navigates Wikipedia using high-dimensional vector space....