DiceTechJobs/VectorsInSearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Implements three approximate nearest neighbor search algorithms (LSH with Sim Hash, K-Means Tree, and Vector Thresholding) that encode dense vectors as inverted index queries for efficient large-scale similarity search. The approach generates boolean OR queries optimized by Lucene's Block Max WAND algorithm, with Python utilities for vector indexing and a custom Solr similarity plugin to score results based on vector proximity. Targets Solr 7.5+ and provides both the algorithmic implementations and index configuration needed to integrate semantic search into existing search infrastructure.
No commits in the last 6 months.
Stars
86
Forks
15
Language
Python
License
Apache-2.0
Category
Last pushed
May 12, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/DiceTechJobs/VectorsInSearch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
unmonoqueteclea/voilib
🎧 Podcast Search Engine. Try it now for free or run your own instance.
damiandelmas/flexvec
Composable vector search with SQL
IuriiD/pinecone-faiss-pgvector
Comparing vector DBs Pinecone, FAISS & pgvector in combination with OpenAI Embeddings for semantic search
omni-front/ConstructIQ
Semantic search API for building permits using vector embeddings, FastAPI & Pinecone
RubenGarrod/ClinicCloud
Advanced semantic search system for medical and scientific documentation using BioBERT and pgvector.