DiceTechJobs/ConceptualSearch

Train a Word2Vec model or LSA model, and Implement Conceptual Search\Semantic Search in Solr\Lucene - Simon Hughes Dice.com, Dice Tech Jobs

48
/ 100
Emerging

Trains Word2Vec or LSA models on domain-specific documents using gensim, then generates Solr synonym files (with optional payload weighting) or vector clusters to enable semantic matching beyond keyword queries. The pipeline includes preprocessing, optional keyphrase extraction, model training, and multiple output formats—synonyms for query expansion or clustered vectors for field-based conceptual search without requiring custom plugins. Integrates directly with Apache Solr via synonym filters and optional custom plugins (PayloadEdismax, PayloadAwareDefaultSimilarity) for weighted semantic retrieval, or works with any search engine supporting synonym files.

259 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

259

Forks

54

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Apr 26, 2019

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/DiceTechJobs/ConceptualSearch"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.