rustyneuron01/Conversation-Genome-Project
Structured data & semantic tagging pipeline. Turns raw text (conversations, web pages, surveys) into tagged data for AI and search. Coordinators set ground truth; workers run LLM inference on windows. Scoring via cosine similarity. Python, FastAPI, OpenAI/Anthropic/OpenRouter, embeddings, Docker.
Implements a distributed coordinator-worker architecture where coordinators establish semantic ground truth via full-document embeddings, then score worker outputs using multi-method cosine similarity (weighted mean/median/max of top-3 matches with tag overlap penalties). Supports pluggable LLM backends (OpenAI, Anthropic, OpenRouter, Groq, Chutes) with PyTorch embeddings and integrates with Weights & Biases for experiment tracking and custom FastAPI conversation servers for private data pipelines.
Stars
23
Forks
8
Language
Python
License
MIT
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/rustyneuron01/Conversation-Genome-Project"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ob-labs/ChatBot
ChatBot, show how to implement a RAG based on OceanBase or OceanBase seekdb AI capabilities...
pmbstyle/Alice
Alice is a voice-first desktop AI assistant application built with Vue.js, Vite, and Electron....
stackitcloud/rag-template
Template for AI chatbots & document management using Retrieval-Augmented Generation with vector...
GGyll/condo_gpt
An intelligent assistant for querying and analyzing real estate condo data in Miami.
zaldivards/ContextQA
ContextQA - The open-source tool for data-driven conversations