dgarnitz/vectorflow
VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of your choice.
Implements fault-tolerant batch processing with RabbitMQ queuing and PostgreSQL job tracking, supporting pluggable embedding models (OpenAI, HuggingFace Sentence Transformers) and vector databases (Pinecone, Qdrant, Weaviate). Provides flexible document chunking strategies (exact, paragraph, sentence, custom) with configurable overlap, and includes a Python client library for programmatic access. Designed for Kubernetes deployment with Docker Compose setup including MinIO object storage and automatic database schema initialization.
698 stars. No commits in the last 6 months.
Stars
698
Forks
51
Language
Python
License
Apache-2.0
Category
Last pushed
May 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/dgarnitz/vectorflow"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
curiosity-ai/catalyst
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...
Azure/azure-search-vector-samples
A repository of code samples for Vector search capabilities in Azure AI Search.
supabase/embeddings-generator
GitHub Action to generate embeddings from the markdown files in your repository.
vector-ai/vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data...
yusufhilmi/client-vector-search
A client side vector search library that can embed, store, search, and cache vectors. Works on...