curiosity-ai/catalyst
🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's design, it brings pre-trained models, out-of-the box support for training word and document embeddings, and flexible entity recognition models.
Built on pure C# with .NET Standard 2.0 support, it achieves >1M tokens/second through regex-free tokenization and supports multiple NER approaches (gazetteer, rule-based patterns, and perceptron models). Models serialize efficiently via MessagePack and integrate with FastText/StarSpace for embedding training, plus companion libraries for HNSW similarity search and UMAP dimensionality reduction, with language-specific models distributed as modular NuGet packages trained on Universal Dependencies data.
836 stars. Actively maintained with 7 commits in the last 30 days.
Stars
836
Forks
83
Language
C#
License
MIT
Category
Last pushed
Mar 09, 2026
Commits (30d)
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/curiosity-ai/catalyst"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
Azure/azure-search-vector-samples
A repository of code samples for Vector search capabilities in Azure AI Search.
supabase/embeddings-generator
GitHub Action to generate embeddings from the markdown files in your repository.
vector-ai/vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data...
wagtail/wagtail-vector-index
Store Wagtail pages & Django models as embeddings in vector databases
kelindar/search
Go library for embedded vector search and semantic embeddings using llama.cpp