artitw/text2text

Text2Text Language Modeling Toolkit

/ 100

Established

Provides unified APIs for multilingual text processing tasks including LLM inference (with streaming and structured output support), embeddings, semantic search via TF-IDF/BM25 indexing, machine translation, and data augmentation through back-translation. Built on transformer-based models with pluggable pretrained backends (defaulting to Facebook's M2M-100), operating efficiently on commodity hardware and free Colab resources. Exposes sub-word tokenization, language identification, and edit-distance calculations alongside a conversational assistant interface compatible with OpenAI's chat completion schema.

303 stars and 606 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m

Maintenance 0 / 25

Adoption 16 / 25

Maturity 18 / 25

Community 18 / 25

How are scores calculated?

Stars

303

Forks

Language

Python

License

—

Related tools

Azure-Samples/azure-ai-document-processing-samples

A collection of samples demonstrating techniques for processing documents with Azure AI...

build-on-aws/langchain-embeddings

This repository demonstrates the construction of a state-of-the-art multimodal search engine,...

aiplanethub/beyondllm

Build, evaluate and observe LLM apps

cofin/mogemma

🔥 Python / Mojo Interface for Google Gemma 3

qianniuspace/llm_notebooks

AI 应用示例合集

Explore Embedding Tools

All categories Trending Embeddings directory Insights