pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Built on a scalable Rust engine using Differential Dataflow, Pathway executes Python code with incremental computation capabilities, enabling multithreading and distributed processing beyond Python's typical constraints. It unifies batch and streaming workflows with a single codebase that handles late/out-of-order data through automatic consistency management, and includes dedicated LLM tooling with vector indexing, embeddings, and integrations with LangChain and LlamaIndex for RAG applications. Connectors span Kafka, PostgreSQL, GDrive, SharePoint, and 300+ sources via Airbyte, with state persistence for fault-tolerant pipeline recovery.
60,697 stars and 9,939 monthly downloads. Used by 2 other packages. Actively maintained with 52 commits in the last 30 days. Available on PyPI.
Stars
60,697
Forks
1,610
Language
Python
License
—
Category
Last pushed
Mar 19, 2026
Monthly downloads
9,939
Commits (30d)
52
Dependencies
38
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/pathwaycom/pathway"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.