The Data Engineering Directory
Quality-scored directory of 517 data engineering tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.
Data engineering tools for building data pipelines, ETL workflows, data quality, and data infrastructure.
34
70–100
111
50–69
259
30–49
113
10–29
Top tools by quality score
| # | Tool | Score |
|---|---|---|
| 1 |
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data... |
|
| 2 |
dagster-io/dagster
An orchestration platform for the development, production, and observation... |
|
| 3 |
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data... |
|
| 4 |
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics |
|
| 5 |
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM... |
|
| 6 |
supabase/supabase-py
Python Client for Supabase. Query Postgres from Flask, Django, FastAPI.... |
|
| 7 |
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is... |
|
| 8 |
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single... |
|
| 9 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
|
| 10 |
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data. |
|
| 11 |
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your... |
|
| 12 |
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor |
|
| 13 |
quiltdata/quilt
Quilt is a Scientific Data Management Platform on AWS that helps teams and... |
|
| 14 |
databricks/dbt-databricks
A dbt adapter for Databricks. |
|
| 15 |
debezium/debezium
Change data capture for a variety of databases. Please log issues at... |
|
| 16 |
apache/flink-cdc
Flink CDC is a streaming data integration tool |
|
| 17 |
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from... |
|
| 18 |
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform |
|
| 19 |
apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,... |
|
| 20 |
apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data... |
|
Browse by category
Data Pipeline Frameworks
264 tools
SQL Query Adapters
106 tools
Spark Hadoop Ml Pipelines
14 tools
Ml Experiment Tracking
8 tools
Natural Language Sql Builders
7 tools
Mlops Workflow Orchestration
5 tools
Ml Api Deployment
4 tools
Distributed Training Frameworks
4 tools
Csv Data Chat
4 tools
Data Quality Preprocessing
3 tools
Model Inference Serving
3 tools
Rust Tensor Frameworks
3 tools
Go Ml Bindings
3 tools
Twitter Sentiment Pipelines
3 tools
Business Intelligence Dashboards
3 tools
Semantic Search Applications
2 tools
Llm Data Labeling
2 tools
Natural Language Database Agents
2 tools
Open Source Contribution Guides
2 tools
Open Dataset Collections
2 tools
Scala Ml Frameworks
2 tools
Real Time Threat Detection
2 tools
Scikit Learn Pipelines
2 tools
Personal Portfolio Showcases
2 tools
Data Analytics Platforms
2 tools
Document Data Extraction
1 tools
Agentic Ai Frameworks
1 tools
Rust Native Vectordbs
1 tools
Ai Test Automation
1 tools
Sql Database Mcp
1 tools
Saas Ai Platforms
1 tools
Obsidian Ai Plugins
1 tools
Automl Frameworks
1 tools
Production Rag Pipelines
1 tools
Slack Mcp Servers
1 tools
Llm Data Visualization
1 tools
Langchain Tool Integrations
1 tools
Local Semantic Search
1 tools
Regional Fiscal Data
1 tools
Code Context Packaging
1 tools
Text Visualization Graphs
1 tools
Ai Workflow Automation
1 tools
Document Intelligence Extraction
1 tools
Self Hosted Rag Platforms
1 tools
Chatbot Frameworks
1 tools
Postgres Vector Rag
1 tools
Julia Ml Frameworks
1 tools
Data Warehouse Mcp
1 tools
Social Media Trends
1 tools
Aws Bedrock Applications
1 tools
Langchain Starter Projects
1 tools
Mlops Framework Directories
1 tools
Go Agent Frameworks
1 tools
Langchain Application Tutorials
1 tools
Disk Imaging Tools
1 tools
Edge Device Ml Frameworks
1 tools
Mcp Client Configuration
1 tools
Mojo Ml Frameworks
1 tools
React Speech Recognition
1 tools
Icu Patient Risk Prediction
1 tools
Fullstack Ai Monorepos
1 tools
Aws Cloud Services
1 tools
Llm Json Streaming
1 tools
Network Traffic Classification
1 tools
Anomaly Detection Systems
1 tools
File Content Extraction
1 tools
Vector Db Benchmarking
1 tools
Openapi Mcp Generation
1 tools
Bayesian Inference Frameworks
1 tools
Nlp Dataset Collections
1 tools
Ai Business Analytics
1 tools
Geospatial Ml Tools
1 tools
Mlr3 Ecosystem
1 tools
Election Sentiment Forecasting
1 tools
Portuguese Nlp Tools
1 tools
N8N Workflow Automation
1 tools
Government Procurement Docs
1 tools
Chatgpt Web Automation
1 tools
Temporal Expression Parsing
1 tools
Ai Search Optimization
1 tools
Personal Blogs Portfolios
1 tools
Local Llm Deployment
1 tools
Web3 Contract Security
1 tools
Javascript Ml Libraries
1 tools
Kubernetes Ai Dashboards
1 tools
Rust Ml Libraries
1 tools
Algorithmic Trading Bots
1 tools
Dna Sequence Ml
1 tools