The Data Engineering Directory

Quality-scored directory of 517 data engineering tools, updated daily. Every tool scored on maintenance, adoption, maturity, and community signals.

Data engineering tools for building data pipelines, ETL workflows, data quality, and data infrastructure.

Verified

34

70–100

Established

111

50–69

Emerging

259

30–49

Experimental

113

10–29

Top tools by quality score

# Tool Score
1 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

95
2 dagster-io/dagster

An orchestration platform for the development, production, and observation...

94
3 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

90
4 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

90
5 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

87
6 supabase/supabase-py

Python Client for Supabase. Query Postgres from Flask, Django, FastAPI....

87
7 Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is...

86
8 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

84
9 koopjs/koop

Transform, query, and download geospatial data on the web.

82
10 mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

81
11 meltano/meltano

Meltano: the declarative code-first data integration engine that powers your...

80
12 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

79
13 quiltdata/quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and...

77
14 databricks/dbt-databricks

A dbt adapter for Databricks.

77
15 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

76
16 apache/flink-cdc

Flink CDC is a streaming data integration tool

76
17 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

76
18 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

76
19 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

76
20 apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data...

76

Browse by category

Data Pipeline Frameworks

264 tools

SQL Query Adapters

106 tools

Spark Hadoop Ml Pipelines

14 tools

Ml Experiment Tracking

8 tools

Natural Language Sql Builders

7 tools

Mlops Workflow Orchestration

5 tools

Ml Api Deployment

4 tools

Distributed Training Frameworks

4 tools

Csv Data Chat

4 tools

Data Quality Preprocessing

3 tools

Model Inference Serving

3 tools

Rust Tensor Frameworks

3 tools

Go Ml Bindings

3 tools

Twitter Sentiment Pipelines

3 tools

Business Intelligence Dashboards

3 tools

Semantic Search Applications

2 tools

Llm Data Labeling

2 tools

Natural Language Database Agents

2 tools

Open Source Contribution Guides

2 tools

Open Dataset Collections

2 tools

Scala Ml Frameworks

2 tools

Real Time Threat Detection

2 tools

Scikit Learn Pipelines

2 tools

Personal Portfolio Showcases

2 tools

Data Analytics Platforms

2 tools

Document Data Extraction

1 tools

Agentic Ai Frameworks

1 tools

Rust Native Vectordbs

1 tools

Ai Test Automation

1 tools

Sql Database Mcp

1 tools

Saas Ai Platforms

1 tools

Obsidian Ai Plugins

1 tools

Automl Frameworks

1 tools

Production Rag Pipelines

1 tools

Slack Mcp Servers

1 tools

Llm Data Visualization

1 tools

Langchain Tool Integrations

1 tools

Local Semantic Search

1 tools

Regional Fiscal Data

1 tools

Code Context Packaging

1 tools

Text Visualization Graphs

1 tools

Ai Workflow Automation

1 tools

Document Intelligence Extraction

1 tools

Self Hosted Rag Platforms

1 tools

Chatbot Frameworks

1 tools

Postgres Vector Rag

1 tools

Julia Ml Frameworks

1 tools

Data Warehouse Mcp

1 tools

Social Media Trends

1 tools

Aws Bedrock Applications

1 tools

Langchain Starter Projects

1 tools

Mlops Framework Directories

1 tools

Go Agent Frameworks

1 tools

Langchain Application Tutorials

1 tools

Disk Imaging Tools

1 tools

Edge Device Ml Frameworks

1 tools

Mcp Client Configuration

1 tools

Mojo Ml Frameworks

1 tools

React Speech Recognition

1 tools

Icu Patient Risk Prediction

1 tools

Fullstack Ai Monorepos

1 tools

Aws Cloud Services

1 tools

Llm Json Streaming

1 tools

Network Traffic Classification

1 tools

Anomaly Detection Systems

1 tools

File Content Extraction

1 tools

Vector Db Benchmarking

1 tools

Openapi Mcp Generation

1 tools

Bayesian Inference Frameworks

1 tools

Nlp Dataset Collections

1 tools

Ai Business Analytics

1 tools

Geospatial Ml Tools

1 tools

Mlr3 Ecosystem

1 tools

Election Sentiment Forecasting

1 tools

Portuguese Nlp Tools

1 tools

N8N Workflow Automation

1 tools

Government Procurement Docs

1 tools

Chatgpt Web Automation

1 tools

Temporal Expression Parsing

1 tools

Ai Search Optimization

1 tools

Personal Blogs Portfolios

1 tools

Local Llm Deployment

1 tools

Web3 Contract Security

1 tools

Javascript Ml Libraries

1 tools

Kubernetes Ai Dashboards

1 tools

Rust Ml Libraries

1 tools

Algorithmic Trading Bots

1 tools

Dna Sequence Ml

1 tools