Trending Data Engineering Tools

Tools with the biggest quality score improvements over the last 6 days.

# Tool Change Score Tier
1 datajoint/datajoint-python

Relational data pipelines for the science lab

+10 75 Verified
2 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

+10 67 Established
3 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

+9 87 Verified
4 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

+7 66 Established
5 fkie-cad/Logprep

log data pre processing, generation and shipping in python

+7 62 Established
6 apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

+7 74 Verified
7 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

+7 67 Established
8 rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

+7 58 Established
9 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

+7 76 Verified
10 dagster-io/dagster

An orchestration platform for the development, production, and observation...

+7 94 Verified
11 risingwavelabs/risingwave

Event streaming platform for agents, apps, and analytics. Continuously...

+7 70 Verified
12 redpanda-data/connect

Fancy stream processing made operationally mundane

+7 63 Established
13 cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset...

+7 70 Verified
14 crate/crate

CrateDB is a distributed and scalable SQL database for storing and analyzing...

+7 73 Verified
15 dathere/qsv

Blazing-fast Data-Wrangling toolkit

+7 73 Verified
16 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

+7 84 Verified
17 dagu-org/dagu

A local-first workflow engine built the way it should be: declarative,...

+7 70 Verified
18 heavyai/heavydb

HeavyDB (formerly MapD/OmniSciDB)

+7 58 Established
19 snowplow/snowplow

The leader in Customer Data Infrastructure

+7 69 Established
20 ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

+7 62 Established
21 sql-machine-learning/sqlflow

Brings SQL and AI together.

+7 48 Emerging
22 weld-project/weld

High-performance runtime for data analytics applications

+7 45 Emerging
23 turbot/steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No...

+7 59 Established
24 PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to...

+7 69 Established
25 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

+7 90 Verified
26 mayneyao/eidos

An extensible framework for Personal Data Management.

+7 68 Established
27 treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

+7 70 Verified
28 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

+7 76 Verified
29 vectordotdev/vector

A high-performance observability data pipeline.

+7 71 Verified
30 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

+7 76 Verified
31 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

+7 76 Verified
32 apache/flink-cdc

Flink CDC is a streaming data integration tool

+7 76 Verified
33 open-metadata/OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data...

+7 75 Verified
34 StructuredLabs/preswald

Preswald is a WASM packager for Python-based interactive data apps: bundle...

+7 51 Established
35 mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

+7 81 Verified
36 orchest/orchest

Build data pipelines, the easy way 🛠️

+7 44 Emerging
37 apache/seatunnel

SeaTunnel is a multimodal, high-performance, distributed, massive data...

+7 76 Verified
38 dbeaver/dbeaver

Free universal database tool and SQL client

+7 70 Verified
39 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

+7 95 Verified
40 scribe-org/Scribe-Data

Wikidata and Wiktionary language data extraction

+5 68 Established
41 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

+5 90 Verified
42 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

+4 79 Verified
43 Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is...

+4 86 Verified
44 vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,...

+4 75 Verified
45 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

+3 67 Established
46 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

+3 63 Established
47 koopjs/koop

Transform, query, and download geospatial data on the web.

+3 82 Verified
48 Data-Centric-AI-Community/ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas...

+1 58 Established
49 jtablesaw/tablesaw

Java dataframe and visualization library

+1 60 Established