Trending Data Engineering Tools
Tools with the biggest quality score improvements over the last 6 days.
| # | Tool | Change | Score | Tier |
|---|---|---|---|---|
| 1 |
datajoint/datajoint-python
Relational data pipelines for the science lab |
+10 | 75 | Verified |
| 2 |
biglocalnews/warn-scraper
Command-line interface for downloading WARN Act notices of qualified plant... |
+10 | 67 | Established |
| 3 |
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM... |
+9 | 87 | Verified |
| 4 |
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,... |
+7 | 66 | Established |
| 5 |
fkie-cad/Logprep
log data pre processing, generation and shipping in python |
+7 | 62 | Established |
| 6 |
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
+7 | 74 | Verified |
| 7 |
SQLMesh/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt. |
+7 | 67 | Established |
| 8 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
+7 | 58 | Established |
| 9 |
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from... |
+7 | 76 | Verified |
| 10 |
dagster-io/dagster
An orchestration platform for the development, production, and observation... |
+7 | 94 | Verified |
| 11 |
risingwavelabs/risingwave
Event streaming platform for agents, apps, and analytics. Continuously... |
+7 | 70 | Verified |
| 12 |
redpanda-data/connect
Fancy stream processing made operationally mundane |
+7 | 63 | Established |
| 13 |
cloudquery/cloudquery
Data pipelines for cloud config and security data. Build cloud asset... |
+7 | 70 | Verified |
| 14 |
crate/crate
CrateDB is a distributed and scalable SQL database for storing and analyzing... |
+7 | 73 | Verified |
| 15 |
dathere/qsv
Blazing-fast Data-Wrangling toolkit |
+7 | 73 | Verified |
| 16 |
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single... |
+7 | 84 | Verified |
| 17 |
dagu-org/dagu
A local-first workflow engine built the way it should be: declarative,... |
+7 | 70 | Verified |
| 18 |
heavyai/heavydb
HeavyDB (formerly MapD/OmniSciDB) |
+7 | 58 | Established |
| 19 |
snowplow/snowplow
The leader in Customer Data Infrastructure |
+7 | 69 | Established |
| 20 |
ucbepic/docetl
A system for agentic LLM-powered data processing and ETL |
+7 | 62 | Established |
| 21 |
sql-machine-learning/sqlflow
Brings SQL and AI together. |
+7 | 48 | Emerging |
| 22 |
weld-project/weld
High-performance runtime for data analytics applications |
+7 | 45 | Emerging |
| 23 |
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No... |
+7 | 59 | Established |
| 24 |
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to... |
+7 | 69 | Established |
| 25 |
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics |
+7 | 90 | Verified |
| 26 |
mayneyao/eidos
An extensible framework for Personal Data Management. |
+7 | 68 | Established |
| 27 |
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data |
+7 | 70 | Verified |
| 28 |
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform |
+7 | 76 | Verified |
| 29 |
vectordotdev/vector
A high-performance observability data pipeline. |
+7 | 71 | Verified |
| 30 |
apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,... |
+7 | 76 | Verified |
| 31 |
debezium/debezium
Change data capture for a variety of databases. Please log issues at... |
+7 | 76 | Verified |
| 32 |
apache/flink-cdc
Flink CDC is a streaming data integration tool |
+7 | 76 | Verified |
| 33 |
open-metadata/OpenMetadata
OpenMetadata is a unified metadata platform for data discovery, data... |
+7 | 75 | Verified |
| 34 |
StructuredLabs/preswald
Preswald is a WASM packager for Python-based interactive data apps: bundle... |
+7 | 51 | Established |
| 35 |
mage-ai/mage-ai
🧙 Build, run, and manage data pipelines for integrating and transforming data. |
+7 | 81 | Verified |
| 36 |
orchest/orchest
Build data pipelines, the easy way 🛠️ |
+7 | 44 | Emerging |
| 37 |
apache/seatunnel
SeaTunnel is a multimodal, high-performance, distributed, massive data... |
+7 | 76 | Verified |
| 38 |
dbeaver/dbeaver
Free universal database tool and SQL client |
+7 | 70 | Verified |
| 39 |
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data... |
+7 | 95 | Verified |
| 40 |
scribe-org/Scribe-Data
Wikidata and Wiktionary language data extraction |
+5 | 68 | Established |
| 41 |
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data... |
+5 | 90 | Verified |
| 42 |
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor |
+4 | 79 | Verified |
| 43 |
Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is... |
+4 | 86 | Verified |
| 44 |
vaexio/vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML,... |
+4 | 75 | Verified |
| 45 |
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery |
+3 | 67 | Established |
| 46 |
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io |
+3 | 63 | Established |
| 47 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
+3 | 82 | Verified |
| 48 |
Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas... |
+1 | 58 | Established |
| 49 |
jtablesaw/tablesaw
Java dataframe and visualization library |
+1 | 60 | Established |