Data Pipeline Frameworks Data Engineering Tools
Tools for building, deploying, and orchestrating end-to-end data workflows (ETL/ELT, transformations, ingestion). Does NOT include SQL learning resources, individual data connectors, or general-purpose query engines.
There are 264 data pipeline frameworks tools tracked. 25 score above 70 (verified tier). The highest-rated is PrefectHQ/prefect at 95/100 with 21,898 stars and 9,764,465 monthly downloads. 10 of the top 10 are actively maintained.
Get all 264 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=data-pipeline-frameworks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data... |
|
Verified |
| 2 |
dagster-io/dagster
An orchestration platform for the development, production, and observation... |
|
Verified |
| 3 |
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data... |
|
Verified |
| 4 |
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics |
|
Verified |
| 5 |
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM... |
|
Verified |
| 6 |
bruin-data/ingestr
ingestr is a CLI tool to copy data between any databases with a single... |
|
Verified |
| 7 |
koopjs/koop
Transform, query, and download geospatial data on the web. |
|
Verified |
| 8 |
meltano/meltano
Meltano: the declarative code-first data integration engine that powers your... |
|
Verified |
| 9 |
pyjanitor-devs/pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor |
|
Verified |
| 10 |
quiltdata/quilt
Quilt is a Scientific Data Management Platform on AWS that helps teams and... |
|
Verified |
| 11 |
databricks/dbt-databricks
A dbt adapter for Databricks. |
|
Verified |
| 12 |
debezium/debezium
Change data capture for a variety of databases. Please log issues at... |
|
Verified |
| 13 |
apache/flink-cdc
Flink CDC is a streaming data integration tool |
|
Verified |
| 14 |
airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from... |
|
Verified |
| 15 |
apache/superset
Apache Superset is a Data Visualization and Data Exploration Platform |
|
Verified |
| 16 |
apache/shardingsphere
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,... |
|
Verified |
| 17 |
datajoint/datajoint-python
Relational data pipelines for the science lab |
|
Verified |
| 18 |
apache/incubator-devlake
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
|
Verified |
| 19 |
dathere/qsv
Blazing-fast Data-Wrangling toolkit |
|
Verified |
| 20 |
capitalone/locopy
locopy: Loading/Unloading to Redshift and Snowflake using Python. |
|
Verified |
| 21 |
vectordotdev/vector
A high-performance observability data pipeline. |
|
Verified |
| 22 |
treeverse/lakeFS
lakeFS - Data version control for your data lake | Git for data |
|
Verified |
| 23 |
dagu-org/dagu
A local-first workflow engine built the way it should be: declarative,... |
|
Verified |
| 24 |
cloudquery/cloudquery
Data pipelines for cloud config and security data. Build cloud asset... |
|
Verified |
| 25 |
risingwavelabs/risingwave
Event streaming platform for agents, apps, and analytics. Continuously... |
|
Verified |
| 26 |
PeerDB-io/peerdb
Fast, Simple and a cost effective tool to replicate data from Postgres to... |
|
Established |
| 27 |
apache/hop
Hop Orchestration Platform |
|
Established |
| 28 |
catalyst-cooperative/pudl
The Public Utility Data Liberation Project provides analysis-ready energy... |
|
Established |
| 29 |
networktocode/diffsync
A utility library for comparing and synchronizing different datasets. |
|
Established |
| 30 |
snowplow/snowplow
The leader in Customer Data Infrastructure |
|
Established |
| 31 |
scribe-org/Scribe-Data
Wikidata and Wiktionary language data extraction |
|
Established |
| 32 |
biglocalnews/warn-scraper
Command-line interface for downloading WARN Act notices of qualified plant... |
|
Established |
| 33 |
SQLMesh/sqlmesh
Scalable and efficient data transformation framework - backwards compatible with dbt. |
|
Established |
| 34 |
dataform-co/dataform
Dataform is a framework for managing SQL based data operations in BigQuery |
|
Established |
| 35 |
odpi/egeria
Egeria core |
|
Established |
| 36 |
laminlabs/lamindb
Open-source data framework for biology. Context and memory for datasets and... |
|
Established |
| 37 |
aws/aws-sdk-pandas
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,... |
|
Established |
| 38 |
astronomer/airflow-provider-fivetran-async
A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran |
|
Established |
| 39 |
datazip-inc/olake
OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain... |
|
Established |
| 40 |
datavane/tis
Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI |
|
Established |
| 41 |
nordquant/complete-dbt-bootcamp-zero-to-hero
Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp... |
|
Established |
| 42 |
wgzhao/Addax
A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL... |
|
Established |
| 43 |
ariacom/Seal-Report
Database Reporting Tool and Tasks (.Net) |
|
Established |
| 44 |
datavane/datavines
Know your data better!Datavines is Next-gen Data Observability Platform,... |
|
Established |
| 45 |
nightscape/spark-excel
A Spark plugin for reading and writing Excel files |
|
Established |
| 46 |
vietvudanh/vietlott-data
Automation fetching data for Vietlott. Just for fun. |
|
Established |
| 47 |
datagouv/csv-detective
Inspection of tabular (csv, xls-like) files to guess the columns' content |
|
Established |
| 48 |
sodadata/soda-core
Data Contracts engine for the modern data stack. https://www.soda.io |
|
Established |
| 49 |
redpanda-data/connect
Fancy stream processing made operationally mundane |
|
Established |
| 50 |
elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers.... |
|
Established |
| 51 |
fkie-cad/Logprep
log data pre processing, generation and shipping in python |
|
Established |
| 52 |
iBridges-for-iRODS/iBridges
A wrapper around the python-irodsclient to allow for easy interaction with... |
|
Established |
| 53 |
apecloud/ape-dts
ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data... |
|
Established |
| 54 |
datacleaner/DataCleaner
The premier open source Data Quality solution |
|
Established |
| 55 |
bitpicky/dbt-sugar
dbt-sugar is a CLI tool that allows users of dbt to have fun and ease... |
|
Established |
| 56 |
cre-dev/xml2db
A Python package to load complex XML files into a relational database |
|
Established |
| 57 |
turbot/steampipe
Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No... |
|
Established |
| 58 |
DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building... |
|
Established |
| 59 |
slingdata-io/sling-cli
Sling is a CLI tool that extracts data from a source storage/database and... |
|
Established |
| 60 |
rudderlabs/rudder-server
Privacy and Security focused Segment-alternative, in Golang and React |
|
Established |
| 61 |
biglocalnews/warn-transformer
Consolidate, enrich and republish the data gathered by warn-scraper |
|
Established |
| 62 |
timeplus-io/proton
⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream... |
|
Established |
| 63 |
fedspendingtransparency/usaspending-api
Server application to serve U.S. federal spending data via a RESTful API |
|
Established |
| 64 |
amphi-ai/amphi-etl
visual data prep powered by python |
|
Established |
| 65 |
snowflakedb/snowpark-python
Snowflake Snowpark Python API |
|
Established |
| 66 |
dotflow-io/dotflow
🎲 Business Logic Code in a flow! |
|
Established |
| 67 |
Desbordante/desbordante-core
Desbordante is a high-performance data profiler that is capable of... |
|
Established |
| 68 |
osalvador/ReplicaDB
ReplicaDB is open source tool for database replication, designed for... |
|
Established |
| 69 |
ohs-foundation/fhir-data-pipes
A collection of tools for extracting FHIR resources and analytics services... |
|
Established |
| 70 |
ConduitIO/conduit
Conduit streams data between data stores. Kafka Connect replacement. No JVM required. |
|
Established |
| 71 |
data-engineering-community/data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data... |
|
Established |
| 72 |
opendatadiscovery/odd-platform
First open-source data discovery and observability platform. We make a life... |
|
Established |
| 73 |
Multiwoven/multiwoven
🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census. |
|
Established |
| 74 |
TianLangStudio/DataXServer
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer)... |
|
Established |
| 75 |
edkreuk/FMD_FRAMEWORK
The Fabric Metadata-Driven Framework (FMD) is a cutting-edge accelerator... |
|
Established |
| 76 |
stn1slv/awesome-integration
A curated list of awesome system integration software and resources. |
|
Established |
| 77 |
airbytehq/PyAirbyte
PyAirbyte brings the power of Airbyte to every Python developer. |
|
Established |
| 78 |
AbsaOSS/cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark |
|
Established |
| 79 |
tower/tower-cli
Next generation compute platform for the post-modern data stack |
|
Established |
| 80 |
neo4j/neo4j-jdbc
Official Neo4j JDBC Driver |
|
Established |
| 81 |
DataKitchen/dataops-testgen
DataOps Data Quality TestGen is part of DataKitchen's Open Source Data... |
|
Established |
| 82 |
Breeze0806/go-etl
go-etl is a toolset for data extraction, transformation and loading. |
|
Established |
| 83 |
dlt-hub/verified-sources
Contribute to dlt verified sources 🔥 |
|
Established |
| 84 |
Guepard-Corp/qwery-core
The Boring query platform - Connect and query anything |
|
Established |
| 85 |
HTTP-RPC/Kilo
Lightweight REST for Java |
|
Established |
| 86 |
fdmorison/tiozin
Tiozin, your friendly ETL framework |
|
Established |
| 87 |
benjamin-awd/monopoly
Monopoly is a Python library & CLI that converts bank statement PDFs to CSV. |
|
Established |
| 88 |
bitol-io/open-data-contract-standard
Home of the Open Data Contract Standard (ODCS). |
|
Established |
| 89 |
bacalhau-project/bacalhau
Community-driven, simple, yet powerful framework for fast, cost-effective... |
|
Established |
| 90 |
metafacture/metafacture-core
Core package of the Metafacture tool suite for metadata processing. |
|
Established |
| 91 |
linkedpipes/etl
LinkedPipes ETL is an RDF based, lightweight ETL tool |
|
Established |
| 92 |
vmware/versatile-data-kit
One framework to develop, deploy and operate data workflows with Python and SQL. |
|
Established |
| 93 |
dalenewman/Transformalize
Configurable Extract, Transform, and Load |
|
Established |
| 94 |
dashmug/glue-utils
glue-utils makes AWS Glue jobs less repetitive, more type-safe, and easier... |
|
Established |
| 95 |
AndreaBozzo/dataprof
Library and CLI for profiling tabular data |
|
Established |
| 96 |
robert-koch-institut/mex-common
RKI Metadata Exchange | Software development toolkit for the MEx project... |
|
Established |
| 97 |
dagster-io/community-integrations
Community supported integrations for the Dagster platform. |
|
Established |
| 98 |
datazip-inc/olake-ui
Frontend & BFF (Backend for frontend) for Olake. This includes the UI code... |
|
Established |
| 99 |
dataflint/spark
Drop-in replacement for Apache Spark UI |
|
Established |
| 100 |
dfpc-coe/CloudTAK
TAK Compatible, browser based Common Operation Picture & Situational Awareness tool |
|
Established |
| 101 |
flowsynx/flowsynx
A deterministic orchestrator for composable micro-workflows with reusable modules |
|
Emerging |
| 102 |
kay-ou/SimTradeData
SimTradeData is a utility library supporting SimTradeDesk, SimTradeLab and... |
|
Emerging |
| 103 |
starlake-ai/starlake
Declarative text based tool for data analysts and engineers to extract,... |
|
Emerging |
| 104 |
dbt-labs/jaffle-shop
🥪🦘 An open source sandbox project exploring dbt workflows via a fictional... |
|
Emerging |
| 105 |
reductstore/reductstore
High Performance Storage and Streaming Solution for Data Acquisition Systems |
|
Emerging |
| 106 |
DataSQRL/sqrl
Data Pipeline Automation Framework to build MCP servers, data APIs, and data... |
|
Emerging |
| 107 |
odpi/egeria-docs
Documentation repository for the Egeria project. |
|
Emerging |
| 108 |
microsoft/unified-data-foundation-with-fabric-solution-accelerator
Unified Data Foundation with Microsoft Fabric with Options to Integrate with... |
|
Emerging |
| 109 |
edrewitz/WxData
A Python library that acts as a client to download, pre-process and... |
|
Emerging |
| 110 |
kanton-bern/hellodata-be
The Open-Source Enterprise Data Platform in a single Portal |
|
Emerging |
| 111 |
MilkMp/CIA-World-Factbooks-Archive-1990-2025
Complete structured archive of every CIA World Factbook edition from... |
|
Emerging |
| 112 |
weifuwan/seatunnel-web
SeaTunnel Web is a visual platform for building, managing, and monitoring... |
|
Emerging |
| 113 |
GitBrincie212/ChronoGrapher
Powerful, developer-experience centric, blazingly fast and extensible job... |
|
Emerging |
| 114 |
akmalsoliev/Validoopsie
A simple and easy to use Data Validation library for Python. |
|
Emerging |
| 115 |
OHDSI/ETL-Synthea
A package supporting the conversion from Synthea CSV to OMOP CDM |
|
Emerging |
| 116 |
dflib/dflib
In-memory Java DataFrame library |
|
Emerging |
| 117 |
GovHub-br/gov-hub
GovHub - Transformando Dados em Valor para Gestão Pública |
|
Emerging |
| 118 |
trustedshops-public/schema2pyarrow
Converts AsyncApi and JsonSchema to PyArrow schema |
|
Emerging |
| 119 |
rpsft/etlbox
A lightweight ETL (extract, transform, load) library and data integration... |
|
Emerging |
| 120 |
mprove-io/mprove
Open Source Business Intelligence with Malloy Semantic Layer :tada: |
|
Emerging |
| 121 |
ara3d/bim-open-schema
Representing BIM Data as Parquet |
|
Emerging |
| 122 |
ogbinar/DataEngineeringPilipinas
Data Engineering Pilipinas is a community for data engineers, data analysts,... |
|
Emerging |
| 123 |
GoPlasmatic/dataflow-rs
A high-performance rules engine for IFTTT-style automation in Rust with... |
|
Emerging |
| 124 |
opensnowcat/opensnowcat-enrich
OpenSnowcat Enricher (Apache 2.0 License) |
|
Emerging |
| 125 |
NeaByteLab/IDX-API
Indonesian Stock Exchange API wrapper for trading data integration. |
|
Emerging |
| 126 |
halestudio/hale
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor) |
|
Emerging |
| 127 |
Edwardvaneechoud/Flowfile
Flowfile is a visual ETL tool and Python library combining drag-and-drop... |
|
Emerging |
| 128 |
Bruno-Furtado/cloud-cnpj
Ingestão, preparação e disponibilização gratuita de dados de CNPJs de... |
|
Emerging |
| 129 |
DataRecce/recce
The data-validation toolkit for enhanced dbt (data build tool) PR review |
|
Emerging |
| 130 |
thadhutch/sports-quant
End-to-end NFL data pipeline that scrapes PFF grades and Pro Football... |
|
Emerging |
| 131 |
treeverse/charts
Helm charts |
|
Emerging |
| 132 |
DataKitchen/data-observability-installer
Installer for DataKitchen's Open Source Data Observability Products. Data... |
|
Emerging |
| 133 |
scribe-org/Scribe-Server
Backend service for Scribe data downloads |
|
Emerging |
| 134 |
leftkats/awesome-greek-tech-jobs
A comprehensive map of companies that hire for tech jobs in Greece. |
|
Emerging |
| 135 |
xxh/xxh-shell-xonsh
Use @xonsh wherever you go through the SSH without installation on the host. |
|
Emerging |
| 136 |
wgzhao/addax-admin
Addax Admin is a web-based management console for Addax ETL jobs, offering... |
|
Emerging |
| 137 |
GovHub-br/data-application-gov-hub
Pipeline de Dados do Gov-Hub |
|
Emerging |
| 138 |
bitol-io/open-data-product-standard
Home of the Open Data Product Standard (ODPS). |
|
Emerging |
| 139 |
mehd-io/pypi-duck-flow
end-to-end data engineering project to get insights from PyPi using python,... |
|
Emerging |
| 140 |
GregoryKogan/yt-framework
Build scalable data pipelines on YTsaurus with automatic stage management,... |
|
Emerging |
| 141 |
nightmarewalker/D-MemFS
In-process virtual filesystem with hard quota for Python |
|
Emerging |
| 142 |
ashish10alex/vscode-dataform-tools
Dataform Tools - VS Code extension to run and visualise Dataform data... |
|
Emerging |
| 143 |
hiero-hackers/analytics
Stay up to date with hiero organisation activity and contributor diversity |
|
Emerging |
| 144 |
hbz/lobid-resources
Transformation, web frontend, and API for the hbz catalog as LOD |
|
Emerging |
| 145 |
bbossgroups/bboss-elastic-tran
bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;... |
|
Emerging |
| 146 |
ludovicschmetz-stack/datavow
Open-source data contract enforcement — define, sync dbt, validate, block,... |
|
Emerging |
| 147 |
opensnowcat/opensnowcat-collector
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License) |
|
Emerging |
| 148 |
mikevan666/opendataworks
opendataworks... |
|
Emerging |
| 149 |
SpareCores/sc-crawler
Pull and standardize data on cloud compute resources. |
|
Emerging |
| 150 |
wp-labs/warp-parse
Focusing on building industry-leading ETL engines. |
|
Emerging |
| 151 |
koralium/flowtide
High-performance streaming SQL query engine designed for real-time data... |
|
Emerging |
| 152 |
irajhedayati/data-engineering
A set of Data Engineering tools online for public use |
|
Emerging |
| 153 |
databricks-industry-solutions/python-data-sources
Quality python data sources for pyspark 4.x |
|
Emerging |
| 154 |
AMPATH/etl-rest-server
This project hosts scripts to generate flat tables used for reporting purposes. |
|
Emerging |
| 155 |
MTSWebServices/onetl
One ETL tool to rule them all |
|
Emerging |
| 156 |
AndreaBozzo/Ceres
Harvesting & Semantic search for open data portals |
|
Emerging |
| 157 |
ottogroup/koality
Library for data quality monitoring based on duckdb. |
|
Emerging |
| 158 |
markusbegerow/data-analytics-exercises
End-to-end data warehouse exercises for students - build a modern ELT... |
|
Emerging |
| 159 |
Edwardvaneechoud/pyfloe
A minimal zero dependency dataframe library |
|
Emerging |
| 160 |
catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases |
|
Emerging |
| 161 |
datacompose/datacompose
Data Cleaning for Pyspark |
|
Emerging |
| 162 |
Indexical-Metrics-Measure-Advisory/watchmen
Watchmen Platform is a low code data platform for data pipeline, meta data... |
|
Emerging |
| 163 |
DawnbrandBots/yaml-yugi
A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card... |
|
Emerging |
| 164 |
continuous-dems/fetchez
Fetchez is a lightweight, modular, and highly extendable Python framework... |
|
Emerging |
| 165 |
colliery-io/cloacina
Embedded workflow orchestration library for Rust and Python. Build... |
|
Emerging |
| 166 |
apache/incubator-devlake-playground
Apache DevLake is an open-source dev data platform to ingest, analyze, and... |
|
Emerging |
| 167 |
dagster-io/dagster-open-platform
Dagster Labs' open-source data platform, built with Dagster. |
|
Emerging |
| 168 |
jordilin/gitar
Git all remotes. git cli tool that targets both Github and Gitlab |
|
Emerging |
| 169 |
elastiflow/pipelines
A lightweight Go framework for building stateful, real-time data pipelines.... |
|
Emerging |
| 170 |
rannd1nt/phaethon
Dimensional Data Pipeline & Semantic Data Engineering Framework |
|
Emerging |
| 171 |
ChrisDevRepo/vscode_data_lineage
VS Code extension for visualizing SQL Server database object dependencies... |
|
Emerging |
| 172 |
B1AAB/EBA
An ML-first temporal graph of Bitcoin's on-chain fund flows. |
|
Emerging |
| 173 |
DevDizzle/gammarips-engine
An end-to-end, serverless AI platform built on Google Cloud that... |
|
Emerging |
| 174 |
wilson-mok/demo
In this repository, you will find varies demo and presentations I have... |
|
Emerging |
| 175 |
AbsaOSS/pramen
Resilient data pipeline framework running on Apache Spark |
|
Emerging |
| 176 |
monarch-initiative/koza
Data transformation framework for LinkML data models |
|
Emerging |
| 177 |
terrylica/exness-data-preprocess
Professional forex tick data preprocessing with unified DuckDB storage,... |
|
Emerging |
| 178 |
mattlianje/etl4s
Powerful, whiteboard-style ETL |
|
Emerging |
| 179 |
ivszhuravlev/spark-tuning-handbook
Hands-on Spark internals and performance engineering. |
|
Emerging |
| 180 |
wherobots/airflow-providers-wherobots
Airflow extensions for communicating with Wherobots Cloud |
|
Emerging |
| 181 |
Data-Research-Analysis/data-research-analysis-platform
Stop Guessing. Start Dominating Your Market. The only data platform built... |
|
Emerging |
| 182 |
PeopleForBikes/brokenspoke
A collection of tools for the BNA. |
|
Emerging |
| 183 |
CategoricalData/CQL
Categorical Query Language IDE |
|
Emerging |
| 184 |
netxs2000/devops
DevOps Data Application Platform... |
|
Emerging |
| 185 |
yanghaiji/JsonCleanseETL
JSONCleanseETL是一款专业的数据清洗和转换工具,旨在为用户提供高效处理JSON格式数据的解决方案。... |
|
Emerging |
| 186 |
moj-analytical-services/etl_manager
A python package to create a database on the platform using our moj data... |
|
Emerging |
| 187 |
ineelhere/forex-connect
Streamlit Connection to Explore Foreign Currency Exchange rates 💰 in real-time |
|
Emerging |
| 188 |
exasol/exasol-personal
The High-Performance Analytics Engine — Free for Personal Use |
|
Emerging |
| 189 |
realdatadriven/etlx
ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and... |
|
Emerging |
| 190 |
nationalarchives/ds-caselaw-ingester
Parse judgements from the Transformation Engine and load them into MarkLogic... |
|
Emerging |
| 191 |
DataKitchen/dataops-observability
DataOps Observability is part of DataKitchen's Open Source Data... |
|
Emerging |
| 192 |
MTSWebServices/syncmaster-ui
Frontend for Syncmaster, no-code ETL tool. WIP |
|
Emerging |
| 193 |
gopidesupavan/qualink
Data quality validation, profiling, anomaly detection, and YAML-driven... |
|
Emerging |
| 194 |
Zipstack/visitran
Modern, AI-native and agentic Pythonic data transformation platform. |
|
Emerging |
| 195 |
BEKO2210/World_report
A self-updating global dashboard that aggregates 40+ open data sources... |
|
Emerging |
| 196 |
DawnbrandBots/yaml-yugipedia
An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi. |
|
Emerging |
| 197 |
Beyond-Finance/dataeng-de-technical-assessment
Public repo of Beyond Finance's technical assessment for Data Engineering candidates |
|
Emerging |
| 198 |
ankiano/etl
Extract transform load CLI tool for extracting small and middle data volume... |
|
Emerging |
| 199 |
chalk-ai/chalk-go
Go client for Chalk |
|
Emerging |
| 200 |
MTSWebServices/syncmaster
No-code ETL tool, based on onETL + PySpark |
|
Emerging |
| 201 |
ccao-data/data-architecture
Codebase for CCAO data infrastructure construction and management |
|
Emerging |
| 202 |
RustedBytes/audios-to-dataset
Convert your audio files into DuckDB or Parquet files |
|
Emerging |
| 203 |
DataKitchen/dataops-observability-agents
DataOps Observability Integration Agents are part of DataKitchen's Open... |
|
Emerging |
| 204 |
equitusai/arcxa
Mapping intelligence for enterprise data migrations: schema mapping,... |
|
Emerging |
| 205 |
bitroot/coflux
Open-source workflow engine. Orchestrate and observe computational workflows... |
|
Emerging |
| 206 |
mahmoudparsian/data-warehousing
This repository is a place for the Data Warehousing course at the... |
|
Emerging |
| 207 |
betoalien/PardoX
PardoX: The Hyper-Fast Data Engine |
|
Emerging |
| 208 |
tenzir/library
Packages for the Tenzir ecosystem. |
|
Emerging |
| 209 |
prefeitura-rio/pipelines_rj_smtr
Códigos de captura e tratamento de dados da SMTR |
|
Emerging |
| 210 |
thinkall/featcopilot
Next-generation LLM-powered auto feature engineering framework |
|
Emerging |
| 211 |
rush-db/rushdb
RushDB is an Instant Database for Modern Apps & AI. Built on top of Neo4j. |
|
Emerging |
| 212 |
mbari-org/aidata
(ETL) Extract, transform, load/download and augment images and annotations... |
|
Emerging |
| 213 |
vedanthv/data-engineering-portfolio
Cool DE Projects |
|
Emerging |
| 214 |
jtakish/airflow-provider-sap-hana
Airflow provider package for SAP HANA |
|
Emerging |
| 215 |
moj-analytical-services/iam_builder
Little helper to write IAM policies |
|
Emerging |
| 216 |
IgorNatann/project_e_commerce_dw
DW de e-commerce (Kimball/Star Schema) em SQL Server, com scripts, dados... |
|
Experimental |
| 217 |
TJAdryan/astro_blog
This site uses the amazing Astro.build project. I added **Google Docs** ... |
|
Experimental |
| 218 |
bruin-data/setup-bruin
Official action to install Bruin CLI in Github Actions. |
|
Experimental |
| 219 |
cderickson/Mox-Data.com
Mox-Data.com is a cloud-based data ingestion tool used to process raw data... |
|
Experimental |
| 220 |
richban/opendata-stack-platform
Open Data Stack Platform: a collection of projects and pipelines built with... |
|
Experimental |
| 221 |
vnvo/deltaforge
A versatile, high-performance Change Data Capture (CDC) engine built in... |
|
Experimental |
| 222 |
peter115342/soccer-tracker-DE-project
End-To-End Data Engineering Project. Made to learn some common data... |
|
Experimental |
| 223 |
tvs-sde/oxford-omop-data-mapper
A documentation-centric DuckDB based ETL tool, implementing transformations... |
|
Experimental |
| 224 |
eventvisor/eventvisor
Fine-grained control over analytics events and logs via remote configuration |
|
Experimental |
| 225 |
sopho-tech/sopho
Open Source Business Intelligence |
|
Experimental |
| 226 |
lezwon/CatalystOps
Semantic cost-linting and performance warnings extension for Databricks in VS Code |
|
Experimental |
| 227 |
TheCocoTeam/source-watcher-core
PHP ETL engine for building extract–transform–load pipelines with pluggable... |
|
Experimental |
| 228 |
MTSWebServices/horizon
Simple HWM Store backend |
|
Experimental |
| 229 |
Hyperwindmill/morphql
Transform data with queries |
|
Experimental |
| 230 |
SourceWatcher/source-watcher-core
PHP ETL engine with pluggable steps: extractors, transformers, loaders |
|
Experimental |
| 231 |
illuin-tech/data-pipeline
Library for describing data transformation pipelines by compositing simple... |
|
Experimental |
| 232 |
lyrasis/kiba-extend
Extensions to Kiba ETL |
|
Experimental |
| 233 |
sul-dlss/libsys-airflow
Airflow DAGS for migrating and managing ILS data into FOLIO along with other... |
|
Experimental |
| 234 |
tracebloc/data-ingestors
tracebloc data pipeline for training/test dataset setup |
|
Experimental |
| 235 |
MTSWebServices/etl-entities
Basic ETL Entity classes for onETL |
|
Experimental |
| 236 |
tarek-clarke/resilient-rap-framework
A resilient, fault‑tolerant telemetry analytics pipeline designed to... |
|
Experimental |
| 237 |
everycure-org/kedro-argo
argo-kedro is a kedro-plugin for executing Kedro pipelines on Argo Workflows. |
|
Experimental |
| 238 |
edwinweber/dbt_duckdb_demo_public
Data engineering demo project for Danish Parliament (Folketing) open data —... |
|
Experimental |
| 239 |
neo-technology-field/python-etl-lib
simple lib of ETL building blocks |
|
Experimental |
| 240 |
adhamhaithameid/Classroom-Quick-Downloader
A sophisticated cross-browser extension for bulk Google Classroom downloads,... |
|
Experimental |
| 241 |
pablo-reyes8/colombia-tourism-ml-forecasting
ML project forecasting monthly foreign tourist arrivals in Colombian cities... |
|
Experimental |
| 242 |
elevata-labs/elevata
elevata is an Architecture Runtime for modern data platforms —... |
|
Experimental |
| 243 |
calbergs/spotify-api
Pipeline that extracts data from the Spotify API to build a more detailed... |
|
Experimental |
| 244 |
Galaticos-API/API-3
Projeto da API do primeiro semestre de 2026 |
|
Experimental |
| 245 |
MTSWebServices/horizon-hwm-store
Horizon HWM Store for onETL |
|
Experimental |
| 246 |
tbrus/smartjoin
Deterministic key and join discovery for structured datasets |
|
Experimental |
| 247 |
qweliant/ankaa
POC for real-time monitoring and alert system for home hemodialysis,... |
|
Experimental |
| 248 |
nicopon/dtpipe
A simple, self-contained CLI for performance-focused data streaming & anonymization. |
|
Experimental |
| 249 |
faltz009/Closure-SDK
A hash you can do algebra on — composable verification for ordered data over... |
|
Experimental |
| 250 |
vishnuvardhanaan/equity-fundamental-engine
Production-style financial data engineering pipeline that standardizes NSE... |
|
Experimental |
| 251 |
vishnuvardhanaan/equity-fundamental-analytics
Macro-aware, explainable equity analytics system using Bronze–Silver–Gold... |
|
Experimental |
| 252 |
RaySatish/Market-Surveillance-System
Big-data pipeline detecting wash trading, pump & dump, and spoofing in trade... |
|
Experimental |
| 253 |
nvisycom/runtime
Enterprise-grade multimodal redaction runtime that detects and removes... |
|
Experimental |
| 254 |
zovchik0v/task-management
🛠️ Streamline task management with this full-stack solution featuring... |
|
Experimental |
| 255 |
raphaelberly/journal
A movie journal coupled with open IMDb data, and a Flask web-app for easy... |
|
Experimental |
| 256 |
salimt/Transfermarkt-ETL-and-LIVE-Scores
asyncIO, Github Actions, GCP, dbt, Terraform, Docker |
|
Experimental |
| 257 |
anwitars/grab
High-performance, declarative stream processor for delimited text data. |
|
Experimental |
| 258 |
turki-alajmi/8-Week-TSQL-Challenge
My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server |
|
Experimental |
| 259 |
pandabear-neil/microsoft_fabric_mods
Code Snippets, Designs, and other things about building a Data Analytics... |
|
Experimental |
| 260 |
arnienemeth/industry-intel-generator
Automated weekly tech trend reports — built with Claude Code + Claude Cowork |
|
Experimental |
| 261 |
belajarqywok/cryptocurrency_prediction
Cryptocurrency prediction using LSTM (Long Short Term Memory) [ Hugging... |
|
Experimental |
| 262 |
turki-alajmi/8-week-sql-challenge-tsql
My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server |
|
Experimental |
| 263 |
turki-alhumaid/8-week-sql-challenge-tsql
My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server |
|
Experimental |
| 264 |
tosh2230/stairlight
A data lineage tool detects table dependencies from rendered SQL statements. |
|
Experimental |