Data Pipeline Frameworks Data Engineering Tools

Tools for building, deploying, and orchestrating end-to-end data workflows (ETL/ELT, transformations, ingestion). Does NOT include SQL learning resources, individual data connectors, or general-purpose query engines.

There are 264 data pipeline frameworks tools tracked. 25 score above 70 (verified tier). The highest-rated is PrefectHQ/prefect at 95/100 with 21,898 stars and 9,764,465 monthly downloads. 10 of the top 10 are actively maintained.

Get all 264 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=data-pipeline-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data...

95
Verified
2 dagster-io/dagster

An orchestration platform for the development, production, and observation...

94
Verified
3 dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data...

90
Verified
4 growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

90
Verified
5 pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM...

87
Verified
6 bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single...

84
Verified
7 koopjs/koop

Transform, query, and download geospatial data on the web.

82
Verified
8 meltano/meltano

Meltano: the declarative code-first data integration engine that powers your...

80
Verified
9 pyjanitor-devs/pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

79
Verified
10 quiltdata/quilt

Quilt is a Scientific Data Management Platform on AWS that helps teams and...

77
Verified
11 databricks/dbt-databricks

A dbt adapter for Databricks.

77
Verified
12 debezium/debezium

Change data capture for a variety of databases. Please log issues at...

76
Verified
13 apache/flink-cdc

Flink CDC is a streaming data integration tool

76
Verified
14 airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from...

76
Verified
15 apache/superset

Apache Superset is a Data Visualization and Data Exploration Platform

76
Verified
16 apache/shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability,...

76
Verified
17 datajoint/datajoint-python

Relational data pipelines for the science lab

75
Verified
18 apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

74
Verified
19 dathere/qsv

Blazing-fast Data-Wrangling toolkit

73
Verified
20 capitalone/locopy

locopy: Loading/Unloading to Redshift and Snowflake using Python.

72
Verified
21 vectordotdev/vector

A high-performance observability data pipeline.

71
Verified
22 treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

70
Verified
23 dagu-org/dagu

A local-first workflow engine built the way it should be: declarative,...

70
Verified
24 cloudquery/cloudquery

Data pipelines for cloud config and security data. Build cloud asset...

70
Verified
25 risingwavelabs/risingwave

Event streaming platform for agents, apps, and analytics. Continuously...

70
Verified
26 PeerDB-io/peerdb

Fast, Simple and a cost effective tool to replicate data from Postgres to...

69
Established
27 apache/hop

Hop Orchestration Platform

69
Established
28 catalyst-cooperative/pudl

The Public Utility Data Liberation Project provides analysis-ready energy...

69
Established
29 networktocode/diffsync

A utility library for comparing and synchronizing different datasets.

69
Established
30 snowplow/snowplow

The leader in Customer Data Infrastructure

69
Established
31 scribe-org/Scribe-Data

Wikidata and Wiktionary language data extraction

68
Established
32 biglocalnews/warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant...

67
Established
33 SQLMesh/sqlmesh

Scalable and efficient data transformation framework - backwards compatible with dbt.

67
Established
34 dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

67
Established
35 odpi/egeria

Egeria core

67
Established
36 laminlabs/lamindb

Open-source data framework for biology. Context and memory for datasets and...

66
Established
37 aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream,...

66
Established
38 astronomer/airflow-provider-fivetran-async

A new Airflow Provider for Fivetran, maintained by Astronomer and Fivetran

65
Established
39 datazip-inc/olake

OLake - Fastest Databases, Kafka & S3 Replication to Apache Iceberg or Plain...

65
Established
40 datavane/tis

Support agile DataOps Based on Flink, DataX and Flink-CDC, Chunjun with Web-UI

64
Established
41 nordquant/complete-dbt-bootcamp-zero-to-hero

Supplementary Materials for the The Complete dbt (Data Build Tool) Bootcamp...

64
Established
42 wgzhao/Addax

A fast and versatile ETL tool that can transfer data between RDBMS and NoSQL...

64
Established
43 ariacom/Seal-Report

Database Reporting Tool and Tasks (.Net)

64
Established
44 datavane/datavines

Know your data better!Datavines is Next-gen Data Observability Platform,...

64
Established
45 nightscape/spark-excel

A Spark plugin for reading and writing Excel files

64
Established
46 vietvudanh/vietlott-data

Automation fetching data for Vietlott. Just for fun.

64
Established
47 datagouv/csv-detective

Inspection of tabular (csv, xls-like) files to guess the columns' content

64
Established
48 sodadata/soda-core

Data Contracts engine for the modern data stack. https://www.soda.io

63
Established
49 redpanda-data/connect

Fancy stream processing made operationally mundane

63
Established
50 elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers....

62
Established
51 fkie-cad/Logprep

log data pre processing, generation and shipping in python

62
Established
52 iBridges-for-iRODS/iBridges

A wrapper around the python-irodsclient to allow for easy interaction with...

61
Established
53 apecloud/ape-dts

ApeCloud's Data Transfer Suite, written in Rust. Provides ultra-fast data...

61
Established
54 datacleaner/DataCleaner

The premier open source Data Quality solution

60
Established
55 bitpicky/dbt-sugar

dbt-sugar is a CLI tool that allows users of dbt to have fun and ease...

60
Established
56 cre-dev/xml2db

A Python package to load complex XML files into a relational database

59
Established
57 turbot/steampipe

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No...

59
Established
58 DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building...

59
Established
59 slingdata-io/sling-cli

Sling is a CLI tool that extracts data from a source storage/database and...

58
Established
60 rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

58
Established
61 biglocalnews/warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper

58
Established
62 timeplus-io/proton

⚡ Fastest SQL ETL pipeline in a single C++ binary, built for stream...

58
Established
63 fedspendingtransparency/usaspending-api

Server application to serve U.S. federal spending data via a RESTful API

57
Established
64 amphi-ai/amphi-etl

visual data prep powered by python

57
Established
65 snowflakedb/snowpark-python

Snowflake Snowpark Python API

57
Established
66 dotflow-io/dotflow

🎲 Business Logic Code in a flow!

57
Established
67 Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of...

56
Established
68 osalvador/ReplicaDB

ReplicaDB is open source tool for database replication, designed for...

56
Established
69 ohs-foundation/fhir-data-pipes

A collection of tools for extracting FHIR resources and analytics services...

56
Established
70 ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

56
Established
71 data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data...

56
Established
72 opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life...

55
Established
73 Multiwoven/multiwoven

🔥🔥🔥 Open source Reverse ETL - alternative to hightouch and census.

55
Established
74 TianLangStudio/DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer)...

55
Established
75 edkreuk/FMD_FRAMEWORK

The Fabric Metadata-Driven Framework (FMD) is a cutting-edge accelerator...

55
Established
76 stn1slv/awesome-integration

A curated list of awesome system integration software and resources.

55
Established
77 airbytehq/PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.

55
Established
78 AbsaOSS/cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

55
Established
79 tower/tower-cli

Next generation compute platform for the post-modern data stack

54
Established
80 neo4j/neo4j-jdbc

Official Neo4j JDBC Driver

54
Established
81 DataKitchen/dataops-testgen

DataOps Data Quality TestGen is part of DataKitchen's Open Source Data...

54
Established
82 Breeze0806/go-etl

go-etl is a toolset for data extraction, transformation and loading.

54
Established
83 dlt-hub/verified-sources

Contribute to dlt verified sources 🔥

54
Established
84 Guepard-Corp/qwery-core

The Boring query platform - Connect and query anything

54
Established
85 HTTP-RPC/Kilo

Lightweight REST for Java

53
Established
86 fdmorison/tiozin

Tiozin, your friendly ETL framework

53
Established
87 benjamin-awd/monopoly

Monopoly is a Python library & CLI that converts bank statement PDFs to CSV.

53
Established
88 bitol-io/open-data-contract-standard

Home of the Open Data Contract Standard (ODCS).

53
Established
89 bacalhau-project/bacalhau

Community-driven, simple, yet powerful framework for fast, cost-effective...

52
Established
90 metafacture/metafacture-core

Core package of the Metafacture tool suite for metadata processing.

52
Established
91 linkedpipes/etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

52
Established
92 vmware/versatile-data-kit

One framework to develop, deploy and operate data workflows with Python and SQL.

52
Established
93 dalenewman/Transformalize

Configurable Extract, Transform, and Load

52
Established
94 dashmug/glue-utils

glue-utils makes AWS Glue jobs less repetitive, more type-safe, and easier...

51
Established
95 AndreaBozzo/dataprof

Library and CLI for profiling tabular data

51
Established
96 robert-koch-institut/mex-common

RKI Metadata Exchange | Software development toolkit for the MEx project...

51
Established
97 dagster-io/community-integrations

Community supported integrations for the Dagster platform.

51
Established
98 datazip-inc/olake-ui

Frontend & BFF (Backend for frontend) for Olake. This includes the UI code...

50
Established
99 dataflint/spark

Drop-in replacement for Apache Spark UI

50
Established
100 dfpc-coe/CloudTAK

TAK Compatible, browser based Common Operation Picture & Situational Awareness tool

50
Established
101 flowsynx/flowsynx

A deterministic orchestrator for composable micro-workflows with reusable modules

49
Emerging
102 kay-ou/SimTradeData

SimTradeData is a utility library supporting SimTradeDesk, SimTradeLab and...

49
Emerging
103 starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract,...

49
Emerging
104 dbt-labs/jaffle-shop

🥪🦘 An open source sandbox project exploring dbt workflows via a fictional...

49
Emerging
105 reductstore/reductstore

High Performance Storage and Streaming Solution for Data Acquisition Systems

48
Emerging
106 DataSQRL/sqrl

Data Pipeline Automation Framework to build MCP servers, data APIs, and data...

48
Emerging
107 odpi/egeria-docs

Documentation repository for the Egeria project.

48
Emerging
108 microsoft/unified-data-foundation-with-fabric-solution-accelerator

Unified Data Foundation with Microsoft Fabric with Options to Integrate with...

48
Emerging
109 edrewitz/WxData

A Python library that acts as a client to download, pre-process and...

48
Emerging
110 kanton-bern/hellodata-be

The Open-Source Enterprise Data Platform in a single Portal

47
Emerging
111 MilkMp/CIA-World-Factbooks-Archive-1990-2025

Complete structured archive of every CIA World Factbook edition from...

47
Emerging
112 weifuwan/seatunnel-web

SeaTunnel Web is a visual platform for building, managing, and monitoring...

47
Emerging
113 GitBrincie212/ChronoGrapher

Powerful, developer-experience centric, blazingly fast and extensible job...

47
Emerging
114 akmalsoliev/Validoopsie

A simple and easy to use Data Validation library for Python.

47
Emerging
115 OHDSI/ETL-Synthea

A package supporting the conversion from Synthea CSV to OMOP CDM

47
Emerging
116 dflib/dflib

In-memory Java DataFrame library

47
Emerging
117 GovHub-br/gov-hub

GovHub - Transformando Dados em Valor para Gestão Pública

46
Emerging
118 trustedshops-public/schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

46
Emerging
119 rpsft/etlbox

A lightweight ETL (extract, transform, load) library and data integration...

46
Emerging
120 mprove-io/mprove

Open Source Business Intelligence with Malloy Semantic Layer :tada:

46
Emerging
121 ara3d/bim-open-schema

Representing BIM Data as Parquet

46
Emerging
122 ogbinar/DataEngineeringPilipinas

Data Engineering Pilipinas is a community for data engineers, data analysts,...

46
Emerging
123 GoPlasmatic/dataflow-rs

A high-performance rules engine for IFTTT-style automation in Rust with...

46
Emerging
124 opensnowcat/opensnowcat-enrich

OpenSnowcat Enricher (Apache 2.0 License)

46
Emerging
125 NeaByteLab/IDX-API

Indonesian Stock Exchange API wrapper for trading data integration.

46
Emerging
126 halestudio/hale

(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)

45
Emerging
127 Edwardvaneechoud/Flowfile

Flowfile is a visual ETL tool and Python library combining drag-and-drop...

45
Emerging
128 Bruno-Furtado/cloud-cnpj

Ingestão, preparação e disponibilização gratuita de dados de CNPJs de...

45
Emerging
129 DataRecce/recce

The data-validation toolkit for enhanced dbt (data build tool) PR review

45
Emerging
130 thadhutch/sports-quant

End-to-end NFL data pipeline that scrapes PFF grades and Pro Football...

45
Emerging
131 treeverse/charts

Helm charts

45
Emerging
132 DataKitchen/data-observability-installer

Installer for DataKitchen's Open Source Data Observability Products. Data...

45
Emerging
133 scribe-org/Scribe-Server

Backend service for Scribe data downloads

44
Emerging
134 leftkats/awesome-greek-tech-jobs

A comprehensive map of companies that hire for tech jobs in Greece.

44
Emerging
135 xxh/xxh-shell-xonsh

Use @xonsh wherever you go through the SSH without installation on the host.

44
Emerging
136 wgzhao/addax-admin

Addax Admin is a web-based management console for Addax ETL jobs, offering...

44
Emerging
137 GovHub-br/data-application-gov-hub

Pipeline de Dados do Gov-Hub

44
Emerging
138 bitol-io/open-data-product-standard

Home of the Open Data Product Standard (ODPS).

44
Emerging
139 mehd-io/pypi-duck-flow

end-to-end data engineering project to get insights from PyPi using python,...

43
Emerging
140 GregoryKogan/yt-framework

Build scalable data pipelines on YTsaurus with automatic stage management,...

43
Emerging
141 nightmarewalker/D-MemFS

In-process virtual filesystem with hard quota for Python

43
Emerging
142 ashish10alex/vscode-dataform-tools

Dataform Tools - VS Code extension to run and visualise Dataform data...

43
Emerging
143 hiero-hackers/analytics

Stay up to date with hiero organisation activity and contributor diversity

43
Emerging
144 hbz/lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD

43
Emerging
145 bbossgroups/bboss-elastic-tran

bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;...

42
Emerging
146 ludovicschmetz-stack/datavow

Open-source data contract enforcement — define, sync dbt, validate, block,...

42
Emerging
147 opensnowcat/opensnowcat-collector

OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)

42
Emerging
148 mikevan666/opendataworks

opendataworks...

42
Emerging
149 SpareCores/sc-crawler

Pull and standardize data on cloud compute resources.

42
Emerging
150 wp-labs/warp-parse

Focusing on building industry-leading ETL engines.

42
Emerging
151 koralium/flowtide

High-performance streaming SQL query engine designed for real-time data...

42
Emerging
152 irajhedayati/data-engineering

A set of Data Engineering tools online for public use

42
Emerging
153 databricks-industry-solutions/python-data-sources

Quality python data sources for pyspark 4.x

42
Emerging
154 AMPATH/etl-rest-server

This project hosts scripts to generate flat tables used for reporting purposes.

41
Emerging
155 MTSWebServices/onetl

One ETL tool to rule them all

41
Emerging
156 AndreaBozzo/Ceres

Harvesting & Semantic search for open data portals

41
Emerging
157 ottogroup/koality

Library for data quality monitoring based on duckdb.

41
Emerging
158 markusbegerow/data-analytics-exercises

End-to-end data warehouse exercises for students - build a modern ELT...

41
Emerging
159 Edwardvaneechoud/pyfloe

A minimal zero dependency dataframe library

41
Emerging
160 catalyst-cooperative/ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases

41
Emerging
161 datacompose/datacompose

Data Cleaning for Pyspark

41
Emerging
162 Indexical-Metrics-Measure-Advisory/watchmen

Watchmen Platform is a low code data platform for data pipeline, meta data...

40
Emerging
163 DawnbrandBots/yaml-yugi

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card...

40
Emerging
164 continuous-dems/fetchez

Fetchez is a lightweight, modular, and highly extendable Python framework...

40
Emerging
165 colliery-io/cloacina

Embedded workflow orchestration library for Rust and Python. Build...

40
Emerging
166 apache/incubator-devlake-playground

Apache DevLake is an open-source dev data platform to ingest, analyze, and...

40
Emerging
167 dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

40
Emerging
168 jordilin/gitar

Git all remotes. git cli tool that targets both Github and Gitlab

40
Emerging
169 elastiflow/pipelines

A lightweight Go framework for building stateful, real-time data pipelines....

40
Emerging
170 rannd1nt/phaethon

Dimensional Data Pipeline & Semantic Data Engineering Framework

40
Emerging
171 ChrisDevRepo/vscode_data_lineage

VS Code extension for visualizing SQL Server database object dependencies...

40
Emerging
172 B1AAB/EBA

An ML-first temporal graph of Bitcoin's on-chain fund flows.

39
Emerging
173 DevDizzle/gammarips-engine

An end-to-end, serverless AI platform built on Google Cloud that...

39
Emerging
174 wilson-mok/demo

In this repository, you will find varies demo and presentations I have...

39
Emerging
175 AbsaOSS/pramen

Resilient data pipeline framework running on Apache Spark

39
Emerging
176 monarch-initiative/koza

Data transformation framework for LinkML data models

39
Emerging
177 terrylica/exness-data-preprocess

Professional forex tick data preprocessing with unified DuckDB storage,...

39
Emerging
178 mattlianje/etl4s

Powerful, whiteboard-style ETL

39
Emerging
179 ivszhuravlev/spark-tuning-handbook

Hands-on Spark internals and performance engineering.

39
Emerging
180 wherobots/airflow-providers-wherobots

Airflow extensions for communicating with Wherobots Cloud

39
Emerging
181 Data-Research-Analysis/data-research-analysis-platform

Stop Guessing. Start Dominating Your Market. The only data platform built...

39
Emerging
182 PeopleForBikes/brokenspoke

A collection of tools for the BNA.

39
Emerging
183 CategoricalData/CQL

Categorical Query Language IDE

38
Emerging
184 netxs2000/devops

DevOps Data Application Platform...

37
Emerging
185 yanghaiji/JsonCleanseETL

JSONCleanseETL是一款专业的数据清洗和转换工具,旨在为用户提供高效处理JSON格式数据的解决方案。...

37
Emerging
186 moj-analytical-services/etl_manager

A python package to create a database on the platform using our moj data...

37
Emerging
187 ineelhere/forex-connect

Streamlit Connection to Explore Foreign Currency Exchange rates 💰 in real-time

37
Emerging
188 exasol/exasol-personal

The High-Performance Analytics Engine — Free for Personal Use

37
Emerging
189 realdatadriven/etlx

ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and...

37
Emerging
190 nationalarchives/ds-caselaw-ingester

Parse judgements from the Transformation Engine and load them into MarkLogic...

37
Emerging
191 DataKitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data...

37
Emerging
192 MTSWebServices/syncmaster-ui

Frontend for Syncmaster, no-code ETL tool. WIP

37
Emerging
193 gopidesupavan/qualink

Data quality validation, profiling, anomaly detection, and YAML-driven...

36
Emerging
194 Zipstack/visitran

Modern, AI-native and agentic Pythonic data transformation platform.

36
Emerging
195 BEKO2210/World_report

A self-updating global dashboard that aggregates 40+ open data sources...

36
Emerging
196 DawnbrandBots/yaml-yugipedia

An automatically-updated collection of wikitexts from Yugipedia. Part of YAML Yugi.

35
Emerging
197 Beyond-Finance/dataeng-de-technical-assessment

Public repo of Beyond Finance's technical assessment for Data Engineering candidates

35
Emerging
198 ankiano/etl

Extract transform load CLI tool for extracting small and middle data volume...

34
Emerging
199 chalk-ai/chalk-go

Go client for Chalk

34
Emerging
200 MTSWebServices/syncmaster

No-code ETL tool, based on onETL + PySpark

34
Emerging
201 ccao-data/data-architecture

Codebase for CCAO data infrastructure construction and management

34
Emerging
202 RustedBytes/audios-to-dataset

Convert your audio files into DuckDB or Parquet files

34
Emerging
203 DataKitchen/dataops-observability-agents

DataOps Observability Integration Agents are part of DataKitchen's Open...

33
Emerging
204 equitusai/arcxa

Mapping intelligence for enterprise data migrations: schema mapping,...

33
Emerging
205 bitroot/coflux

Open-source workflow engine. Orchestrate and observe computational workflows...

33
Emerging
206 mahmoudparsian/data-warehousing

This repository is a place for the Data Warehousing course at the...

33
Emerging
207 betoalien/PardoX

PardoX: The Hyper-Fast Data Engine

33
Emerging
208 tenzir/library

Packages for the Tenzir ecosystem.

32
Emerging
209 prefeitura-rio/pipelines_rj_smtr

Códigos de captura e tratamento de dados da SMTR

32
Emerging
210 thinkall/featcopilot

Next-generation LLM-powered auto feature engineering framework

31
Emerging
211 rush-db/rushdb

RushDB is an Instant Database for Modern Apps & AI. Built on top of Neo4j.

31
Emerging
212 mbari-org/aidata

(ETL) Extract, transform, load/download and augment images and annotations...

31
Emerging
213 vedanthv/data-engineering-portfolio

Cool DE Projects

31
Emerging
214 jtakish/airflow-provider-sap-hana

Airflow provider package for SAP HANA

30
Emerging
215 moj-analytical-services/iam_builder

Little helper to write IAM policies

30
Emerging
216 IgorNatann/project_e_commerce_dw

DW de e-commerce (Kimball/Star Schema) em SQL Server, com scripts, dados...

29
Experimental
217 TJAdryan/astro_blog

This site uses the amazing Astro.build project. I added **Google Docs** ...

29
Experimental
218 bruin-data/setup-bruin

Official action to install Bruin CLI in Github Actions.

29
Experimental
219 cderickson/Mox-Data.com

Mox-Data.com is a cloud-based data ingestion tool used to process raw data...

29
Experimental
220 richban/opendata-stack-platform

Open Data Stack Platform: a collection of projects and pipelines built with...

28
Experimental
221 vnvo/deltaforge

A versatile, high-performance Change Data Capture (CDC) engine built in...

28
Experimental
222 peter115342/soccer-tracker-DE-project

End-To-End Data Engineering Project. Made to learn some common data...

28
Experimental
223 tvs-sde/oxford-omop-data-mapper

A documentation-centric DuckDB based ETL tool, implementing transformations...

27
Experimental
224 eventvisor/eventvisor

Fine-grained control over analytics events and logs via remote configuration

27
Experimental
225 sopho-tech/sopho

Open Source Business Intelligence

27
Experimental
226 lezwon/CatalystOps

Semantic cost-linting and performance warnings extension for Databricks in VS Code

27
Experimental
227 TheCocoTeam/source-watcher-core

PHP ETL engine for building extract–transform–load pipelines with pluggable...

27
Experimental
228 MTSWebServices/horizon

Simple HWM Store backend

27
Experimental
229 Hyperwindmill/morphql

Transform data with queries

27
Experimental
230 SourceWatcher/source-watcher-core

PHP ETL engine with pluggable steps: extractors, transformers, loaders

27
Experimental
231 illuin-tech/data-pipeline

Library for describing data transformation pipelines by compositing simple...

26
Experimental
232 lyrasis/kiba-extend

Extensions to Kiba ETL

26
Experimental
233 sul-dlss/libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other...

26
Experimental
234 tracebloc/data-ingestors

tracebloc data pipeline for training/test dataset setup

26
Experimental
235 MTSWebServices/etl-entities

Basic ETL Entity classes for onETL

26
Experimental
236 tarek-clarke/resilient-rap-framework

A resilient, fault‑tolerant telemetry analytics pipeline designed to...

26
Experimental
237 everycure-org/kedro-argo

argo-kedro is a kedro-plugin for executing Kedro pipelines on Argo Workflows.

26
Experimental
238 edwinweber/dbt_duckdb_demo_public

Data engineering demo project for Danish Parliament (Folketing) open data —...

26
Experimental
239 neo-technology-field/python-etl-lib

simple lib of ETL building blocks

26
Experimental
240 adhamhaithameid/Classroom-Quick-Downloader

A sophisticated cross-browser extension for bulk Google Classroom downloads,...

25
Experimental
241 pablo-reyes8/colombia-tourism-ml-forecasting

ML project forecasting monthly foreign tourist arrivals in Colombian cities...

25
Experimental
242 elevata-labs/elevata

elevata is an Architecture Runtime for modern data platforms —...

25
Experimental
243 calbergs/spotify-api

Pipeline that extracts data from the Spotify API to build a more detailed...

25
Experimental
244 Galaticos-API/API-3

Projeto da API do primeiro semestre de 2026

25
Experimental
245 MTSWebServices/horizon-hwm-store

Horizon HWM Store for onETL

25
Experimental
246 tbrus/smartjoin

Deterministic key and join discovery for structured datasets

25
Experimental
247 qweliant/ankaa

POC for real-time monitoring and alert system for home hemodialysis,...

25
Experimental
248 nicopon/dtpipe

A simple, self-contained CLI for performance-focused data streaming & anonymization.

25
Experimental
249 faltz009/Closure-SDK

A hash you can do algebra on — composable verification for ordered data over...

25
Experimental
250 vishnuvardhanaan/equity-fundamental-engine

Production-style financial data engineering pipeline that standardizes NSE...

25
Experimental
251 vishnuvardhanaan/equity-fundamental-analytics

Macro-aware, explainable equity analytics system using Bronze–Silver–Gold...

25
Experimental
252 RaySatish/Market-Surveillance-System

Big-data pipeline detecting wash trading, pump & dump, and spoofing in trade...

25
Experimental
253 nvisycom/runtime

Enterprise-grade multimodal redaction runtime that detects and removes...

25
Experimental
254 zovchik0v/task-management

🛠️ Streamline task management with this full-stack solution featuring...

25
Experimental
255 raphaelberly/journal

A movie journal coupled with open IMDb data, and a Flask web-app for easy...

19
Experimental
256 salimt/Transfermarkt-ETL-and-LIVE-Scores

asyncIO, Github Actions, GCP, dbt, Terraform, Docker

18
Experimental
257 anwitars/grab

High-performance, declarative stream processor for delimited text data.

18
Experimental
258 turki-alajmi/8-Week-TSQL-Challenge

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

17
Experimental
259 pandabear-neil/microsoft_fabric_mods

Code Snippets, Designs, and other things about building a Data Analytics...

17
Experimental
260 arnienemeth/industry-intel-generator

Automated weekly tech trend reports — built with Claude Code + Claude Cowork

17
Experimental
261 belajarqywok/cryptocurrency_prediction

Cryptocurrency prediction using LSTM (Long Short Term Memory) [ Hugging...

17
Experimental
262 turki-alajmi/8-week-sql-challenge-tsql

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

17
Experimental
263 turki-alhumaid/8-week-sql-challenge-tsql

My Solutions to Danny Ma's 8 Week SQL Challenge — built in T-SQL on SQL Server

17
Experimental
264 tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

16
Experimental