All Data Engineering Tools

517 tools ranked by quality score · Page 5 of 6

Showing 401–500 of 517
# Tool Score Tier
401 calbergs/spotify-api

Pipeline that extracts data from the Spotify API to build a more detailed...

32
Emerging
402 turbot/steampipe-plugin-virustotal

Use SQL to instantly query file, domain, URL and IP scanning results from VirusTotal.

32
Emerging
403 prefeitura-rio/pipelines_rj_smtr

Códigos de captura e tratamento de dados da SMTR

32
Emerging
404 tenzir/library

Packages for the Tenzir ecosystem.

32
Emerging
405 SentryPeer/SentryPeerHQ

Fraud Detection for VoIP. Use SentryPeer® HQ to help prevent VoIP...

32
Emerging
406 mlr-org/mlr3db

Data Backends to let mlr3 work transparently with (remote) data bases

32
Emerging
407 bytehub-ai/bytehub

ByteHub: making feature stores simple

32
Emerging
408 tushar2704/SQL-Portfolio

Collection of personal SQL projects and queries I've worked on, showcasing...

31
Emerging
409 mbari-org/aidata

(ETL) Extract, transform, load/download and augment images and annotations...

31
Emerging
410 BigData-Ananlysiser/UGC-Analysiser

一个开源的全栈大数据项目,主要包含实时数据采集/机器学习/大数据处理/前端可视化

31
Emerging
411 Paulescu/bytewax-hopsworks-example

Compute and store real-time features for crypto trading using Bytwax (stream...

31
Emerging
412 thinkall/featcopilot

Next-generation LLM-powered auto feature engineering framework

31
Emerging
413 pkochanowicz/n8n-setup-docker

Fast, safe and smart setup for self-hosted n8n placed in a Docker container,...

30
Emerging
414 moj-analytical-services/iam_builder

Little helper to write IAM policies

30
Emerging
415 jtakish/airflow-provider-sap-hana

Airflow provider package for SAP HANA

30
Emerging
416 GSA/coe-hud-acquisitions

A repository that contains links and information for acquisitions and...

29
Experimental
417 AmirhosseinHonardoust/Data-Storytelling-Dashboard

A fully interactive data storytelling dashboard for e-commerce analytics....

29
Experimental
418 IgorNatann/project_e_commerce_dw

DW de e-commerce (Kimball/Star Schema) em SQL Server, com scripts, dados...

29
Experimental
419 runprism/prism

Prism is the easiest way to develop, orchestrate, and execute data pipelines...

29
Experimental
420 apache/seatunnel-tools

SeaTunnel is a multimodal, high-performance, distributed, massive data...

29
Experimental
421 bruin-data/setup-bruin

Official action to install Bruin CLI in Github Actions.

29
Experimental
422 cderickson/Mox-Data.com

Mox-Data.com is a cloud-based data ingestion tool used to process raw data...

29
Experimental
423 TJAdryan/astro_blog

This site uses the amazing Astro.build project. I added **Google Docs** ...

29
Experimental
424 peter115342/soccer-tracker-DE-project

End-To-End Data Engineering Project. Made to learn some common data...

28
Experimental
425 richban/opendata-stack-platform

Open Data Stack Platform: a collection of projects and pipelines built with...

28
Experimental
426 vnvo/deltaforge

A versatile, high-performance Change Data Capture (CDC) engine built in...

28
Experimental
427 turbot/steampipe-plugin-imap

Use SQL to instantly query mailboxes, messages and more using IMAP. Open...

27
Experimental
428 eventvisor/eventvisor

Fine-grained control over analytics events and logs via remote configuration

27
Experimental
429 lezwon/CatalystOps

Semantic cost-linting and performance warnings extension for Databricks in VS Code

27
Experimental
430 turbot/steampipe-plugin-openapi

Use SQL to instantly query resources from OpenAPI. Open source CLI. No DB required.

27
Experimental
431 Hyperwindmill/morphql

Transform data with queries

27
Experimental
432 Mindbaz/python-gpostmaster-domains-datas

Downloads and flattends datas from Google Postmaster Tools (GPT)

27
Experimental
433 turbot/steampipe-plugin-digitalocean

Use SQL to instantly query droplets, VPCs, users and more from DigitalOcean....

27
Experimental
434 SourceWatcher/source-watcher-core

PHP ETL engine with pluggable steps: extractors, transformers, loaders

27
Experimental
435 TheCocoTeam/source-watcher-core

PHP ETL engine for building extract–transform–load pipelines with pluggable...

27
Experimental
436 tvs-sde/oxford-omop-data-mapper

A documentation-centric DuckDB based ETL tool, implementing transformations...

27
Experimental
437 sopho-tech/sopho

Open Source Business Intelligence

27
Experimental
438 MTSWebServices/horizon

Simple HWM Store backend

27
Experimental
439 turbot/steampipe-plugin-supabase

Use SQL to instantly query Supabase resources. Open source CLI. No DB required.

26
Experimental
440 turbot/steampipe-plugin-docker

Use SQL to instantly query Dockerfile commands and more from Docker. Open...

26
Experimental
441 turbot/steampipe-plugin-namecheap

Use SQL to instantly query Namecheap for domains, DNS host records & more....

26
Experimental
442 turbot/steampipe-plugin-ibm

Use SQL to instantly query instances, networks, users and more from IBM...

26
Experimental
443 turbot/steampipe-plugin-jumpcloud

Use SQL to instantly query resources from JumpCloud. Open source CLI. No DB required.

26
Experimental
444 turbot/steampipe-plugin-linode

Use SQL to instantly query instances, domains and more from Linode. Open...

26
Experimental
445 turbot/steampipe-plugin-onepassword

Use SQL to instantly query 1Password vaults, items, files & more. Open...

26
Experimental
446 everycure-org/kedro-argo

argo-kedro is a kedro-plugin for executing Kedro pipelines on Argo Workflows.

26
Experimental
447 MTSWebServices/etl-entities

Basic ETL Entity classes for onETL

26
Experimental
448 sul-dlss/libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other...

26
Experimental
449 lyrasis/kiba-extend

Extensions to Kiba ETL

26
Experimental
450 illuin-tech/data-pipeline

Library for describing data transformation pipelines by compositing simple...

26
Experimental
451 tracebloc/data-ingestors

tracebloc data pipeline for training/test dataset setup

26
Experimental
452 tarek-clarke/resilient-rap-framework

A resilient, fault‑tolerant telemetry analytics pipeline designed to...

26
Experimental
453 edwinweber/dbt_duckdb_demo_public

Data engineering demo project for Danish Parliament (Folketing) open data —...

26
Experimental
454 neo-technology-field/python-etl-lib

simple lib of ETL building blocks

26
Experimental
455 chayansraj/Python-ETL-pipeline-using-Airflow-on-AWS

This project demonstrates how to build and automate an ETL pipeline written...

25
Experimental
456 nvisycom/runtime

Enterprise-grade multimodal redaction runtime that detects and removes...

25
Experimental
457 zovchik0v/task-management

🛠️ Streamline task management with this full-stack solution featuring...

25
Experimental
458 KasperOmsK/pipefn

pipefn is a Go library for building lazy, functional, and composable...

25
Experimental
459 turbot/steampipe-plugin-aiven

Use SQL to instantly query Aiven accounts, projects, teams, users & more....

25
Experimental
460 turbot/steampipe-plugin-trello

Use SQL to instantly query Trello organizations, boards, members,...

25
Experimental
461 turbot/steampipe-plugin-env0

Use SQL to instantly query env0 resources. Open source CLI. No DB required.

25
Experimental
462 turbot/steampipe-plugin-heroku

Use SQL to instantly query apps, dynos and more from Heroku. Open source...

25
Experimental
463 turbot/steampipe-plugin-fly

Use SQL to instantly query fly.io resources. Open source CLI. No DB required.

25
Experimental
464 turbot/steampipe-plugin-fastly

Use SQL to instantly query services, ACLs and more from Fastly. Open source...

25
Experimental
465 turbot/steampipe-plugin-urlscan

Use SQL to instantly query urlscan.io. Open source CLI. No DB required.

25
Experimental
466 turbot/steampipe-plugin-updown

Use SQL to instantly query status (e.g. checks, downtimes) from updown.io....

25
Experimental
467 turbot/steampipe-plugin-awscfn

Use SQL to instantly query resources, data sources and more from AWS...

25
Experimental
468 tbrus/smartjoin

Deterministic key and join discovery for structured datasets

25
Experimental
469 qweliant/ankaa

POC for real-time monitoring and alert system for home hemodialysis,...

25
Experimental
470 turbot/steampipe-plugin-panos

Use SQL to instantly query PAN-OS firewalls, security policies & more. Open...

25
Experimental
471 turbot/steampipe-plugin-newrelic

Use SQL to instantly query alerts, events, and more from New Relic. Open...

25
Experimental
472 turbot/steampipe-plugin-planetscale

Use SQL to instantly query PlanetScale databases, branches and more. Open...

25
Experimental
473 turbot/steampipe-plugin-mailchimp

Use SQL to instantly query Mailchimp marketing data. Open source CLI. No DB required.

25
Experimental
474 turbot/steampipe-plugin-vercel

Use SQL to instantly query projects, teams, domains and more from Vercel....

25
Experimental
475 turbot/steampipe-plugin-splunk

Use SQL to instantly query logs, indexes, apps and more Splunk. Open source...

25
Experimental
476 turbot/steampipe-plugin-pipes

Use SQL to instantly query Turbot Pipes resources across workspaces. Open...

25
Experimental
477 nicopon/dtpipe

A simple, self-contained CLI for performance-focused data streaming & anonymization.

25
Experimental
478 faltz009/Closure-SDK

A hash you can do algebra on — composable verification for ordered data over...

25
Experimental
479 vishnuvardhanaan/equity-fundamental-engine

Production-style financial data engineering pipeline that standardizes NSE...

25
Experimental
480 alireza-heidarii/Real-Time-Data-Cleaning-Pipeline-for-Medical-and-Healthcare-Data

A real-time data cleaning pipeline for medical and healthcare data using...

25
Experimental
481 vishnuvardhanaan/equity-fundamental-analytics

Macro-aware, explainable equity analytics system using Bronze–Silver–Gold...

25
Experimental
482 RaySatish/Market-Surveillance-System

Big-data pipeline detecting wash trading, pump & dump, and spoofing in trade...

25
Experimental
483 pablo-reyes8/colombia-tourism-ml-forecasting

ML project forecasting monthly foreign tourist arrivals in Colombian cities...

25
Experimental
484 elevata-labs/elevata

elevata is an Architecture Runtime for modern data platforms —...

25
Experimental
485 adhamhaithameid/Classroom-Quick-Downloader

A sophisticated cross-browser extension for bulk Google Classroom downloads,...

25
Experimental
486 Galaticos-API/API-3

Projeto da API do primeiro semestre de 2026

25
Experimental
487 MTSWebServices/horizon-hwm-store

Horizon HWM Store for onETL

25
Experimental
488 idlab-discover/RustiFlow

Flow feature extraction tool built in Rust using eBPF

24
Experimental
489 tosh2230/stairlight

A data lineage tool detects table dependencies from rendered SQL statements.

23
Experimental
490 fishstormX/fishmaple

个人网站 https://www.fishmaple.cn

22
Experimental
491 BirdiD/BirdiDQ

BirdiDQ leverages the power of the Python Great Expectations open-source...

22
Experimental
492 Wazzabeee/pyspark-etl-twitter

Implementation of an ETL process for real-time sentiment analysis of tweets...

22
Experimental
493 AmirhosseinHonardoust/Market-IQ

MarketIQ is a full-stack Streamlit + SQL + Prophet dashboard for real-time...

21
Experimental
494 NileDB/com.niledb.core

Open-source Data Backend written in Java and based on PostgreSQL & GraphQL.

21
Experimental
495 pmutua/drf_csv_xlsx_file_upload

Demo Django (Django Rest Framework) API uploads .csv/.xlsx for bulk data,...

21
Experimental
496 AmirhosseinHonardoust/Beyond-Charts-Interactive-Storytelling

A comprehensive guide and codebase for building interactive storytelling...

21
Experimental
497 contriboss/no_fly_list

A flexible, high-performance tagging system for Rails applications with...

21
Experimental
498 MaxHalford/tuna

:fish: A streaming ETL for fish

20
Experimental
499 ThinkThinkAI/ThinkDB

ThinkDB is an easy-to-use SQL client that makes working with your databases...

20
Experimental
500 aymane-maghouti/Big-Data-Project

This project aims to predict smartphone prices using a combination of batch...

20
Experimental