All Data Engineering Tools

517 tools ranked by quality score · Page 3 of 6

Showing 201–300 of 517
# Tool Score Tier
201 microsoft/unified-data-foundation-with-fabric-solution-accelerator

Unified Data Foundation with Microsoft Fabric with Options to Integrate with...

48
Emerging
202 odpi/egeria-docs

Documentation repository for the Egeria project.

48
Emerging
203 tuanx18/data-engineer-portfolio

This is a repository to demonstrate my details, skills, projects and to keep...

48
Emerging
204 sql-machine-learning/sqlflow

Brings SQL and AI together.

48
Emerging
205 19-84/redd-archiver

A PostgreSQL-backed archive generator that creates browsable HTML archives...

48
Emerging
206 MTSWebServices/onetl

One ETL tool to rule them all

48
Emerging
207 turbot/steampipe-sqlite

Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate...

48
Emerging
208 J0SAL/Decentralized-Expense-Tracker

Tracking Expenses Securely

47
Emerging
209 DawnbrandBots/yaml-yugi

A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card...

47
Emerging
210 buildersoftio/cortex

Cortex | Data Framework—a cutting-edge SDK that simplifies real-time data...

47
Emerging
211 feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

47
Emerging
212 GitBrincie212/ChronoGrapher

Powerful, developer-experience centric, blazingly fast and extensible job...

47
Emerging
213 turbot/steampipe-plugin-jira

Use SQL to instantly query Jira. Open source CLI. No DB required.

47
Emerging
214 dagster-io/dagster-open-platform

Dagster Labs' open-source data platform, built with Dagster.

47
Emerging
215 NeaByteLab/IDX-API

Indonesian Stock Exchange API wrapper for trading data integration.

46
Emerging
216 wp-labs/warp-parse

Focusing on building industry-leading ETL engines.

46
Emerging
217 monarch-initiative/koza

Data transformation framework for LinkML data models

46
Emerging
218 rpsft/etlbox

A lightweight ETL (extract, transform, load) library and data integration...

46
Emerging
219 FrigadeHQ/trench

Trench — Open-Source Analytics Infrastructure. A single production-ready...

46
Emerging
220 leftkats/awesome-greek-tech-jobs

A comprehensive map of companies that hire for tech jobs in Greece.

46
Emerging
221 mattlianje/etl4s

Powerful, whiteboard-style ETL

46
Emerging
222 GovHub-br/gov-hub

GovHub - Transformando Dados em Valor para Gestão Pública

46
Emerging
223 BlazingDB/blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built...

46
Emerging
224 MLD3/FIDDLE

FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms...

46
Emerging
225 alexei-led/spotinfo

CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types,...

46
Emerging
226 trustedshops-public/schema2pyarrow

Converts AsyncApi and JsonSchema to PyArrow schema

46
Emerging
227 opensnowcat/opensnowcat-enrich

OpenSnowcat Enricher (Apache 2.0 License)

46
Emerging
228 treeverse/charts

Helm charts

45
Emerging
229 CategoricalData/CQL

Categorical Query Language IDE

45
Emerging
230 turbot/steampipe-plugin-slack

Use SQL to instantly query users, channels, emoji and more from your Slack...

45
Emerging
231 pretzelai/pretzelai

The modern replacement for Jupyter Notebooks

45
Emerging
232 turbot/steampipe-plugin-azuread

Use SQL to instantly query groups, service principals, users and more from...

45
Emerging
233 weld-project/weld

High-performance runtime for data analytics applications

45
Emerging
234 thadhutch/sports-quant

End-to-end NFL data pipeline that scrapes PFF grades and Pro Football...

45
Emerging
235 Bruno-Furtado/cloud-cnpj

Ingestão, preparação e disponibilização gratuita de dados de CNPJs de...

45
Emerging
236 skale-me/skale

High performance distributed data processing engine

45
Emerging
237 scribe-org/Scribe-Server

Backend service for Scribe data downloads

44
Emerging
238 melvynator/ELK_twitter

This is a data pipeline for Twitter (ETL) using the elastic stack...

44
Emerging
239 rocketlaunchr/dataframe-go

DataFrames for Go: For statistics, machine-learning, and data...

44
Emerging
240 turbot/steampipe-plugin-cloudflare

Use SQL to instantly query accounts, zones and more from Cloudflare. Open...

44
Emerging
241 turbot/steampipe-plugin-net

Use SQL to instantly query DNS records, certificates and other network...

44
Emerging
242 GovHub-br/data-application-gov-hub

Pipeline de Dados do Gov-Hub

44
Emerging
243 turbot/steampipe-plugin-googleworkspace

Use SQL to instantly query calendar events, drive files, gmail messages, and...

44
Emerging
244 AltimateAI/altimate-code

Opensource agentic data engineering harness for dbt, SQL, and cloud...

44
Emerging
245 wgzhao/addax-admin

Addax Admin is a web-based management console for Addax ETL jobs, offering...

44
Emerging
246 orchest/orchest

Build data pipelines, the easy way 🛠️

44
Emerging
247 alexhraber/flowhawk

Real-time eBPF-powered network security monitor with AI-driven threat...

44
Emerging
248 realdatadriven/etlx

ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and...

44
Emerging
249 DataKitchen/dataops-observability

DataOps Observability is part of DataKitchen's Open Source Data...

44
Emerging
250 fal-ai/dbt-fal

do more with dbt. dbt-fal helps you run Python alongside dbt, so you can...

43
Emerging
251 bywwcnll/StreamPanel

Stream Panel 是一个 Chrome DevTools 扩展,允许开发者实时监控和检查流式请求。它支持 服务器发送事件 (SSE) 和 基于...

43
Emerging
252 ContextData/VectorETL

Build super simple end-to-end data & ETL pipelines for your vector databases...

43
Emerging
253 hiero-hackers/analytics

Stay up to date with hiero organisation activity and contributor diversity

43
Emerging
254 turbot/steampipe-plugin-salesforce

Use SQL to instantly query Salesforce resources. Open source CLI. No DB required.

43
Emerging
255 turbot/steampipe-plugin-stripe

Use SQL to instantly query customers, products, invoices and more from...

43
Emerging
256 turbot/steampipe-plugin-zendesk

Use SQL to instantly query Zendesk. Open source CLI. No DB required.

43
Emerging
257 turbot/steampipe-plugin-datadog

Use SQL to instantly query Datadog resources across accounts. Open source...

43
Emerging
258 hbz/lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD

43
Emerging
259 nightmarewalker/D-MemFS

In-process virtual filesystem with hard quota for Python

43
Emerging
260 GregoryKogan/yt-framework

Build scalable data pipelines on YTsaurus with automatic stage management,...

43
Emerging
261 turbot/steampipe-plugin-oci

Use SQL to instantly query Oracle Cloud resources across regions and...

43
Emerging
262 turbot/steampipe-plugin-prometheus

Use SQL to instantly query Prometheus metrics, alerts, labels and more. Open...

43
Emerging
263 yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with...

43
Emerging
264 turbot/steampipe-plugin-okta

Use SQL to instantly query users, groups, applications and more from Okta....

42
Emerging
265 SpareCores/sc-crawler

Pull and standardize data on cloud compute resources.

42
Emerging
266 ludovicschmetz-stack/datavow

Open-source data contract enforcement — define, sync dbt, validate, block,...

42
Emerging
267 DataZooDE/flapi

API Framework heavily relying on the power of DuckDB and DuckDB extensions....

42
Emerging
268 probcomp/bayeslite

BayesDB on SQLite. A Bayesian database table for querying the probable...

42
Emerging
269 bbossgroups/bboss-elastic-tran

bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;...

42
Emerging
270 digitalghost-dev/poke-cli

A hybrid CLI/TUI tool written in Go for viewing Pokémon data from the...

42
Emerging
271 turbot/steampipe-plugin-csv

Use SQL to instantly query data from CSV files. Open source CLI. No DB required.

42
Emerging
272 turbot/steampipe-plugin-microsoft365

Use SQL to instantly query calendars, contacts, drives, mailboxes and more...

42
Emerging
273 databricks-industry-solutions/python-data-sources

Quality python data sources for pyspark 4.x

42
Emerging
274 mikevan666/opendataworks

opendataworks...

42
Emerging
275 irajhedayati/data-engineering

A set of Data Engineering tools online for public use

42
Emerging
276 turbot/steampipe-export

Steampipe Export is a zero-ETL CLI to fetch data from cloud services and...

42
Emerging
277 turbot/steampipe-plugin-rss

Use SQL to instantly query RSS channels and Atom Feeds. Open source CLI. No...

41
Emerging
278 synmetrix/synmetrix

Synmetrix – production-ready open source semantic layer on Cube

41
Emerging
279 turbot/steampipe-plugin-shodan

Use SQL to instantly query host, DNS and exploit information using Shodan....

41
Emerging
280 AndreaBozzo/Ceres

Harvesting & Semantic search for open data portals

41
Emerging
281 zero-one-group/geni

A Clojure dataframe library that runs on Spark

41
Emerging
282 Edwardvaneechoud/pyfloe

A minimal zero dependency dataframe library

41
Emerging
283 markusbegerow/data-analytics-exercises

End-to-end data warehouse exercises for students - build a modern ELT...

41
Emerging
284 intel/hdk

A low-level execution library for analytic data processing.

41
Emerging
285 turbot/steampipe-plugin-mastodon

Use SQL to instantly query Mastodon resources. Open source CLI. No DB required.

41
Emerging
286 turbot/steampipe-plugin-reddit

Use SQL to instantly query Reddit posts, comments & more. Open source CLI....

41
Emerging
287 AMPATH/etl-rest-server

This project hosts scripts to generate flat tables used for reporting purposes.

41
Emerging
288 datacompose/datacompose

Data Cleaning for Pyspark

41
Emerging
289 catalyst-cooperative/ferc-xbrl-extractor

A tool for converting FERC filings published in XBRL into SQLite databases

41
Emerging
290 turbot/steampipe-plugin-jenkins

Use SQL to instantly query Jenkins resources. Open source CLI. No DB required.

41
Emerging
291 turbot/steampipe-plugin-config

Use SQL to instantly query data from various types of config files. Open...

41
Emerging
292 ottogroup/koality

Library for data quality monitoring based on duckdb.

41
Emerging
293 ChrisDevRepo/vscode_data_lineage

VS Code extension for visualizing SQL Server database object dependencies...

40
Emerging
294 jordilin/gitar

Git all remotes. git cli tool that targets both Github and Gitlab

40
Emerging
295 Vetdatahub/VetDataHub

VetDataHub is an opensource veterinary datasets repository dedicated to...

40
Emerging
296 turbot/steampipe-plugin-googlesheets

Use SQL to instantly query spreadsheets, sheets, and cell data from Google...

40
Emerging
297 turbot/steampipe-plugin-hypothesis

Use SQL to instantly query Hypothesis resources. Open source CLI. No DB required.

40
Emerging
298 elastiflow/pipelines

A lightweight Go framework for building stateful, real-time data pipelines....

40
Emerging
299 turbot/steampipe-plugin-circleci

Use SQL to instantly query projects, pipelines, builds and more from...

40
Emerging
300 continuous-dems/fetchez

Fetchez is a lightweight, modular, and highly extendable Python framework...

40
Emerging