All Data Engineering Tools
517 tools ranked by quality score · Page 3 of 6
| # | Tool | Score | Tier |
|---|---|---|---|
| 201 |
microsoft/unified-data-foundation-with-fabric-solution-accelerator
Unified Data Foundation with Microsoft Fabric with Options to Integrate with... |
|
Emerging |
| 202 |
odpi/egeria-docs
Documentation repository for the Egeria project. |
|
Emerging |
| 203 |
tuanx18/data-engineer-portfolio
This is a repository to demonstrate my details, skills, projects and to keep... |
|
Emerging |
| 204 |
sql-machine-learning/sqlflow
Brings SQL and AI together. |
|
Emerging |
| 205 |
19-84/redd-archiver
A PostgreSQL-backed archive generator that creates browsable HTML archives... |
|
Emerging |
| 206 |
MTSWebServices/onetl
One ETL tool to rule them all |
|
Emerging |
| 207 |
turbot/steampipe-sqlite
Steampipe SQLite is a zero-ETL engine for SQLite. Virtual tables translate... |
|
Emerging |
| 208 |
J0SAL/Decentralized-Expense-Tracker
Tracking Expenses Securely |
|
Emerging |
| 209 |
DawnbrandBots/yaml-yugi
A machine-readable, human-editable database of the Yu-Gi-Oh! Trading Card... |
|
Emerging |
| 210 |
buildersoftio/cortex
Cortex | Data Framework—a cutting-edge SDK that simplifies real-time data... |
|
Emerging |
| 211 |
feathr-ai/feathr
Feathr – A scalable, unified data and AI engineering platform for enterprise |
|
Emerging |
| 212 |
GitBrincie212/ChronoGrapher
Powerful, developer-experience centric, blazingly fast and extensible job... |
|
Emerging |
| 213 |
turbot/steampipe-plugin-jira
Use SQL to instantly query Jira. Open source CLI. No DB required. |
|
Emerging |
| 214 |
dagster-io/dagster-open-platform
Dagster Labs' open-source data platform, built with Dagster. |
|
Emerging |
| 215 |
NeaByteLab/IDX-API
Indonesian Stock Exchange API wrapper for trading data integration. |
|
Emerging |
| 216 |
wp-labs/warp-parse
Focusing on building industry-leading ETL engines. |
|
Emerging |
| 217 |
monarch-initiative/koza
Data transformation framework for LinkML data models |
|
Emerging |
| 218 |
rpsft/etlbox
A lightweight ETL (extract, transform, load) library and data integration... |
|
Emerging |
| 219 |
FrigadeHQ/trench
Trench — Open-Source Analytics Infrastructure. A single production-ready... |
|
Emerging |
| 220 |
leftkats/awesome-greek-tech-jobs
A comprehensive map of companies that hire for tech jobs in Greece. |
|
Emerging |
| 221 |
mattlianje/etl4s
Powerful, whiteboard-style ETL |
|
Emerging |
| 222 |
GovHub-br/gov-hub
GovHub - Transformando Dados em Valor para Gestão Pública |
|
Emerging |
| 223 |
BlazingDB/blazingsql
BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built... |
|
Emerging |
| 224 |
MLD3/FIDDLE
FlexIble Data-Driven pipeLinE – a preprocessing pipeline that transforms... |
|
Emerging |
| 225 |
alexei-led/spotinfo
CLI for exploring AWS EC2 Spot inventory. Inspect AWS Spot instance types,... |
|
Emerging |
| 226 |
trustedshops-public/schema2pyarrow
Converts AsyncApi and JsonSchema to PyArrow schema |
|
Emerging |
| 227 |
opensnowcat/opensnowcat-enrich
OpenSnowcat Enricher (Apache 2.0 License) |
|
Emerging |
| 228 |
treeverse/charts
Helm charts |
|
Emerging |
| 229 |
CategoricalData/CQL
Categorical Query Language IDE |
|
Emerging |
| 230 |
turbot/steampipe-plugin-slack
Use SQL to instantly query users, channels, emoji and more from your Slack... |
|
Emerging |
| 231 |
pretzelai/pretzelai
The modern replacement for Jupyter Notebooks |
|
Emerging |
| 232 |
turbot/steampipe-plugin-azuread
Use SQL to instantly query groups, service principals, users and more from... |
|
Emerging |
| 233 |
weld-project/weld
High-performance runtime for data analytics applications |
|
Emerging |
| 234 |
thadhutch/sports-quant
End-to-end NFL data pipeline that scrapes PFF grades and Pro Football... |
|
Emerging |
| 235 |
Bruno-Furtado/cloud-cnpj
Ingestão, preparação e disponibilização gratuita de dados de CNPJs de... |
|
Emerging |
| 236 |
skale-me/skale
High performance distributed data processing engine |
|
Emerging |
| 237 |
scribe-org/Scribe-Server
Backend service for Scribe data downloads |
|
Emerging |
| 238 |
melvynator/ELK_twitter
This is a data pipeline for Twitter (ETL) using the elastic stack... |
|
Emerging |
| 239 |
rocketlaunchr/dataframe-go
DataFrames for Go: For statistics, machine-learning, and data... |
|
Emerging |
| 240 |
turbot/steampipe-plugin-cloudflare
Use SQL to instantly query accounts, zones and more from Cloudflare. Open... |
|
Emerging |
| 241 |
turbot/steampipe-plugin-net
Use SQL to instantly query DNS records, certificates and other network... |
|
Emerging |
| 242 |
GovHub-br/data-application-gov-hub
Pipeline de Dados do Gov-Hub |
|
Emerging |
| 243 |
turbot/steampipe-plugin-googleworkspace
Use SQL to instantly query calendar events, drive files, gmail messages, and... |
|
Emerging |
| 244 |
AltimateAI/altimate-code
Opensource agentic data engineering harness for dbt, SQL, and cloud... |
|
Emerging |
| 245 |
wgzhao/addax-admin
Addax Admin is a web-based management console for Addax ETL jobs, offering... |
|
Emerging |
| 246 |
orchest/orchest
Build data pipelines, the easy way 🛠️ |
|
Emerging |
| 247 |
alexhraber/flowhawk
Real-time eBPF-powered network security monitor with AI-driven threat... |
|
Emerging |
| 248 |
realdatadriven/etlx
ETL / ELT Framework powered by DuckDB, designed to seamlessly integrate and... |
|
Emerging |
| 249 |
DataKitchen/dataops-observability
DataOps Observability is part of DataKitchen's Open Source Data... |
|
Emerging |
| 250 |
fal-ai/dbt-fal
do more with dbt. dbt-fal helps you run Python alongside dbt, so you can... |
|
Emerging |
| 251 |
bywwcnll/StreamPanel
Stream Panel 是一个 Chrome DevTools 扩展,允许开发者实时监控和检查流式请求。它支持 服务器发送事件 (SSE) 和 基于... |
|
Emerging |
| 252 |
ContextData/VectorETL
Build super simple end-to-end data & ETL pipelines for your vector databases... |
|
Emerging |
| 253 |
hiero-hackers/analytics
Stay up to date with hiero organisation activity and contributor diversity |
|
Emerging |
| 254 |
turbot/steampipe-plugin-salesforce
Use SQL to instantly query Salesforce resources. Open source CLI. No DB required. |
|
Emerging |
| 255 |
turbot/steampipe-plugin-stripe
Use SQL to instantly query customers, products, invoices and more from... |
|
Emerging |
| 256 |
turbot/steampipe-plugin-zendesk
Use SQL to instantly query Zendesk. Open source CLI. No DB required. |
|
Emerging |
| 257 |
turbot/steampipe-plugin-datadog
Use SQL to instantly query Datadog resources across accounts. Open source... |
|
Emerging |
| 258 |
hbz/lobid-resources
Transformation, web frontend, and API for the hbz catalog as LOD |
|
Emerging |
| 259 |
nightmarewalker/D-MemFS
In-process virtual filesystem with hard quota for Python |
|
Emerging |
| 260 |
GregoryKogan/yt-framework
Build scalable data pipelines on YTsaurus with automatic stage management,... |
|
Emerging |
| 261 |
turbot/steampipe-plugin-oci
Use SQL to instantly query Oracle Cloud resources across regions and... |
|
Emerging |
| 262 |
turbot/steampipe-plugin-prometheus
Use SQL to instantly query Prometheus metrics, alerts, labels and more. Open... |
|
Emerging |
| 263 |
yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with... |
|
Emerging |
| 264 |
turbot/steampipe-plugin-okta
Use SQL to instantly query users, groups, applications and more from Okta.... |
|
Emerging |
| 265 |
SpareCores/sc-crawler
Pull and standardize data on cloud compute resources. |
|
Emerging |
| 266 |
ludovicschmetz-stack/datavow
Open-source data contract enforcement — define, sync dbt, validate, block,... |
|
Emerging |
| 267 |
DataZooDE/flapi
API Framework heavily relying on the power of DuckDB and DuckDB extensions.... |
|
Emerging |
| 268 |
probcomp/bayeslite
BayesDB on SQLite. A Bayesian database table for querying the probable... |
|
Emerging |
| 269 |
bbossgroups/bboss-elastic-tran
bboss-datatran 由 bboss 开源的数据采集&流批一体化工具,提供数据采集、清洗转换处理以及流批一体化计算功能;... |
|
Emerging |
| 270 |
digitalghost-dev/poke-cli
A hybrid CLI/TUI tool written in Go for viewing Pokémon data from the... |
|
Emerging |
| 271 |
turbot/steampipe-plugin-csv
Use SQL to instantly query data from CSV files. Open source CLI. No DB required. |
|
Emerging |
| 272 |
turbot/steampipe-plugin-microsoft365
Use SQL to instantly query calendars, contacts, drives, mailboxes and more... |
|
Emerging |
| 273 |
databricks-industry-solutions/python-data-sources
Quality python data sources for pyspark 4.x |
|
Emerging |
| 274 |
mikevan666/opendataworks
opendataworks... |
|
Emerging |
| 275 |
irajhedayati/data-engineering
A set of Data Engineering tools online for public use |
|
Emerging |
| 276 |
turbot/steampipe-export
Steampipe Export is a zero-ETL CLI to fetch data from cloud services and... |
|
Emerging |
| 277 |
turbot/steampipe-plugin-rss
Use SQL to instantly query RSS channels and Atom Feeds. Open source CLI. No... |
|
Emerging |
| 278 |
synmetrix/synmetrix
Synmetrix – production-ready open source semantic layer on Cube |
|
Emerging |
| 279 |
turbot/steampipe-plugin-shodan
Use SQL to instantly query host, DNS and exploit information using Shodan.... |
|
Emerging |
| 280 |
AndreaBozzo/Ceres
Harvesting & Semantic search for open data portals |
|
Emerging |
| 281 |
zero-one-group/geni
A Clojure dataframe library that runs on Spark |
|
Emerging |
| 282 |
Edwardvaneechoud/pyfloe
A minimal zero dependency dataframe library |
|
Emerging |
| 283 |
markusbegerow/data-analytics-exercises
End-to-end data warehouse exercises for students - build a modern ELT... |
|
Emerging |
| 284 |
intel/hdk
A low-level execution library for analytic data processing. |
|
Emerging |
| 285 |
turbot/steampipe-plugin-mastodon
Use SQL to instantly query Mastodon resources. Open source CLI. No DB required. |
|
Emerging |
| 286 |
turbot/steampipe-plugin-reddit
Use SQL to instantly query Reddit posts, comments & more. Open source CLI.... |
|
Emerging |
| 287 |
AMPATH/etl-rest-server
This project hosts scripts to generate flat tables used for reporting purposes. |
|
Emerging |
| 288 |
datacompose/datacompose
Data Cleaning for Pyspark |
|
Emerging |
| 289 |
catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases |
|
Emerging |
| 290 |
turbot/steampipe-plugin-jenkins
Use SQL to instantly query Jenkins resources. Open source CLI. No DB required. |
|
Emerging |
| 291 |
turbot/steampipe-plugin-config
Use SQL to instantly query data from various types of config files. Open... |
|
Emerging |
| 292 |
ottogroup/koality
Library for data quality monitoring based on duckdb. |
|
Emerging |
| 293 |
ChrisDevRepo/vscode_data_lineage
VS Code extension for visualizing SQL Server database object dependencies... |
|
Emerging |
| 294 |
jordilin/gitar
Git all remotes. git cli tool that targets both Github and Gitlab |
|
Emerging |
| 295 |
Vetdatahub/VetDataHub
VetDataHub is an opensource veterinary datasets repository dedicated to... |
|
Emerging |
| 296 |
turbot/steampipe-plugin-googlesheets
Use SQL to instantly query spreadsheets, sheets, and cell data from Google... |
|
Emerging |
| 297 |
turbot/steampipe-plugin-hypothesis
Use SQL to instantly query Hypothesis resources. Open source CLI. No DB required. |
|
Emerging |
| 298 |
elastiflow/pipelines
A lightweight Go framework for building stateful, real-time data pipelines.... |
|
Emerging |
| 299 |
turbot/steampipe-plugin-circleci
Use SQL to instantly query projects, pipelines, builds and more from... |
|
Emerging |
| 300 |
continuous-dems/fetchez
Fetchez is a lightweight, modular, and highly extendable Python framework... |
|
Emerging |