Spark Hadoop Ml Pipelines Data Engineering Tools

There are 14 spark hadoop ml pipelines tools tracked. 9 score above 50 (established tier). The highest-rated is knime/knime-core at 64/100 with 772 stars. 2 of the top 10 are actively maintained.

Get all 14 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=spark-hadoop-ml-pipelines&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 knime/knime-core

KNIME Analytics Platform

64
Established
2 jtablesaw/tablesaw

Java dataframe and visualization library

60
Established
3 evinism/mistql

A query / expression language for performing computations on JSON-like...

59
Established
4 apache/wayang

Apache Wayang is the first cross-platform data processing system.

57
Established
5 RumbleDB/rumble

Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for...

55
Established
6 sparklyr/sparklyr

R interface for Apache Spark

54
Established
7 quixio/quix-streams

Python Streaming DataFrames for Kafka

52
Established
8 dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

52
Established
9 h2oai/sparkling-water

Sparkling Water provides H2O functionality inside Spark cluster

50
Established
10 byzer-org/byzer-lang

Byzer (former MLSQL): A low-code open-source programming language for data...

44
Emerging
11 mc2-project/opaque-sql

An encrypted data analytics platform

42
Emerging
12 viadee/camunda-kafka-polling-client

Stream your process history to Kafka

29
Experimental
13 Smart-Shaped/chaM3Leon

By Smart Shaped s.r.l. (https://www.smartshaped.com/)

28
Experimental
14 aymane-maghouti/Big-Data-Project

This project aims to predict smartphone prices using a combination of batch...

20
Experimental