Spark Hadoop Ml Pipelines Data Engineering Tools
There are 14 spark hadoop ml pipelines tools tracked. 9 score above 50 (established tier). The highest-rated is knime/knime-core at 64/100 with 772 stars. 2 of the top 10 are actively maintained.
Get all 14 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=data-engineering&subcategory=spark-hadoop-ml-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
knime/knime-core
KNIME Analytics Platform |
|
Established |
| 2 |
jtablesaw/tablesaw
Java dataframe and visualization library |
|
Established |
| 3 |
evinism/mistql
A query / expression language for performing computations on JSON-like... |
|
Established |
| 4 |
apache/wayang
Apache Wayang is the first cross-platform data processing system. |
|
Established |
| 5 |
RumbleDB/rumble
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for... |
|
Established |
| 6 |
sparklyr/sparklyr
R interface for Apache Spark |
|
Established |
| 7 |
quixio/quix-streams
Python Streaming DataFrames for Kafka |
|
Established |
| 8 |
dotnet/spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers. |
|
Established |
| 9 |
h2oai/sparkling-water
Sparkling Water provides H2O functionality inside Spark cluster |
|
Established |
| 10 |
byzer-org/byzer-lang
Byzer (former MLSQL): A low-code open-source programming language for data... |
|
Emerging |
| 11 |
mc2-project/opaque-sql
An encrypted data analytics platform |
|
Emerging |
| 12 |
viadee/camunda-kafka-polling-client
Stream your process history to Kafka |
|
Experimental |
| 13 |
Smart-Shaped/chaM3Leon
By Smart Shaped s.r.l. (https://www.smartshaped.com/) |
|
Experimental |
| 14 |
aymane-maghouti/Big-Data-Project
This project aims to predict smartphone prices using a combination of batch... |
|
Experimental |