Spark Hadoop ML Pipelines ML Frameworks
Distributed machine learning frameworks and tools built on Apache Spark, Hadoop, or similar big data processing systems for large-scale data processing. Does NOT include standalone ML libraries, REST API wrappers without distributed computation, or Spring Boot microservices without core data processing components.
There are 83 spark hadoop ml pipelines frameworks tracked. 3 score above 50 (established tier). The highest-rated is Angel-ML/angel at 57/100 with 6,785 stars.
Get all 83 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=spark-hadoop-ml-pipelines&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
Angel-ML/angel
A Flexible and Powerful Parameter Server for large-scale machine learning |
|
Established |
| 2 |
lensacom/sparkit-learn
PySpark + Scikit-learn = Sparkit-learn |
|
Established |
| 3 |
alibaba/Alink
Alink is the Machine Learning algorithm platform based on Flink, developed... |
|
Established |
| 4 |
databricks/spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Spark |
|
Emerging |
| 5 |
OryxProject/oryx
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time... |
|
Emerging |
| 6 |
mahmoudparsian/data-algorithms-book
MapReduce, Spark, Java, and Scala for Data Algorithms Book |
|
Emerging |
| 7 |
kaiwaehner/kafka-streams-machine-learning-examples
This project contains examples which demonstrate how to deploy analytic... |
|
Emerging |
| 8 |
jadianes/spark-py-notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine... |
|
Emerging |
| 9 |
tirthajyoti/Spark-with-Python
Fundamentals of Spark with Python (using PySpark), code examples |
|
Emerging |
| 10 |
endymecy/spark-ml-source-analysis
spark ml 算法原理剖析以及具体的源码实现分析 |
|
Emerging |
| 11 |
MingChen0919/learning-apache-spark
Notes on Apache Spark (pyspark) |
|
Emerging |
| 12 |
flink-extended/dl-on-flink
Deep Learning on Flink aims to integrate Flink and deep learning frameworks... |
|
Emerging |
| 13 |
ShifuML/shifu
An end-to-end machine learning and data mining framework on Hadoop |
|
Emerging |
| 14 |
kaiwaehner/ksql-udf-deep-learning-mqtt-iot
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data |
|
Emerging |
| 15 |
apache/flink-ml
Machine learning library of Apache Flink |
|
Emerging |
| 16 |
romain-e-lacoste/sparklen
A statistical learning toolkit for high-dimensional Hawkes processes in Python |
|
Emerging |
| 17 |
TodoEconometria/ejercicios-bigdata
Complete Big Data course with Python (230h) — SQLite to Kafka to TensorFlow.... |
|
Emerging |
| 18 |
kanyun-inc/ytk-learn
Ytk-learn is a distributed machine learning library which implements most of... |
|
Emerging |
| 19 |
kaiwaehner/tensorflow-serving-java-grpc-kafka-streams
Kafka Streams + Java + gRPC + TensorFlow Serving => Stream Processing... |
|
Emerging |
| 20 |
sparkling-graph/sparkling-graph
SparklingGraph provides easy to use set of features that will give you... |
|
Emerging |
| 21 |
ShifuML/guagua
An iterative computing framework for both Hadoop MapReduce and Hadoop YARN. |
|
Emerging |
| 22 |
kaiwaehner/ksql-fork-with-deep-learning-function
Deep Learning UDF for KSQL, the Streaming SQL Engine for Apache Kafka with... |
|
Emerging |
| 23 |
siddhi-io/siddhi-execution-streamingml
Extension that performs streaming machine learning on event streams |
|
Emerging |
| 24 |
XYWENJIE/spring-ai-extension
An extension of Spring AI that supports Alibaba Cloud’s dashscope... |
|
Emerging |
| 25 |
SAP-samples/hana-apl-apis-runtimes
Code examples for SAP HANA Automated Predictive Library (APL). It provides... |
|
Emerging |
| 26 |
sbl-sdsc/mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein... |
|
Emerging |
| 27 |
viadee/bpmn.ai
Machine learning around business processes |
|
Emerging |
| 28 |
shalini0528/big-data-weather-analysis
Big Data weather analysis using Hadoop MapReduce, Apache Hive, Apache Spark,... |
|
Emerging |
| 29 |
feedzai/feedzai-openml
API for Feedzai's Open Machine Learning that allows to integrate ML... |
|
Emerging |
| 30 |
microsoft/masc
Microsoft's contributions for Spark with Apache Accumulo |
|
Emerging |
| 31 |
arminmoin/ML-Quadrat
ML-Quadrat (ML2) is a Model-Driven Software Engineering (MDSE) tool with... |
|
Emerging |
| 32 |
siddhi-io/siddhi-execution-tensorflow
Extension that adds support for inferences from pre-built TensorFlow SavedModels |
|
Emerging |
| 33 |
adventure-island/springboot-deepar-template
A Java(SpringBoot) template for Java and AWS SageMaker DeepAR model endpoint... |
|
Experimental |
| 34 |
jiumao-org/we-mall
A lightweigh mall, simple and esay. |
|
Experimental |
| 35 |
comet-ml/comet-java-sdk
Comet Java SDK |
|
Experimental |
| 36 |
predictiveworks/cdap-spark
A wrapper for Apache Spark to make machine & deep learning available in... |
|
Experimental |
| 37 |
AlanBinu007/AI_Big-Data_Data-Engineering_and_Distributions
Here we created some projects using Kafka, AI , Data virtualization and... |
|
Experimental |
| 38 |
iaja/scalaLDAvis
Scala-Spark port of https://github.com/bmabey/pyLDAvis for Apache Spark LDA... |
|
Experimental |
| 39 |
mikeroyal/Apache-Spark-Guide
Apache Spark Guide |
|
Experimental |
| 40 |
alipay/jpmml-sparkml-lightgbm
JPMML-SparkML plugin for converting LightGBM-Spark models to PMML |
|
Experimental |
| 41 |
rhinempi/sparkhit
sparkhit - analyzing large scale genomic data on the cloud |
|
Experimental |
| 42 |
IPVS-AS/MMP-Backend
A Model Management Platform (MMP) for Industry 4.0 Environments (Backend) |
|
Experimental |
| 43 |
almo/Machine-Learning
Machine Learning snippets and use cases. |
|
Experimental |
| 44 |
manuparra/taller_SparkR
Taller SparkR para las Jornadas de Usuarios de R |
|
Experimental |
| 45 |
chen0040/java-machine-learning-web-api
A simple machine learning web server that caters for small datasets |
|
Experimental |
| 46 |
AxaFrance/spring-ai-workshop
Exploring interactions with LLMs : Practical insights with Spring AI |
|
Experimental |
| 47 |
perguard/pg-streaming-performance-data
Data collection, feature engineering and machine learning of performance traces |
|
Experimental |
| 48 |
nicolaskrier/spring-ai-examples
Spring AI Examples |
|
Experimental |
| 49 |
AvaAvarai/Java-Parallel-Coordinates-Vis
Java Parallel Coordinates Visualization Tool, to visualize... |
|
Experimental |
| 50 |
senx/warp10-ext-pmml
WarpScript™ PMML Extension |
|
Experimental |
| 51 |
AmrrSalem/Pyspark-Local
Portable self-contained PySpark 3.5 environment for Big Data coursework,... |
|
Experimental |
| 52 |
galafis/spark-kafka-ml-training-pipeline
Distributed ML training pipeline with Spark processing, Kafka ingestion and... |
|
Experimental |
| 53 |
dhchenx/Catla-HS
Catla for Hadoop and Spark (Catla-HS): An open-source system to support... |
|
Experimental |
| 54 |
zzzzz1st/predictorML
Machine learning and prediction service for Niagara NX platform. |
|
Experimental |
| 55 |
maengsanha/bigdata
KMU CS Hot Topics in Big Data |
|
Experimental |
| 56 |
DeathReaper0965/distributed-deeplearning
End to End Distributed Deep Learning Engine, works both with Streaming and... |
|
Experimental |
| 57 |
pneff93/Kafka-R-Realtime-Prediction
This tutorial explains how a machine learning model is applied on real-time data |
|
Experimental |
| 58 |
siddhi-io/siddhi-gpl-execution-pmml
Siddhi extension to evaluate Predictive Model Markup Language (PMML). |
|
Experimental |
| 59 |
nickozoulis/thunderstorm
Investigating the trade-offs of low latency responses over quality when... |
|
Experimental |
| 60 |
kriss024/Spark
Spark for Data Science and ETL process. |
|
Experimental |
| 61 |
neerajkesav/SparkMLJavaExamples
Apache Spark Machine Learning - Java Examples |
|
Experimental |
| 62 |
Mazennaji/ai-intelligence-platform-java-ml
An all-in-one Java Machine Learning platform integrating fraud detection,... |
|
Experimental |
| 63 |
iamirmasoud/pyspark_tutorials
Machine Learning for Big Data using PySpark with real-world projects |
|
Experimental |
| 64 |
Sowdeshwar-99/noise-aware-ml-pipeline
Noise-aware ML pipeline for large-scale agricultural yield prediction using... |
|
Experimental |
| 65 |
TravelXML/APACHE-SPARK-PYSPARK-DATABRICKS-MACHINE-LEARNING-MLIB
Apache Spark Machine Learning project using MLlib and Linear Regression on... |
|
Experimental |
| 66 |
hevc15hamza/pyspark-airfoil-noise-prediction
Predict airfoil self-noise using PySpark with an end-to-end machine learning... |
|
Experimental |
| 67 |
Sishant123/scala-m9k
🚀 Streamline big data processing with Scala and M9K, enhancing performance... |
|
Experimental |
| 68 |
Swapnil-2596/scala-aba
🚀 Transform Scala code into efficient, scalable applications with scala-aba,... |
|
Experimental |
| 69 |
aengusmartindonaire/pyspark-ml-pipeline
PySpark ML classification pipelines for NLP, clinical prediction, and census... |
|
Experimental |
| 70 |
mn-cs/fineweb-spark
FineWeb-Edu dataset analysis using Apache Spark - DSC 232R group project |
|
Experimental |
| 71 |
agoda-com/spark-hpopt
Bayesian hyperparamter tuning for Spark MLLib |
|
Experimental |
| 72 |
MinLee0210/kafka-learning
Learning how to use Kafka |
|
Experimental |
| 73 |
rtybase/pmml-microservice
A toy Phishing Classification Service using PMML for demo purposes |
|
Experimental |
| 74 |
adil-faiyaz98/accelerated-spark-gpu
This repository demonstrates how to significantly accelerate Apache Spark 3... |
|
Experimental |
| 75 |
shakha-de/mnist-java-microservice
Spring Boot Micorservice for MNIST |
|
Experimental |
| 76 |
alikemalocalan/Spark-API
Apache Spark Recommendation/Machine Learning Api Service |
|
Experimental |
| 77 |
MehdiBukhari/oak
A Scalable Concurrent Key-Value Map for Big Data Analytics |
|
Experimental |
| 78 |
Chih-Ling-Hsu/Spark-Machine-Learning-Modules
Machine Learning Modules of Spark MLlib |
|
Experimental |
| 79 |
sivasurya681/PySpark
PySpark-Roadmap is an 18-day structured learning journey that takes you from... |
|
Experimental |
| 80 |
hinzy97/spark-dynamic-executor-time-prediction
Neural Network Models for Predicting Execution Time with Dynamic Executor... |
|
Experimental |
| 81 |
FadilAdz/praktikumBigData
Repository ini berisi rangkaian praktikum Big Data yang mencakup penyimpanan... |
|
Experimental |
| 82 |
daugraph/ParameterServer
Parameter Server using Java |
|
Experimental |
| 83 |
GPalfy/socialnetworkcomments
:memo: Text Data Analysis & Machine Learning on supermarket's Social... |
|
Experimental |