learning-apache-spark and spark-py-notebooks

learning-apache-spark
51
Established
spark-py-notebooks
51
Established
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 25/25
Stars: 299
Forks: 186
Downloads:
Commits (30d): 0
Language: HTML
License: MIT
Stars: 1,663
Forks: 911
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License:
Stale 6m No Package No Dependents
Stale 6m No Package No Dependents

About learning-apache-spark

MingChen0919/learning-apache-spark

Notes on Apache Spark (pyspark)

These notes help data professionals understand how to process and analyze very large datasets efficiently using Apache Spark. They cover common data manipulation and analysis tasks, showing how to transform raw data into actionable insights or cleaned datasets ready for further use. Data engineers, data scientists, and analysts working with big data will find this resource useful.

big-data-processing data-engineering data-analysis data-science large-scale-etl

About spark-py-notebooks

jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

This project provides step-by-step guides using Jupyter notebooks to help data scientists and big data engineers learn how to analyze large datasets and build machine learning models with Apache Spark and Python. It takes raw data, like network interaction logs, and shows you how to process, explore, and build predictive models for tasks such as anomaly detection or recommendation engines. This is for professionals who need to work with massive datasets and leverage Spark's distributed computing power.

Big Data Analysis Machine Learning Data Science Training Distributed Computing Predictive Modeling

Scores updated daily from GitHub, PyPI, and npm data. How scores work