DataTalksClub/data-engineering-zoomcamp

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼

/ 100

Established

The curriculum covers the complete data engineering stack—from containerization and infrastructure-as-code (Docker, Terraform, GCP) through workflow orchestration (Kestra), data warehousing (BigQuery), analytics engineering (dbt), and streaming systems (Kafka, KSQL)—with hands-on modules using industry tools like Apache Spark, dlt for data ingestion, and Bruin for end-to-end pipelines. Students build a real-world final project with peer review, reinforcing concepts across batch processing, partitioning strategies, schema management, and deployment to cloud platforms. The course assumes only basic coding and SQL knowledge, making it accessible while maintaining production-grade rigor through integration with modern data platforms.

39,193 stars. Actively maintained with 4 commits in the last 30 days.

No License No Package No Dependents

Maintenance 16 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 25 / 25

How are scores calculated?

Stars

39,193

Forks

7,884

Language

Jupyter Notebook

License

—

Related tools

PrefectHQ/prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

dlt-hub/dlt

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

growthbook/growthbook

Open Source Feature Flags, Experimentation, and Product Analytics

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

Explore Data Engineering Tools

All categories Trending Data Engineering directory Insights