DataTalksClub/data-engineering-zoomcamp
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here 👇🏼
The curriculum covers the complete data engineering stack—from containerization and infrastructure-as-code (Docker, Terraform, GCP) through workflow orchestration (Kestra), data warehousing (BigQuery), analytics engineering (dbt), and streaming systems (Kafka, KSQL)—with hands-on modules using industry tools like Apache Spark, dlt for data ingestion, and Bruin for end-to-end pipelines. Students build a real-world final project with peer review, reinforcing concepts across batch processing, partitioning strategies, schema management, and deployment to cloud platforms. The course assumes only basic coding and SQL knowledge, making it accessible while maintaining production-grade rigor through integration with modern data platforms.
39,193 stars. Actively maintained with 4 commits in the last 30 days.
Stars
39,193
Forks
7,884
Language
Jupyter Notebook
License
—
Category
Last pushed
Mar 19, 2026
Commits (30d)
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/data-engineering/DataTalksClub/data-engineering-zoomcamp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
PrefectHQ/prefect
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
dagster-io/dagster
An orchestration platform for the development, production, and observation of data assets.
dlt-hub/dlt
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
growthbook/growthbook
Open Source Feature Flags, Experimentation, and Product Analytics
pathwaycom/pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.