End-to-end Data Lakehouse project built on Databricks, following the Medallion Architecture (Bronze, Silver, Gold). Covers real-world data engineering and analytics workflows using Spark, PySpark, SQL ...
Every data team has the same nightmare. You get a CSV from a stakeholder. It has: Dates in 3 different formats Missing customer IDs on 20% of rows A price of £999.99 that should be £9.99 Column names ...