Medallion
architecture: Data platform strategy and best practices for managing Bronze,
Silver and Gold
Bronze
layer:
The
bronze layer is usually a reservoir that stores data in its natural and
original state
·
Maintains
the raw state of the data source in the structure “as-is”.
·
Data is
immutable (read-only).
·
Can be any
combination of streaming and batch transactions.
Silver
layer:
The
Silver layer provides a refined structure over data that has been ingested. It
represents a validated, enriched version of our data that can be trusted for
downstream workloads, both operational and analytical. Silver layer
characteristics:
·
Uses data
quality rules for validating and processing data.
·
Typically
contains only functional data. So, technical data or irrelevant data from
Bronze is filtered out.
·
Historization
is usually applied by merging all data. Data is processed using slowly changing
dimensions (SCD)
·
Data is
stored in an efficient storage format; preferably Delta, alternatively Parquet.
·
Handles
missing data, standardizes clean or empty fields.
·
Data is
often cluttered around certain subject areas.
·
Data
is often still source-system aligned and organized.
Gold
layer:
In a
Lakehouse architecture, the Gold layer houses data that is structured in
“project-specific” databases, making it readily available for consumption. Uses
denormalized and read-optimized data model with fewer joins, such as a
Kimball-style star schema, depending on specific use cases. Gold layer characteristics:
·
Gold
tables represent data that has been transformed for consumption or use cases.
·
Data is
stored in an efficient storage format, preferably Delta.
·
Gold can
be a selection or aggregation of data that’s found in Silver.
·
In Gold
you apply complex business rules. So, it uses many post-processing activities,
calculations, enrichments, use-case specific optimizations, etc.
·
Data is
highly governed and well-documented.