Dimensional modeling

Dimensional modeling, also known as the Star Schema approach, is a set of techniques and concepts used in data warehouse design proposed by Ralph Kimball.

Curved lines on a blue background.

A dimensional model is more of a conceptual model than a logical, or physical data model, because it is applicable to any physical form, such as a relational or multidimensional database. Star and snowflake schemas are the most commonly used schema types for implementing a dimensional model in a relational database. The Snowflake schema normalizes the dimension tables, while the star schema denormalizes the dimension tables. The transition from dimensional to logical, or physical data model is straightforward and should follow well-defined guidelines.

The architecture contains the following main layers:

  • Operational Data Store (ODS): for each data source, the needed data is stored without transformation. The data warehouse is then based on the operational data stores.
  • Data warehouse: based on the source data (source system or ODS based on source system) and transforms the data in a snowflake schema (core: facts and entities) and a star schema (mart: fact tables and dimensions)
  • Data mart: a structure / access pattern specific to data warehouse environments, used to retrieve client-facing data.
  • Test project: used to create test cases on your other projects ODS, data warehouse, data vault etc.

Modeling strategy

A dimensional model is driven by well-defined and known analytical requirements. Simplification is a valid approach when designing a star schema, but it reduces information and requires certain steps in planning. One drawback is that it cannot answer previously unconsidered analytical questions. Designing a dimensional model has always been a very difficult trade-off between simplicity and flexibility of analysis As a result, the gap between the model's analytical capabilities and future analytical requirements increases as the system becomes more complex.

ETL process

The complexity of the ETL process to the (dimensional) core model depends on several factors, including the number and complexity of source systems, the complexity of integration and business transformation procedures. Loading dimension tables is usually easier than loading fact tables because there are fewer source tables to consider.

ETL complexity for fact tables is often very high, typically requiring transformations and consolidations of many source tables. Star schemas are completely different from typical OLTP data models, so restructuring the data model can be very complex and often requires multiple steps and intermediate tables are required.

Historization

Different SCD types can be implemented to meet different historization requirements. The most common types are:

  • SCD1 (old data being overwritten with new data), and
  • SCD2 (new record is created for each change)

The SCD2 dimension table has two additional timestamps that indicate record validity. While the relational and data vault model approaches assign a surrogate key to each entity, the star/snowflake schema assigns a surrogate key to each new version of the entity. Surrogate keys also contain information about validity information, so you can join fact and dimension tables without considering the timestamp attribute. As a result, join operations are more intuitive to users and handled more efficiently by the database. Having different primary keys for each version has the drawback of the so-called "ripple effect". Each new version of the parent "triggers" a new version in all children, potentially leading to an "explosion" of the children.

Further reading

What is data modeling?

Future-proof your data with biGENIUS-X today.

Accelerate and automate your analytical data workflow with comprehensive features that biGENIUS-X offers.