Facts and Dimensions in Data Warehousing

When developing Data Warehouses -and in extend Business Intelligence solutions- there are several standards which we can follow to make some tasks “easier” -or if you prefer, streamline some processes. To be more specific when we want to model our Data Warehouse, Dimensional modeling would be our go-to technique.

Dimensional modeling is widely accepted as the preferred way to present analytic data because it addresses two simultaneous requirements:

  • Deliver data that’s understandable to the business users
  • Deliver fast query performance

With this approach we capture how a business process is measured. The elements that capture measurements are called facts and the elements that provide context for those measurements are called dimensions. Facts and dimensions will have their respective tables in our database schema and if we put this in relational context this is called a Star Schema. Please note that a Star Schema is the most simple version we can find in this context. Typically we will have a Constellation Schema (several Star Schema in the same database, sharing dimensions). In this post we will examine how we can identify measurements and context and how we can transition them to a dimensional schema.

Measurements and Context

Both measurements and context, or if you rather facts and dimensions, can be identified via several ways. The most common are statements, questions and report specifications where the two first can be found in verbal speech as well. Let us check some examples on those.

  • What are gross margins by product category for January?
  • What is the average account balance by education level?
  • How many sick days were taken by marketing employees last year?

Each of these questions centers on a business process: sales, account management, attendance, etc. What is important here is that these process-centric questions do not focus on individual activities or transactions. To be able to answer them we need to access a group of transactions. On the top of that each of this questions reveals how a business process is measured. Studying sales for instance, requires the measurement of gross margin. Financial institutions require measurements of account balances. Measuring the number of absences is common practice in human resources.

In the previous questions, we can also identify the context in which each business process is measured. Gross margin is viewed in the context of product categories and time (month).  Sick days are viewed in the context of a department (marketing) and time (last year).

Regarding the report specifications, the identification of measurements and contexts should be easier. We should receive something like: “I want to be able to view -drill up/down- quantity sold, cost, order totals in euro, per product and category”. This report will include as measurements the quantity, the cost and the order totals in euro, and as context the category and the product.

Measurement and context, are the foundation of dimensional design. Every dimensional solution describes a process by capturing what is measured and the context in which the measurements are evaluated.

Facts and Dimensions

Let us take the previously mentioned report specification and work with it from now. In the end the deliverable will look something like:

Report sample

In this order report, we have several measurements for each product and category. Product and category are dimensions -and we can use those to sort the data as well.

Elements that are aggregated, summarized, or subtotaled are facts. It is obvious in the report image above that the facts cost and order total are contained. These numbers are being specified at various levels, including products, categories, and a grand total. Check again and you can notice another fact. Although it is not being subtotaled, quantity is being aggregated. In this case, it looks like quantity ordered by product is being aggregated from individual orders.

Assume now that for those orders the recipients -the customers- are companies and there is a sales man that does the business between the client and the Lorem Ipsum Ltd. We should be able to roll up the mentioned facts according to a lot of other dimensions. Overall this roll up map should look something like:

  • Facts: Quantity, Cost, Order totals
  • Dimensions: Product, Category, Date (Month, Year, Fiscal Year etc.), Customer (Zip Code, Area, City etc.) Sales Man and probably a lot more.

So we should be able for example to ask for quantity for a specific product sold in a specific city. Or for the order totals per sales man per quarter.

FactsDimensions
Quantity SoldProduct
Cost Category
Order TotalProduct Description
Month of Order
Quarter of Order
Year of Order
Sales person
Customer
Region
...
Each of the facts listed in the column on the left may be rolled up according to the dimension values in the right-hand column

Grouping Facts and Dimensions

When we conclude on the above we should be able now to proceed with grouping the relevant dimensions together. This will also be an early draft for our database schema.
Again, using the earlier example we would conclude on a grouping that will look something like:

Grouping Dimensions and Facts

As you can see we have added some more dimensions here. The list of dimensions can be expanded a lot more depending on the requirements. Having this grouping it is easy to transition this logic to a database schema.

Early Database Schema Design

The above is a Star Schema. Examining this schema design -even in this premature stage- we can identify some a couple of advantages.

  • Simplicity: Small number of tables and joins
  • Process oriented: It is the result of asking questions to discover which business processes needs to be measured, how are they described and how are they measured 

That was it for an introduction to Facts and Dimensions -or Facts and Dimensions 101 if you may- and in an extend dimensional design-modelling. Overall the concept can be pretty overwhelming for newcomers and the goals of this post are limited to understand what is a fact, what is a dimensions and how we can identify them.