One of the biggest challenges -if not the biggest- when we are building ETL pipelines, either for BI projects either for any project that has any integration requirements, is the unification of data. Overall this will be a domain-specific task. Handling dates although, that are imported in our system from external or internal sources (e.g API), can be achieved with a more “unified” way for all our projects. Dates will -typically- be the backbone of our reporting and aggregations, thus it is for the best to tackle this issue at the beginning of the project.
In this post, we will explain how this can happen in PostgreSQL, but also note that the implementation for other relational databases is similar.
Date table creation
What we can do to make both the import (load) processes and the reporting easier is to create a table which will break down dates as much as possible – or based on the reporting requirements we may have. One way to create such a table can be found below.
CREATE TABLE code_examples.dim_date ( dim_date_id INT PRIMARY KEY , date_actual DATE NOT NULL, epoch BIGINT NOT NULL, day_suffix VARCHAR(4) NOT NULL, day_name VARCHAR(9) NOT NULL, day_of_week INT NOT NULL, day_of_week_iso INT NOT NULL, day_of_month INT NOT NULL, day_of_quarter INT NOT NULL, day_of_year INT NOT NULL, week_of_month INT NOT NULL, week_of_year INT NOT NULL, week_of_year_iso CHAR(10) NOT NULL, month_actual INT NOT NULL, month_name VARCHAR(9) NOT NULL, month_name_abbreviated CHAR(3) NOT NULL, quarter_actual INT NOT NULL, quarter_name VARCHAR(9) NOT NULL, year_actual INT NOT NULL, year_actual_iso INT NOT NULL, first_day_of_week DATE NOT NULL, last_day_of_week DATE NOT NULL, first_day_of_month DATE NOT NULL, last_day_of_month DATE NOT NULL, first_day_of_quarter DATE NOT NULL, last_day_of_quarter DATE NOT NULL, first_day_of_year DATE NOT NULL, last_day_of_year DATE NOT NULL, mmyyyy CHAR(6) NOT NULL, mmddyyyy CHAR(10) NOT NULL, weekend_indr BOOLEAN NOT NULL );
The columns are self-explanatory, just note that as the primary key we will set yyyymmdd
date format cast to an integer. We have used the bi convention for the naming of the table in this example (dim from dimension).
Filling this table will allow us long term to make aggregations based on several aspects of the date (e.g weeks, quarters etc).
Loading the date table
One of the advantages of this approach is that we can insert the data automatically in the table for any given date range. This range will always depend on our requirements but for our example will use a 10 years range for the initial loading. We can always perform a similar insertion if we need more records to describe dates in the future (or in the past) .
INSERT INTO code_examples.dim_date SELECT TO_CHAR(datum,'yyyymmdd')::INT AS dim_date_id, datum AS date_actual, EXTRACT(epoch FROM datum) AS epoch, TO_CHAR(datum,'fmDDth') AS day_suffix, TO_CHAR(datum,'Day') AS day_name, EXTRACT(dow FROM datum) + 1 AS day_of_week, EXTRACT(isodow FROM datum) AS day_of_week_iso, EXTRACT(DAY FROM datum) AS day_of_month, datum - DATE_TRUNC('quarter',datum)::DATE +1 AS day_of_quarter, EXTRACT(doy FROM datum) AS day_of_year, TO_CHAR(datum,'W')::INT AS week_of_month, EXTRACT(week FROM datum) AS week_of_year, TO_CHAR(datum,'YYYY"-W"IW') AS week_of_year_iso, EXTRACT(MONTH FROM datum) AS month_actual, TO_CHAR(datum,'Month') AS month_name, TO_CHAR(datum,'Mon') AS month_name_abbreviated, EXTRACT(quarter FROM datum) AS quarter_actual, CASE WHEN EXTRACT(quarter FROM datum) = 1 THEN 'First' WHEN EXTRACT(quarter FROM datum) = 2 THEN 'Second' WHEN EXTRACT(quarter FROM datum) = 3 THEN 'Third' WHEN EXTRACT(quarter FROM datum) = 4 THEN 'Fourth' END AS quarter_name, EXTRACT(year FROM datum) AS year_actual, EXTRACT(isoyear FROM datum) AS year_actual_iso, datum +(1 -EXTRACT(isodow FROM datum))::INT AS first_day_of_week, datum +(7 -EXTRACT(isodow FROM datum))::INT AS last_day_of_week, datum +(1 -EXTRACT(DAY FROM datum))::INT AS first_day_of_month, (DATE_TRUNC('MONTH',datum) +INTERVAL '1 MONTH - 1 day')::DATE AS last_day_of_month, DATE_TRUNC('quarter',datum)::DATE AS first_day_of_quarter, (DATE_TRUNC('quarter',datum) +INTERVAL '3 MONTH - 1 day')::DATE AS last_day_of_quarter, TO_DATE(EXTRACT(isoyear FROM datum) || '-01-01','YYYY-MM-DD') AS first_day_of_year, TO_DATE(EXTRACT(isoyear FROM datum) || '-12-31','YYYY-MM-DD') AS last_day_of_year, TO_CHAR(datum,'mmyyyy') AS mmyyyy, TO_CHAR(datum,'mmddyyyy') AS mmddyyyy, CASE WHEN EXTRACT(isodow FROM datum) IN (6,7) THEN TRUE ELSE FALSE END AS weekend_indr FROM (SELECT '2018-01-01'::DATE+ SEQUENCE.DAY AS datum FROM GENERATE_SERIES (0,3650) AS SEQUENCE (DAY) GROUP BY SEQUENCE.DAY) DQ ORDER BY 1;
We are utilizing many build-in date methods/methods from PostgreSQL in order to fill our columns with what we want. Again, keep in mind this could work with any relational database since they all support such methods. In the FROM
clause, we are generating our series. The starting point is 2018-01-01. In our example and we are instructing SQL to generate a sequence which will start from this (2018-01-01) and up to 3650 increments -ten years.
Outcome
Now let us check what the table is looking like after executing the previous script. Below you can find the first ten records of the created date_dim table.
dim_date_id | date_actual | epoch | day_suffix | day_name | day_of_week | day_of_week_iso | day_of_month | day_of_quarter | day_of_year | week_of_month | week_of_year | week_of_year_iso | month_actual | month_name | month_name_abbreviated | quarter_actual | quarter_name | year_actual | year_actual_iso | first_day_of_week | last_day_of_week | first_day_of_month | last_day_of_month | first_day_of_quarter | last_day_of_quarter | first_day_of_year | last_day_of_year | mmyyyy | mmddyyyy | weekend_indr |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20180101 | 2018-01-01 | 1514764800 | 1st | Monday | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01012018 | false |
20180102 | 2018-01-02 | 1514851200 | 2nd | Tuesday | 3 | 2 | 2 | 2 | 2 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01022018 | false |
20180103 | 2018-01-03 | 1514937600 | 3rd | Wednesday | 4 | 3 | 3 | 3 | 3 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01032018 | false |
20180104 | 2018-01-04 | 1515024000 | 4th | Thursday | 5 | 4 | 4 | 4 | 4 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01042018 | false |
20180105 | 2018-01-05 | 1515110400 | 5th | Friday | 6 | 5 | 5 | 5 | 5 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01052018 | false |
20180106 | 2018-01-06 | 1515196800 | 6th | Saturday | 7 | 6 | 6 | 6 | 6 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01062018 | true |
20180107 | 2018-01-07 | 1515283200 | 7th | Sunday | 1 | 7 | 7 | 7 | 7 | 1 | 1 | 2018-W01 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-01 | 2018-01-07 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01072018 | true |
20180108 | 2018-01-08 | 1515369600 | 8th | Monday | 2 | 1 | 8 | 8 | 8 | 2 | 2 | 2018-W02 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-08 | 2018-01-14 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01082018 | false |
20180109 | 2018-01-09 | 1515456000 | 9th | Tuesday | 3 | 2 | 9 | 9 | 9 | 2 | 2 | 2018-W02 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-08 | 2018-01-14 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01092018 | false |
20180110 | 2018-01-10 | 1515542400 | 10th | Wednesday | 4 | 3 | 10 | 10 | 10 | 2 | 2 | 2018-W02 | 1 | January | Jan | 1 | First | 2018 | 2018 | 2018-01-08 | 2018-01-14 | 2018-01-01 | 2018-01-31 | 2018-01-01 | 2018-03-31 | 2018-01-01 | 2018-12-31 | 012018 | 01102018 | false |
Usage Examples
Let us assume, we have utilized this approach while we were building a data warehouse about Sales Reporting.
![](http://karolos.me/wp-content/uploads/2020/11/dw-959x1024.png)
This is a typical star schema where we are using, amongst others, the dimdate
as a dimension. We would like to create some data marts on the top of it for our end users so they will be able to create their reports or do their data analysis.
CREATE VIEW dw.salesmart AS SELECT s.order_id, s.quantity, s.extended_price_amount, s.discount_amount, s.sold_amount, c.company_name, c.contact_name, c.contact_title, c.customer_city, c.customer_country, c.customer_region, c.customer_postal_code, e.employee_name, e.employee_title, p.product_name, p.discontinued, p.category_name, d.* FROM dw.factsales s JOIN dw.dimcustomer c USING (customer_key) JOIN dw.dimemployee e USING (employee_key) JOIN dw.dimproduct p USING (product_key) JOIN dw.dimdate d on d.date_dim_id = s.order_date_key ;
Notice how we are taking all the columns from the date dimension (line 6). With this, we have created a view ready for the users to work on it and we have also provided extended functionality on what they are able to do with the dates. Below you can see some sample records from this data mart.
order_id | quantity | extended_price_amount | discount_amount | sold_amount | company_name | contact_name | contact_title | customer_city | customer_country | customer_region | customer_postal_code | employee_name | employee_title | product_name | discontinued | category_name | date_dim_id | date_actual | epoch | day_suffix | day_name | day_of_week | day_of_week_iso | day_of_month | day_of_quarter | day_of_year | week_of_month | week_of_year | week_of_year_iso | month_actual | month_name | month_name_abbreviated | quarter_actual | quarter_name | year_actual | year_actual_iso | first_day_of_week | last_day_of_week | first_day_of_month | last_day_of_month | first_day_of_quarter | last_day_of_quarter | first_day_of_year | last_day_of_year | mmyyyy | mmddyyyy | weekend_indr |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10248 | 12 | 168 | 0 | 168 | Vins et alcools Chevalier | Paul Henriot | Accounting Manager | Reims | France | n/a | 51100 | StevenBuchanan | Sales Manager | Queso Cabrales | N | Dairy Products | 19960704 | 1996-07-04 | 836438400 | 4th | Thursday | 5 | 4 | 4 | 4 | 186 | 1 | 27 | 1996-W27-4 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-01 | 1996-07-07 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07041996 | false |
10248 | 10 | 98.00000190734863 | 0 | 98.00000190734863 | Vins et alcools Chevalier | Paul Henriot | Accounting Manager | Reims | France | n/a | 51100 | StevenBuchanan | Sales Manager | Singaporean Hokkien Fried Mee | Y | Grains/Cereals | 19960704 | 1996-07-04 | 836438400 | 4th | Thursday | 5 | 4 | 4 | 4 | 186 | 1 | 27 | 1996-W27-4 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-01 | 1996-07-07 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07041996 | false |
10248 | 5 | 173.99999618530273 | 0 | 173.99999618530273 | Vins et alcools Chevalier | Paul Henriot | Accounting Manager | Reims | France | n/a | 51100 | StevenBuchanan | Sales Manager | Mozzarella di Giovanni | N | Dairy Products | 19960704 | 1996-07-04 | 836438400 | 4th | Thursday | 5 | 4 | 4 | 4 | 186 | 1 | 27 | 1996-W27-4 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-01 | 1996-07-07 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07041996 | false |
10249 | 9 | 167.40000343322754 | 0 | 167.40000343322754 | Toms Spezialitäten | Karin Josephs | Marketing Manager | Münster | Germany | n/a | 44087 | MichaelSuyama | Sales Representative | Tofu | N | Produce | 19960705 | 1996-07-05 | 836524800 | 5th | Friday | 6 | 5 | 5 | 5 | 187 | 1 | 27 | 1996-W27-5 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-01 | 1996-07-07 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07051996 | false |
10249 | 40 | 1696.0000610351562 | 0 | 1696.0000610351562 | Toms Spezialitäten | Karin Josephs | Marketing Manager | Münster | Germany | n/a | 44087 | MichaelSuyama | Sales Representative | Manjimup Dried Apples | N | Produce | 19960705 | 1996-07-05 | 836524800 | 5th | Friday | 6 | 5 | 5 | 5 | 187 | 1 | 27 | 1996-W27-5 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-01 | 1996-07-07 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07051996 | false |
10250 | 10 | 76.99999809265137 | 0 | 76.99999809265137 | Hanari Carnes | Mario Pontes | Accounting Manager | Rio de Janeiro | Brazil | RJ | 05454-876 | MargaretPeacock | Sales Representative | Jack's New England Clam Chowder | N | Seafood | 19960708 | 1996-07-08 | 836784000 | 8th | Monday | 2 | 1 | 8 | 8 | 190 | 2 | 28 | 1996-W28-1 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-08 | 1996-07-14 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07081996 | false |
10250 | 35 | 1484.0000534057617 | 222.60001685619386 | 1261.4000365495679 | Hanari Carnes | Mario Pontes | Accounting Manager | Rio de Janeiro | Brazil | RJ | 05454-876 | MargaretPeacock | Sales Representative | Manjimup Dried Apples | N | Produce | 19960708 | 1996-07-08 | 836784000 | 8th | Monday | 2 | 1 | 8 | 8 | 190 | 2 | 28 | 1996-W28-1 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-08 | 1996-07-14 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07081996 | false |
10250 | 15 | 251.9999885559082 | 37.79999978542321 | 214.199988770485 | Hanari Carnes | Mario Pontes | Accounting Manager | Rio de Janeiro | Brazil | RJ | 05454-876 | MargaretPeacock | Sales Representative | Louisiana Fiery Hot Pepper Sauce | N | Condiments | 19960708 | 1996-07-08 | 836784000 | 8th | Monday | 2 | 1 | 8 | 8 | 190 | 2 | 28 | 1996-W28-1 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-08 | 1996-07-14 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07081996 | false |
10251 | 6 | 100.79999542236328 | 5.039999846220013 | 95.75999557614327 | Victuailles en stock | Mary Saveley | Sales Agent | Lyon | France | n/a | 69004 | JanetLeverling | Sales Representative | Gustaf's Knäckebröd | N | Grains/Cereals | 19960708 | 1996-07-08 | 836784000 | 8th | Monday | 2 | 1 | 8 | 8 | 190 | 2 | 28 | 1996-W28-1 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-08 | 1996-07-14 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07081996 | false |
10251 | 15 | 234.0000057220459 | 11.700000460445885 | 222.3000052616 | Victuailles en stock | Mary Saveley | Sales Agent | Lyon | France | n/a | 69004 | JanetLeverling | Sales Representative | Ravioli Angelo | N | Grains/Cereals | 19960708 | 1996-07-08 | 836784000 | 8th | Monday | 2 | 1 | 8 | 8 | 190 | 2 | 28 | 1996-W28-1 | 7 | July | Jul | 3 | Third | 1996 | 1996 | 1996-07-08 | 1996-07-14 | 1996-07-01 | 1996-07-31 | 1996-07-01 | 1996-09-30 | 1996-01-01 | 1996-12-31 | 071996 | 07081996 | false |
Now the end-user will be able to perform aggregations and filtering based on anything that the date dimension includes. You can find some examples below.
Group by quarter name
SELECT quarter_name, count(order_id) as orders, sum(quantity) as quantity, sum(round(sold_amount::numeric,2)) as sold_amount FROM dw.salesmart GROUP BY quarter_name;
quarter_name | orders | quantity | sold_amount |
---|---|---|---|
Fourth | 529 | 12890 | 310036.83 |
Second | 419 | 10114 | 259371.92 |
First | 693 | 16949 | 436780.46 |
Third | 441 | 10166 | 233666.3 |
Group by day name
SELECT day_name, count(order_id) as orders, sum(quantity) as quantity, sum(round(sold_amount::numeric,2)) as sold_amount FROM dw.salesmart GROUP BY day_name;
day_name | orders | quantity | sold_amount |
---|---|---|---|
Monday | 411 | 9851 | 256526.81 |
Thursday | 431 | 9907 | 234587.76 |
Tuesday | 407 | 9569 | 246522.28 |
Friday | 426 | 10517 | 268888.19 |
Wednesday | 407 | 10275 | 233330.47 |
Group By Week of the Year for the day Monday only
SELECT count(order_id) as orders, sum(quantity) as quantity, sum(round(sold_amount::numeric,2)) as sold_amount, week_of_year FROM dw.salesmart WHERE day_of_week = 2 GROUP BY week_of_year
orders | quantity | sold_amount | week_of_the_year |
---|---|---|---|
8 | 330 | 8475.26 | 1 |
8 | 191 | 2830.85 | 2 |
5 | 106 | 1907.8 | 3 |
9 | 220 | 5074.73 | 4 |
10 | 185 | 6576.2 | 5 |
9 | 243 | 18282.97 | 6 |
9 | 354 | 9551.26 | 7 |
9 | 183 | 13288.5 | 8 |
8 | 221 | 4481.93 | 9 |
9 | 186 | 3092.19 | 10 |
8 | 160 | 2532.17 | 11 |
13 | 264 | 6219.67 | 12 |
11 | 217 | 3481.4 | 13 |
15 | 412 | 9475.83 | 14 |
17 | 409 | 6384.85 | 15 |
7 | 203 | 8472.9 | 16 |
13 | 217 | 4511.2 | 17 |
9 | 236 | 7511.2 | 18 |
7 | 157 | 3101.25 | 19 |
6 | 128 | 2687.9 | 20 |
8 | 249 | 12138.22 | 21 |
2 | 24 | 240.1 | 22 |
5 | 143 | 2944.4 | 23 |
2 | 30 | 488.7 | 24 |
2 | 65 | 890 | 25 |
6 | 99 | 1407.45 | 26 |
4 | 94 | 2831.25 | 27 |
11 | 255 | 4120.16 | 28 |
4 | 86 | 2906.3 | 29 |
7 | 164 | 4693.7 | 30 |
8 | 398 | 9911.6 | 31 |
7 | 234 | 3297.28 | 32 |
9 | 166 | 4178.15 | 33 |
7 | 126 | 2496.6 | 34 |
8 | 122 | 2665.7 | 35 |
5 | 58 | 838.23 | 36 |
5 | 101 | 3064 | 37 |
7 | 82 | 1693.77 | 38 |
9 | 126 | 2963.01 | 39 |
4 | 59 | 1089.1 | 40 |
10 | 289 | 7060.6 | 41 |
8 | 177 | 4362.5 | 42 |
4 | 28 | 238.3 | 43 |
10 | 279 | 7661.25 | 44 |
8 | 229 | 5204.3 | 45 |
8 | 175 | 6577.18 | 46 |
5 | 140 | 2161.6 | 47 |
9 | 182 | 3433.08 | 48 |
5 | 120 | 5307.4 | 49 |
11 | 243 | 4423.87 | 50 |
10 | 293 | 9980.67 | 51 |
13 | 393 | 7318.28 | 52 |
As mentioned already, those examples are for PostgreSQL but you can do this in any relational database which you will use as a data warehouse in a BI project. It is highly likely that you will need to adjust a bit the CREATE
script but this should give you overall a good starting point. Last keep in mind that this approach could work for simpler/smaller projects as well, but perhaps it could be over-engineering and it should be discussed within the team if this approach fits your needs.