Handling dates in BI projects and Data Warehouses

One of the biggest challenges -if not the biggest- when we are building ETL pipelines, either for BI projects either for any project that has any integration requirements, is the unification of data. Overall this will be a domain-specific task. Handling dates although, that are imported in our system from external or internal sources (e.g API), can be achieved with a more “unified” way for all our projects. Dates will -typically- be the backbone of our reporting and aggregations, thus it is for the best to tackle this issue at the beginning of the project.

In this post, we will explain how this can happen in PostgreSQL, but also note that the implementation for other relational databases is similar.

Date table creation

What we can do to make both the import (load) processes and the reporting easier is to create a table which will break down dates as much as possible – or based on the reporting requirements we may have. One way to create such a table can be found below.

CREATE TABLE code_examples.dim_date
(
  dim_date_id              INT PRIMARY KEY ,
  date_actual              DATE NOT NULL,
  epoch                    BIGINT NOT NULL,
  day_suffix               VARCHAR(4) NOT NULL,
  day_name                 VARCHAR(9) NOT NULL,
  day_of_week              INT NOT NULL,
  day_of_week_iso          INT NOT NULL,
  day_of_month             INT NOT NULL,
  day_of_quarter           INT NOT NULL,
  day_of_year              INT NOT NULL,
  week_of_month            INT NOT NULL,
  week_of_year             INT NOT NULL,
  week_of_year_iso         CHAR(10) NOT NULL,
  month_actual             INT NOT NULL,
  month_name               VARCHAR(9) NOT NULL,
  month_name_abbreviated   CHAR(3) NOT NULL,
  quarter_actual           INT NOT NULL,
  quarter_name             VARCHAR(9) NOT NULL,
  year_actual              INT NOT NULL,
  year_actual_iso          INT NOT NULL,
  first_day_of_week        DATE NOT NULL,
  last_day_of_week         DATE NOT NULL,
  first_day_of_month       DATE NOT NULL,
  last_day_of_month        DATE NOT NULL,
  first_day_of_quarter     DATE NOT NULL,
  last_day_of_quarter      DATE NOT NULL,
  first_day_of_year        DATE NOT NULL,
  last_day_of_year         DATE NOT NULL,
  mmyyyy                   CHAR(6) NOT NULL,
  mmddyyyy                 CHAR(10) NOT NULL,
  weekend_indr             BOOLEAN NOT NULL
);

The columns are self-explanatory, just note that as the primary key we will set yyyymmdd date format cast to an integer. We have used the bi convention for the naming of the table in this example (dim from dimension).
Filling this table will allow us long term to make aggregations based on several aspects of the date (e.g weeks, quarters etc).

Loading the date table

One of the advantages of this approach is that we can insert the data automatically in the table for any given date range. This range will always depend on our requirements but for our example will use a 10 years range for the initial loading. We can always perform a similar insertion if we need more records to describe dates in the future (or in the past) .

INSERT INTO code_examples.dim_date
SELECT TO_CHAR(datum,'yyyymmdd')::INT AS dim_date_id,
       datum AS date_actual,
       EXTRACT(epoch FROM datum) AS epoch,
       TO_CHAR(datum,'fmDDth') AS day_suffix,
       TO_CHAR(datum,'Day') AS day_name,
       EXTRACT(dow FROM datum) + 1 AS day_of_week,
       EXTRACT(isodow FROM datum) AS day_of_week_iso,
       EXTRACT(DAY FROM datum) AS day_of_month,
       datum - DATE_TRUNC('quarter',datum)::DATE +1 AS day_of_quarter,
       EXTRACT(doy FROM datum) AS day_of_year,
       TO_CHAR(datum,'W')::INT AS week_of_month,
       EXTRACT(week FROM datum) AS week_of_year,
       TO_CHAR(datum,'YYYY"-W"IW') AS week_of_year_iso,
       EXTRACT(MONTH FROM datum) AS month_actual,
       TO_CHAR(datum,'Month') AS month_name,
       TO_CHAR(datum,'Mon') AS month_name_abbreviated,
       EXTRACT(quarter FROM datum) AS quarter_actual,
       CASE
         WHEN EXTRACT(quarter FROM datum) = 1 THEN 'First'
         WHEN EXTRACT(quarter FROM datum) = 2 THEN 'Second'
         WHEN EXTRACT(quarter FROM datum) = 3 THEN 'Third'
         WHEN EXTRACT(quarter FROM datum) = 4 THEN 'Fourth'
       END AS quarter_name,
       EXTRACT(year FROM datum) AS year_actual,
       EXTRACT(isoyear FROM datum) AS year_actual_iso,
       datum +(1 -EXTRACT(isodow FROM datum))::INT AS first_day_of_week,
       datum +(7 -EXTRACT(isodow FROM datum))::INT AS last_day_of_week,
       datum +(1 -EXTRACT(DAY FROM datum))::INT AS first_day_of_month,
       (DATE_TRUNC('MONTH',datum) +INTERVAL '1 MONTH - 1 day')::DATE AS last_day_of_month,
       DATE_TRUNC('quarter',datum)::DATE AS first_day_of_quarter,
       (DATE_TRUNC('quarter',datum) +INTERVAL '3 MONTH - 1 day')::DATE AS last_day_of_quarter,
       TO_DATE(EXTRACT(isoyear FROM datum) || '-01-01','YYYY-MM-DD') AS first_day_of_year,
       TO_DATE(EXTRACT(isoyear FROM datum) || '-12-31','YYYY-MM-DD') AS last_day_of_year,
       TO_CHAR(datum,'mmyyyy') AS mmyyyy,
       TO_CHAR(datum,'mmddyyyy') AS mmddyyyy,
       CASE
         WHEN EXTRACT(isodow FROM datum) IN (6,7) THEN TRUE
         ELSE FALSE
       END AS weekend_indr
FROM (SELECT '2018-01-01'::DATE+ SEQUENCE.DAY AS datum
      FROM GENERATE_SERIES (0,3650) AS SEQUENCE (DAY)
      GROUP BY SEQUENCE.DAY) DQ
ORDER BY 1;

We are utilizing many build-in date methods/methods from PostgreSQL in order to fill our columns with what we want. Again, keep in mind this could work with any relational database since they all support such methods. In the FROM clause, we are generating our series. The starting point is 2018-01-01. In our example and we are instructing SQL to generate a sequence which will start from this (2018-01-01) and up to 3650 increments -ten years.

Outcome

Now let us check what the table is looking like after executing the previous script. Below you can find the first ten records of the created date_dim table.

dim_date_iddate_actualepochday_suffixday_nameday_of_weekday_of_week_isoday_of_monthday_of_quarterday_of_yearweek_of_monthweek_of_yearweek_of_year_isomonth_actualmonth_namemonth_name_abbreviatedquarter_actualquarter_nameyear_actualyear_actual_isofirst_day_of_weeklast_day_of_weekfirst_day_of_monthlast_day_of_monthfirst_day_of_quarterlast_day_of_quarterfirst_day_of_yearlast_day_of_yearmmyyyymmddyyyyweekend_indr
201801012018-01-0115147648001stMonday21111112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801012018false
201801022018-01-0215148512002ndTuesday32222112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801022018false
201801032018-01-0315149376003rdWednesday43333112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801032018false
201801042018-01-0415150240004thThursday54444112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801042018false
201801052018-01-0515151104005thFriday65555112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801052018false
201801062018-01-0615151968006thSaturday76666112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801062018true
201801072018-01-0715152832007thSunday17777112018-W011JanuaryJan1First201820182018-01-012018-01-072018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801072018true
201801082018-01-0815153696008thMonday21888222018-W021JanuaryJan1First201820182018-01-082018-01-142018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801082018false
201801092018-01-0915154560009thTuesday32999222018-W021JanuaryJan1First201820182018-01-082018-01-142018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801092018false
201801102018-01-10151554240010thWednesday43101010222018-W021JanuaryJan1First201820182018-01-082018-01-142018-01-012018-01-312018-01-012018-03-312018-01-012018-12-3101201801102018false

Usage Examples

Let us assume, we have utilized this approach while we were building a data warehouse about Sales Reporting.

This is a typical star schema where we are using, amongst others, the dimdate as a dimension. We would like to create some data marts on the top of it for our end users so they will be able to create their reports or do their data analysis.

CREATE VIEW dw.salesmart AS 
  SELECT s.order_id, s.quantity, s.extended_price_amount, s.discount_amount, s.sold_amount,
          c.company_name, c.contact_name, c.contact_title, c.customer_city, c.customer_country,
          c.customer_region, c.customer_postal_code, e.employee_name, e.employee_title,
          p.product_name, p.discontinued, p.category_name,
          d.*
  FROM dw.factsales s
  JOIN dw.dimcustomer c USING (customer_key)
  JOIN dw.dimemployee e USING (employee_key)
  JOIN dw.dimproduct p USING (product_key)
  JOIN dw.dimdate d on d.date_dim_id = s.order_date_key
;

Notice how we are taking all the columns from the date dimension (line 6). With this, we have created a view ready for the users to work on it and we have also provided extended functionality on what they are able to do with the dates. Below you can see some sample records from this data mart.

order_idquantityextended_price_amountdiscount_amountsold_amountcompany_namecontact_namecontact_titlecustomer_citycustomer_countrycustomer_regioncustomer_postal_codeemployee_nameemployee_titleproduct_namediscontinuedcategory_namedate_dim_iddate_actualepochday_suffixday_nameday_of_weekday_of_week_isoday_of_monthday_of_quarterday_of_yearweek_of_monthweek_of_yearweek_of_year_isomonth_actualmonth_namemonth_name_abbreviatedquarter_actualquarter_nameyear_actualyear_actual_isofirst_day_of_weeklast_day_of_weekfirst_day_of_monthlast_day_of_monthfirst_day_of_quarterlast_day_of_quarterfirst_day_of_yearlast_day_of_yearmmyyyymmddyyyyweekend_indr
10248121680168Vins et alcools ChevalierPaul HenriotAccounting ManagerReimsFrancen/a51100StevenBuchananSales ManagerQueso CabralesNDairy Products199607041996-07-048364384004thThursday54441861271996-W27-47JulyJul3Third199619961996-07-011996-07-071996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607041996false
102481098.00000190734863098.00000190734863Vins et alcools ChevalierPaul HenriotAccounting ManagerReimsFrancen/a51100StevenBuchananSales ManagerSingaporean Hokkien Fried MeeYGrains/Cereals199607041996-07-048364384004thThursday54441861271996-W27-47JulyJul3Third199619961996-07-011996-07-071996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607041996false
102485173.999996185302730173.99999618530273Vins et alcools ChevalierPaul HenriotAccounting ManagerReimsFrancen/a51100StevenBuchananSales ManagerMozzarella di GiovanniNDairy Products199607041996-07-048364384004thThursday54441861271996-W27-47JulyJul3Third199619961996-07-011996-07-071996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607041996false
102499167.400003433227540167.40000343322754Toms SpezialitätenKarin JosephsMarketing ManagerMünsterGermanyn/a44087MichaelSuyamaSales RepresentativeTofuNProduce199607051996-07-058365248005thFriday65551871271996-W27-57JulyJul3Third199619961996-07-011996-07-071996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607051996false
10249401696.000061035156201696.0000610351562Toms SpezialitätenKarin JosephsMarketing ManagerMünsterGermanyn/a44087MichaelSuyamaSales RepresentativeManjimup Dried ApplesNProduce199607051996-07-058365248005thFriday65551871271996-W27-57JulyJul3Third199619961996-07-011996-07-071996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607051996false
102501076.99999809265137076.99999809265137Hanari CarnesMario PontesAccounting ManagerRio de JaneiroBrazilRJ05454-876MargaretPeacockSales RepresentativeJack's New England Clam ChowderNSeafood199607081996-07-088367840008thMonday21881902281996-W28-17JulyJul3Third199619961996-07-081996-07-141996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607081996false
10250351484.0000534057617222.600016856193861261.4000365495679Hanari CarnesMario PontesAccounting ManagerRio de JaneiroBrazilRJ05454-876MargaretPeacockSales RepresentativeManjimup Dried ApplesNProduce199607081996-07-088367840008thMonday21881902281996-W28-17JulyJul3Third199619961996-07-081996-07-141996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607081996false
1025015251.999988555908237.79999978542321214.199988770485Hanari CarnesMario PontesAccounting ManagerRio de JaneiroBrazilRJ05454-876MargaretPeacockSales RepresentativeLouisiana Fiery Hot Pepper SauceNCondiments199607081996-07-088367840008thMonday21881902281996-W28-17JulyJul3Third199619961996-07-081996-07-141996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607081996false
102516100.799995422363285.03999984622001395.75999557614327Victuailles en stockMary SaveleySales AgentLyonFrancen/a69004JanetLeverlingSales RepresentativeGustaf's KnäckebrödNGrains/Cereals199607081996-07-088367840008thMonday21881902281996-W28-17JulyJul3Third199619961996-07-081996-07-141996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607081996false
1025115234.000005722045911.700000460445885222.3000052616Victuailles en stockMary SaveleySales AgentLyonFrancen/a69004JanetLeverlingSales RepresentativeRavioli AngeloNGrains/Cereals199607081996-07-088367840008thMonday21881902281996-W28-17JulyJul3Third199619961996-07-081996-07-141996-07-011996-07-311996-07-011996-09-301996-01-011996-12-3107199607081996false

Now the end-user will be able to perform aggregations and filtering based on anything that the date dimension includes. You can find some examples below.

Group by quarter name

SELECT quarter_name, count(order_id) as orders, sum(quantity) as quantity, 
       sum(round(sold_amount::numeric,2)) as sold_amount
FROM dw.salesmart
GROUP BY quarter_name;
quarter_nameordersquantitysold_amount
Fourth52912890310036.83
Second41910114259371.92
First69316949436780.46
Third44110166233666.3

Group by day name

SELECT day_name, count(order_id) as orders, sum(quantity) as quantity,
  	   sum(round(sold_amount::numeric,2)) as sold_amount
FROM dw.salesmart
GROUP BY day_name;
day_nameordersquantitysold_amount
Monday4119851256526.81
Thursday4319907234587.76
Tuesday4079569246522.28
Friday42610517268888.19
Wednesday40710275233330.47

Group By Week of the Year for the day Monday only

SELECT count(order_id) as orders, sum(quantity) as quantity, 
       sum(round(sold_amount::numeric,2)) as sold_amount,
       week_of_year
FROM dw.salesmart
WHERE day_of_week = 2
GROUP BY week_of_year
ordersquantitysold_amountweek_of_the_year
83308475.261
81912830.852
51061907.83
92205074.734
101856576.25
924318282.976
93549551.267
918313288.58
82214481.939
91863092.1910
81602532.1711
132646219.6712
112173481.413
154129475.8314
174096384.8515
72038472.916
132174511.217
92367511.218
71573101.2519
61282687.920
824912138.2221
224240.122
51432944.423
230488.724
26589025
6991407.4526
4942831.2527
112554120.1628
4862906.329
71644693.730
83989911.631
72343297.2832
91664178.1533
71262496.634
81222665.735
558838.2336
5101306437
7821693.7738
91262963.0139
4591089.140
102897060.641
81774362.542
428238.343
102797661.2544
82295204.345
81756577.1846
51402161.647
91823433.0848
51205307.449
112434423.8750
102939980.6751
133937318.2852

As mentioned already, those examples are for PostgreSQL but you can do this in any relational database which you will use as a data warehouse in a BI project. It is highly likely that you will need to adjust a bit the CREATE script but this should give you overall a good starting point. Last keep in mind that this approach could work for simpler/smaller projects as well, but perhaps it could be over-engineering and it should be discussed within the team if this approach fits your needs.