. visual analytics agile business intelligence analytics bi big data bi vendors bob becker business analyst business intelligence business performance charts chip felton cio client success cloud bi conforming dashboards data governance data science data warehouse dimensional modelling etl fact tables functional requirements gartner hadoop health hr dashboards hris human resources information visualisation master data management microsoft mobile bi montage pervasive bi ralph kimball rational decision making process roadmap self service bi stephen frew tableau tableau software tableau version 9.0 trends visualisation whitepaper yellowfin
Browse by Date
"Montage was fantastic in terms of adjusting as we went by. It was a shared risk, shared reward kind of project"
ETL in Depth with Ralph Kimball and Bob Becker
A thoughtful observer’s point of view: ETL in Depth with Ralph Kimball and Bob Becker
Spending four days on a course with the father of dimensional modelling and a thought leader in Business Intelligence is, as you can imagine, any BI consultant’s dream!
The course started with Ralph’s overview of his career back to his involvement with PARC in the days when the icon and mouse were designed, through to Red Brick and then the Kimball Group. His fascination with how people and computers interact has driven the passion and has always been at the heart of his career.
He summed up the personality requirements for a BI professional as:
- Someone who is interested in the business world, how businesses work and discovering business processes.
- Someone who is pretty good with technology. Not necessarily a computer scientist or a programmer; but more a thoughtful observer.
- Someone who is interested in people. Why people interact with computers, what they need from it and how they prefer to interact with it.
Thinking about his summary of the ideal candidate, it reminded me of one of Montage’s principal methodologies around the BI process and what we aim to achieve with our clients: bringing together people, process and technology to deliver successful BI solutions.
This being an ‘in depth ETL course’, Ralph continued delving into the depths of ETL highlighting the ultimate goal of ETL as “speed and simplicity to the end user”. This was a short and sweet one liner, but says so much about the process. What we want to achieve from a business perspective is ease of use, and fast query retrieval. Thus if the actual process we as ETL professionals have to go through to achieve this is hard, that’s ok. It’s almost “supposed to be that way”. Else there would be no need for the professionals and all users could spend their days and night in Excel to answer their questions. But alas, we soldier on.
I’ve experienced a few scenarios where someone (client and/or project manager) throws a requirement at me as to exactly what they would like in a fact table, and 9 times out of 10 that is due to their reporting requirements. During the course it was great to get the reconfirmation that that ‘fact tables have nothing to do with how a client expects to see the output of data”. Fact tables serve one purpose only, ‘to represent a single measureable event and therefore belong in the back room’.
Having said that Ralph introduced a 5th type of fact table which filled in a gap where I’ve always kind of experienced one: a Consolidated Fact table which aggregates the underlying Fact and joins Dimensions as required to simplify reporting. Thus you’re able to keep your underlying dimensional model clean (i.e. only properly structured Transaction, Periodic Snapshot and Accumulating Snapshot Fact tables in your star schemas) and stick to the ultimate goal of ETL: “speed and simplicity to the end user” without making the report writers sweat too much. If you’re wondering about the 4th type of Fact table, the answer is “Factless Fact”.
Another new idea around dimensional design was the concept of a Fact Surrogate Key. Up to this point I’ve always included Surrogate Keys in the Dimension tables only and used a composite key in the Fact table. Ralph introduced the idea of adding in an (auto-increment) Fact Surrogate Key into Fact tables. Some advantages which helped convince me of the usefulness:
- Easy identification of a single fact record. Use a single column value instead of a combination of a number of Dimension Surrogate Keys to identify a single fact record. Mostly for back room and testing purposes.
- Allows you to easily resume or back out of a failed job.
- Using a Fact Surrogate Key you can also now insert and update records in a Fact table without any primary key violations
- Use the Fact Surrogate Key to serve as the parent in a parent-child relationship scheme.
After 2 days with Ralph, Bob Becker took over and stepped through Dimension and Fact tables in even more detail. This was the really in depth part and we got into the technicalities without getting too technical, a perfect mix. The course is very detailed without focussing on any specific software vendor. Bob included some examples of different software vendors which was great to see. I mainly use the Microsoft BI stack and WhereScape for ETL, so seeing other vendors’ products was great exposure for a consultant.
One of the most important things I took from the days spent on dimension architecture is the importance of conforming. You can only really have success in your design if you conform on all levels. Thus not only conforming Dimensions, but also Hierarchies, Attributes and Relationships. One version of the truth can only be achieved if that’s what the underlying data portrays.
At the end of the 4 day course I felt inspired, my ETL passion re-ignited and ready to take on the BI world and transform all Data Warehouses into fast and efficient data cleansing machines.
It’s refreshing to realise the Kimball Group comprises only 6 people in the US, yet their influence spans across the globe into so many companies, data models and warehouses. Meeting these people was a truly inspirational experience.
ETL Architecture in Depth
17 – 20 September 2012