I'm into Qlik Sense for about a year now, as a consultant, and there are still some aspects about the tool that yields me some suspicion. I'm evangelized to say that Qlik Sense works as an ETL tool, but I hear a lot that just because it can work as one, it doesn't mean it should be used like one.
I never really understood what exactly should be done upstream QS. Which business logic are we talking about?
The architecture of QS says to extract all tables into QVDs, then perform all the transformations and associations and apply all the business rules upon those QVDs. So, if I have lots of relationships between tables in the database, like:
t1.id = t2.id
and t3.categoryId = 10
and t3. = t1.Id
and (t1.statusId = 2 or t1.statusId = 3)
and t1.id = t4.id;
In QS, according with the best practices, I should extract tables 1, 2, 3 e 4 each one to a QVD (one QVD for each table), and then perform all the WHERE clauses within the QS script, though joins and whatever.
But, all the WHERE clauses are business rules, right? Shouldn't they be done in the database? If I'd need this information in other system or tool, I would need to replicate these rules in that new system.
And in the case of validation, isn't this schema worse if I want to perform data validation? If I need to trace the origin of a problem, I'd need to validate each business rule applied in QS script, and that could be cumbersome.
I'm really just looking for some testimonies from professionals who have great experience working with QS as an ETL tool. What should and shouldn't be done in the QS script?
So, database views: is that normally the best approach even when we are dealing with millions and millions of records on a daily basis?
What are the practical differences between creating a view in the database and extracting the tables one by one and then apply all the required associations in Qlik script? And whose decision should this be? The Qlik developer or the client/stakeholder DBA? (database administrator)
Looking forward for more testimonies of Qlik professionals regarding this topic!
If the query tables will only by used in that specific query
No other model can reuse the underlying table data
Structure of the final result won't change
Then pushing the calculation to the database server will likely yield the best overall performance.
If however you are still in the early development phase (where your table structures change regularly) or the raw tables will be reused within your environment, then dumping the raw tables to QVDs first is best. This saves resources (cpu/memory/network) on the database server side, while allowing you flexibility to transform and reuse existing data as needed.
You will find that you end up with a mixture of using database views and dumping the raw tables to qvd's before doing your own transformations. It is about optimizing the use of your resources. Eliminate duplication and performing the calculations where you have spare cpu/memory.
Sticking to the normal 3 stage process is a good rule of thumb, but there will always be exceptions.
Sometimes you have competent people on the data source side who knows exactly how to structure the queries to give you reliable data. In that case you are lucky and put reliance on the DBA to generate the necessary views, or to provide info on the transformations required. In other cases the people on the data source side aren't willing or able to provide your with the necessary views/queries. In that case you end up having to figure things out yourself which means you end up doing almost all the transformations within qlik script.
That was a very clear explanation. Thank you for sharing your experience 🙂
So far I've concluded this: It is very rare to start a project with some business rules defined, because typically the client wants to see EVERYTHING, or at least wants the possibility to see everything, so he won't filter out anything at a database level. Which means we always end up dumping the raw tables to QVDs first and then we figure things out for ourselves, as you say.
This is what prompt me to ask what normally is the approach taken by a consultant. I can see now that is not a right/wrong answer.