Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
profilejamesbond

Data source, data models and testing

Hi,

When do you get the requirements then how do you map or connect the data sources ? How do you know what fields need to be included into the Key_Field.

 

For example: you get 10 tables without any description and you have to do it by yourself to find the relation and bring the KPIs from that data described by the stakeholder ? How do you do it ? do you have any magical tool to analyze the data and their granularity level (keys)?

 

What about testing ? how do you do the testing ? Do you do A/B Testing or there is a way to perform automated testing ? For example Extraction have base data and then you apply transformation after transformation the data change then goes to GUI then how do you verify that data ? For example: Sales in the extraction is 100€ after that you perform transformation for with logic. Such as there are 10 products belongs to category 1 and category 2. Category 1 has 10% commission while category 2 has 5% commission. In this case when transformation apply with strange logics the results are not 100 it is different. The question is how to verify them in most critical applications because there are millions of products with differents scenario and different results ?

 

Thanks

Labels (5)
1 Solution

Accepted Solutions
marcus_sommer

No magical tool will replace in any way the experience of a (BI) developer. I think in many cases such approaches wouldn't save efforts - even if there are some results in the pattern-regocnition - because too easily it hints in a wrong direction. The usual reflex is going further and trying to fix anything and not to consider the expedience of the approach and then within the following also to discard the current state and to start in the beginning again.

If no description is available it's a kind of reverse engineering - of course possible but with limitations especially to the quality of the results and the capability to say when there might be some one. If no description also means that there is (nearly) no existing reporting against the own results could be compared - the entire testing would have be done from the end-users. Because there would be no chance to check if the raw-data are complete and correct respectively which of them should be as such regarded.

Nevertheless it mustn't be so hard. Just starting to load 1000 records from n tables qualified and looking at them in a few table/list-boxes will quickly give insights which table contained which kind of data and how they might be merged/associated. It's quite likely that in one or two days and maybe 100 loads you have the majority of knowledge about the data-set and the needed data-model.

View solution in original post

7 Replies
profilejamesbond
Author

still waiting for expert answer...

henrikalmen
Specialist II

There's not one simple answer, every case is different. Finding the key fields if column names seem random can be a case of visually examining columns contents to see where there are equal values, and then make tests. Not all tables at once, start with trying to link two tables and when you have figured out the key field (or key fields) and have linked those tables, then try to find another table that can be linked to one of those tables you have already linked.

Verifying data also depends on the specific situation. Perhaps the source system has some kind of exportable report with aggregations, that you can load into qlik where you do the same aggregations, and you create a flag field with 1 if result is equal and 0 if it differs - if you have zeroes, you have done something wrong.

marcus_sommer

No magical tool will replace in any way the experience of a (BI) developer. I think in many cases such approaches wouldn't save efforts - even if there are some results in the pattern-regocnition - because too easily it hints in a wrong direction. The usual reflex is going further and trying to fix anything and not to consider the expedience of the approach and then within the following also to discard the current state and to start in the beginning again.

If no description is available it's a kind of reverse engineering - of course possible but with limitations especially to the quality of the results and the capability to say when there might be some one. If no description also means that there is (nearly) no existing reporting against the own results could be compared - the entire testing would have be done from the end-users. Because there would be no chance to check if the raw-data are complete and correct respectively which of them should be as such regarded.

Nevertheless it mustn't be so hard. Just starting to load 1000 records from n tables qualified and looking at them in a few table/list-boxes will quickly give insights which table contained which kind of data and how they might be merged/associated. It's quite likely that in one or two days and maybe 100 loads you have the majority of knowledge about the data-set and the needed data-model.

profilejamesbond
Author

Hi @marcus_sommer,

That is the point reverse engineering with trade of data quality.

Thanks

profilejamesbond
Author

Hi @henrikalmen,

For now, I am doing exactly the same thing with spending 80% of the time to find what is what ?

Thanks

henrikalmen
Specialist II

Reaching out to the system owner to ask for guidance often helps as well.

profilejamesbond
Author

Hi @henrikalmen,

That is the first thing I would like to do. Unfortunately, high employee turnover rate lead me nowhere.