Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I'm currently developing a data model for a new dashboard I'm building and wanted to pick the communities brain on an approach for data modeling multiple versions of similar data. The issue I'm running into is that these multiple versions of the same type data will need to be compared within the dashboard, so I can't really create different dashboards for each version. Each month we take a snapshot from our transactional DB and store it in a data warehouse with a maximum storage time of 12 months. What we end up with is a DB with a version of all sales that were booked in October but extracted from the transaction DB in November, a second version of sales that were booked in October but extracted from the transaction DB in December and so on. The reasoning for keeping these versions is that past months data is revisable so it keeps multiple copies of a particular month for comparision. The dashboard is supposed to give the user the option to select a specific month and see the data that was extracted in that specific month.
What I've come up with so far is to concatenate the data that was extracted each month into corresponding tables then I use a field within each table to distinguish the version (so Sales would have multiple versions October data). However, to do this, I have to create unique fields on each table (e.g. the sales table would have a field ExtractedIn that contains the month name, and similar unique fields for all other tables). There has to be a better way to do this though, does anyone have any suggestions?
Kevin - No problem at all. Thinking about your problem can you post some sample data? It will be easy to work with actual data while doing data modelling. How about creating a Hash or Autonumber key which will be combination of ProjectID & Unique Identifier in (LineItems) table. This way you can still have one table and if you select ProjectID you will see all the values for SalesAmt in the same table because the combination of both the unique keys in two tables represent the data at the most granular level. So you are meeting both the requirements.
I hope this makes sense. Please let me know your thoughts or post some sample data.
Cheers,
DV
Kevin,
My thoughts...
I think your approach is correct but I didn't follow the last bit. I'd concatenate all the tables into one long table and there is one field as a flag which marks the version. If you have large volumes of data then I'd recommend using both Set Analysis and Flag approach with one large fact table. Because in QlikView it's more about number of columns/fieds rather than tables or rows. If you store the same fields together as one field then you the repeated values are just stored once instead of multiple times in different values. QlikView is storage works based on Cardinality of the data. Here is my understanding of Internal Storage Structure..
The data records are read into the memory, so that all the processing of data may be done through memory. I am sure you know this bit. QlikView treats all the data as Data Element Type (Columns / Fields) and Data Element Values (Values / Records). So each different data element value of each data element type is assigned a binary code and the data records are stored in binary-coded form and they are also sorted. By using the binary coding, very quick searches can be done on the tables. Also, QlikView removes the redundant information information and reduces the amount of data. However, the redundant information in stored as seperately with the frequencies for each unique data element value and across each data element type. When user makes a selection on data element values then the implied selection (possible values) are kept track seperately to present them to the user. By this process QlikView can perform rapid linear searches.
Good luck!
Cheers,
DV
Deepak, I really appreciate the response. Is there a reason that concatenating into one table would be more beneficial here? Would I lose some of the associative powers that I'd typically have under a normal format? I'm worried that if I try to put everything into one table, either I'll be concatenating the data which may lead to some problems with how the data is linked up or I'll kill my load script performance by trying to join data that wouldn't typically be contained within the same table. Is there a downside to using identifiers on each table that are qualified with the table name to distiguish them from other identifiers? I realize that when performing set analysis, it would force me to have to get the set from multiple places but I don't see this as a big enough issue to have to go to a one-table structure.
You're welcome Kevin.
My understanding about the data is you have multiple versions of similar data. Each version is used to compare with the previous version. Here are my thoughts...
Example Expressions:
Current Version = SUM(<1{ MyFlag = {$(=Max(MyFlag))}>} Sales)
Previous Version = SUM(<1{ MyFlag = {$(=Max(MyFlag)-1)}>} Sales)
I hope this all makes sense! Please feel to add anything and we can discuss this further.
Cheers,
DV
Deepak, sorry, that was a misunderstanding on my part. I thought you were saying create one big table but now I understand that you were saying store anything that could be versioned in a centralized table. I'm already working towards a centralized fact table solution. The issue I'm running into is that I essentially have two levels of detail, see the simplified example below. The problem I'm running into is that if I want to create a central fact table where I select a ProjectID and it will give me associated SalesAmts for that project, I would have to join the two tables (both are big tables so it would eat a ton of script time). If I concatenate here, then I lose that association because selecting a projectID will only show associated EntryIDs, not the next level of SalesAmt. So what I've ended up doing is creating two flags, which I agree with you is pretty clunky. Do you have any ideas for an alternative?
Project:
Load
ProjectID,
EntryID,
OtherProjectLevelInfo,
ProjectFlag
From ProjectExtract;
LineItems:
Load
EntryID,
SalesAmt,
LineItemFlag
From LineItemsExtract;
Kevin - No problem at all. Thinking about your problem can you post some sample data? It will be easy to work with actual data while doing data modelling. How about creating a Hash or Autonumber key which will be combination of ProjectID & Unique Identifier in (LineItems) table. This way you can still have one table and if you select ProjectID you will see all the values for SalesAmt in the same table because the combination of both the unique keys in two tables represent the data at the most granular level. So you are meeting both the requirements.
I hope this makes sense. Please let me know your thoughts or post some sample data.
Cheers,
DV