Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Save an extra $150 Dec 1–7 with code CYBERWEEK - stackable with early bird savings: Register
cancel
Showing results for 
Search instead for 
Did you mean: 
bkar81
Contributor III
Contributor III

[resolved] How to work on Accumulating Snapshot Fact table in TIS

My Fact table rely on a SCD Type 2 Dimension table. And my Fact table is in accumulating snapshot model.
Please let me know how to update the fact table in this situation. (Update existing row and insert a new row) - similar to SCD Type 2.
Is there a component for that?
I am using Jaspersoft ETL 5.4.1 Ent version
Labels (2)
1 Solution

Accepted Solutions
willm1
Creator
Creator

Let me take a step back and describe how your job could be set up to do this purely in Talend.
+ First update all your dimension tables with data from your source system (SCD1, SCD2 etc...)
+ Select data from your source system that includes all your business keys (so every row has business keys and measures (for the fact table)
+ Using tMap, look up the current row from each Dimension table; since they're SCD2, you'll have an active flag that shows what row is the current.
+ From the row for each dimension table, get the corresponding Surrogate Key; for example, CustID MAG7983 could be mapped to Surrogate Key 1123. Get that key and add to your output.
+ In that tMap, look up all your Dimensions, and make sure you use 'Left Join' to that main data flow and select 'Unique' records from the input 'table'.
+ Coming out of the tMap, you'll have all your Surrogate Keys and your measures, ready to be inserted / updated into the Fact Table.
+ Write this to a Staging Table or output file.
See screenshots... You will notice that the initial load of the Staging table uses a Process_Type field, which we set to 1. This will be the default for all new records from the source system.
Now when you have the data in the staging table, you need to determine what records in the Fact Table need to be updated. We do this by joining the Staging table to the Fact Table (identical schemas at this point) and using an Inner Join. We only join on as few keys as we expect to always have the first load. In this case, I don't expect us to have ShipDate, Receipt Date etc... So we don't join on them. Any row that matches completely based on SK matches is assigned a Process_Type of 2. The output from this step is used to update the same Staging Table. At this point, rows that should be used for update will have Process_Type 2, and those for Insert, Process_Type 1.
Finally, read the Staging table and update or insert (see screenshots for details).
0683p000009MBxM.png 0683p000009MCur.png 0683p000009MD3V.png 0683p000009MD9g.png 0683p000009MD9l.png 0683p000009MD9q.png 0683p000009MD9N.png 0683p000009MD8o.png 0683p000009MD9v.png 0683p000009MDA5.png

View solution in original post

11 Replies
willm1
Creator
Creator

Hello bkar81 - depending on database you're using, one clean and easy way to accomplish this would be a MERGE statement in a tDBRow (tOracleRow for Oracle etc...). Here's an example of what the statement would look like - http://psoug.org/reference/merge.html . With the MERGE, you can easily define how to insert or update rows.
bkar81
Contributor III
Contributor III
Author

Thanks for the suggestion Willm.
My fact table has lot of foreign keys pointing to dimension tables and most of them are of SCD Type 2. In this case how can I achieve it?
Unfortunately, this is the requirement
willm1
Creator
Creator

Let me take a step back and describe how your job could be set up to do this purely in Talend.
+ First update all your dimension tables with data from your source system (SCD1, SCD2 etc...)
+ Select data from your source system that includes all your business keys (so every row has business keys and measures (for the fact table)
+ Using tMap, look up the current row from each Dimension table; since they're SCD2, you'll have an active flag that shows what row is the current.
+ From the row for each dimension table, get the corresponding Surrogate Key; for example, CustID MAG7983 could be mapped to Surrogate Key 1123. Get that key and add to your output.
+ In that tMap, look up all your Dimensions, and make sure you use 'Left Join' to that main data flow and select 'Unique' records from the input 'table'.
+ Coming out of the tMap, you'll have all your Surrogate Keys and your measures, ready to be inserted / updated into the Fact Table.
+ Write this to a Staging Table or output file.
See screenshots... You will notice that the initial load of the Staging table uses a Process_Type field, which we set to 1. This will be the default for all new records from the source system.
Now when you have the data in the staging table, you need to determine what records in the Fact Table need to be updated. We do this by joining the Staging table to the Fact Table (identical schemas at this point) and using an Inner Join. We only join on as few keys as we expect to always have the first load. In this case, I don't expect us to have ShipDate, Receipt Date etc... So we don't join on them. Any row that matches completely based on SK matches is assigned a Process_Type of 2. The output from this step is used to update the same Staging Table. At this point, rows that should be used for update will have Process_Type 2, and those for Insert, Process_Type 1.
Finally, read the Staging table and update or insert (see screenshots for details).
0683p000009MBxM.png 0683p000009MCur.png 0683p000009MD3V.png 0683p000009MD9g.png 0683p000009MD9l.png 0683p000009MD9q.png 0683p000009MD9N.png 0683p000009MD8o.png 0683p000009MD9v.png 0683p000009MDA5.png
bkar81
Contributor III
Contributor III
Author

Wow... What a detailed explanation... Lemme try and get back to you
Thanks a lot
bkar81
Contributor III
Contributor III
Author

Hi,
I have a small doubt..
Can I implement this through tMysqlSCD component instead of manually setting a flag to 1 or 2 (I know SCD is meant for Dimensions and not for fact)? I want to know which one would be faster and error free...
willm1
Creator
Creator

You can certainly try... The issue with that approach is that you'd be constrained to the way the SCD component works - you have to define SCD Start Date, SCD End Date, Active flag, plus a new Surrogate key. These would all be unnecessary and complicated for a Fact table - plus, you'd lose control on what fields you want to check on and how exactly you want to check...
The only twist to the approach I outlined here is having to check whether a fact had previously been inserted into the fact table. Otherwise, we normally just insert into fact tables.
Let us know how it goes...
willm1
Creator
Creator

And as far as faster and error free, I've used this simple design pattern (1 or 2...) many times across different projects. But don't take my word for it 0683p000009MACn.png. Put it to the test and let us know...
bkar81
Contributor III
Contributor III
Author

Yep sure...
I need to design both and do a load test before commenting...
Thanks anyways.
bkar81
Contributor III
Contributor III
Author

Is there a way to calculate the checksum inside talend (means either with tMap or any specific component), if so we can easily use the checksum field in tMap to determine the existence of a record in the Fact table