Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello All,
I have a fact table that is being loaded every day in full load manner meaning truncate and reload every day. (5 million records every day)
I have like 14 look up dimension tables which are loading the end fact table.
Now the client asked me to implement incremental logic on this fact table. I was asked not to use the CDC component as the client said they can not make any changes to source tables or on source database because of their policies.
Can some one please help me how to implement this logic as I have never implemented incremental logic on a fact table.
Below is how my job looks like and FCT_SN_FINAID_OPERATIONS id my final table that is being loaded.
Thank You
Hi,
When you are doing incremental logic, you need to check whether the table is already having the data based on key columns. If the data is present, pick the keys from the records and any change is an update to this fact table. If the data is not present, you can do straight insert.
Now your query might be how to differentiate the records to insert and update based on key columns. It can be done with following steps.
a) Do an inner join between your stage table (where you will have full data from source) and fact table based on key columns. This will give you the records which are already present. Now use this dataset as a lookup for a flow where you need to read the input stage table as the main flow again.
b) Do the inner join with both datasets using inner join. But in the output section, select the option to pick only inner join reject records. This means we are picking only those records which are not present in the DB. These records can be send directly for insert.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Hi,
When you are doing incremental logic, you need to check whether the table is already having the data based on key columns. If the data is present, pick the keys from the records and any change is an update to this fact table. If the data is not present, you can do straight insert.
Now your query might be how to differentiate the records to insert and update based on key columns. It can be done with following steps.
a) Do an inner join between your stage table (where you will have full data from source) and fact table based on key columns. This will give you the records which are already present. Now use this dataset as a lookup for a flow where you need to read the input stage table as the main flow again.
b) Do the inner join with both datasets using inner join. But in the output section, select the option to pick only inner join reject records. This means we are picking only those records which are not present in the DB. These records can be send directly for insert.
Warm Regards,
Nikhil Thampi
Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂
Thank You very much. I was able to implement what I needed based on your inputs.