Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi Team,
Need your assistance in identifying the right approach for datamart load.
Issue : Currently when we initial load with full load option selected performance is good however after the full load when we change the setting to incremental load compose is throwing duplicate issue. It is trying to insert the data that is already existing in DM.
When we do initial load with incremental option selected performance is very poor. Below are our findings please let us know the best approach and correct if we missed anything. Thanks
Scenario 1 DM full load: created 2 data marts with same network id filter. One DM with incremental option and other with full load option selected.
Initial DM1 load with full load option selected: 1,250 record loaded 15.72 secs
Initial DM2 load with Incremental load option selected: 1,250 record loaded 11.75 secs
Scenario 2 DM incremental load: updated the full load option to incremental in DM1 and generated new scripts so now both the data marts have incremental option selected. Ran DWH CDC tasks and data load was successful.
Ran DM1 task --> this task has failed due to duplicate issue.
Ran DM2 task --> this task was successful.
Here both the DM's processed same data one was successful and other failed and this is consistency.I reviewed both the queries and everything was same except DWH runhandled parameter name. we are assuming parameter value is same as we did not process any DWH runs in between.
Scenario 3 incremental changes not picked during DWH run: Made a change in invoice_line_item table updated lineitem_units and lineitem_rate for lineitem 5807 . Noticed change got captured by replicate task and also we see an entry in ct table and then we started DWH incremental workflow CT table record was disappeared and change was not captured by compose.
Tried the same after few hours compose captured the change meaning DWH record got updated. Inconsistent performance by compose noticed in this case as well.
Ran DM1 and DM2 --> DM1 failed due to duplicate issue and DM2 captured the changes.
Scenario 4 perform full load on existing DM: We tried couple of options for this problems, none of the options worked.
option 1 by truncating all tables: Dropped foreign key references, truncated data , recreated foreign key references on all the tables and then updated incremental flag to full load. After making all these changes started DM task and it failed with the error message related to missing foreign key. I tried to create the missing foreign key but looks like this is already existing in the DB so SQL did not allow not sure if this is a cache issue.
option 2 by dropping and recreating all the tables: this options worked for full load but after changing the DM settings back to incremental ,DM task started to fail due to duplicate issue.
Apart from this we also noticed that CT changes are being lost even if the DM task is executed meaning if there is a changes in source , replicate captures that changes in CT tables and If DM tasks are trigged prior to DWH incremental CT changes are being lost.
Hi Gary,
Thanks for the detailed explanation. The purpose of Full Load is to get all available records every time we run it and I don't see an issue with it causing the Duplicates error. It's a best practice to create separate tasks for Full and incremental load and use Full Load only when we have truncated the data in the tables and want to reload.
Looks like scenario 3 qualifies for a Support case.
Scenario 4:
Option 1: After making the changes, did we clear the metadata cache and Validate the DataWarehouse? I think these 2 are mandatory steps since we made changes directly on the DB and Compose need to sync the changes within the metadata.
Hope this helps but I feel a Support case would be more beneficial to provide more detailed info for all the mentioned scenarios.
I reviewed the scenario with the client's team. In their case they have a situation where an incremental data mart run is updating 1/3 of the records in the mart already (most type 1 dims and facts). This is known to be a vary expensive operation on most RDBMS. The full rebuild completes in minutes. Given this data change profile the best option is to do full rebuilds of the data mart in every run. There's no concern with loosing history, since the history records themselves are in the DW and processed incrementally every time. Any needed history (for type 2) will be recreated correctly every rebuild from the DW.
They will attempt to reproduce scenario 3 (Compose seemingly skipping CDC records) and open a case. We did confirm however that running the Datamart has no impact on CDC records in change table.
Yes Datamart has many Drawbacks:
Though data marts have some benefits over DW they also have some drawbacks as explained below:
@garyclark , any of the advise resolve your question.
lease note that the actual solution may depend on the details of the problem you're encountering.
Review Error Messages:
Check Data Integrity:
Examine Load Process Logs:
Validate Data Sources:
Check Load Dependencies:
Review Transformation Logic:
Verify Database Connection:
Check Storage Availability:
Data Type Compatibility:
Update ETL Jobs or Scripts:
Engage with Support:
Remember to make changes cautiously and preferably in a testing environment before applying fixes in a production setting. If you have specific error messages or details about the issue, feel free to provide more information for a more targeted solution. Caregiver Connect