Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
garyclark
Contributor
Contributor

Issue with compose datamart load

Hi Team,

Need your assistance in identifying the right approach for datamart load.
Issue : Currently when we initial load with full load option selected performance is good however after the full load when we change the setting to incremental load compose is throwing duplicate issue. It is trying to insert the data that is already existing in DM.
When we do initial load with incremental option selected performance is very poor. Below are our findings please let us know the best approach and correct if we missed anything. Thanks

Scenario 1 DM full load: created 2 data marts with same network id filter. One DM with incremental option and other with full load option selected.

Initial DM1 load with full load option selected: 1,250 record loaded 15.72 secs
Initial DM2 load with Incremental load option selected: 1,250 record loaded 11.75 secs

Scenario 2 DM incremental load: updated the full load option to incremental in DM1 and generated new scripts so now both the data marts have incremental option selected. Ran DWH CDC tasks and data load was successful.

Ran DM1 task --> this task has failed due to duplicate issue.

Ran DM2 task --> this task was successful.

Here both the DM's processed same data one was successful and other failed and this is consistency.I reviewed both the queries and everything was same except DWH runhandled parameter name. we are assuming parameter value is same as we did not process any DWH runs in between.

Scenario 3 incremental changes not picked during DWH run: Made a change in invoice_line_item table updated lineitem_units and lineitem_rate for lineitem 5807 . Noticed change got captured by replicate task and also we see an entry in ct table and then we started DWH incremental workflow CT table record was disappeared and change was not captured by compose.

Tried the same after few hours compose captured the change meaning DWH record got updated. Inconsistent performance by compose noticed in this case as well.

Ran DM1 and DM2 --> DM1 failed due to duplicate issue and DM2 captured the changes.

Scenario 4 perform full load on existing DM: We tried couple of options for this problems, none of the options worked.

option 1 by truncating all tables: Dropped foreign key references, truncated data , recreated foreign key references on all the tables and then updated incremental flag to full load. After making all these changes started DM task and it failed with the error message related to missing foreign key. I tried to create the missing foreign key but looks like this is already existing in the DB so SQL did not allow not sure if this is a cache issue.

option 2 by dropping and recreating all the tables: this options worked for full load but after changing the DM settings back to incremental ,DM task started to fail due to duplicate issue.

Apart from this we also noticed that CT changes are being lost even if the DM task is executed meaning if there is a changes in source , replicate captures that changes in CT tables and If DM tasks are trigged prior to DWH incremental CT changes are being lost.

Labels (4)
5 Replies
shashi_holla
Support
Support

Hi Gary,

Thanks for the detailed explanation. The purpose of Full Load is to get all available records every time we run it and I don't see an issue with it causing the Duplicates error. It's a best practice to create separate tasks for Full and incremental load and use Full Load only when we have truncated the data in the tables and want to reload.

Looks like scenario 3 qualifies for a Support case. 

Scenario 4:

Option 1: After making the changes, did we clear the metadata cache and Validate the DataWarehouse? I think these 2 are mandatory steps since we made changes directly on the DB and Compose need to sync the changes within the metadata.

Hope this helps but I feel a Support case would be more beneficial to provide more detailed info for all the mentioned scenarios.

 

QDI_Manuel
Employee
Employee

I reviewed the scenario with the client's team.  In their case they have a situation where an incremental data mart run is updating 1/3 of the records in the mart already (most type 1 dims and facts).  This is known to be a vary expensive operation on most RDBMS.  The full rebuild completes in minutes.  Given this data change profile the best option is to do full rebuilds of the data mart in every run.  There's no concern with loosing history, since the history records themselves are in the DW and processed incrementally every time.  Any needed history (for type 2) will be recreated correctly every rebuild from the DW.  

They will attempt to reproduce scenario 3 (Compose seemingly skipping CDC records) and open a case.  We did confirm however that running the Datamart has no impact on CDC records in change table.

Todd66
Contributor III
Contributor III

Yes Datamart has many Drawbacks:

Though data marts have some benefits over DW they also have some drawbacks as explained below:

  • Unwanted data marts that have been created are tough to maintain.
  • Data marts are meant for small business needs. Increasing the size of data marts will decrease its performance.
  • If you are creating more number of data marts then the management should properly take care of their versioning, security, and performance.
  • Data marts may contain historical (or) summarized (or) detailed data. However, updates to DW data and data mart data may not happen at the same time due to data inconsistency issues.
  • PMP Payslip Registration
Steve_Nguyen
Support
Support

@garyclark , any of the advise resolve your question.

Help users find answers! Don't forget to mark a solution that worked for you! If already marked, give it a thumbs up!
rickyshell
Contributor
Contributor

lease note that the actual solution may depend on the details of the problem you're encountering.

Troubleshooting and Fixing Datamart Load Issues:

  1. Review Error Messages:

    • If there are error messages during the datamart load, carefully review them. The error messages can provide valuable insights into what went wrong.
  2. Check Data Integrity:

    • Ensure that the data you're trying to load into the datamart is accurate, complete, and follows the defined data schema. Any inconsistencies or errors in the data can lead to load failures.
  3. Examine Load Process Logs:

    • Check the logs generated during the datamart load process. Look for any warnings, errors, or exceptions that might point to the root cause of the issue.
  4. Validate Data Sources:

    • Confirm that the data sources you're extracting data from are accessible and providing the expected data. Connectivity issues or changes in data source structures can impact the datamart load.
  5. Check Load Dependencies:

    • If your datamart load involves dependencies on other processes or data, ensure that those dependencies are met. For example, if there are upstream ETL (Extract, Transform, Load) processes, verify their successful execution.
  6. Review Transformation Logic:

    • If there are transformations applied during the datamart load, review the logic to ensure it's correct. Incorrect transformation logic can lead to data format issues or load failures.
  7. Verify Database Connection:

    • Ensure that the connection to the target database where the datamart is being loaded is established and valid. Check connection strings, credentials, and network configurations.
  8. Check Storage Availability:

    • Ensure that there is sufficient storage space available in the target database or datamart. Running out of storage can cause the load process to fail.
  9. Data Type Compatibility:

    • Confirm that the data types in the source data match the expected data types in the datamart schema. Mismatched data types can lead to conversion errors.
  10. Update ETL Jobs or Scripts:

    • If your datamart load is part of an ETL process, review and update the ETL jobs or scripts as needed. Ensure that they are aligned with any changes in data sources or business requirements.
  11. Engage with Support:

    • If you are using a specific datamart or ETL tool, consider reaching out to the support team or community forums associated with that tool. They may provide insights or solutions to common issues.

Remember to make changes cautiously and preferably in a testing environment before applying fixes in a production setting. If you have specific error messages or details about the issue, feel free to provide more information for a more targeted solution. Caregiver Connect