Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
bclstu
Contributor
Contributor

SCD with audit date

Fairly new to Talend.  I am working on developing a standard process to maintain our SCD tables.  All of our dimension tables have standard audit columns, specifically CreationDate and LastUpdateDate, as well as the columns needed for SCD Type 2 (start, end, version and active).  I'm having trouble setting the LastUpdateDate appropriately.

 

The desired behavior is to set the LastUpdateDate to the current date on initial creation of a new row and update it on any Type 1 or Type 2 column updates in the row.  I would think that this is a fairly common use case but I'm not seeing a solution when searching the forum.

 

The general flow is:

 

tFileInputDelimited --> tMap --> tDBSCD

 

In the tMap component, I am mapping the current date to the SCD table LastUpdateDate.  

 

I have tried a couple different things in the tDBSCD component without success.  

Attempts:

1. LastUpdateDate as Type 0 column - the LastUpdateDate does not get updated.  Makes sense.

2. LastUpdateDate as Type 1 column - LastUpdateDate is updated to the current date for every row in the table.  Again, this makes sense as the it the LastUpdateData is just like any other Type 1 column.

3. LastUpdate as an Unused column - the LastUpdateDate is set to null (which violates our NOT NULL constraint on the audit columns. Also makes sense.

 

While the results of my different attempts make sense, they do not result in the desired behavior. it seems that there should be away of setting the LastUpdateDate only if other columns in the table have been updated.  Is this not a fairy typical use case? What is the common or best practice for dimension table audit columns when using the Talend SCD component.

 

I do not see how to set the LastUpdateDate per our requirements. Can anyone help me with this?

 

Thanks.

 

 

 

 

 

 

 

 

 

 

 

 

 

Labels (2)
5 Replies
Anonymous
Not applicable

Hello,

We are supposing that you have already checked this online documentation about:TalendHelpCenter:Tracking data changes using Slowly Changing Dimensions (type 0 through type 3) 

Could you please give us your LastUpdateDate setting screenshots on community?

Best regards

Sabrina

bclstu
Contributor
Contributor
Author

Thank you for the follow up, Sabrina. Yes, I did check the link you provided.  It does not seem to address the issue I am having.

 

I have attached screen shots of the overall flow, the tMap and SCD settings.

 

tJava_1 just gets and saves the current date.

product_raw is the source input file.  It does not have a LastUpdateDate field.

tMap_1 does some basic mapping to send to the SCD component.  It maps the current date to the LastUpdateDate column.

tProductSCD manages the slowly changing dimension.  In this example, the LastUpdateDate is in the Type 0 area. Using LastUpdateDate as a Type 1 or Type 2 change does not give the desired results either.

 

This works pretty much as expected.  The problem I am running into is setting our standard audit column, LastUpdateDate, to the current (or job) date for updated rows.  I do not see how to update this date correctly for SCD Type 1 or SCD Type 2 updates.

 

 

For SCD Type 1 updates,  we want to set the LastUpdateDate equal to the current (job) date for all rows in the table that have the same natural key as the row(s) being updated by the tDBSCD component. There may be multiple rows in the table with the same natural key due to prior SCD Type 2 changes to the row.

 

For SCD Type 2 updates, we want to set the LastUpdateDate equal to the current (job) date for only the row(s) being updated (that is, the row that is being expired).  For Type 2 changes, only the one row is updated by the tDBSCD component.   

 

I have come up with a very convoluted job flow that provides the required changes for the LastUpdateDate column.  It includes replicating the tMap_1 output to the SCD component and another tMap.  This tMap compares the input with the Dimension table, column by column and creates 2 tFileOutputDelimited componenents, one for Type 1 changes and one for Type 2 changes.  They are then processed (by subjobs) by tDBOutput components.

 

This seems way too complex and I'm hoping there is a better way of accomplishing this, as we expect dozens of dimension tables over time.

 

Thank you.

 


TalendSCD.png
TalendSCDFlow.png
TalendSCDTmap1.png
dominik4
Contributor
Contributor

Hi bclstu,

 

did you find a solution? I have the same question. At the moment my solution would be a Tabletrigger but i hope there is a better one.

 

Thank you

bclstu
Contributor
Contributor
Author

Hi dominik,

No, I did not find a solution using the SCD component.  Talend support confirmed that it cannot be done with the SCD component and recommended entering a new  feature/enhancement request to support it.

 

I ended up recreating the SCD functionality using standard components.  The ugly part is the column-by-column comparison in tMap to check if the column values have changed.  This will be especially cumbersome and very error prone for wide dimension tables. But the flow is not too bad and it does give you complete control.

Vlatko
Contributor
Contributor

Hello!

I am also looking for this functionality, which is actually quite basic for a datawarehouse. From my experience, every table in a datawarehouse should have the following columns:

ETL_INSERT_DATE (= first time the record was created)

ETL_MODIFIED_DATE (= your LastUpdatedDate)

ETL_MODIFIED_BY (=jobName or some kind of comment if the record was inserted/updated manually of some reason)

 

In addition, the dimensions that keep track of history, should have these columns:

ETL_FROM_DATE

ETL_TO_DATE

ETL_ACTIVE_FLAG

ETL_SCD_VERSION

 

Any ETL tool (at least good ones) should support easy maintaining of these "standard" columns.

Talend SCD component is almost perfect, but does not support the possibility of changing some columns only if other columns are changed.

 

Solution:

The SCD component editor needs one more "department": "The fields to be compared".

Because... Under "Type 0", "Type 1" and "Type 2" you are listing how each field is treated AFTER a change is discovered. At the same time "Type 1" and "Type 2" list the columns which should be compared. But it is not right!

I do want my ETL_MODIFIED_DATE (= LastUpdateDate) to be changed as if it was an Type 1 column, but I don't want it to be compared with the old value.

 

Alternatively, Talend should in a way give control to the user between the data set is made ready for writing to the database and the writing itself.

I have worked with SAP BusinessObjects Data Services for many years before I started using Talend in 2021. SAP BODS does not have a component which "does all the job", like the tDBSCD component in Talend. Instead it has Table_Comparison and History_Preserving, which give you more control.

But I think tDBSCD is a better solution... if it supported the needs described in this case. As said, it can be perfect if Talend does introduce a department "The fields to be compared".

 

Best regards

Vlatko