topic Re: Soft delete when there are duplicates in source in Talend Studio

Soft delete when there are duplicates in source

Aami — Sat, 16 Nov 2024 03:34:54 GMT

Hi,

I am reading data from a csv file which does not have a primary key and there are duplicate entries as well. This data is loaded into database. The requirement is to implement the soft delete when something gets deleted from source.

For example if there are 4 occurrences of a particular record in source and once of them is deleted, one of the record to be marked as N in target during the next load and rest three would be Y. In case two of them gets deleted, two records should be marked as N and rest two will remain as Y.

I am thinking about sequencing the group of duplicates in incremental order and using that column to differentiate between the duplicates. However I'm not sure how to practically implement this.

Can someone please help with ideas.

Thanks in advance,

Re: Soft delete when there are duplicates in source

Anonymous — Wed, 15 Jan 2020 13:06:40 GMT

To do a ranking you need a tSortRow and a tMap.

You may omit the tSortRow component if the duplicates rows are exactly the same, if there is someting different you can order by these fields. Then in the tMap create a variable with integer type. In the expression write: Numeric.sequence(fields you want to group by,1,1). These fields must appear as only one string field so you can do something like: Numeric.sequence(field1.toString() + field2.toString(),1,1)

Re: Soft delete when there are duplicates in source

Aami — Fri, 24 Jan 2020 09:48:58 GMT

Any idea how to achieve this?

Since the source doesn't have a primary key and has duplicated rows, Marking any of the deleted record as inactive in target is looking like a challenge.

Re: Soft delete when there are duplicates in source

Aami — Mon, 02 Nov 2020 07:42:34 GMT

Select all active records from Target and source and store them in buffer (tHashoutput) component.
While fetching records from Source create a Hash value (Hash1) of all the applicable columns using DataMasking.createMD5 or any similar hashing technique in tMap1. In tMap2 create a numeric sequence for Hash value using Numeric.sequence(Hash1,1,1). Use this Sequence as ID for the Target Table and create another Hash key (Hash2) using all the applicable columns +Hash1 + Numeric sequence (ID)
Insert New Record: Use src data as input1 and target data as input 2 into tMap and join them on Hash2 and ID created in previous step and all the applicable columns. Insert if ID == null . Add below columns with default values as mentioned

ROW_EFFECTIVE_DATE -> TalendDate.getCurrentDate()

ROW_EXPIRY_DATE -> TalendDate.parseDate("yyyy-MM-dd","9999-12-31")

ROW_CREATED_DATE-> TalendDate.getCurrentDate()

ROW_UPDATED_DATE -> TalendDate.getCurrentDate()

ACTIVE_IND-> "Y"

4. Logical delete of records deleted in source: Use target as input 1 and source as input 2 into tmap and join them on Hash2 and ID created in step 2 and all the applicable columns. If ID == null update the active indicator to N.

ROW_EFFECTIVE_DATE -> Keep as it is in target

ROW_EXPIRY_DATE -> TalendDate.addDate(TalendDate.getCurrentDate(),-1,"dd")

ROW_CREATED_DATE-> Keep as it is in target

ROW_UPDATED_DATE -> TalendDate.getCurrentDate()

ACTIVE_IND-> "N"