Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
gt0731
Contributor III
Contributor III

tmap -Single transaction delete before insert

We tried to figure out how to do the following within a tamp mapping?
 Scenario :
For each record read
Start transaction
     Identify existing records
     Delete the existing records
     Insert new records from source
End transaction
Job flow  :
                                                          DeleteFlow
                                                             |
read from sql server(source) -->write into HDFS -->tmap -->insert Flow -->tgreenplumoutput
                                                              |
                                                            Lookup target file

Source to target Lookup model screenshot :
source table left join Target table -->Output Flow--> Delete complete record otherwise Insert flow all data again (when Matched ID is null)
0683p000009MDEv.png  
Tested :
When i am pulling 6 records from source it is inserting 6 well as output. but then if i increase `1 record in source  job goes to infinite loop it is not even throwing error. It should first delete the 6 already inserted and then insert all 7 again. but it is not working as expected.
Any help on it would be much appreciated
Job flow screenshot:
0683p000009MDES.png

If we did not have the requirement of a single transaction, I would build two separate jobs, one to delete, the other to insert.  But the requirement for a single transaction is making this hard.  Also will need to guarantee that we Delete before we Insert.
Labels (2)
9 Replies
Anonymous
Not applicable

Hi 
Do an inner join to get exiting records and new records, cache the new records or old records in memory for used in next subjob, eg:
t...connection
   |
source data--...-->tMap---eixsting records-- tgreenplumoutput (for deleting)
                                --new records--tHashOutput
   |
onsubjobok

   |
tHashInput--main-- tgreenplumoutput (for inserting)
   -onsubjobok--t...commit
   -onsubjoberror--t..Rollback.

Regards
Shong
gt0731
Contributor III
Contributor III
Author

Hi shong,
First of all thanks for data flow. it worked for me.
But also when I Executed same as you mentioned. for ~90k records. 
job execution time =18 min. 
Goal is :
How do i get incremental load in greenplum data warehouse. please guide because 2 flow based current approach is not improving the performance as expected.
 it is protocol to write  source data into hadoop  -source data ---> hdfs (for archive purpose)-->tgreenplumoutput.
Any help on it would be much appreciated .

 
Anonymous
Not applicable

Do you really need to check if the same records already exist in target table? or just delete all the data in target table and insert the source data. Cache the small data in memory, if the lookup data set is a little large, cache the data on disk instead of memory on tMap. 
gt0731
Contributor III
Contributor III
Author

Thanks in advance  shong !
I  would like to write the Just delete all from target and then  insert all again.However, Lookup Contains huge amount of data. When i  turned on Stored data on disk =TRUE . 
Still ,getting the same execution time because  Job visual clearly says data batch while outputting into tgreenplumOutput is slow. it consist of default batch value =10000. 
Anonymous
Not applicable

If you want to delete all data in target table before inserting, you can use a tGreenplumRow to execute a delete statement to delete all data.

t...connection
   |
tGreenPlumRow (delete statement)--oncomponentok--source data--...write into HDFS --oncomponentok-->-tgreenplumoutput (for inserting)
   -onsubjobok--t...commit
   -onsubjoberror--t..Rollback.
gt0731
Contributor III
Contributor III
Author

 Thanks shong !
I have tried this flow. This looks better But again it is full load. 
Also trying for Incremental Load.
In that,I am not able to Identify changed / new row. However,I cannot perform CDC  from source system. As i have read access only for extracting the data from source system. Is that possible to perform Incremental Load with given pipeline. it is difficult to me track  new/updated rows and output only this into greenplum. Instead of Full load.  
I'm hoping you can guide me with.
Anonymous
Not applicable

Without CDC, you have to compare the source data and target data to get respectively rows that need to be updated and inserted, that is to do an inner join between source data and target data like you did, obviously, the performance of this way is much lower than CDC if the target data volume is big.
Anonymous
Not applicable

Hi Shong,
I was not able to find how to start a new topic so I have to do this way. I am currently working on a CDC job that should run in both sequences depending upon the timestamp of changes or logs. Sequences are a job that runs in the order of Insert update delete or in the order of delete update insert job run on two conditions. This is to avoid constraint error. Because if row being delete and reinsert with the same primary key will throw error or lose records. I am also passing many context variables to those jobs.  I am not able to work with conditional run using tjavarow or tjava with if statements on string. Is there a way to do with if else condition on trunjob  and also pass values to the tmap from that conditions.  
Thanks,
kmv
Soumya_M
Contributor II
Contributor II

In the Delete scenario, firstly, I need to compare date of source and target (by extracting date from timestamp) - delete the matching records, then I've to insert the new records (for the deleted date) to the target. How can i achieve this?

source - .csv file, target - postgresql database