Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello everyone,
I'm facing an issue which I thought was due to sort component but it seems it's not.
My scenario is the following: read CSV file --> simple Tmap-> unique rows by 2 columns -> map to calculate some math formula for distance between two coordinates -> 3 branches from this Tmap (for inserts and updates in 2 Snowflake SCD II dimensions).
Right now, is only reading 3.25 rows/sec !!! It doesn't through any memory error, but it took almost 12h! So, something's not right. No difference running in TOS or in the server either.
Can someone please help me, any hint? Maybe some option in the tfileInputDelimited ...
@Shicong Hong I know you usually have some pretty good hints on this, maybe you could help me ? Thank you in advance 🙂
1) try to select only related columns from the lookup table.
for ex: lookup_dim_site
if you need only one column value , then select the foreign key column and the other column you requires in output.
Do this for all the lookup table
2) Also why you used the first tMap, you are not doing any lookup there. Only use the tMap when it is required.
Hi, you have to partition your job,you did extraction transformation and load in one flow, it can cause Poor performance.
Just try in a first Time to put your tFileInputDelimited with just a tLogRow and you will see How long it really take to the job to read it.
Yes, the bottleneck is not the tmap or sorts even. It's really the insert and update row by row for SCD 2. It doesn't work with bulk insert and update because that way I loose the status of the latest record to me modified and so on.
So, the idea is to speed up things for this SCD2 implementation after tMap. I don't know, will separate each branch into one child job be better?