Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
anna_t1
Contributor III
Contributor III

tFileInputDelimited reading very,very slow (less than 120.000 rows) before tMap and unique

Hello everyone,

I'm facing an issue which I thought was due to sort component but it seems it's not.

My scenario is the following: read CSV file --> simple Tmap-> unique rows by 2 columns -> map to calculate some math formula for distance between two coordinates -> 3 branches from this Tmap (for inserts and updates in 2 Snowflake SCD II dimensions).

Right now, is only reading 3.25 rows/sec !!! It doesn't through any memory error, but it took almost 12h! So, something's not right. No difference running in TOS or in the server either.

Can someone please help me, any hint? Maybe some option in the tfileInputDelimited ...

0695b00000F6iAMAAZ.png

Labels (4)
5 Replies
anna_t1
Contributor III
Contributor III
Author

@Shicong Hong​ I know you usually have some pretty good hints on this, maybe you could help me ? Thank you in advance 🙂

Prakhar1
Creator III
Creator III

1) try to select only related columns from the lookup table.

for ex: lookup_dim_site

if you need only one column value , then select the foreign key column and the other column you requires in output.

Do this for all the lookup table

 

2) Also why you used the first tMap, you are not doing any lookup there. Only use the tMap when it is required.

gjeremy1617088143

Hi, you have to partition your job,you did extraction transformation and load in one flow, it can cause Poor performance.

gjeremy1617088143

Just try in a first Time to put your tFileInputDelimited with just a tLogRow and you will see How long it really take to the job to read it.

anna_t1
Contributor III
Contributor III
Author

Yes, the bottleneck is not the tmap or sorts even. It's really the insert and update row by row for SCD 2. It doesn't work with bulk insert and update because that way I loose the status of the latest record to me modified and so on.

So, the idea is to speed up things for this SCD2 implementation after tMap. I don't know, will separate each branch into one child job be better?