I'm doing some work optimising a fairly complex script that I run (doing a complex transform that touches about 80+ tables). In that script I have case of a drop table command that seem to be running slowly.
Is there anyone out there with some expertise who might be able to comment on possible causes for this command to be slow?
Here is a comparison on the case:
I have two separate commands that run many times through the script, where the contents of the tables is a similar size.
drop table Columns.tmp.2 - runs 215x in the script with a total time spent of 11 minutes
drop table Columns.tmp,Dependencies.tmp,Tables.tmp - runs 109x in the script with a total time spent of 5s
For further information the table Columns.tmp.2 is always a sub-set of the table Columns.tmp (this is why I say the data size is similar)
Given similar data size I would have expected each instance of the first command to take up same amount of time as the second, which would put it at about 10s total time as an expectation, rather than 11mins.
As a note, this one drop table command is taking up about 9% of the time for running the script and is my largest single script statement for time spent (which is why I'm interested in it from an optimisation perspective).
For measuring the time that I spend on the statement I have this code:
set cmd=drop table Columns.tmp.2
Thanks in advance for any thoughts on the subject/comments on things that may affect the performance of this command.
Qlik needs to do multiple actions to drop a table. For example dropping the data-table (just flagging the occupied RAM area to be free to overwrite with any other data), adjusting the internal table-index, dropping the unique system-tables from this table, removing the field-values from shared system-tables and recalculating the pointer-index (relates to KEY fields) and maybe some more things (the last mentioned point here will probably the most heavy one).
I don't know if those actions are fully or partly executed in multi-threading but even if it means that multiple actions needs to be coordinated between Qlik and the OS and each communication caused some latency.
From your description I assume that you used various loops on the outside of the load-statements which also include the drop-statements. Therefore I suggest to re-think this approach because the same things like above mentioned will be needed to create tables - means it could be never fast at least not compared against the "native" Qlik loadings.
Thanks both to both of you for your inputs! I will do some investigation on your guidance.
It seems to me like the recalculating pointer-index that @marcus_sommer mentions may be the issue. I say this because the .tmp & .tmp.2 tables have identical columns names & tmp.2 is always removed before .tmp table so there aren't going to be the same number of KEY fields when the .tmp is removed.
I think the volume issue that @Karthi mentions is probably not so much of a problem as it's going to be less than 7000 rows (Columns.tmp is a subset of 7000, probably less than 100 rows most of the time, and .tmp.2 is a subset of .tmp, so even smaller).
I think that the issue was unrelated to the drop table itself. When I reviewed this case with the drop tables today I see that in total all of my drop tables in the entire app (now 867 drop table commands) take a total of 49s for what is a 3.5hrs process.
As a result I'm not investigating this any further.
A guess that I have is that it was related to some custom logging that I was doing at the time. I did some work last year to optimise this logging, and that did save a significant amount of time in my process, that was the only significant optimisation I did, so I think it was probably that.