Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Write Table now available in Qlik Cloud Analytics: Read Blog
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

Improve the performance of a report

Hi,

I have a report that the users run locally. It works fine, now I'd like to improve the performance: There is one table which is quite big - approx. 19mio records currently. This table used to be loaded from a qvd several times during the report - in some places, only a few records are loaded, but the entire table has to be parsed ... in places, this is even done inside a LOOP.

Now I have already replaced all those LOADs and made them RESIDENT - the table is now loaded into RAM once in the beginning of the report, all subsequent LOADs are RESIDENT. That should improve the performance quite a bit.

I'd like to do one more thing: Sometimes, the users have to run the report several times in a row if they have several instances of that task, because the report cannot yet do batch-processing.

In that case, I'd like to offer the user the possibility to just let that one big table remain in the DataModel and use it again instead of loading it again.

I guess that will come down to a partial run of the script, but I could think of an alternative using a second qvw that is loaded BINARY ...

Does anyone have any ideas? If I were to do the partial run, every other table except that one would need a REPLACE keyword, no?`

Thanks a lot for your suggestions!

Best regards,

DataNibbler

1 Solution

Accepted Solutions
marcus_sommer

Hi DataNibbler,

I think there are other ways for it. I would try the following:

SmallTable:

load KEY from SmallTable;

BigTable:

load * From BigTable where exists(KEY); // this would be remain an optimized load from a qvd

- Marcus

View solution in original post

8 Replies
marcus_sommer

Hi DataNibbler,

resident loads mustn't be the most efficient way to load the data - for example it will depend on how many RAM is available (you will need more RAM than storing the data as qvd and reloading them from there again) and how fast is your (network) storage. Probably there are some more points which could affect this and the best way to find it out is to check the load-times from both methods.

Beside the approaches to run a partial load and/or a binary load I suggest to check if the load itself couldn't be improved - you are speaking of loading the same data several times and also in loops which indicates that there could be potential to optimize it (especially if qvd-loads are running optimized).

- Marcus

datanibbler
Champion
Champion
Author

Hi Marcus,

well, our network_connection is definitely not fast 😉 So RESIDENT LOADs are the most efficient way I guess.

There might be more potential to optimize, I will have another look.

One idea I had yesterday is, the user could in the beginning, before reloading the script, provide a date or daterange to look in, so only the relevant part of that big transaction_table would need to be loaded.

Thanks a lot!

Best regards,

DataNibbler

datanibbler
Champion
Champion
Author

Hi Marcus,

say, is there a difference in the "direction" - let's say I have a small table with 20 records and a big one with 20.000 - out of this big table I want only those records that match any in the small table.

usually, I would parse the small table in a FOR NEXT loop and, upon every iteration, parse the big one for matches. Could I do it the other way round and loop through the big table and just check, upon every iteration, whether the record MATCH()es any one I have in the small table? Would that make a difference?

(I remember that in another tool I have worked with, ACL, that did make a difference).

Best regards,

DataNibbler

marcus_sommer

Hi DataNibbler,

I think there are other ways for it. I would try the following:

SmallTable:

load KEY from SmallTable;

BigTable:

load * From BigTable where exists(KEY); // this would be remain an optimized load from a qvd

- Marcus

datanibbler
Champion
Champion
Author

Hi Marcus,

that sounds good - that way I would need only one pass, no loop.

The only thing is - here I must enter into detail a little bit - the big_table has transactions of any kind (any bookings made to our database_system) and the samll_table holds the Tracks (like "boxes").

In the further progress of the script - and in the chart on the GUI - I need the nr. of bookings that were made to each box - so I need the nr. of records from the big_table that I got for every record from the small_table.

I don't quite see how I could get that number if I were to do it all in one pass - that is one of the reasons I have the loop in that place.

P.S.: Hmmm - I am not quite certain - I cannot yet see where that figure is actually displayed in the end, or used in another loop or anything - I need to have another look at that. If it isn't used, I'll scrap that one and then it might be possible to do it in one pass like you suggest.

marcus_sommer

Hi DataNibbler,

I think you could get this information with an additionally load-step (as associated table or maybe for another join- or mapping-step), like:

SmallTable:

load KEY from SmallTable;

BigTable:

load * From BigTable where exists(KEY);

NumberOfKey:

load KEY, count(KEY) as CountKEY From BigTable group by KEY;

- Marcus

datanibbler
Champion
Champion
Author

Hi Marcus,

it seems like I don't really need this information at all. The issue is only that the report has several possible "routes" and outcomes to allow for different scenarios. I have to look through all of them before I take it out.

datanibbler
Champion
Champion
Author

Hi Marcus,

I have tested both methods and - the falsifying effects of the cache and all in mind - the old-fashioned method takes 1min49 while the alternative method takes 6sec - that is a difference. I think it's worthwhile pursuing that.

Coming back to that other idea I had: Maybe I can just implement another feedback_option from the user and store away that one big table that I have loaded in the beginning of the process and if the user says he has to run that report several times, then the next iteration of the report will just load that smaller qvd again - well, I guess the difference would not be great that way plus the user would have to make sure that they look for the delivery that came the earliest first - no, I won't go that way, the users usually won't have this information, I have to look it up in the database before I prepare that one big table.

Thanks a lot! I guess that's it for now.