Solved: Peek function is taking too long. - Qlik Community

Report Inappropriate Content · ‎2010-04-02

I have an issue where a value is not populating on a field and I need to get the value of the previous record. In the instance below, I would need the populate the null value with 'MAC'.

Key Group Value Status
1 A MAC P
2 A null R
3 A MAC P

I've coded the peek function to handle what I'm looking for, and it works great....except....I've got to do this for 40 million rows and it's taking forever. Is there a quicker way to load and populate this value??

Below is my current process:

Load *,
if(ApprvCostType=' ',peek(ApprvCostType),ApprvCostType) as ApprvCostType_final;
Load [Clm Num] as ClmNum,
[Clm Auth Num] as Key,
SUBMIT_DT_KEY as SubDt,
num([Clm Submit Timestamp]) as SubTime,
[Clm Calc Cost Basis] as ApprvCostType
Resident CLMFACT
Order By [Clm Num],
SUBMIT_DT_KEY,
[Clm Submit Timestamp],
[Clm Auth Num];

Any ideas???

johnw · ‎2010-04-05

Hmmm, it really shouldn't take 2 hours to sort 40 million records. Let's see. Generated a random data set. Sorting 1 million rows in the script seemed like a second or two. Sorting 10 million rows took 42 seconds. Had to switch to a beefier computer to sort 40 million due to memory constraints on my PC, but that took nearly 20 minutes. That's slower than it should be. Even an O(n^2) sort, which they can't possibly be using, should only be 16 times slower for 40 million than for 10 million. It must have been swapping to disk.

Anyway, OK, maybe for a large key and big rows, maybe it could take 2 hours.

To make sure, you could remove the order by. That'll give you the wrong data, obviously, but if the performance problem goes away, then it probably IS the sort. If the performance problem remains, it isn't the sort.

Still, a join might be a better solution if you can make it work. The sort certainly isn't fast, even if it isn't ultimately the main problem. To get around the wall you've hit with the join approach, go ahead and do the join, but create a NEW field like [Substitute Clm Calc Cost Basis]. Then do another join of the table to itself, and add your final field something like this:

if([Clm Calc Cost Basis]=' ',[Substitute Clm Calc Cost Basis],[Clm Calc Cost Basis]) as ApprvCostType_final

As a side note, while I didn't expect it, I was hoping that QlikView would recognize that my sort key was a short number, and that I had a whole lot of rows, and would therefore switch to a linear time sorting algorithm. It didn't. No suprise. Just was hoping to be impressed.

View solution in original post

Anonymous · ‎2010-04-05

Do all 40 million rows change between each reload? Otherwise I would suggest using a QVD file and incremental reload (more info on this can be found in the QV reference manual). That way you only load the new records every time while all the previous records are already properly formatted in the QVD file.

johnw · ‎2010-04-05

It might be slightly faster to avoid the preceeding load:

Load [Clm Num] as ClmNum,
[Clm Auth Num] as Key,
SUBMIT_DT_KEY as SubDt,
num([Clm Submit Timestamp]) as SubTime,
if([Clm Calc Cost Basis]=' ',peek(ApprvCostType_final),[Clm Calc Cost Basis]) as ApprvCostType_final
Resident CLMFACT
Order By [Clm Num],
SUBMIT_DT_KEY,
[Clm Submit Timestamp],
[Clm Auth Num];

Do you know that it's the peek() that's the problem, though? Sorting 40 million records might take quite a while, even without the peek(). Also, if by "taking forever" you mean "it never finishes", it's possible that you have join problems.

Report Inappropriate Content · ‎2010-04-05

Great point John, and it does seem that the sort without the peek() takes about the same amt of time as including the peek. However, I've still got the same issue, as the two go hand in hand. Usually I can reload my qvw in about 15 minutes. With this new functionality, it's taking almost 2 hours to sort and apply the peek to all the records.

I am doing a incremental load to a QVD currently, but I don't see how to use that to my advantage as I still have to sort all the records in order to find the previosly loaded row that applies to that record.

I'm currently trying to develop a way through joins to apply the [Clm Calc Cost Basis] to records that have a ' ' in the field, and it seems to be working, but I've hit another wall. I'm not sure how to apply the values to only the records that have ' ' while leaving the other values in there for other records.

Would you have any suggestions on how to better accomplish this?

johnw · ‎2010-04-05

Hmmm, it really shouldn't take 2 hours to sort 40 million records. Let's see. Generated a random data set. Sorting 1 million rows in the script seemed like a second or two. Sorting 10 million rows took 42 seconds. Had to switch to a beefier computer to sort 40 million due to memory constraints on my PC, but that took nearly 20 minutes. That's slower than it should be. Even an O(n^2) sort, which they can't possibly be using, should only be 16 times slower for 40 million than for 10 million. It must have been swapping to disk.

Anyway, OK, maybe for a large key and big rows, maybe it could take 2 hours.

To make sure, you could remove the order by. That'll give you the wrong data, obviously, but if the performance problem goes away, then it probably IS the sort. If the performance problem remains, it isn't the sort.

Still, a join might be a better solution if you can make it work. The sort certainly isn't fast, even if it isn't ultimately the main problem. To get around the wall you've hit with the join approach, go ahead and do the join, but create a NEW field like [Substitute Clm Calc Cost Basis]. Then do another join of the table to itself, and add your final field something like this:

if([Clm Calc Cost Basis]=' ',[Substitute Clm Calc Cost Basis],[Clm Calc Cost Basis]) as ApprvCostType_final

As a side note, while I didn't expect it, I was hoping that QlikView would recognize that my sort key was a short number, and that I had a whole lot of rows, and would therefore switch to a linear time sorting algorithm. It didn't. No suprise. Just was hoping to be impressed.

rwunderlich · ‎2010-04-05


rob.cleek wrote:I'm currently trying to develop a way through joins

What key fields are you joining on?

-Rob

Report Inappropriate Content · ‎2010-04-06

Sorry for the delayed response guys. I'm trying to overcome some challenges with QV Support. I was able to get it to work through joins. At first glance this looks like it's doing what I need it to. I've just got to test it more.

As for the timings, I took off the order by, and it seemed to take the same amount of time to process the peek() as it did to keep the order by in. (about 2 minutes for 1 mill records). The clm auth num is about 15 characters long, so it's large. Using the join only brought on an extra 10 minutes to the load, so that's not too unreasonable.

Thanks John for your help. I greatly appreciate it.