Skip to main content
Announcements
Live today at 11 AM ET. Get your questions about Qlik Connect answered, or just listen in. SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
ziabobaz
Creator III
Creator III

Peek() function slows my script - is there a workaround?

hi!

I have 2.5m rows and it takes around 5 minutes to load the following script. Is it a normal situation with peek() ?

Load Distinct
      Client,
      Order,
      date,
      if(Peek('Client')='Client',numsum(peek('PurchaseNo',-1),1),1) as PurchaseNo,//Присваиваю номера покупкам в разрезе Партнеров
      if(peek('Client')='Client',date-peek('date'),0) as DaysBetweenPurchases, //рассчитываю кол-во дней между повторными покупками в разрезе Партнеров,
      if(peek('Client')='Client',(date-peek('date'))/30,0) as MonthsBetweenPurchases

Resident MainTable
Order By Client asc, date asc;
Labels (1)
  • Peek

6 Replies
sunny_talwar

I think it might be Order by and/or DISTINCT which might be slowing your reload... have you tried this

Load Distinct
      Client,
      Order,
      date
Resident MainTable
Order By Client asc, date asc;

 and see if it still takes 5 minutes. I would test without Order By and Without Distinct and see what reload times you get.

But I think these operations combined, can take time to process.

ziabobaz
Creator III
Creator III
Author

THanks Sunny

Just checked - this is peek() which slows the data load

 

 

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi,

Peek will be slow. To speed things up you might look at chunking up the data so that you don't have to re-parse all of it every time and then do optimised loads from QVD. The challenge is always working out what you can chunk on.

ziabobaz
Creator III
Creator III
Author

thank you for all your input, guys

marcus_sommer

In my experience peek() itself isn't so slow that it couldn't be applied on larger datasets. Therefore I go with Sunny to check if the applied combination of statements (distinct, order by, if, peek) may delay the execution significantly. This means not only just removing one or two of them and noticing that the run-time is lesser and therefore the cause of the delaying must be the remaing statements else only the combination of all the transformations may cause this effect - especially if the load didn't run multi-threaded anymore else it's forced to a single-threaded execution it will increase the run-times quite heavily.

To check this you may try to do the distinct load a step before maybe even with the necessary ordering - which might be even applied another step before and/or the ordering is reduced to a single field maybe by any rowno() approach or by combining the needed fields preferably not as strings else as a number, for example with Client as the integer-value and date becomes the frac-part.

To insert one or two inbetween steps within the load-order may not necessary lead to a longer run-times at all epecially if it was intentionally designed in this way. This includes especially the suggestion from Steve to slice your data by applying any incremental approaches.

Of course to re-think and (partly) re-design the whole load could be become quite expensive and not appropriate for this case but you would also learn a lot so that the real benefits come on later projects.

Nevertheless here a few direct suggestions which may have also an impact on the run-time:

- replace all peek() unless peek('PurchaseNo') with previous()
- replace numsum() with rangesum()
- making the Client check only once:
  - maybe within a preceeding approach
  - first: -(Client = previous(Client)) as Flag
  - second: rangemin(rangesum(peek('PurchaseNo'), 1) * Flag, 1) as PurchaseNo
  - third: DaysBetweenPurchases / 30 as ... here might be also a mapping possible

- Marcus

ziabobaz
Creator III
Creator III
Author

Marcus

Thank you for thoughtful answer

I tried all you suggestions 

It saved me around 5% - not much

The problem is with the previous / peek functions 

I think i can live with it as for now

 

thank you!