Re: Peek() function slows my script - is there a w... - Qlik Community

ziabobaz · ‎2020-08-13

hi!

I have 2.5m rows and it takes around 5 minutes to load the following script. Is it a normal situation with peek() ?

Load Distinct
      Client,
      Order,
      date,
      if(Peek('Client')='Client',numsum(peek('PurchaseNo',-1),1),1) as PurchaseNo,//Присваиваю номера покупкам в разрезе Партнеров
      if(peek('Client')='Client',date-peek('date'),0) as DaysBetweenPurchases, //рассчитываю кол-во дней между повторными покупками в разрезе Партнеров,
      if(peek('Client')='Client',(date-peek('date'))/30,0) as MonthsBetweenPurchases

Resident MainTable
Order By Client asc, date asc;

sunny_talwar · ‎2020-08-13

I think it might be Order by and/or DISTINCT which might be slowing your reload... have you tried this

Load Distinct
      Client,
      Order,
      date
Resident MainTable
Order By Client asc, date asc;

and see if it still takes 5 minutes. I would test without Order By and Without Distinct and see what reload times you get.

But I think these operations combined, can take time to process.

ziabobaz · ‎2020-08-13

THanks Sunny

Just checked - this is peek() which slows the data load

stevedark · ‎2020-08-13

Hi,

Peek will be slow. To speed things up you might look at chunking up the data so that you don't have to re-parse all of it every time and then do optimised loads from QVD. The challenge is always working out what you can chunk on.

ziabobaz · ‎2020-08-14

thank you for all your input, guys

marcus_sommer · ‎2020-08-14

In my experience peek() itself isn't so slow that it couldn't be applied on larger datasets. Therefore I go with Sunny to check if the applied combination of statements (distinct, order by, if, peek) may delay the execution significantly. This means not only just removing one or two of them and noticing that the run-time is lesser and therefore the cause of the delaying must be the remaing statements else only the combination of all the transformations may cause this effect - especially if the load didn't run multi-threaded anymore else it's forced to a single-threaded execution it will increase the run-times quite heavily.

To check this you may try to do the distinct load a step before maybe even with the necessary ordering - which might be even applied another step before and/or the ordering is reduced to a single field maybe by any rowno() approach or by combining the needed fields preferably not as strings else as a number, for example with Client as the integer-value and date becomes the frac-part.

To insert one or two inbetween steps within the load-order may not necessary lead to a longer run-times at all epecially if it was intentionally designed in this way. This includes especially the suggestion from Steve to slice your data by applying any incremental approaches.

Of course to re-think and (partly) re-design the whole load could be become quite expensive and not appropriate for this case but you would also learn a lot so that the real benefits come on later projects.

Nevertheless here a few direct suggestions which may have also an impact on the run-time:

- replace all peek() unless peek('PurchaseNo') with previous()
- replace numsum() with rangesum()
- making the Client check only once:
- maybe within a preceeding approach
- first: -(Client = previous(Client)) as Flag
- second: rangemin(rangesum(peek('PurchaseNo'), 1) * Flag, 1) as PurchaseNo
- third: DaysBetweenPurchases / 30 as ... here might be also a mapping possible

- Marcus

ziabobaz · ‎2020-08-14

Marcus

Thank you for thoughtful answer

I tried all you suggestions

It saved me around 5% - not much

The problem is with the previous / peek functions

I think i can live with it as for now

thank you!

Peek() function slows my script - is there a workaround?

Peek