

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Peek() function slows my script - is there a workaround?
hi!
I have 2.5m rows and it takes around 5 minutes to load the following script. Is it a normal situation with peek() ?
Load Distinct
Client,
Order,
date,
if(Peek('Client')='Client',numsum(peek('PurchaseNo',-1),1),1) as PurchaseNo,//Присваиваю номера покупкам в разрезе Партнеров
if(peek('Client')='Client',date-peek('date'),0) as DaysBetweenPurchases, //рассчитываю кол-во дней между повторными покупками в разрезе Партнеров,
if(peek('Client')='Client',(date-peek('date'))/30,0) as MonthsBetweenPurchases
Resident MainTable
Order By Client asc, date asc;

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think it might be Order by and/or DISTINCT which might be slowing your reload... have you tried this
Load Distinct
Client,
Order,
date
Resident MainTable
Order By Client asc, date asc;
and see if it still takes 5 minutes. I would test without Order By and Without Distinct and see what reload times you get.
But I think these operations combined, can take time to process.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
THanks Sunny
Just checked - this is peek() which slows the data load


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Peek will be slow. To speed things up you might look at chunking up the data so that you don't have to re-parse all of it every time and then do optimised loads from QVD. The challenge is always working out what you can chunk on.


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for all your input, guys


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my experience peek() itself isn't so slow that it couldn't be applied on larger datasets. Therefore I go with Sunny to check if the applied combination of statements (distinct, order by, if, peek) may delay the execution significantly. This means not only just removing one or two of them and noticing that the run-time is lesser and therefore the cause of the delaying must be the remaing statements else only the combination of all the transformations may cause this effect - especially if the load didn't run multi-threaded anymore else it's forced to a single-threaded execution it will increase the run-times quite heavily.
To check this you may try to do the distinct load a step before maybe even with the necessary ordering - which might be even applied another step before and/or the ordering is reduced to a single field maybe by any rowno() approach or by combining the needed fields preferably not as strings else as a number, for example with Client as the integer-value and date becomes the frac-part.
To insert one or two inbetween steps within the load-order may not necessary lead to a longer run-times at all epecially if it was intentionally designed in this way. This includes especially the suggestion from Steve to slice your data by applying any incremental approaches.
Of course to re-think and (partly) re-design the whole load could be become quite expensive and not appropriate for this case but you would also learn a lot so that the real benefits come on later projects.
Nevertheless here a few direct suggestions which may have also an impact on the run-time:
- replace all peek() unless peek('PurchaseNo') with previous()
- replace numsum() with rangesum()
- making the Client check only once:
- maybe within a preceeding approach
- first: -(Client = previous(Client)) as Flag
- second: rangemin(rangesum(peek('PurchaseNo'), 1) * Flag, 1) as PurchaseNo
- third: DaysBetweenPurchases / 30 as ... here might be also a mapping possible
- Marcus


- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Marcus
Thank you for thoughtful answer
I tried all you suggestions
It saved me around 5% - not much
The problem is with the previous / peek functions
I think i can live with it as for now
thank you!
