Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Wen
Partner - Contributor III
Partner - Contributor III

Optimize script without using previous() and peek() in FOR loop

Hi everyone, im having some performance issue using Previous() and Peek().

I'm trying to optimize a pice of code i've made, hope you guys can help me.

The main idea here is to compare all the records of a single table (thousands of records) to calculate the distance between them (every record is a car movement event with a specific latitude and longitude).

My first approach was to make 2 FORs, 1 inside the other to compare all the records between them.

After I noticed the performance was horrible, I move to this other one. The idea is the same, but with only 1 FOR and the conditions in the where clause of the load statement.

The performance for small amount of records (5k) is ok, but when I load a whole month (300k records) it takes forever to make the calculations.

Does anyone know how to optimize this witouht using peek / previous? Any thoughts?

Thanks in advance!

The code here:

DetencionesCompartidas_2:

NoConcatenate Load

Null() as IdEventoDetencionExtendida_2

AutoGenerate 0;

// Ciclo que recorre los registros de la tabla de eventos detenidos.

For i = 0 to NoOfRows('DetencionesCompartidas_1')-1

Let S_LatitudExtendida = Peek('LatitudDetencionExtendida_1',i,'DetencionesCompartidas_1');

Let S_LongitudExtendida = Peek('LongitudDetencionExtendida_1',i,'DetencionesCompartidas_1');

    Let S_IdEvento = Peek('IdEventoDetencionExtendida_1',i,'DetencionesCompartidas_1');

    Let S_Dominio = Peek('%_Dominio_1',i,'DetencionesCompartidas_1');

    Let S_FechaHoraInicio = Num(Peek('FechaHoraInicioDetencionExtendida_1',i,'DetencionesCompartidas_1'));

    Let S_FechaHoraFin = Num(Peek('FechaHoraFinDetencionExtendida_1',i,'DetencionesCompartidas_1'));

    Let S_PuntoGeografico = Peek('PuntoGeograficoDetencionExtendido_1',i,'DetencionesCompartidas_1');

Let S_TiempoDetencion = Peek('TiempoDetencionExtendido_1',i,'DetencionesCompartidas_1');

    Let S_LegajoConductor = Peek('LegajoDetencionExtendida_1',i,'DetencionesCompartidas_1');

   

   

    Concatenate(DetencionesCompartidas_2)

    Load

     %_Dominio_1 as %_Dominio_2,

        IdEventoDetencionExtendida_1 as IdEventoDetencionExtendida_2,

        FechaHoraInicioDetencionExtendida_1 as FechaHoraInicioDetencionExtendida_2,

        FechaHoraFinDetencionExtendida_1 as FechaHoraFinDetencionExtendida_2,

        PuntoGeograficoDetencionExtendido_1 as PuntoGeograficoDetencionExtendido_2,

        TiempoDetencionExtendido_1 as TiempoDetencionExtendido_2,

        LegajoDetencionExtendida_1 as LegajoDetencionExtendido_2,

            1 as EsDetencionCompartida_2,

            $(i) as IdDetencionCompartida_2,

            '$(S_IdEvento)' as IdEventoCompartido_2,

            '$(S_Dominio)' as DominioCompartido_2,

            '$(S_FechaHoraInicio)' as FechaHoraInicioEventoCompartido_2,

            '$(S_FechaHoraFin)' as FechaHoraFinEventoCompartido_2,

            '$(S_PuntoGeografico)' as PuntoGeograficoCompartido_2,

'$(S_TiempoDetencion)' as TiempoDetencionCompartido_2,

            '$(S_LegajoConductor)' as LegajoDetencionCompartido_2

        Resident DetencionesCompartidas_1

        Where Not Exists(IdEventoCompartido_2, IdEventoDetencionExtendida_1) And %_Dominio_1<>'$(S_Dominio)' And

        (Acos(Sin($(S_LatitudExtendida)*Pi()/180)*Sin(LatitudDetencionExtendida_1*Pi()/180)+

            Cos($(S_LatitudExtendida)*Pi()/180)*Cos(LatitudDetencionExtendida_1*Pi()/180)*Cos((LongitudDetencionExtendida_1*Pi()/180)-

            ($(S_LongitudExtendida)*Pi()/180)))*$(S_RadioTerrestre)) <= $(S_DistanciaDetencionCompartida) And

        (((FechaHoraInicioDetencionExtendida_1>='$(S_FechaHoraInicio)' And FechaHoraInicioDetencionExtendida_1<='$(S_FechaHoraFin)')

         Or (FechaHoraFinDetencionExtendida_1>='$(S_FechaHoraInicio)' And FechaHoraFinDetencionExtendida_1<='$(S_FechaHoraFin)'))

            Or (FechaHoraInicioDetencionExtendida_1<='$(S_FechaHoraInicio)' And FechaHoraFinDetencionExtendida_1>='$(S_FechaHoraFin)'));

        // [(2i>=1i And 2i<=1f) Or (2f>=1i And 2f<=1f)] Or (2i<=1i And 2f>=1f)

Next i;

Drop Table DetencionesCompartidas_1

5 Replies
marcus_sommer

Your main-problem by the performance is the use of a for loop and not the peek/previous functions. In which way you ever might it to implement it will be quite slow compared to normal loadings.

The reason is that each iteration has an own overhead - and now you called in there a load multiple times (even if it wouldn't be a whole table - the (rather complex) where clause is applied to each record else just an autogenerate 1) it would cause an overhead of adding one table to another.

Quite often you could replace such a construct with inside-loops like autogenerate, while or subfield within a preceeding load-approache like this (here simplified for demonstrating the logic):

load *, iterno() as i2 while iterno() >= Y;

load recno() as R, iterno() as i1 autogenerate noofrows('table') while iterno() <= X;

This will be much faster than an outside-loop. Of course you will need some efforts to adapt your logic to this method especially to catch all your variable-approaches.

Even more faster would it be if you could run (I'm not sure from your description and code if it's applicable in this case but in general it's true) your transformation not over wholes tables else over fieldvalues because they are stored as distinct values and quite often are their numbers much lesser than the number of records of their tables.

Beside this I have the impression that you create a kind of cartesian product. In this case it might be an easier alternatively to create the cartesian product with an outer join and applying your filtering/transformations/calculations on this result.

- Marcus

Wen
Partner - Contributor III
Partner - Contributor III
Author

marcus_sommer, thank you for the answer!

The big where clause is because I need to compare each and every record with all of the rest for:

* Being a different car license plate AND

* To have less than 30 meters from other vehicle AND

* Be at the same event time.

This is to calculate and create groups of cars being in the same place at the same time.

I can't pre calculate this things beforehand because I need to go row by row for it. Can I compare records using a load-loop iterno() and while?

Thank you for all!

marcus_sommer

For me means each against everyone a cartesian product. And the easiest and fastest way to create one is to use an outer join. Of course this is a heavy transformation and will need beside the calulation-time a lot of RAM - BUT this might be reducible by grouping your records in beforehand in time-periods, countries, cities or similar to join only records which fit in the right pattern.

In general I think a splitting of the task would be useful especially if it's after all not fit within your needed time-frame for the load and you must apply an incremental load-approach on your data.

I'm not absolutely sure that you could use inside-loops to compare records against eachother but before using outside-loops I would give it a try - whereby before that I would go with the above mentioned join-approach and applying the calculations on them.

- Marcus

swuehl
MVP
MVP

Wen
Partner - Contributor III
Partner - Contributor III
Author

marcus_sommer and swuehl, hi. Sorry for the delay on the answer.

The why for me to use a FOR instead a cartesian product was because of the number of records i'm managing. Trying to process a full month gives me a 82000 records table, which, if I join it to itself, gives me a 6.724.000.000 records table in memory which, for my server infrastructure, is impossible to handle.

The workaround I found is to process the data through 2 consecutives days per time (I need to compare today-1 with today) with a do while and inside that loop, I still use the FOR with the big where clause. I noticed that this last "solution" is slightly faster than the outer join one.

Doing it this way I finally manage to "finish" and process 1 month that was a thing I never could do by running a whole month.

Thank you very much both for your replies!