Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
prayner
Contributor III
Contributor III

Smooth values based on neighbouring bins

I have a gridded map table produced by the GeoOperations binning function. However, some of the values in the map might be outliers relative to neighbours, so I have written a smoothing function in Qlik load editor to replace any outliers with the average value of the neighbouring bins. However, I have to run this function for hundreds of thousands of bins, multiplied by each dimension (product, date). How can I optimise the load script? It would be good to avoid the `for each` but I am not sure how.

Screenshot 2024-11-13 125002.png

(image illustrates how some values appear higher or lower than neighbouring bins. They look like dark and light spots. using a smoothing function, we can eliminate these outliers).

binning_table_t2:
load *,
SubField(bin_id, '|', 2) as x,
SubField(bin_id, '|', 3) as y
resident binning_table_t
;
drop table binning_table_t;

[t_distinct_bin_ids]:
load distinct bin_id as distinct_bin_ids resident binning_table_t2;


// Create a temporary table with neighbor BinIds
neighbour_check:
LOAD * INLINE [
tmp_field
];

for each vBinId in FieldValueList('distinct_bin_ids')

//Identify each surrounding neighbour for a bin id which has format 'R|X|Y'

LET vX=SubField('$(vBinId)', '|', 2);
LET vY=SubField('$(vBinId)', '|', 3);
LET vYPlus1 = $(vY) + 1;
LET vYMinus1 = $(vY) - 1;
LET vXMinus1 = $(vX) - 1;
LET vXPlus1 = $(vX) + 1;

[binning_table_t2_bin_id]:
NoConcatenate
load date, product, avg_index_value resident binning_table_t2
WHERE
((x=$(vX) and y=$(vYPlus1)) or
(x=$(vX) and y=$(vYMinus1)) or
(x=$(vXMinus1) and y=$(vY)) or
(x=$(vXPlus1) and y=$(vY)) or
(x=$(vXMinus1) and y=$(vYPlus1)) or //top left
(x=$(vXPlus1) and y=$(vYPlus1)) or //top right
(x=$(vXMinus1) and y=$(vYMinus1)) or //bottom left
(x=$(vXPlus1) and y=$(vYMinus1))) //bottom right
;

trace $(vBinId);

// Join the temporary table with your main data table
[neighbour_check]:
Concatenate(neighbour_check)
LOAD '$(vBinId)' as bin_id, product, date, avg(avg_index_value) as neighbouring_avg_index_value
resident binning_table_t2_bin_id
group by product, date
;

drop table binning_table_t2_bin_id;

next vBinId

 

Sample data :

[binning_table_t]:

load * inline [
bin_id , avg_index_value, date, product
R|114|838 , 0.60129568 , 30/10/2024 , apple
R|114|838 , 0.60129568 , 12/11/2024 , apple
R|114|839 , 0.47632898 , 12/11/2024 , apple
R|114|839 , 0.66667490883028 , 30/10/2024 , apple
R|114|843 , 0.55975765889839 , 12/11/2024 , apple
R|114|843 , 0.78789984277157 , 30/10/2024 , apple
R|114|844 , 0.67144721805877 , 12/11/2024 , apple
R|114|844 , 0.83291251451518 , 30/10/2024 , apple
R|114|846 , 0.5460232 , 12/11/2024 , apple
R|114|846 , 0.7981770082408 , 30/10/2024 , apple
R|114|847 , 0.63275596 , 12/11/2024 , apple
R|114|847 , 0.83393975481718 , 30/10/2024 , apple
R|115|838 , 0.50620004195143 , 12/11/2024 , apple
R|115|838 , 0.61589052 , 30/10/2024 , apple
R|115|839 , 0.54598076193291 , 12/11/2024 , apple
R|115|839 , 0.73205414 , 30/10/2024 , apple
R|115|842 , 0.70264421 , 12/11/2024 , apple
R|115|842 , 0.95325742113421 , 30/10/2024 , apple
R|115|843 , 0.65527531952943 , 12/11/2024 , apple
R|115|843 , 0.90651484 , 30/10/2024 , apple
R|115|844 , 0.61245206101525 , 12/11/2024 , apple
R|115|844 , 0.83911441747261 , 30/10/2024 , apple
R|115|846 , 0.75535312 , 12/11/2024 , apple
R|115|846 , 0.88929316987812 , 30/10/2024 , apple
R|115|847 , 0.6306660543884 , 12/11/2024 , apple
R|115|847 , 0.82988369243856 , 30/10/2024 , apple
]
;

Labels (3)
1 Solution

Accepted Solutions
marcus_sommer

Such approach must be slow because in each loop iteration is a table-load initialized which reads a larger resident-load with a heavy where-clause.

Personally I would try a different way by accumulating the relevant neighbours with the help of interrecord-functions within an appropriate sorted resident load. Here a simplified example:

load Key, Value, rangeavg(Value, previous(Value), peek('Value', -2)) as ValueAVG
resident Source order by Key;

This means the essential part is the right ordering against one or several fields. Depending on the scenario it might be necessary to embed the accumulation-part within n (nested) if-loops to ensure that the interrecord-logic is restricted to the wanted areas and/or to run it twice - forwards and backwards - and/or to create n different avg-fields and/or to calculate the offset from the current value against the avg and/or to fetch outliers within further range-min/max and/or further grouping/flagging-fields or similar stuff.  

View solution in original post

1 Reply
marcus_sommer

Such approach must be slow because in each loop iteration is a table-load initialized which reads a larger resident-load with a heavy where-clause.

Personally I would try a different way by accumulating the relevant neighbours with the help of interrecord-functions within an appropriate sorted resident load. Here a simplified example:

load Key, Value, rangeavg(Value, previous(Value), peek('Value', -2)) as ValueAVG
resident Source order by Key;

This means the essential part is the right ordering against one or several fields. Depending on the scenario it might be necessary to embed the accumulation-part within n (nested) if-loops to ensure that the interrecord-logic is restricted to the wanted areas and/or to run it twice - forwards and backwards - and/or to create n different avg-fields and/or to calculate the offset from the current value against the avg and/or to fetch outliers within further range-min/max and/or further grouping/flagging-fields or similar stuff.