Duplicates using Applymap()

jakeng · ‎2023-01-22

Hello All,

I am stuck in a situation where I have a large table A, which stores store locations. Its primary key is ID, other important keys include Category (Low, Medium, High).

Then I have created another table/mapping B, which sums boolean values for each store ID & category. So I have 3 rows for each ID and 3 columns (ID, Category, Sum).

Now what I want to do so to assign these sums to table A. Using a left join will ultimately lead to having a lot of duplicates, and since I will be doing other calculations using these sums, only the first entry must be appended for each ID & Category.

Therefore, I have tried using Applymap(), but it leaves me with the same result as in the case with left join - with a lot of duplicates. My quick fix so far has been to use Distinct for my calculations, but this is suboptimal as two categories may share the same value as their sum in table B.

Is there any reason why Applymap() would append the sum from table B to table A for each matching row as if it were a regular left join? I thought the whole meaning of this function was that it was triggered at the first matching entry only.

Jakob

Vegar · ‎2023-01-23

ApplyMap() will not give you the first value of your mapping table in return. It will not add rows to your A table.

Maybe your issue lies before or after the applymap in your script?

You write that your applymap table has three columns, but it can only have two . Read more about mapping tables and about ApplyMap() on the qlik help pages.

Vegar
Qlik Community MVP

jakeng · ‎2023-01-23

Hei Vegar og takk for svar.

I shall continue in English so others facing the same issue may follow.

The code goes like this:

Mapping_Orig:
Load ID, Category, Sum(Boolean_Key) as Sum1
resident Another_Table
where Boolean_Key > 0
group by ID, Category
;

Mapping:
Mapping Load ID & Category as Key, Sum1 as Sum2
Resident Mapping_Orig
;

Table_A_Temp:
Load *, ID & Category as Key
resident Table_A
;

Drop Table Table_A;

Table_A:
Load *, ApplyMap('Mapping', Key, 0) as Sum2
resident Table_A_Temp
;

Drop Table Table_A_Temp;

where Table_A is the original table which approx. 20 fields that I am trying to map the sums to. It also relies on Another_Table as my mapping.

The mapping looks ok until the step where I use ApplyMap(), i.e. my problem is that when I filter by category in Table_A after the mapping is done, it contains duplicates so the final sums are a lot higher than the actual ones. In the Mapping_Orig table, this problem is not present.

Can you please advise what I am doing wrong?

Mange takk.

Jakob

Vegar · ‎2023-01-23

Hei Jakob

Not easy to pinpoint your issue by just looking at the (pseudo-)script, but when you filter by category in table_A after the mapping you have potentionally more than one value per category as your mapping key is per ID and Category. What happens if you select an ID?

To check if you do generate more rows than expected in your script try this.

Mapping_Orig:
Load ID, Category, Sum(Boolean_Key) as Sum1
resident Another_Table
where Boolean_Key > 0
group by ID, Category
;

Mapping:
Mapping Load ID & Category as Key, Sum1 as Sum2
Resident Mapping_Orig
;

DROP TABLE Mapping_Orig; //you don't need it anymore

Table_A_Temp:
NoConcatenate Load *, ID & Category as Key
resident Table_A
;

LET vNoOfRowsTable_A_Temp = noOfRows('Table_A_Temp');
TRACE $(vNoOfRowsTable_A_Temp);

Drop Table Table_A;

Table_A:
NoConcatenate
Load *, ApplyMap('Mapping', Key, 0) as Sum2
resident Table_A_Temp
;

Drop Table Table_A_Temp;

LET vNoOfRowsTable_A = noOfRows('Table_A');
TRACE $(vNoOfRowsTable_A);

Now you can compare the variables vNoOfRowsTable_A and vNoOfRowsTable_A_Temp. If they are the same then you have proven that the no of rows are the same before and after the applymap().

/Vegar

Vegar
Qlik Community MVP

jakeng · ‎2023-01-23

Hei igjen,

I have done as instructed, and the number of rows is identical in both tables. So it should work, but then again when I use my Aggr Sum expression, it outputs an inflated number as if duplicates were present.

I use this expression:

=Aggr(Sum(Sum2),ID)

For example, for a ID with two active categories, two sums are 2 and 11, but the expression above returns 260. I guess since I have 20 rows for the key pair ID & Category, but this leads to my initial question why does ApplyMap() not assign the sum to the first matching row of those 20 candidates but to all of them? How else do I get this to work? Do you need me to share the full code?

My quick fix so far has been to count distinct values only for each ID, but this approach features the obvious weakness that it wont work if two or more categories have the same sum.

=Aggr(Sum(Distinct Sum2),ID)

Question 2: How can I sum over aggregated sums? Something like this, but this doesnt work:

=Sum(Aggr(Sum(Aggr(Max(Sum2),ID,Category),ID))

Basically the inner sum should return the sum for each ID & category combination (since duplicates are present, I use Max()), and the outer sum should sum these three numbers (since I got three categories) to one single number for each ID.

Set Analysis