Solved: Re: Aggregating and keeping only the prevalent rec... - Qlik Community

SerSwagster · ‎2022-03-15

Hi everyone. I have loaded this input data:

Users:

Name City

Tom Toronto

Tom New York

Nick Philadelphia

Nick Los Angeles

Nick Philadelphia

Sam New York

Sam Chicago

Sam New York

Sam Chicago

John Phoenix

Vincent San Antonio

Bob Miami

Bob Boston

I want to reload these data in another table, aggregating by Name field, keeping only the record with most City occurrences, and otherwise, in case of the same number of occurrences for both Cities, to keep the distinct records, without repetition. I show you the result I expect:

Name City

Tom Toronto

Tom New York

Nick Philadelphia

Sam New York

Sam Chicago

John Phoenix

Vincent San Antonio

Bob Miami

Bob Boston

What is the code to achieve this result? I thought something like this:

Load Name,

if(not isnull(mode(City)) -> mode(City)

if(isnull(mode(City)) -> all the distinct records

Resident Users

Group by Name;

Any suggestions?

Or · ‎2022-03-15

This is a mockup and probably won't work as is, but the general gist is:

Table1:

Load City, Name, count(1) as Count

From YourTable

Group by City, Name;

Map:

Mapping Load Name, Count

Where RowNumber=1;

Load City, Name, Count, Autonumber(City, Name) as RowNumber

Order by Count desc;

Resident Table1;

Final:

Load Name, City

Resident Table1

Where Count = ApplyMap('Map', Name);

Drop Table Table1;

The added complexity with the mapping load and additional final load is to deal with ties, if you just wanted the single biggest (with ties being broken arbitrarily or using another field) you could just use the row=1 approach directly.

View solution in original post

Or · ‎2022-03-15

There are probably many ways to do this, and this might not be the most efficient, but...

1) Load the city, name, and count of lines for each combination

2) For each name load the maximum number of lines using group by name, might be easiest to use a mapping load here

3) From the first table, load only name/city combinations where the count matches the maximum number from the previous load (2)

I think this approach should get you what you're after without having to use loops, which would likely cause poor performance (depending on the data volume). As I said, there may be a simpler approach with fewer steps, but this is what comes to mind.

SerSwagster · ‎2022-03-15

This script will fit within a huge script that already takes more than 5 minutes to be executed. So I'm searching for, if it is possible, the best solution in order to not put additional burdens on my actual script. Do you think that your solution is the best in terms of impact on loading times?

Or · ‎2022-03-15

I doubt it's the best, but it shouldn't be a problem unless you're iterating over many, many millions of lines for City and Name. Everything would be done using Resident loads and it's only two fields, so I wouldn't expect much of a negative impact under normal circumstances.

SerSwagster · ‎2022-03-15

Can you write here, briefly, the script of your solution? It takes into account also the cases in which there are two cities with the same number of records for a certain name?

Or · ‎2022-03-15

This is a mockup and probably won't work as is, but the general gist is:

Table1:

Load City, Name, count(1) as Count

From YourTable

Group by City, Name;

Map:

Mapping Load Name, Count

Where RowNumber=1;

Load City, Name, Count, Autonumber(City, Name) as RowNumber

Order by Count desc;

Resident Table1;

Final:

Load Name, City

Resident Table1

Where Count = ApplyMap('Map', Name);

Drop Table Table1;

The added complexity with the mapping load and additional final load is to deal with ties, if you just wanted the single biggest (with ties being broken arbitrarily or using another field) you could just use the row=1 approach directly.

SerSwagster · ‎2022-03-16

It works, thank you!

One last question, to understand better your procedure: what is the specific step in which are kept only the combinations with most occurrences? Because the usage of RowNumber is not so clear for me, since I see sometimes the same combination, with the same count, with RowNumber = 1 and other times = 2.

Anyway, the problem is solved.

Or · ‎2022-03-16

This is the condition in the final load. We previously created a table that contains each name and the matching greatest count. It doesn't care which counties are associated with number, just what the number is. We then look for any name/country combination that matches that number and keep all of them. A bit tricky, and possibly not the simplest way to achieve this, but I was thinking it'd work.

Aggregating and keeping only the prevalent records

Data Load Editor

filter

Script