Solved: Load Script - Filter results by summary data - Qlik Community

WeePecky · ‎2020-08-12

Greetings,

I have a set of data that is summarised to show only those items that have 11 or more different 'types'.

e.g.:

ID	Year-Quarter	Count of 'Type'
1	2017-01	17
2	2018-02	12
3	2019-03	11

I am wanting to use this summary data to filter out IDs from the detailed results so only the details of those results with 11 or more 'Types' per quarter are included in the result set.

Any assistance appreciated.

Kindest regards
Wee

stevedark · ‎2020-08-13

You will want to use the WHERE EXISTS clause. This is a simple way of doing the equivalent of an INNER JOIN between tables in Qlik - provided there is only one field in common between the tables.

Summary:
LOAD
    ID,
    [Year-Quarter],
    [Count of Type]
FROM [lib://QVD/Summary.qvd] (qvd)
WHERE [Count of Type] >= 11;

Detail:
LOAD
   *
FROM [lib://QVD/Detail.qvd] (qvd)
WHERE EXISTS(ID);

If the field that you are matching on has a different name in the Detail table to the Summary table you need to specify both field names in the EXISTS statement.

So it would be:

WHERE EXISTS (ID, DetailID)

Hope that works out for you.

Steve

View solution in original post

RsQK · ‎2020-08-13

tmpDetailed:

LOAD *,

ID & '|' & "Year-Quarter" AS IDPeriod.#key;

LOAD

ID,

"Year-Quarter",

Type,

Datafield1,

Datafield2,

Datafield3

FROM [$(g_data)/data.qvd] (qvd);

Summary:

LOAD *,

ID & '|' & "Year-Quarter" AS LookupIDPeriod.#key

WHERE TypeCount>10;

LOAD

ID,

"Year-Quarter",

Count(Type) as TypeCount

Resident tmpDetailed

Group by ID,"Year-Quarter";

Detailed:

NoConcatenate

LOAD * RESIDENT tmpDetailed

WHERE EXISTS(LookupIDPeriod.#key,IDPeriod.#key);

DROP TABLE tmpDetailed;

DROP FIELDS LookupIDPeriod.#key,IDPeriod.#key;

stevedark · ‎2020-08-13

You will want to use the WHERE EXISTS clause. This is a simple way of doing the equivalent of an INNER JOIN between tables in Qlik - provided there is only one field in common between the tables.

Summary:
LOAD
    ID,
    [Year-Quarter],
    [Count of Type]
FROM [lib://QVD/Summary.qvd] (qvd)
WHERE [Count of Type] >= 11;

Detail:
LOAD
   *
FROM [lib://QVD/Detail.qvd] (qvd)
WHERE EXISTS(ID);

If the field that you are matching on has a different name in the Detail table to the Summary table you need to specify both field names in the EXISTS statement.

So it would be:

WHERE EXISTS (ID, DetailID)

Hope that works out for you.

Steve

WeePecky · ‎2020-08-13

@stevedark Thanks.
Sadly that did not work.

I still get results back that are under the count of 11.

[EDIT] I believe I failed to mention that the Type column can contain many values for the same ID, eg 15 of type Black and 12 of Type Purple, and only the unique count of this field should be considered when counting the Types. My apologies I should have been more diligent in the original post. <homer>doh!</homer>

So where there are 2000 types, if all 2000 are the same type then the count is 1.

The form of the detail includes the quarter as well as the ID an the match needs to include both columns.

The detail table looks like this:

ID	Quarter	Type	Data 1	Data 2	Data 3	Data 4
1	201701	Hot	some data	some data	some data	some data
1	201701	Warm	some data	some data	some data	some data
1	201701	Cold	some data	some data	some data	some data
...

I have tried matching on two columns but the results are the same.

11 different types are required for an ID for Quarter to be reported.

Thanks again.

stevedark · ‎2020-08-14

@WeePecky So you don't have a table with the Count Of Type? You need to create this in order to do the rest of the analysis?

Is the data in QVDs as I assumed? If so, you are looking for something like:

Summary:
LOAD
    *
    WHERE [Count of Type] >= 11
    ;
LOAD
    ID & ':' & Quarter as IDQtr,
    count(DISTINCT Type) as [Count of Type]
FROM [lib://QVD/Detail.qvd] (qvd)
GROUP BY ID, [Year-Quarter];

Detail:
LOAD
   *
FROM [lib://QVD/Detail.qvd] (qvd)
WHERE EXISTS(IDQtr, ID & ':' & Quarter);

If you are using QVDs, you will find that the process runs much quicker if you create the IDQtr field when you create the QVD rather than when you load from it.

WeePecky · ‎2020-08-14

Hi @stevedark

Yes, I do have a summary dataset, which is in a qvd file. My apologies for not being clear.

I have millions of rows across multiple quarters, which renders hundreds of thousands of matching types per quarter.

I am perplexed as to why your solution does not work. I am working through this, and will post the solution and reason for this not working when we have discovered it.

Thanks for your help!!

Wee

stevedark · ‎2020-08-15

Given the large number of rows I would definitely look at chunking up the load into various parts.

Create an aggregated QVD using the GROUP BY as a separate part of the process to the main load script.

When loading from the aggregated QVD, a WHERE EXISTS on possible counts would be quicker, or better still create a field called 11 Plus in the QVD and use that in a WHERE EXISTS.

Creating the composite key (ID & ':' & Quarter) when creating the QVD rather than on the fly goes from being a 'good idea' to being essential - due to optimised QVD loads.

Steve

WeePecky · ‎2020-08-16

@RsQK & @stevedark thank you for your input.

Both of your solutions are correct for the problem as described.

I had already created the summary with the count of "type" and was joining the two qvd files to each other, and tried to get the Exists clause to work as you advised .

The real problem was that there was an age field that (obviously) changed for some ID's (people are just the worst!) during the quarter. Once I used min on the age fields all my dreams came true!

My apologies to you good gents, and thank you for your time.

I tossed a coin, and have marked @stevedark as the solution. Sorry @RsQK .

Kindest regards

Wee

Load Script - Filter results by summary data

filtering

Script