Solved: Set Analysis - Wildmatch with seperate field - Qlik Community

Stromberg · ‎2022-08-03

Hey all,

I am trying to optimize an App to make it work faster. I want to convert a wildmatch statement into a set analysis statement. The data roughly looks like that:

Table1:

ID	Number1	Number2	Number3
AX123	12345	56789	98765
AY987	23456	12345	76543
AZ456	65432	87654	45678
....	...	....	....

Table2 (already filtered to one row):

LIST

12345,43215,98756,23453,65432

Now I want to count the IDs, where neither of the three numbers are in the LIST field. For the if statement I roughly had:

count(distinct if(not(wildmatch(LIST,'*'&Number1&'*') or wildmatch(LIST,'*'&Number2&'*') or wildmatch(LIST,'*'&Number3&'*')),ID))

For the set analysis I thought something like:

count(distinct {$<LIST-={"*Number1*"}>+<LIST-={"*Number2*"}>+<LIST-={"*Number3*"}>},

which didn't work, probably because 'NumberX' is a field and that statement does not 'evaluate' the content. I tried $(=Number1) and a few other variations as well, still no changes. And I don't know, what else to try.

So in short: in the set analysis condition I want to 'access' the actual number in a field (e.g. 12345), not the field name (Number1).

Is there a way to do that?

I'd be very thankful for any help.

Greetings,
JK

P.S.: In my original statement there are more conditions, but I already figured out, that this part is the one making problems. I'd use the if statement, if set analysis wasn't so much faster.

marcus_sommer · ‎2022-08-04

I'm not sure if it's possible respectively sensible to solve such a scenario with a "classical" set analysis without touching the data-model. One reason is that the check needs to be done against multiple fields and that any call of a field needs an aggregation because without it only() is applied and it means as far as more as single value is possible within the used context the result will be NULL. Further a classical set analysis evaluates the conditions on a column-level - for each use-case which needs to be an evaluation on a row-level it couldn't be applied (the required context wasn't explicitly mentioned) .

General possibilities would be an if-loop like you already used. Another method would be something like the suggestion form @MarcoWedel but this is in reality an if-loop, too - just within a set analysis syntax. Nevertheless this method should be more performant because it's logically a change from:

count(if(condition, Field))

to

if(condition, count(Field))

which is usually significantly faster.

By keeping your data-structure you may consider a reverse querying - not numbers in list else list in numbers. This may look like:

count({< Number1 -= {"$(=only(LIST))"},
Number2 -= {"$(=only(LIST))"},
Number3 -= {"$(=only(LIST))"} >} distinct ID)

But even if this approach may work in this case I suggest the above mentioned change within the data-structures in the data-model. Reasons are for example:

that it wouldn't work if a row-level evaluation is needed
the number of crosstable-fields is unknown or varying to a certain context (of course it could be handled, too but at least here you will need advanced if-loops)
the complexity within the data-model + UI by developing and maintaining the applications
more or less performance disadvantages

I never did a performance-check from crosstable-structures against streamdata-structures for dimension-fields but I really doubt that there would be ones - at least not within the UI. Beside the multiple disadvantages to handle them within objects and expressions and/or providing them as selection-fields.

In regard to measure-fields I made such a comparison with simple dummy-data and two or three measures and the short but wide fact-table was slightly smaller and opened a bit faster as the long but narrow fact-table. But in reality with partly several dozens of measures in certain applications it worked very performant especially compared with our older business-releases which worked with the crosstable-measures. Whereby it's not direct comparison because we changed a lot of other things, too between the releases but beside these uncertainties I'm sure that the datastream-structure is a real benefit. Therefore give it a try.

- Marcus

View solution in original post

marcus_sommer · ‎2022-08-03

I suggest not to do the essentiell work within the UI else to change the data-model, for example to transform the crosstable into a data-stream, maybe with:

t: crosstable(NumberType, Value, 1) load * from Table1;

and then you may use an approach like:

count({< Value -= {"$(=only(LIST))"}>} ID)

whereby LIST seems to be loaded values and therefore they could be matched, for example with a mapping against the Value to create a 0/1 flag which is then used as a selection or a set analysis.

- Marcus

Stromberg · ‎2022-08-03

Thanks for the quick reply!

I'll try to do that, but since the result depends on multiple variable inputs, the data model is very complex and this method creates a lot of rows.

Do you know if it is possible to access the field value as I described above?

- JK

MarcoWedel · ‎2022-08-03

One way to make your expression work like expected could be:

=Count({$<ID={"=not Min(wildmatch(LIST,'*'&Number1&'*') or wildmatch(LIST,'*'&Number2&'*') or wildmatch(LIST,'*'&Number3&'*'))"}>} DISTINCT ID)

hope this helps
Marco

marcus_sommer · ‎2022-08-04

I'm not sure if it's possible respectively sensible to solve such a scenario with a "classical" set analysis without touching the data-model. One reason is that the check needs to be done against multiple fields and that any call of a field needs an aggregation because without it only() is applied and it means as far as more as single value is possible within the used context the result will be NULL. Further a classical set analysis evaluates the conditions on a column-level - for each use-case which needs to be an evaluation on a row-level it couldn't be applied (the required context wasn't explicitly mentioned) .

General possibilities would be an if-loop like you already used. Another method would be something like the suggestion form @MarcoWedel but this is in reality an if-loop, too - just within a set analysis syntax. Nevertheless this method should be more performant because it's logically a change from:

count(if(condition, Field))

to

if(condition, count(Field))

which is usually significantly faster.

By keeping your data-structure you may consider a reverse querying - not numbers in list else list in numbers. This may look like:

count({< Number1 -= {"$(=only(LIST))"},
Number2 -= {"$(=only(LIST))"},
Number3 -= {"$(=only(LIST))"} >} distinct ID)

But even if this approach may work in this case I suggest the above mentioned change within the data-structures in the data-model. Reasons are for example:

that it wouldn't work if a row-level evaluation is needed
the number of crosstable-fields is unknown or varying to a certain context (of course it could be handled, too but at least here you will need advanced if-loops)
the complexity within the data-model + UI by developing and maintaining the applications
more or less performance disadvantages

I never did a performance-check from crosstable-structures against streamdata-structures for dimension-fields but I really doubt that there would be ones - at least not within the UI. Beside the multiple disadvantages to handle them within objects and expressions and/or providing them as selection-fields.

In regard to measure-fields I made such a comparison with simple dummy-data and two or three measures and the short but wide fact-table was slightly smaller and opened a bit faster as the long but narrow fact-table. But in reality with partly several dozens of measures in certain applications it worked very performant especially compared with our older business-releases which worked with the crosstable-measures. Whereby it's not direct comparison because we changed a lot of other things, too between the releases but beside these uncertainties I'm sure that the datastream-structure is a real benefit. Therefore give it a try.

- Marcus

Stromberg · ‎2022-08-09

Thank you very much!!

I tried to combine your and Marcos input and in fact I was able to improve the Backend Datamodel so that the frontend calculations are much faster.

So it's not the solution I was expecting but still the result I was hoping for. 😄

Thanks again for your professional and extended explanations!

- JK

Set Analysis - Wildmatch with seperate field

Advanced Authoring

General Question