Qlik Community

Ask a Question

QlikView App Dev

Discussion Board for collaboration related to QlikView App Development.

Announcements
Support Cases coming to Qlik Community Oct. 4! Start chats, open cases, explore resources: READ DETAILS
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Performance implications of using dual values in set analysis

Hi

Assume I have model where transaction table has field OperationType: Sale, Purchase, Transfer and so on.

In general I have three choice:

  • fill that field with plain text values ("Sale", "Purchase", "Transfer")
  • fill that field with numeric (integer) values - make that field a key to small dimension table: LOAD INLINE [key, value.. 1, Sale .. 2, Purchase .. 3, Transfer]
  • fill that field with dual values ('Sales',1), ('Purchase',2), ('Transfer',3)

My understanding is that first variant would be the least performant: expression Sum({<OperaionType={'Sale'}>} Amount) woud filter dataset by text value.

In secod variant I may use numeric values in set analysis as in Sum({<OperaionType={1}>} Amount) which woud be as performat as it can be.

In third variant on the surface I have filtering by text in the set analysys formula, Sum({<OperaionType={'Sale'}>} Amount), same as in first variant. But theoretically QlikView engine my under the hood use in filter operations numeric part of the dual values.


So, how dual values perform in set analysis on large datasets? Comparable with plain text values? Comparable with numeric values?

Somewhere in between?

Thank in andance

1 Solution

Accepted Solutions
tresesco
MVP
MVP

Dual is basically text representation which has a numerical value internally (which could be useful in some scenarios - like, soritng the text in a customized order). So if not necessary, it's not a good idea to overhead a numeric field with some text (as done in dual). And yes, performance wise a numeric field would be better than a dual (which is overheaded with a text).

View solution in original post

4 Replies
tresesco
MVP
MVP

My understanding is, set analysis uses some sort of indexing. If that is the case, searching time for stored vaules would not be different on type(numeric/string) of stored values. But this could effectively matter in a large database in terms of calculation/cpu efficiency in expressions without set analysis where search/sort is required because text sort is always costlier than numeric sort. Apart from these,numeric flagging would be better in terms of memory usage as well because of qlikview's Symbol Tables and Bit Stuffed Pointers mechanism.

Not applicable
Author

My understanding is, set analysis uses some sort of indexing. If that is the case, searching time stored vaules would not be different on type(numeric/string) of stored values.

So actually I was wrong assuming that set analysis on plain text values would imply some performance penalty?

Apart from these,numeric flagging would be better in terms of memory usage as well because of qlikview's Symbol Tables and Bit Stuffed Pointers mechanism.

I would probably not consider on practice using plain text values in such situation anyway.

My main interest is to understand is there any meaningfull differences between plain numeric and dual values in terms of memory usage, performance and so on. Did for instance plain numeric values make better with respect to Symbol Tables and Bit Stuffed Pointers then dual values on large datasets?

tresesco
MVP
MVP

Dual is basically text representation which has a numerical value internally (which could be useful in some scenarios - like, soritng the text in a customized order). So if not necessary, it's not a good idea to overhead a numeric field with some text (as done in dual). And yes, performance wise a numeric field would be better than a dual (which is overheaded with a text).

View solution in original post

suresh_rawat
Creator II
Creator II

Hi Vadim,

As per your concerned point :

Plain Numeric is better than the dual function.

In dual function we first convert our text field to numeric value, which is beneficial for sorting our data accordingly.

But when it comes to performance Plain numeric is better than the dual function.

Regards

Suresh Rawat