At what stage should we apply number formatting?

mmarchese · ‎2020-05-12

Which of the following two methods is considered best practice? In reality, the data would be loaded from a database, file, or qvd.

1) Leaving the field unformatted and applying formatting to the measure.

LOAD
    m_quantity,
INLINE [
    m_quantity
    1000
    2000
];
Set vm_quantity = Num(Sum(m_quantity), '#,##0');

2) Applying formatting to the field and letting it propagate to the measure.

LOAD
    Num(m_quantity, '#,##0') AS m_quantityF
INLINE [
    m_quantity
    1000
    2000
];
Set vm_quantity = Sum(m_quantityF);

I've been doing method 1, but now that I'm aiming to build up a data warehouse of qvds, I'm considering using method 2 to push more code from user apps into qvd-loading apps and to encourage more standardized number formatting.

Then again, the nice thing about method 1 is that it always works, whereas applying formatting to fields is fragile. I experimented to see how field formatting propagates and found that if you do anything to the field other than wrapping it in an aggregation, you lose the formatting. For instance, all of the trivial formulas below remove that field's formatting. And the legitimate scenario of calculating one field based on others also kills the formatting, even if every source column had the exact same formatting. So what's best? Try for method two and supplement it with method one as needed?

Set vm_quantity = Sum(m_quantityF * 1);
Set vm_quantity = Sum(m_quantityF + 0);
Set vm_quantity = Sum(m_quantityF) * 1;
Set vm_quantity = Sum(m_quantityF) + 0;

rwunderlich · ‎2020-05-12

I tend to favor approach #1, delay formatting until the measure. Primary reason is storage. Numbers formatted in the script will usually be stored as duals -- storing both numeric and string -- using more RAM and disk space.

-Rob

mmarchese · ‎2020-05-12

@rwunderlich Thanks for the reply. Interesting. I'm trying to understand this fully:

So if you leave a numeric field unformatted, it consumes less memory? I thought I read somewhere that everything is a dual in Qlik, but I guess not? Maybe I misinterpreted the statement, "There are no data types in Qlikview" from this article about duals. Anyway, in light of what you said about memory, I'm trying to picture how it works. Does Qlik do something like this?

Store both the number and the string if a function was used that returns a dual, such as Dual(), Date(), Num(), etc.
Otherwise, if the field is numeric, store only a number.
Otherwise, if the field is non-numeric, store only a string.

Also, the memory impact would only be in the symbol table, not the data table (borrowing terminology from here), right? So if you didn't have many unique values, the difference in memory consumption would be negligible.

rwunderlich · ‎2020-05-12

Yes, I would agree with all of your statements. I would make a slight tweak to your #1, in that Dates, even though "Dual", may be stored slightly more optimized. See here for more https://qlikviewcookbook.com/2017/03/dual-storage-vs-dual-behavior/. This detail is not really significant to your original question.

Yes, it impacts only symbol table storage and in many cases the increase is negligible. So if pre-formatting makes you more efficient in some way, the trade off may be worth it.

The increase in symbol size is also reflected in QVDs, something of more concern to some folks.

-Rob

mmarchese · ‎2020-05-13

@rwunderlich

Ok, I suppose I'll stick to approach 1 for now (formatting measures, not fields).

I just read the article of yours, as well as your other one here. Unfortunately, my main takeaway is that Qlik is a mysterious black box. I say that because I have no idea what the rules are for when Date() and Num() fields are stored as true duals (one number and one string per row) vs "implied duals" (one number per row + a single formatting string). The only rule I know (based on your articles) is that if a date field is referenced by a WHERE clause, it will be forced to a true dual and that using Date(Num(DateField)) will force it back to an implied dual. Neither of those behaviors makes much intuitive sense to me, but I trust that you are correct. Is there anywhere else I could learn more about these nuances?

General Question