Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Sep 21, 2022 5:11:51 PM
Apr 10, 2018 10:05:20 AM
I must admit that I never really thought about how Qlik handled colors internally and how they are stored. Inspired by the following discussion: https://community.qlik.com/thread/297454 I looked a bit more closely and I'm not really surprised to detect that colors are dual-values which meant something like this:
dual(rgb(0,0,0), 4278190080)
Of course I could have known it because it is clearly stated within the help but usually I look only to the examples and don't read the descriptions …
This meant they are logically handled like date-values in which each date get a certain number assigned and a displaying of 2018/04/10 is just a formatting of a numerical value. Further because of the statement that Qlik used the color-logic of Visual Basic I assume that Qlik just used Microsoft *.dll's for it without implementing own logics.
How this worked could be comprehend with the following script - the attached qvw contained only a reduced dataset because most of the fields produce distinct values and by 16.8 M of records the filesize will be quite heavy:
set oRGB = 4278190080;
set eRGB = "rgb(floor(($1 - $(oRGB)) / 256 / 256),
floor(($1 - $(oRGB) - (256 * (floor(($1 - $(oRGB)) / 256 / 256) * 256))) / 256),
mod($1 - $(oRGB), 256))";
Colors:
first 165536 // the size of the dataset could be adjusted here
load *,
rowno() as RowNo,
ceil(rowno() / 256) as ColorClusterSmall,
ceil(rowno() / 256 / 256) as ColorClusterBig,
color(rowno()) as ColorRowNoText,
num(color(rowno())) as ColorRowNoNum,
(256 * 256 *256 * 255) + rowno() - 1 as ColorIndex,
$(eRGB(ColorNum)) as ColorIndexRGB;
load
R, G, B,
rgb(R, G, B) as ColorText,
num(rgb(R, G, B)) as ColorNum,
dual(rgb(R, G, B), num(rgb(R, G, B))) as ColorDual,
argb(0, R, G, B) as ColorTextARGB,
num(argb(0, R, G, B)) as ColorNumARGB;
load *, iterno() -1 as B while iterno() - 1 <= 255;
load recno() -1 as R, iterno() -1 as G
autogenerate 256 while iterno() - 1 <= 255;
This script creates with a nested loop (1 x autogenerate + 2 while) all possible combinations of the rgb-colors = 256 *256 *256 = 16,777,216 records (if you removed the first statement) and builds with them the colors as string-, numeric- and dual-value. On top of it are additionally clusters, color-functions which use the record-number as color-index and a function (with the help of the above variables) which calculates reverse from the color-index the rgb-notation.
And instead of rgb() or color-function like white() you could just use the color-index to specify your wanted color. Further included is a similar (reduced) logic for argb() to display why the rgb-index starts with an offset from 4,278,190,080 because the color-index for argb() starts by 1 and need all these numbers. You should comment or drop fields which have no real use for you to keep the application handy.
The clusters are just aimed to provide a selection possibility to certain areas with which the size of a table in the UI could be reduced because without any selection it might take some time for calculating and rendering the objects.
I hope it will help you to get a better understanding what happens by using colors in Qlik and here you could find a practically approach how to implement colors within charts: https://community.qlik.com/blogs/qlikviewdesignblog/2012/12/04/colors-in-charts.
Happy Qliking!
I think there may be something wrong in your demo document. The value 4,278,190,080 which you are calling "an offset", is actually a value of full opacity (corresponding to the leftmost FF in your hexadecimal RGB values). Indeed, the Color Functions help states that functions RGB, HSL and syscolor() all return an (ARGB) color value with Alpha = 255 and that should make the underlying numerical RGB and ARGB values identical. Which is not the case in your Records table (Numerical RGB(0,0,0) = 4278190080 while Numerical ARGB(0,0,0,0) = 0). Maybe change the fixed 0 value in your ARGB() calls into a 255 value to get full opacity. Alpha = 0 means full transparency, and that produces 256*256*256 identical (invisible) colors
[Edit] Corrected the total color count for invisible (Alpha=0) colors. Thanks Marcus.
Many thanks Peter for the hints. My explanation to the numerical starting of the color-index of the rgb-colors was related to them and therefore I called it offset because it doesn't starts with 1 else with 4278190080. Maybe it's a bit confusing but I didn't knew to explain it better and therefore I included the argb(0,0,0,0) to show that the index of 1 starts with them.
Further you are right that argb(255,0,0,0) is identically with rgb(0,0,0) respectively with black() and not only by the result else also by the color-index. Like you hinted all argb() with a transparency-parameter of 255 are equally to the rgb() and share the same color-index. This meant there is no separate color-index for the 256 transparency (alpha) parameter/color and therefore rgb() starts with 256*256*256*255 = 4278190080.
I think that there is a full range of color-indexes for the 0 alpha-parameter of the argb() has technically reasons like that always a color must be specified and that the alpha-parameter might be later variable changed like certain objects/windows reacts on certain settings/actions or maybe like the whole screen of windows get a bit transparence if a windows UAC message popup.
- Marcus
Excellent in-depth treatment of the topic Marcus, thank you!
Something else to consider. When using this technique, ApplyMap() is often the vehicle for performing the {string}+{numeric} pairings. However there may be times when the same color is to be reused in a single pass. Since Dual() accepts only the first string to be paired with a numeric and discards further strings wanting to reuse that same numeric, distinction is needed to reuse the same color multiple times in a single pass.
[COLORS]:
LOAD * INLINE [
DIMENSION, STRING, NUMERIC
Product_Category, Shoes, "RGB(100,255,100)"
Product_Category, Hats, "RGB(0,255,255)"
Product_Category, Shirts, "RGB(100,255,100)"
Product_Category, Pants, "RGB(75,75,255)"
];
[MAP_COLORS]:
MAPPING LOAD STRING, evaluate(NUMERIC) RESIDENT COLORS;
[FACT]:
LOAD Dual([STRING], ApplyMap('MAP_COLORS',[STRING],'*default color when map missed?')) AS [Product_Category]
RESIDENT COLORS;
/*
an additional consideration when attempting to assign the same color to more than one string in a single color-mapping pass.
Once dual pairs a numeric to a string, that numeric ignores any further attempts to pair new strings to the same value.
The mapping table cannot complete 'Shirts' + RGB(100,255,100) because the string 'Shoes' has already "reserved" that number.
In order to re-use the same color for multiple possible string matches, a small frac can create distinction of the numerical value, while preserving the essential colorindex
*/
[FACT]:
LOAD Dual([STRING], ApplyMap('MAP_COLORS',[STRING],'*default color when map missed?') + Rand() ) AS [Product_Category]
RESIDENT COLORS;
**Caveats:
Desktop - this technique will consistently work on all desktop installations
Desktop w/ web view - this has both worked & been non-functional in desktop web-view
Server via AccessPoint - this technique has typically not worked when migrating to server (and the resultant workaround usually involves moving the numeric values into a new stand-alone leaf-table)
Just a try - what happens by making the mapping values unqiue with something like:
[MAP_COLORS]:
MAPPING LOAD
STRING,
evaluate(NUMERIC) + rand() // (1 / rowno()) - to be sure
RESIDENT COLORS;
[FACT]:
LOAD
Dual([STRING],
floor(ApplyMap('MAP_COLORS',[STRING],
'*default color when map missed?'))) AS [Product_Category]
RESIDENT COLORS;
with or without the floor() in the dual().
- Marcus
Marcus Sommer wrote:
Just a try - what happens by making the mapping values unqiue with something like:
[MAP_COLORS]:
MAPPING LOADSTRING,
evaluate(NUMERIC) + rand() // (1 / rowno()) - to be sure
RESIDENT COLORS;
[FACT]:
LOADDual([STRING],
floor(ApplyMap('MAP_COLORS',[STRING],
'*default color when map missed?'))) AS [Product_Category]
RESIDENT COLORS;with or without the floor() in the dual().
- Marcus
Hello Marcus. Trying your example in desktop: When floor() is present, values 'Shirts' & 'Shoes' collapse and only 3 distinct values remain in [Product_Category] ('Shirts' is lost). When floor() is removed, the 4 original values remain, and color assignments are all completed.
After all I'm not sure if a dual() in this color-context is really an added value or if it not just adds a complexity which could be easier resolved by using two fields. Especially as I doubt that there is a significantly benefit in regard to the RAM usage (you might check it with the mem-files) and maybe it had even further disadvantages if the category-fields needs to be applied in aggr/total-statements (not checked but I could imagine that there also side-effects).
- Marcus
Yeah, I think the original intent was trying to compress the number of fields & tables, but the approach requires the pointer for the numeric value at each row, and that could be a lot bigger in RAM rather than a distinct list of dimension values as a leaf table, with the corresponding sort, color, etc.. (i.e. 10 distinct dimension values, assigned 10 colors, repeating across 5 million row pointers, vs. a leaf table of merely 10 rows associated to the main fact, with breakout fields for sort/color).
The leaf table approach is often the one I finally end up with (especially when the dual() evaluations cease functioning once arriving on the server).
The only other reasoning was the assertion that performance is impacted when calculations have to cross multiple table associations to get all the necessary components. This means the leaf tables holding the colors, sorts, etc.. now involve one additional hop (and each dimension field requires its own leaf table per these attributes are assigned).
An alternative could be to put dedicated attribute fields in the main fact table (i.e. a single table data model with a column for PRODUCT and a separate field for PRODUCT_COLOR & PRODUCT_SORT). This would use up more RAM, repeating color & sort info pointers for each fact row, but would absorb the leaf tables, meaning UI expressions don't have to traverse associations. Would this be exchanging RAM for CPU?
//Leaf tables
FOR EACH Dimn in 'A','B','C','D','E','F','G'
[DIMN_$(Dimn)_ATTRIBUTES]:
NOCONCATENATE LOAD
1 AS DIMN_$(Dimn),
1 AS DIMN_SORT_$(Dimn),
1 AS DIMN_COLOR_$(Dimn)
AUTOGENERATE(1);
NEXT
[FACT]:
LOAD
1 AS DIMN_A,
1 AS DIMN_B,
1 AS DIMN_C,
1 AS DIMN_D,
1 AS DIMN_E,
1 AS DIMN_F,
1 AS DIMN_G,
1 AS FACT_1,
1 AS FACT_2,
1 AS [FACT_ETC...]
AUTOGENERATE(1);
//Dimension attributes stored in fact table
[FACT]:
LOAD
1 AS DIMN_A,
1 AS DIMN_B,
1 AS DIMN_C,
1 AS DIMN_D,
1 AS DIMN_E,
1 AS DIMN_F,
1 AS DIMN_G,
1 AS FACT_1,
1 AS FACT_2,
1 AS [FACT_ETC...]
AUTOGENERATE(1);
FOR EACH Dimn in 'A','B','C','D','E','F','G'
JOIN(FACT)
LOAD
1 AS DIMN_$(Dimn),
1 AS DIMN_SORT_$(Dimn),
1 AS DIMN_COLOR_$(Dimn)
AUTOGENERATE(1);
NEXT
//Dual encapsulation of 1 set of dimensional attributes
[FACT]:
LOAD
Dual(1,1) AS DIMN_A,
Dual(1,1) AS DIMN_B,
Dual(1,1) AS DIMN_C,
Dual(1,1) AS DIMN_D,
Dual(1,1) AS DIMN_E,
Dual(1,1) AS DIMN_F,
Dual(1,1) AS DIMN_G,
1 AS FACT_1,
1 AS FACT_2,
1 AS [FACT_ETC...]
AUTOGENERATE(1);
~E
Indeed it's a question in which direction an application could/should be optimized and this depends mostly on the requirements to the UI response-times and of course to which area (CPU, RAM, ...) is the biggest bottleneck.
I think without a critical performance by such an application I would tend to a normal fact-table and putting the needed colors, sorting and other attributes in dimension-tables - whereby I would check if I could merge them into as less as possible. This meant I believe that your example above with 7 dimension-tables is rather unusual and often it will be just 1 - 3 hierarchical dimensions so that a single dimension-table might be sufficient.
Another point which often could be saved is the sorting-topic which could be done by pre-loding the distinct dimension-values within the wished order, loading then normally all script-parts and in the end dropping this dummy-tables again. With this approach for all sorting could be taken the load-order, no additionally fields and RAM is needed and also no processing by applying the sorting within the sorting-expressions.
Of course is anything critical by an application it would be necessary to dive deep into various designs and to measure their differences.
- Marcus