2013

(un)originality

Posted by Michael Anthony Dec 27, 2013

Who you are is the product of all of the experiences you have had, and not had, throughout your lifetime. Nobody operates in complete isolation. Everyone is influenced by sources outside of themselves. We take those experiences and internalize them with our other memories in our own way but ultimately everything you come in contact with serves as material for the future you. So it stands to reason that new ideas & creativity are also the result of taking existing ideas and transforming them.

 

People frequently talk about ideas/people as being "totally original," but the truth is that originality is rather unoriginal. People with seemingly totally new ideas are really just the result of taking existing concepts and bringing them together in new ways. Perhaps you can identify the original source material, perhaps you can not, but everyone is influenced by ideas outside of themselves and nobody creates something entirely new.

 

The 4 part video series Everything is a Remix is a fantastic exploration of this in action. From music, to film, to mechanical invention everyone is influenced by the work of others.

You wake up in the morning and head down to the living room; faintly you hear a lingering “hohoho” from the chimney. Santa was here – and he left us something wonderful!


Santa is early this year, not only did he bring us a new and shiny service release for QlikView he also included a free to use mapping extension!

 

When you install the latest version of QlikView, http://www.qlik.com/download/, we now ship an Example Extension Object that makes use of OpenLayers and MapQuest.

So how do I make use of this sweet nectar you say?
Let me take you on a journey and explore some mapping possibilities!

  1. Head out and install the latest version of QlikView, make sure you install the examples.
  2. Navigate to C:\Program Files\QlikView\Examples\Extensions and double click the “Extensions Examples.qar” file. This will install all of the extension examples.
  3. Open the QlikView “Extension Examples that you can find in C:\Program Files\QlikView\Examples\Documents\
  4. In the Mapping tabs you will find examples on how to plot either dots/points, lines or polygons.

 

map.png

 

Attached to this post you will also find a dataset and an app that contains all of the high speed cameras in Sweden with corresponding latitude and longitude points if you want to play around with the extension, make sure you install the extension first, on your own.

 

Keep on Qliking!

 

Keep in mind

Extensions are generally built upon web technologies such as HTML and JavaScript and for QlikView to be able to render these objects on the screen you will need to run QlikView Desktop with WebView mode enabled or access the document through the AJAX-client over AccessPoint. The IE-plugin does not support extensions.

 

Disclaimer

The QlikView Mapping Example Extension can be configured to use many different map tile sources.

Each map tiles source has its own terms and conditions and the user assume all responsibility for the selection of a source for map tiles and for compliance with the terms and conditions of the selected source. Any and all liability associated with the selection of a tile source and the compliance with the terms and conditions of the selected source is hereby disclaimed.

Henric Cronström

Rounding Errors

Posted by Henric Cronström Dec 17, 2013

If you use equality as a condition when comparing floats, I will flunk you!

 

I can still hear the words of the Professor in my first programming class when studying for my engineering degree. The threat was very real – he meant it – and the reason was of course the fact that you cannot (always) represent decimal numbers in an exact binary form.

 

For example, we would never dream of writing a condition

 

  If( x = 0.3333333 , … )

 

when we want to test if x equals a third. Never. Because we know that a third cannot be represented exactly as a decimal number. No matter how many threes we add to the number, it will still not be exact.

 

But it is not uncommon that people make comparisons with an exact decimal number, similar to

 

      If( x = 0.01 , … )

 

thinking that it is a valid comparison, although it leads to exactly the same problem as the previous comparison! This becomes obvious if you look at the hexadecimal representation of 0.01:

 

      0.01 (decimal) = 0.028F5C28F5C28F…. (hex)

 

The sequence …28F5C… is repeated an infinite number of times, but since QlikView uses a finite number of binary digits (all according to the IEEE standard), QlikView will internally use a “rounded” number.

 

So what are the consequences? Well, QlikView will sometimes deliver the “wrong” number as result. Examples:

 

      Ceil( 0.15, 0.01 ) will return 0.16

      Floor( 0.34, 0.01 ) will return 0.33

      0.175*1000 = 175 will return FALSE

      Time( Floor( Time#( '04:00:00' ),1/24/60/60 )) will return 03:59:59

 

What you see are not errors in QlikView. And they are not errors in IEEE 754. Rather, they represent errors in the expectation and usage of binary floating point numbers. Once you understand what binary floating point numbers really are, it makes perfect sense. It's simply that some values cannot be exactly represented as binary numbers, so you get rounding errors. There's no way around it.

 

Should you want to investigate this yourself, I suggest you start with the following script that generates 100 numbers and their rounded counterparts. In five cases the Ceil() function rounds "incorrectly" and generates a "Diff" different from zero:

 

Load

   Num(Rounded,'(HEX) 0.000000000000000','.',' ') as RoundedHEX,

   (Round(100*Rounded) - PartsPer100)/100 as Diff,

   *;

Load

   Ceil(PartsPer100/100, 0.01) as Rounded,

   *;

Load
   RecNo() as PartsPer100

   Autogenerate 100 ;

 

So, what should you do?

 

First of all, you should realize that the rounding errors are small and usually insignificant. In most cases they will not affect the result of the analysis.

 

Further, you could avoid rounding with Floor() and Ceil() to sub-integer fractions.

 

Also, you could convert the numbers to integers, because the errors will only appear if the numbers can have sub-integer components. For instance, if you know that you always deal with dollars and cents, you could convert the numbers to (integer) cents:

 

  Round( 100*Amount ) as Cents

 

Or if you know that you never deal with time units smaller than seconds:

 

  Round( 24*60*60*Time#( Time, 'hh:mm:ss' ) ) as Seconds

 

And finally, you should never use equality as a condition when comparing floats. Use greater than or less than. My professor isn’t here to flunk you, but rest assured: In his absence, QlikView will do it for him.

 

HIC

I often use some sort of mapping in the QlikView applications I create to manipulate the data.  Mapping functions and statements provide developers with a way to replace or modify field values when the script is run.  By simply adding a mapping table to the script, field values can be modified when the script is run using functions and statements such as the ApplyMap() function, the MapSubstring() function and the Map … using statement.

 

Let’s take a look at how easy it is to use mapping in a QlikView application.  Assume our raw data looks like this:

Data.png

 

You can see the country United States of America was entered in various ways.  If I wanted to modify the country values so that US was used to indicate the United States of America, I could add a mapping table like this to map all the variations of the United States of America to be US.

CountryMap.png

 

Once I have a mapping table, I can start using it.  I usually use the ApplyMap() function when I am mapping.  The script below will map the Country field when this table is loaded.

ApplyMap.png

 

The results are a table like the one below where all the Country values are consistent, even the one that was misspelled (Country field for ID 4).  The mapping handled all the variations that were entered in the data source and when the mapping value was not found the default ‘US’ was used.

ApplyMap results.png

 

Now I could have also used the Map … using statement to handle the mapping.  Personally, I have never used this statement but if you had many tables that loaded the Country field and you wanted to map each of them, Map … using provides an easier way of doing it with fewer changes to the script.  After loading the mapping table, you can say:

map using.png

     ...

     load data

     ...

unmap.png

 

This will map the Country field using the CountryMap until it reached the Unmap statement or the end of the script.  The main difference between this and the ApplyMap() function is with the Map … using statement, the map is applied when the field is stored to the internal table versus when the field is encountered.

 

One last mapping function that is available in QlikView is the MapSubstring() function that allow you to map parts of a field.  Using the mapping table below, the numeric data in the Code field is replace with the text value.

 

Before MapSubstring() function is used:

Before.png

CodeMap.png

Mapsubstring.png

After MapSubstring() function is used:

After Mapsubstring.png

 

The numeric values in the Code field were replaced with the text values.

 

Mapping is a powerful feature of QlikView that I use in just about every application.  It allows me to “clean up” the data and format it in a consistent manner.  I often use it to help scramble data when I have many values that I need to replace with dummy data.  So the next time you are editing or “fixing” the data in your data source, consider mapping.  Check out the technical brief I wrote on this topic.

 

Thanks,

Jennell

“Which products contribute to the first 80% of our turnover?”

 

This type of question is common in all types of business intelligence. I say “type of question” since it appears in many different forms: Sometimes it concerns products, but it can just as well concern customers, suppliers or sales people. It can really be any dimension. Further, here the question was about turnover, but it can just as well be number of support cases, or number of defect deliveries, etc. It can in principle be any additive measure.

 

It is called Pareto analysis. Sometimes also known as 80/20 analysis or ABC analysis.

 

The logic is that you first sort the products according to size, then accumulate the numbers, and finally calculate the accumulated measure as a percentage of the total. The products contributing to the first 80% are your best products; your “A” products. The next 10% are your “B” products, and the last 10% are your “C” products.

 

Pareto chart.png

 

And here’s how you do it in QlikView:

  1. Create a pivot table and choose your dimension and your basic measure. In my example, I use Product and Sum(Sales).
  2. Sort the chart descending by using the measure Sum(Sales) as sort expression. It is not enough just to check “Sort by Y-value”.
  3. Add a second expression to calculate the accumulated sales value:
         RangeSum(Above(Sum(Sales), 0, RowNo()))
    Call this expression Accumulated Sales. The Above() function will return an array of values – all above values in the chart – and the RangeSum() function will sum these numbers.
  4. Create a third expression from the previous one; one that calculates the accumulated sales in percent:
         RangeSum(Above(Sum(Sales), 0, RowNo())) / Sum(total Sales)
    Format it as a percentage and call it Inclusive Percentage.
  5. Create a fourth expression from the previous one; one that calculates the accumulated sales in percent, but this time excluding the current row:
          RangeSum(Above(Sum(Sales), 1, RowNo())) / Sum(total Sales)
    Format it as a percentage and call it Exclusive Percentage.
  6. Create a fifth expression for the ABC classification:
          If([Exclusive Percentage] <= 0.8, 'A', If([Exclusive Percentage] <= 0.9, 'B', 'C'))
    Call this expression Pareto Class. The reason why the Exclusive Percentage is used, is that the classification should be determined by the lower bound of a product’s segment, not the upper.
  7. Create a conditional background color, e.g.
          If([Pareto Class] = 'C', LightRed(), If([Pareto Class] = 'B', Yellow()))

 

You should now have a table similar to the following. In it you can clearly see the classification of different products.

 

Table.png

 

In this table, there are five different expressions that you can use for Pareto analysis. The graph in the beginning of this post uses Sales and Inclusive Percentage for the bars and the line, respectively; and Pareto Class for the coloring of the bars.

 

Further, you may want to combine the Pareto Class and the Exclusive Percentage into one expression:

 

     Pareto Class =
         If(RangeSum(Above(Sum(Sales),1,RowNo())) / Sum(total Sales) <= 0.8, 'A',
         If(RangeSum(Above(Sum(Sales),1,RowNo())) / Sum(total Sales) <= 0.9, 'B', 'C'))

 

Good luck in creating your Pareto chart.

 

HIC

 

Further reading related to this topic:

Recipe for an ABC Analysis

Recipe for a Pareto Analysis – Revisited

The search functionality is central to QlikView. You enter a string, and QlikView immediately searches in the active list box and displays the matches. But what really defines a match? For example, should you find strings containing ‘Š’ when your search string contains an ‘S’? Or ‘Ä’ when you search for ‘A’?

 

These may be odd questions for people with English as first language, but for the rest of us who use “strange” characters daily, these questions are important as the answers affect not just search results, but also sort orders.

 

It is called Collation.

 

A collation algorithm defines a process of how to compare two given character strings and decide if they match and also which string should come before the other. So, the collation affects everything from which search result you get in a query, to how the phone directory is sorted.

 

Basically the collation is defined differently in different languages. Examples:

 

  • The English collation considers A, Å and Ä to be variants of the same letter (matching in searches and sorted together), but the Swedish collation does the opposite: it considers them to be different letters.
  • The English collation considers V and W to be different letters (not matching, and not sorted together), but the Swedish collation does the opposite: it considers them to be variants of the same letter.
  • Most Slavic languages consider S and Š to be different letters, whereas most other languages consider them to be variants of the same letter.
  • In German, Ö is considered to be a variant of O, but in Nordic and Turkish languages it is considered a separate letter.
  • In most western languages I is the upper case version of i, but in Turkish languages, I is the upper case of dotless ı, and İ (dotted) is the upper case of dotted i.

 

An example of how these differences affect sort orders and search results can be seen in the pictures below:

 

English.png   Swedish.png

 

The search string is the same in both cases, and should match all field values that have words beginning with ‘a’ or ‘v’. Note that sort orders as well as search results differ.

 

Hence: A number of differences exist between languages that have special characters or characters with diacritic marks, e.g. Å, Ä Ö, Æ, Ø, Þ, Ś, Ł, Î, Č. Sometimes these characters are considered as separate letters, sometimes not. Some languages even have collation rules for letter combinations and for where in the word an accent is found. An overview can be found on Wikipedia.

 

So, how does QlikView handle this?

 

When QlikView is started, the collation information is fetched from the regional settings of the operating system. This information is then stored into the qvw file when the script is run.

 

Locale.png

 

 

Usually you don’t need to think about this, but should you want to test it yourself, just change the regional settings in the control panel (the Formats tab – not the Location tab), restart QlikView, and run the script of your application.

 

Bottom line – should you need to change the collation, you should do it on the computer where the script is run.

 

HIC

 

Further reading related to this topic:

Text searches

The Search String

The Expression Search

Filter Blog

By date:
By tag: