Pretty fast Pattern Collector

    Hi all,

     

    inspired by Barry’s great post “A very elegant link table script” about speeding up link table generation in Qlik by using the FieldValue function I looked for other use cases. Operations on the symbols of loaded tables in Qlik are blazing fast compared to record based operations (join, group by, etc.)…

     

    Considering an ETL chain with Data Quality in mind one typical task is to check the data pre-load to discover issues (eg. unallowed formats or patterns in data, value ranges, thresholds, missing dimensional values etc.) to have an ability to stop loading processes in an early stage. Data Profiling is a usual task to solve these kind of problems.

     

    This sparked the idea in me to create a Pretty fast Pattern Collector based on the above principle:

    Looping over all fields and symbols of a loaded table to collect all formats and patterns from the data values

    Pseudocode Pattern Collector

    The performance of this simplified Data Profiling approach does not depend on the amount of records but on the amount of symbols: distinct values per field

     

    The trade off is, you do not have a relation from the symbol to the record and so you cannot do aggregations like count, avg or sum, although min/max will work.

    Pattern Collector App

    I was surprised about the load speed during my testings! Just try yourself and download the QlikView app here. The Load Script code will work in Qlik Sense also.

     

    I’m curious what other use cases would exist for using the speed of Qlik’s symbol processing.

     

    Happy profiling!

    Ralf