Qlik Sense App: Generic Data Profiler

stevedark · Oct 11, 2017 5:36:31 PM

This app has a number of objects that work together to allow you to see the frequency of values on any field in your own apps. The field is selected from a Filter Pane and then a Bar Chart object shows the frequency of values within the selected field. There is also a Filter Pane on the selected field, allowing searching on that field.

The objects can be copied and pasted into any Sense document, allowing this functionality on any data set.

A full tutorial on how to build the components in this app, using only standard Sense features, is given in a blog post which you can find here:

https://www.quickintelligence.co.uk/qlik-sense-data-profiler/

This app is a Sense port of our popular QlikView app, which can be found here:

QlikView App - Generic Data Profiler

I hope that you find the application useful. You will find other applications that I have uploaded under my profile on QlikCommunity or on our Downloads page..

Steve

http://www.quickintelligence.co.uk/qlikview-blog/

sergeyko · ‎2021-07-02

Hi @stevedark ,

This is an awesome application - very useful! Thank you for sharing.

I see you use Minstring() to fetch the first of selected dimensions and use it as an input for your functions that compute the descriptive statistics for the target field.

To echo @ajunaidm, it would be amazing to enable showing the same key stats for multiple / all fields in a given table. Let's say, I select 5 dimensions and see all of them in a table that shows Min, Max, # of Nulls, etc. stats for each field.

I see your replay above but was wondering if you came across a more universal solution in the last couple of years.

I saw a solution in the latest tips and tricks app that leverages the advanced Trellis container option on top of your framework to get us VERY close (see "Field overview and distribution" example).

@Patric_Nordstrom- FYI and thanks for the great tip! Can you think of any way to make multiple it work for a straight table (i.e. shows multiple dimensions with descriptive stats)?

https://community.qlik.com/t5/Documents/Top-10-Viz-tips-QlikWorld-2021-part-V/ta-p/1787570

stevedark · ‎2021-07-03

Hi @sergeyko

There is more and more profiling being baked into Qlik Sense, which is great, it may eventually make my Profiler sheet redundant.

I still believe trying to do the stats over all fields at once would be too heavy to have in real time on the front end.

It would be possible to write load script that loops through all of the fields in the data model and then creates the metadata for it, it could go as far as working out all of the min and max values and even inserting all unique values with a count into a profile table.

Perhaps a more sensible way of getting these sort of data is to look at the stats that are included in the header of the QVD files that make up the application. I talk about this approach here:
https://www.quickintelligence.co.uk/read-meta-data-qlikview-qvd/

Other things you might want to look at include CatWalk from Qlik:
https://catwalk.core.qlik.com/

And QSDA by Rob Wunderlich:
https://easyqlik.com/qsda/

Hope that helps.

Steve

sergeyko · ‎2021-07-05

Hi @stevedark ,

Thank you for weighing in and sharing some useful resources.

I am certainly using your awesome QVD Meta Data Viewer app for reviewing high-level stats around our QVDs and their fields. I wish they expose more information in the QVD header (nulls, min/max, avg, etc.).

My current task is to explore the data model (flat table) created after merging and cleaning the data from a few QVD files. Doing it one field at a time won't be feasible as I am dealing with a few dozens of fields and I am trying to understand which ones are more and less suitable for predictive analytics (e.g. a barely populated field is not a good fit). Essentially, looking to implement an equivalent of Pandas.DataFrame.Describe() within Qlik. The Python package seems to handle pretty large data sets well.

If there is a hack to make it cycle through multiple fields on the front-end, I'd be willing to take my chances.

Otherwise, I am very curious to learn your thoughts on how to build the required logic that will calculate the descriptive statistics for each field in a given table as part of the loading script.

I really appreciate your help.

stevedark · ‎2021-07-05

Hi @sergeyko

What I was thinking (and I've not tried it so no idea if it will work) was enumerating around each field in each QVD and put it into a simple table, then use the Fieldname as the dimension:

let sTable = 'TableName';

for iField = 0 to NoOfFields(sTable)
let sField = FieldName(iField, sTable);

Fields:
LOAD
'$(sTable)' as Table,
'$(sField)' as Field,
[$(sField)] as Value,
sum(1) as [No Of Rows]
RESIDENT [$(sTable)]
GROUP BY [$(sField)]
;
next

You could, if you wanted, do a similar loop for all of the tables in your data model also.

Calculations over that in the front end should then be relatively quick.

Let me know how you get on!

Cheers,

Steve

stevedark · ‎2021-07-06

Hi @sergeyko

Script looks good. I did consider grouping to one row per field, but thought that having all the field values would give more flexibility in the front end. You could generate the bar chart of frequencies, for instance, and the reduced number of columns and rows would mean it would still function well.

You could expand that to include the length and type etc. of each field, to make the average calculations more efficient in the front end (you would need to remember to do sum(Length*[No Of Rows]) / sum([No Of Rows]) when doing the average length though, to avoid skew.

Bizarrely you may find that loading from QVD many times (from a fast local disk) is as quick as the resident.

You can combine this code with the code from the QVD header app to profile every field in every QVD you have in a folder.

Glad you got something working, hope it gives you all you need.

Steve

sergeyko · ‎2021-07-06

Hey @stevedark - thanks for the additional feedback,

I've updated the code above based on some additional experiments with processing larger data sets. The version with the temporary table showed the best performance on my file with 500k rows and 500 columns (loaded in ~25 min).

Love the recommendations about adding new data points and pulling data from the respective QVD headers. While it is not on my critical path at this time, I will definitely look into my descriptive stats logic in the future.

Oh, and by the way, I found an answer to my initial question on how to implement some of these descriptive stats for multiple fields on the front-end. Refer to the solution in the following post (the ingenuity of Qlik experts never ceases to amaze me!) but proceed at your own risk. These are heavy computations that can every impact your app's performance at scale.

https://community.qlik.com/t5/QlikView-App-Dev/Chart-with-count-of-Field-values/td-p/503655