Skip to main content
Announcements
See why Qlik was named a Leader in the 2024 Gartner® Magic Quadrant™ for Data Integration Tools for the ninth year in a row: Get the report

QlikView App - Generic Data Profiler

cancel
Showing results for 
Search instead for 
Did you mean: 
stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

QlikView App - Generic Data Profiler

Last Update:

Apr 8, 2023 9:14:45 AM

Updated By:

stevedark

Created date:

Oct 16, 2013 1:07:31 PM

Attachments

Often when building QlikView applications, or picking up applications which have been built by someone else I want to have a quick and easy way of viewing the data that is in that application.  To enable me to do this I have put together a page of objects that lists all tables and fields in the data model and then gives outline information about any selected field.  These objects can be copied and pasted into any QlikView application to view the data model of that document.

 

I have documented how this document works and the reasons why you might use it in a blog post here:

 

https://www.quickintelligence.co.uk/qlikview-data-profiler/

 

Please see the blog post for further details on using this document.

 

There is now a Qlik Sense version of this app, which you can find on Qlik Community here:

Qlik Sense App: Generic Data Profiler

 

I hope that you find it useful.  You will find other applications that I have uploaded under my profile on QlikCommunity, or on our Downloads Page.

 

Steve

https://www.quickintelligence.co.uk/blog/

 

PLEASE NOTE: The 'With Mask' version of the file includes an experimental tab that may or may not work well on large data sets.  Please see comment below for details.  If you are not sure which to download go for DataProfiler.qvw.  Thanks!

Comments
Not applicable

thank you.  I really like this.  was very simple to copy over to my applications.  One recommendation to add to the stats box. the actual count,  there is only the distinct count,  I added the count above the distinct count.  Helped me to see if a value in one field was in many records or just a few so I could quickly determine if it is a one-off entry error or a bigger problem.

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Debbie, I usually show a row count in the caption bar of objects like that.  Typically by summing a counter I add to each table.  That was not done here as I wanted it generic and not dependant on anything done in the data model.  Adding it in as a new value is a good idea.

DavidFoster1
Specialist
Specialist

Hi Steve

this is a great app. One thing I have seen in commercial data profiling tools are data-masks. e.g. 2345.55 would have a mask of 999.99 while Steve123 would have a mask of Aaaaa123. Having masks provide a great method for spotting potentially bad data when data is expected to conform to a certain mask or set of masks (e.g. UK postcodes, US ZipCodes, Email addresses, etc)

I was thinking that you could achieve the desired effect with a replace function, but then realised you would need at least 66 replace layers to replace both cases of the alphabet (just basic western characters and numerics).

Can you think of a cleverer approach?

mikkeltaylor
Creator II
Creator II

This would be a great addition.  We have a DQ tool from Informatica which we use to show 'masking', but would love to see this in the data profiler so that we can use it more widely.

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi David / Mike,

I couldn't get to quite the mask you were after, in the time I had available, but I have come up with something that I think is equally if not more useful.

The mask is built from counts of each type of character, Upper Case, Lower Case, Number and Symbol (spaces are ignored).

So for example;

28.5% would be n3 s2.

Hello World would be A2 a8.

AbC 23*&6 would be A2 a1 n3 s2

You get the picture.  I'm deriving this by setting up variables (in an overcomplicated way - I should have typed them) for each of the types.  A number of subtractions of a purgechar version of the LEN vs. the full LEN are then performed to derive the code.

Being short snappy masks there will be far fewer distinct masks and therefore strange values will stand out.  For instance, UK postcodes would all be A4 n3 or thereabouts.

Rather than posting all the code I will post the app to this thread.  It will invariably be moderated for a while though...

If I think of a way to come up with Aaaa111, but I can't think of a way without a loop - which would not be good news (or very easy to do).  I'm not even sure how performant this solution will be on a large data set.

Steve

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Debbie,

When adding in the mask for the above comment I also tweaked the distinct count code to read thus:

num(count(DISTINCT [$(=minstring($Field))]), '#,##0') & ' of ' &

num(count([$(=minstring($Field))]), '#,##0') & '  ('

& num(count(DISTINCT [$(=minstring($Field))]) / count([$(=minstring($Field))]), '#,##0.0%') &

' unique)'

This gives the current distinct count, plus the count of rows, and how distinct the values in that column are.

Hope that's helpful.

Steve

DavidFoster1
Specialist
Specialist

I think you are right about not being able to do this on the front end. It would need character-by-character parsing in the script.

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

In the example I put together, the translation of values into masks is done in the front end (as a calculated dimension).  This is why I suspect it would be very slow over a larger set of data.

stevedark
Partner Ambassador/MVP
Partner Ambassador/MVP

Hi Saran,

Glad you have found it useful. I tend to copy the objects onto just about every app I create.

Cheers,

Steve

rebelfox
Creator
Creator

Rather lovely!  Thanks.

Contributors
Version history
Last update:
‎2023-04-08 09:14 AM
Updated by: