CSV Data Integrity - Table Analyser



    During my time working with QlikView I'm often asked to imports large CSV's. Unfortunately they're not always exported from databases with trusted data integrity and before work can begin on dashboard design the process of checking fields has to come first.


    Unless you have access to tools to do this it can become a long process and to aid this I put together a QVW which loops through each column and field performing some simple test of the data.


    Date Check: The code checks each column where the name includes the word “Date”. This can be modified in the code. Each field value is checked against possible date formats and identified as either a good or bad date. Bad date distinct values are recorded for further analysis


    Value Frequency: The code loads each distinct value for each column and counts the frequency they're encountered.


    Distinct Values:  Counts the number of distinct values for each field, again after converting the value to lower case and again after trimming leading & trailing spaces from the value.


    Distinct N/A: Uses a mapping table to convert possible N/A entries into a consistent value and counts the number found for each column.


    Null Values: Counts the number of empty records for each column


    Data Types: Checks to see which data type is found for each value of a column. This could be either a number or text. If all fields are returned as Number or Text the field is assigned this property. If a mixture is returned this is recorded.


    This is a prototype QVW and was written with a single project in mind. If you think this is useful please feel free to adapt to your project requirements and I would appreciate if you let me know about your revisions / experience with using this tool.


    Note - For large datasets it can take up to 30 mins to complete, either make a cup of tea or use the new 'limited load' option





    qlikcentral | Create / Inform


    V1.0 - CSV's only

    V1.1 - Extended to include QVD's, added limited load option, and fixed bug when analyzing fields with spaces