2 Replies Latest reply: Jan 1, 2014 7:59 PM by Srikanth P RSS

    Clean up script or functionality

      Is there a known clean-up / data quality functionality that exists in QLikView ? would you have a reference to example clean-up scripts for the data ?

       

      Clean-up is for sales / revenue data, incl.

       

      * remove duplicates

      * find and handle quasi-duplicates

      ...

       

      Would you more generally have a hint at macros (Excel, others), or algos that perform such clean-up ?

       

      Thanks

        • Re: Clean up script or functionality
          Peter Cammaert

          The best way to get better quality data is to perform cleanups in the source system. Always.

          Otherwise you'll build BI solutions full of exception handling code. Which is not a pretty sight and may rapidly become a nightmare to maintain.

           

          However, sometimes you're stuck with data that cannot be changed anymore (general ledgers, transaction tables, historical exchange rates and such), especially in ERP systems. IMHO such situations require ad-hoc solutions, tailored to keep the corrective actions efficient and almost invisible.

           

          Also, some code you add to your scripts will not necessarily correct the data, but will bring bad data to the attention of the people responsible. When they correct their mistakes, the code will automatically become "inactive" (because of no more bad values).

           

          Some examples:

          • Impossible dates end up somehow in transactions, causing your calendar to extend into 3014 or something: make an inventory of these dates and if there aren't too many, filter them out using a mapping table.
          • Many table fields should refer to master data, but sometimes have a NULL or empty value: always capture these by putting an out-of-range value (e.g. -1) in these fields and by creating corresponding master data records.
          • Some product codes are used in orders, but there aren't any corresponding definitions in the Products table: add artificial product definitions to the Products table and give them a name like 'UNDEFINED (ProductID)' so that end-users know that something was missing here.
          • ...

           

          Best,

           

          Peter

          • Re: Clean up script or functionality
            Srikanth P

            Please post some sample data so community will help you more.