Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Clean up script or functionality

Is there a known clean-up / data quality functionality that exists in QLikView ? would you have a reference to example clean-up scripts for the data ?

Clean-up is for sales / revenue data, incl.

* remove duplicates

* find and handle quasi-duplicates

...

Would you more generally have a hint at macros (Excel, others), or algos that perform such clean-up ?

Thanks

2 Replies
Peter_Cammaert
Partner - Champion III
Partner - Champion III

The best way to get better quality data is to perform cleanups in the source system. Always.

Otherwise you'll build BI solutions full of exception handling code. Which is not a pretty sight and may rapidly become a nightmare to maintain.

However, sometimes you're stuck with data that cannot be changed anymore (general ledgers, transaction tables, historical exchange rates and such), especially in ERP systems. IMHO such situations require ad-hoc solutions, tailored to keep the corrective actions efficient and almost invisible.

Also, some code you add to your scripts will not necessarily correct the data, but will bring bad data to the attention of the people responsible. When they correct their mistakes, the code will automatically become "inactive" (because of no more bad values).

Some examples:

  • Impossible dates end up somehow in transactions, causing your calendar to extend into 3014 or something: make an inventory of these dates and if there aren't too many, filter them out using a mapping table.
  • Many table fields should refer to master data, but sometimes have a NULL or empty value: always capture these by putting an out-of-range value (e.g. -1) in these fields and by creating corresponding master data records.
  • Some product codes are used in orders, but there aren't any corresponding definitions in the Products table: add artificial product definitions to the Products table and give them a name like 'UNDEFINED (ProductID)' so that end-users know that something was missing here.
  • ...

Best,

Peter

Not applicable
Author

Please post some sample data so community will help you more.