Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi All,
Could you please help in understanding the concept of " data cleansing " .
How can we do data cleansing in Qliksense ?
Is it something like best practices of data modelling ?? or the concept of " data cleansing " is different ?
Thanks.
"Data cleansing" is the process of improving the quality of the data that's being received from the source systems, to make the data better for analytics. For example, if you load addresses, and some of them might be missing any values in the field State. You could find a way of populating State based on ZIP code. This way, you "cleanse" the data that arrived to you "dirty", or imperfect.
Qlik's recent acquisition of Talend adds more robust automated options for data cleansing - look them up if you are interested.
Do you have any example ? because, any example would help me to understand better.
if we have millions of records and some data/values might be missing in the some fields then, how to do in that case ?
Every time you have missing values in some of the fields, you should find out how these values can be replaced, based on specific business rules that should apply in your specific scenario. Once you establish the business rule, the syntax could be something like this (I'll continue with my example of missing State codes:
State_Map:
MAPPING LOAD
ZIP,
State
FROM ... ;
Addresses:
LOAD
....
IF(len(trim(State)) > 0, State, ApplyMap('State_Map', ZIP) as State,
....
Something along these lines.
Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable, even though they may look correct. There is no one absolute way to prescribe the exact steps in the data cleaning process because the processes will vary from dataset to dataset. But it is crucial to establish a template for your data cleaning process so you know you are doing it the right way every time.