Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
Ranjanac
Contributor III
Contributor III

Data cleansing in QlikSense

Hi All,

Could you please help in understanding the concept of " data cleansing " .

How can we do data cleansing in Qliksense ?

Is it something like best practices of data modelling ?? or the concept of " data cleansing "  is different ?

Thanks.

4 Replies
Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

"Data cleansing" is the process of improving the quality of the data that's being received from the source systems, to make the data better for analytics. For example, if you load addresses, and some of them might be missing any values in the field State. You could find a way of populating State based on ZIP code. This way, you "cleanse" the data that arrived to you "dirty", or imperfect.

Qlik's recent acquisition of Talend adds more robust automated options for data cleansing - look them up if you are interested.

Ranjanac
Contributor III
Contributor III
Author

Do you have any example ? because, any example would help me to understand better. 

if we have millions of records and some data/values might be missing in the some fields then, how to do in that case ?

Oleg_Troyansky
Partner Ambassador/MVP
Partner Ambassador/MVP

Every time you have missing values in some of the fields, you should find out how these values can be replaced, based on specific business rules that should apply in your specific scenario. Once you establish the business rule, the syntax could be something like this (I'll continue with my example of missing State codes:

State_Map:

MAPPING LOAD

     ZIP,

     State

FROM ... ;

Addresses:

LOAD

....

IF(len(trim(State)) > 0, State, ApplyMap('State_Map', ZIP)   as State,

....

Something along these lines.

anat
Master
Master

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable, even though they may look correct. There is no one absolute way to prescribe the exact steps in the data cleaning process because the processes will vary from dataset to dataset. But it is crucial to establish a template for your data cleaning process so you know you are doing it the right way every time.

https://help.qlik.com/en-US/cloud-services/Subsystems/Hub/Content/Sense_Hub/Scripting/data-cleansing...