Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Ever found yourself stuck with a messy pile of data that seems more like a labyrinth than a pathway to clean insights? You're not alone. Today, we're diving into the world of data cleaning in Qlik Sense to help you uncover the analytical potential hiding behind your data.
Imagine you're baking a cake. Would you eyeball the measurements of your ingredients? Probably not, unless you fancy a disaster cake. Just like one poorly measured cup of flour can ruin your entire recipe, a small data error can throw off your whole analysis. That's why, before you dive into the fun part—data analysis—you've got to make sure your key ingredient (data) is as clean and precise as possible.
It's not just about tidying up; it's about quality control. Skipped steps or overlooked errors can lead to inaccurate results that could misinform your business decisions.
These types of tables behave differently than other tables in that they are stored in a separate area of the memory and are strictly used as mapping tables when the script is run, they are then automatically dropped.
Let’s take a look at how to do this and the different statements and functions that can be used:
CountryMap:
MAPPING LOAD * INLINE [
Country, NewCountry
U.S.A., US
U.S., US
United States, US
United States of America, US
];
Keep in mind that a mapping table must have two columns, the first containing the comparison values and the second contains the desired mapping values.
The ApplyMap function is used to replace data in a field based on a previously created Mapping Table.
CountryMap:
MAPPING LOAD * INLINE [
Country, NewCountry
U.S.A., US
U.S., US
United States, US
United States of America, US
];
Data:
LOAD
ID,
Name,
ApplyMap('CountryMap', Country) as Country,
Code
FROM [lib://DataFiles/Data.xlsx]
(ooxml, embedded labels, table is Sheet1);
The first parameter in ApplyMap is the Mapping Table name in quotes. The second parameter is the field containing the data that needs to be mapped.
You can add a third parameter to the ApplyMap function that serves as a default to handle cases when the value doesn’t match one in the Mapping Table.
For instance:
ApplyMap('CountryMap', Country, 'Rest of the world') As Country
after mapping:
ReplaceMap:
MAPPING LOAD * INLINE [
char, replace
")", ""
"(", ""
"\"", ""
"/", ""
"-", ""
] (delimiter is ',');
TestData:
LOAD
DataField as data,
MapSubString('ReplaceMap', DataField) as ReplacedString
INLINE [
DataField
"(415)555-1234",
"(415)543,4321",
"“510”123-4567",
"/925/999/4567"
] (delimiter is ',');
after cleaning:
Map Country Using CountryMap;
Data1:
LOAD
ID,
Name,
Country
FROM [lib://DataFiles/Data.xlsx]
(ooxml, embedded labels, table is Sheet1);
Data2:
LOAD
ID,
Country as Country2
FROM [lib://DataFiles/Data.xlsx]
(ooxml, embedded labels, table is Sheet1);
UNMAP;
UserData:
LOAD * INLINE [
UserID, FullName
1, "John,Doe"
2, "Jane,Doe"
3, "Alice,Wonderland"
4, "Bob,Builder"
];
CleanedData:
LOAD
UserID,
SubField(FullName, ',', 1) as FirstName,
SubField(FullName, ',', 2) as LastName
RESIDENT UserData;
Drop Table UserData;
Example 1:
Using a combination of the functions above to clean up a field. Let’s take a more complex field and try to extract the first name and last name.
UserData:
LOAD * INLINE [
UserID, Object
1, "37642UI101John.Doe"
2, "98322UI101Jane.Doe"
3, "45432UI101Alice.Wonderland"
4, "32642UI101Bob.Builder"
];
CleanedData:
LOAD
UserID,
SubField(Right(Object, Len(Object) - Index(Object, 'UI101') - 4), '.', 1) as FirstName,
SubField(Right(Object, Len(Object) - Index(Object, 'UI101') - 4), '.', 2) as LastName
RESIDENT UserData;
Drop Table UserData;
after cleaning:
Example 2:
Cleaning HTML in a field
Paragraphs:
LOAD * INLINE [
Paragraph_ID, Paragraph
1, "<p>This is a <strong>paragraph</strong>.</p><br><p>This is another <em>paragraph</em>.</p>"
];
// Loop through each paragrpah in the Paragraphs table
For vRow = 1 to NoOfRows('Paragraphs')
Let vID = Peek('Paragraph_ID', vRow-1, 'Paragraphs'); // Get the ID of the next record to parse
Let vtext = Peek('Paragraph', vRow-1, 'Paragraphs'); // Get the original paragraph of the next record
// Loop through each paragraph in place
Do While len(TextBetween(vtext, '<', '>')) > 0
vtext = Replace(vtext, '<br>', chr(10)); // Replace line breaks with carriage returns - improves legibility
vtext = Replace(vtext, '<' & TextBetween(vtext, '<', '>') & '>', ''); // Find groups with <> and replace them with ''
Loop;
// Store the cleaned paragraphs into a temporary table
Temp:
Load
$(vID) as Paragraph_ID,
'$(vtext)' as cleanParagraph
AutoGenerate 1;
Next vRow;
// Join the cleaned paragraphs back into the original Paragraphs table
Left Join (Paragraphs)
Load *
Resident Temp;
// Drop the temporary table
Drop Table Temp;
after cleaning:
I hope you found this post helpful!
Attached you can find a QVD that contains the scripts used in the post.
Happy data cleaning!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.