If you’re new to Qlik Sense, start with this Discussion Board and get up-to-speed quickly.
Hi,
I'm having trouble trying to find the best way to find errors for the csv/excel files we receive. They have advised this is the best they can provide. So we are left having to manually fix any data quality issues. Please note the csv's have double quotes as a text qualifier as there are some fields like Address that contain commas e.g. "190 Fuller Road, Summer City, USA","",
Is there an easy way to check each large csv for any errors? Some of the examples I would like to check for are below
-Street_Number should only have numeric values no text and vice versa for fields like Street_name
- Identify extra double quotes as this sometimes puts details in the wrong fields when loaded
- any fields with nulls etc...
I would kind of like a table showing which has errors or potential errors to fix. I searched and people have recomended something like data quality rules but I have no idea where to start.
Any suggestions would be great. Sorry unable to upload data
Thank you
Hi,
I can't answer your query completely but I can offer the settings required to read your CSV in using double quotes as the qualifier.
You'll need to read file in using UTF-8 as below
Load * from [lib://DataFile/YourFile.csv]
(txt, utf8, embedded labels, delimiter is ',', msq)
This will ignore the commas between the double quotes.
If you have fields you expect to be numeric/text you can create a table after loading and add the function "isnum( )" and "istext( )" for the boolean result of -1 or 0 for true and false. You could try experimenting with a "LEN( )" function to sort if there's any fields that are unexpectedly long.
I hope this helps.
Thanks
Anthony
Thank you for your response Anthony. Unfortunately I have tried utf-8 and still have issues as the data is quite messy. It was recommended in another thread that is best to clean csv's. Do you have any suggestions in how to clean these large csv files? Notepad, excel or can we use qlik to clean the files? Just unsure how to clean and I am a bit limited with which tools I have. Sorry this is all new to me
In regards to your suggestions do I just load the table as usual and add column eg IsNum(ID) ?
Thanks again for your help
Hi,
Sorry that didn't help. If the data is as messy as you say the UTF-8 encoding probably only got so far before the double quotes became unbalanced throwing out the load. I was expecting that it would create the columns but that some of the data may have been aligned to the incorrect columns. Once it was loaded you could, create a table in Qlik and you'd be able to see by the headings and the values available in the table where the inconsistencies are in your CSV.
Depending on the size of the file you could try just opening it in Excel and see where the columns are split in there. Excel is still a good data wrangling tool.
Regards
Anthony