Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm new to qlik and having issues when importing large csv files that we receive monthly (*we get this on different dates each month so have to manually load app) . The csv files have a double quote (") qualifier as well. There some are cases where some values( addresses) contain an extra double quote. This appears to be causing issues and some data is now in the incorrect fields. Is there a way to combat this issue? Is it to do with the load script delimiter? codepage? msq?
I have created a sample below as I am unable to upload dataset
ID | Full_Address | Street_no | Street_Nme | City | State | Postcode |
1 | 1411 Oakmound Road, Burr Ridge, Illinois, 60527 | 1411 | Oakmond Road | Burr Ridge | Illinois | 60527 |
2 | 3948 Rose Street, Lucinda, Pennsylvania, 16235 | 3948 | Rose Street | Lucinda | Pennsylvania | 16235 |
3 | mail" | Echo Lane | Bellevue | 49021" | 12 | Michigan |
My other issue is that I only picked up these issues as I'm doing manual checks and there are multiple csv's. This takes quite a long time and majority of the time the issue would be only found in 2 out of 10 files. Using qliksense is there a way to check for " or other symbols that are causing my issue via data load editor?? Any suggestions in how I would handle this please. Not too sure where to start
This is an example of what I have used to load the table
LOAD
*all the necessary fields*
FROM [lib://DataFiles/table.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);
also tried (txt, codepage is 1252, embedded labels, delimiter is ''", msq);
Thank you very much!
I would try changing the MSQ parameter. This controls how the quoting is made - whether quotes or delimiters have precedence.
Thank you for your response Henric. I have tried removing msq and get the same result. Could you please recommend an alternative? Or would it be best to clean the csv files before loading?
Thanks,
Penny
It is most likely best to clean the csv files before loading. There is however one more trick you can try: Instead of loading a delimited file
LOAD
*all the necessary fields*
FROM [File.txt] (txt, utf8, embedded labels, delimiter is ',', msq);
you could load it as a fixed-record file and try to create the fields yourself:
Load
Subfield(EntireLine,',',1) as Field1,
Subfield(EntireLine,',',2) as Field2,
Subfield(EntireLine,',',3) as Field3,
... ;
LOAD
[@1:n] as EntireLine
FROM [File.txt] (fix, utf8, no labels, header is 1 lines);
This will leave quotes untouched, so you may need to remove these - as well as leading/trailing blanks - by wrapping the Subfield() in a PurgeChar() and a Trim().
But as I said, it is probably best to clean the original text files instead.
Thank you again. Will have a go but definitely will need to clean all files before loading.
Apologies for my late response. In terms of cleaning the data outside of qlik what would be the best way to handle the different types of data errors? Below is another csv where I have issues and sometimes "" shows data gaps so should be left. Due to that I wasn't too sure where to start.
NOTEPAD:
ID,T_No,T_Type,T_Date,T_Name,T_TNo
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1
Thank you
It should be possible to load the table you show using the "no quotes" clause. Also, you can inside the Load statement remove unwanted characters. E.g.
Load
PurgeChar(T_Name,'"') as New_Name,
*
Inline
[ID,T_No,T_Type,T_Date,T_Name,T_TNo
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1] (no quotes);
Thank you for your help.