Skip to main content

New to Qlik Sense

If you’re new to Qlik Sense, start with this Discussion Board and get up-to-speed quickly.

Announcements
QlikWorld 2023, a live, in-person thrill ride. April 17 - 20, 2023, in Las Vegas! REGISTER TODAY
cancel
Showing results for 
Search instead for 
Did you mean: 
PP_12
Contributor III
Contributor III

How to handle extra double quotes (") in csv files? Data in incorrect fields after load

Hi, 

I'm new to qlik and having issues when importing large csv files that we receive monthly (*we get this on different dates each month so have to manually load app) . The csv files have a double quote (") qualifier as well. There some are cases where some values( addresses) contain an extra double quote. This appears to be causing issues and some data is now in the incorrect fields. Is there a way to combat this issue? Is it to do with the load script delimiter? codepage? msq? 
I have created a sample below as I am unable to upload dataset

ID Full_Address Street_no Street_Nme City State Postcode
1 1411 Oakmound Road, Burr Ridge, Illinois, 60527 1411 Oakmond Road Burr Ridge Illinois 60527
2 3948 Rose Street, Lucinda, Pennsylvania, 16235 3948 Rose Street Lucinda Pennsylvania 16235
3 mail" Echo Lane Bellevue 49021" 12 Michigan


My other issue is that I only picked up these issues as I'm doing manual checks and there are multiple csv's. This takes quite a long time and majority of the time the issue would be only found in 2 out of 10 files. Using qliksense is there a way to check for " or other symbols that are causing my issue via data load editor??  Any suggestions in how I would handle this please. Not too sure where to start


This is an example of what I have used to load the table

LOAD
         *all the necessary fields*
FROM [lib://DataFiles/table.csv]
(txt, codepage is 28591, embedded labels, delimiter is ',', msq);

  also tried (txt, codepage is 1252, embedded labels, delimiter is ''", msq);

Thank you very much!

Labels (4)
7 Replies
hic
Employee
Employee

I would try changing the MSQ parameter. This controls how the quoting is made - whether quotes or delimiters have precedence.

PP_12
Contributor III
Contributor III
Author

Thank you for your response Henric. I have tried removing msq and get the same result. Could you please recommend an alternative? Or would it be best to clean the csv files before loading?

Thanks,
Penny

 

hic
Employee
Employee

It is most likely best to clean the csv files before loading. There is however one more trick you can try: Instead of loading a delimited file

LOAD
*all the necessary fields*
FROM [File.txt] (txt, utf8, embedded labels, delimiter is ',', msq);

you could load it as a fixed-record file and try to create the fields yourself:

Load
Subfield(EntireLine,',',1) as Field1,
Subfield(EntireLine,',',2) as Field2,
Subfield(EntireLine,',',3) as Field3,
... ;
LOAD
[@1:n] as EntireLine
FROM [File.txt] (fix, utf8, no labels, header is 1 lines);

This will leave quotes untouched, so you may need to remove these - as well as leading/trailing blanks - by wrapping the Subfield() in a PurgeChar() and a Trim().

But as I said, it is probably best to clean the original text files instead.

PP_12
Contributor III
Contributor III
Author

Thank you again. Will have a go but definitely will need to clean all files before loading.

PP_12
Contributor III
Contributor III
Author

Apologies for my late response. In terms of cleaning the data outside of qlik what would be the best way to handle the different types of data errors? Below is another csv where I have issues and sometimes "" shows data gaps so should be left. Due to that I wasn't too sure where to start. 

NOTEPAD:
ID,T_No,T_Type,T_Date,T_Name,T_TNo
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1

Thank you

hic
Employee
Employee

It should be possible to load the table you show using the "no quotes" clause. Also, you can inside the Load statement remove unwanted characters. E.g.

Load
PurgeChar(T_Name,'"') as New_Name,
*
Inline
[ID,T_No,T_Type,T_Date,T_Name,T_TNo
2,256,House,30/05/2021,Airport,75.1,
3,268,Hotel,31/05/2021,Hotel Antel""",76.1
4,269,House,31/05/2021,Bank of USA,"LA Branch""""",77.1] (no quotes);

 

 

PP_12
Contributor III
Contributor III
Author

Thank you for your help.