Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Parsing CSV - Commas within double quotes

I'm evaluating TOS. I created a new project and added a delimited file under Metadata -> File Delimited.
Going thru the wizard I get to step 3 where you define the encoding, file separator, and a few other parameters. This screen reads the file you specified in step 2 and parses it according to the delimiter. In my case it is a CSV file. But the file has double quotes around all the field values and some of those contain commas within them.
Is it possible to tell TOS to ignore commas that are between double quotes? This is a very common situation and is handled by default in spreadsheet programs such as Open Office or Excel.
This seems like very elementary functionality and the reason I started looking for an ETL tool in the first place is because the COPY command in Postgres cannot handle this situation.
My other alternative is to use dbf file downloads of the same data. Is it possible in TOS to define the dbf delimiiter?
Labels (2)
12 Replies
Anonymous
Not applicable
Author

Never mind. I figured it out with the Text Enclosure setting.
Anonymous
Not applicable
Author

I would be interest in how you over came this problem as I experienced a similar problem with my delimiter files.
I didn't spend too much time to find a solution, i simply changed my delimiter character.
Anonymous
Not applicable
Author

Hello,
This is the difference between tFileInputDelimited (no text enclosure) and tFileInputCSV (text enclosure).

Regards,
Anonymous
Not applicable
Author

JPN - When you are adding File Delimited Metadata the wizard wiill give you the opportunity to set the "text enclosure". Select "\"" and it reads the file correctly.
Anonymous
Not applicable
Author

Hello! thank you for your hint. with this option (using Text Enclosure "\"") it is possible to import google-contact exports (CSV).
--Robert
Anonymous
Not applicable
Author

hi all,
a few more explanation about enclosure
the back slash "\" is the way to escape any character following it which can be a reserved one for the code (\', \\ , ..)
I tell the program : "don't use it like a code character but like any other one "
like in a regex "\\d" : d is reserved pattern so put "\" before , but '\' is reserved too ..so also reserved it => \\d
++
_AnonymousUser
Specialist III
Specialist III

My Text Enclosure character is ~... I can edit this property on the tFileDelimited properties but I can't edit it under the Schema definition in the Repository - I have to choose from the drop list and none of them apply.
Anonymous
Not applicable
Author

Currently running 4.0.2 r43696 and it looks like a bug has crept in. (There isn't a tFileInputCSV for this version and appears deprecated)
When setting up the metadata/file_delimited and select a CSV similar to vclark's file. (EX)
"type","location","setting","complexity",'prod_type","category","stock","website"
"admin","base","default","simple","hardware","56,51","0","www.example.com"
"admin","base","default","simple","hardware","32,61","1","www.example.com"
should read in the following values with Set Heading Row as Column Names checked.
Type Location Setting Complexity Prod_Type Category Stock Website
admin base default simple hardware 56,51 0 www.example.com
admin base default simple hardware 32,61 1 www.example.com
However upon setting the CSV (Escape Char Settings) - Text Enclosure to "\"" and Comma - Field Separator as "," yields the following incorrect results:
"Type" "Location" "Setting" "Complexity" "Prod_Type" "Category" "Stock" "Website"
"admin" "base" "default" "simple" "hardware" "56 51" "0"
"admin" "base" "default" "simple" "hardware" "32 61" "1"
Changing the Field Separator to Semicolon - ";" will read the data fields in correctly but will read the column names as single column and therefore drop all of the fields into a single column (quotes and all).
As you can see the Category field is not parsed correctly.
Anthony
Talend Certified
_AnonymousUser
Specialist III
Specialist III

I think this bug is still in the 4.0.3 release: I've been evaluating talend with some very simple CSV transforms and it turns out this is the thing that's been frustrating me! I guess I can preprocess my CSVs before they hit talend. 😕
Steve

Currently running 4.0.2 r43696 and it looks like a bug has crept in. (There isn't a tFileInputCSV for this version and appears deprecated)
...
Changing the Field Separator to Semicolon - ";" will read the data fields in correctly but will read the column names as single column and therefore drop all of the fields into a single column (quotes and all).
As you can see the Category field is not parsed correctly.
Anthony
Talend Certified