Solved: Re: CSV files (double quotes) and parquet files - Qlik Community

JackStrong · ‎2021-12-07

Hi All.

I have 3 questions. If someone can help I will grateful 🙂 .

1.[tFileOutputDelimited] Is there any way to force Talend to add double quotes as the text enclosure but only for such values where there is a field separator inside the string?

Desired output:

e.g. name;age;comment

Agent Smith;30;convalescent

Neo;29;"convalescent;12/04/21"

(only one field (semicolon inside) is double quoted)

Whenever I tried to achieve above I received:

e.g. name;age;comment

"Agent Smith";"30";"convalescent"

"Neo";"29";"convalescent;12/04/21"

2.Is it possible to write Parquete file by using only the standard jobs? And how to do that?

3.Is it possible to call the Big Data Batch child job from standard job? And if it is possible, is it safe/stable and recommended?

Best regards,

Jack Strong

Anonymous · ‎2021-12-07

There isn't anything "out of the box" that will do this BUT this is one of the big advantages of Talend Studio. You can build the functionality to do this in a routine.

Here is a very quick example I have just knocked up.....

package routines;

public class ExtraCSVOptions {

public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

String returnVal = null;

if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

returnVal = wrapCharacter+data+wrapCharacter;

}

return returnVal;

}

You would use this in a tMap or similar. It would be used for every column. So if you have 3 String columns like this.....

myColumn1

myColumn2

myColumn3

....you would use the above routine like this (assuming the separator is ";" and the wrap character is ")...

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn1, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn2, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn3, ";", "\"")

For Parquet files we have the tFileInputParquet and tFileOutputParquet components.

You can start jobs via the TMC or TAC using the APIs.

View solution in original post

Anonymous · ‎2021-12-07

There isn't anything "out of the box" that will do this BUT this is one of the big advantages of Talend Studio. You can build the functionality to do this in a routine.

Here is a very quick example I have just knocked up.....

package routines;

public class ExtraCSVOptions {

public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

String returnVal = null;

if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

returnVal = wrapCharacter+data+wrapCharacter;

}

return returnVal;

}

You would use this in a tMap or similar. It would be used for every column. So if you have 3 String columns like this.....

myColumn1

myColumn2

myColumn3

....you would use the above routine like this (assuming the separator is ";" and the wrap character is ")...

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn1, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn2, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn3, ";", "\"")

For Parquet files we have the tFileInputParquet and tFileOutputParquet components.

You can start jobs via the TMC or TAC using the APIs.

JackStrong · ‎2022-01-17

Sorry for being late.

@rhall, thank you very much. We added "else if" and it works fine!

Below whole routine:

package routines;

public class textWrapper {

public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

String returnVal = null;

if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

returnVal = wrapCharacter+data+wrapCharacter;

}

else if(data!=null && separator!=null && wrapCharacter!=null && !(data.indexOf(separator)>-1)) {

returnVal = data;

}

return returnVal;

}

Regarding parquet file I'm not sure if I understand. Are you saying that I can trigger the standard job in TMC and then in some way I can call Big Data job?

Anonymous · ‎2022-01-17

Oops, I missed the "else" condition there. I think I was focussed on solving the issue for records affected and overlooked unaffected records. Well spotted.

Regarding reading and writing Parquet files, the components I mentioned are mentioned here....

https://help.talend.com/r/en-US/7.3/parquet/parquet

You can make use of the TMC API for running jobs. The swagger API documentation is here...

https://api.us.cloud.talend.com/tmc/swagger/swagger-ui.html

CSV files (double quotes) and parquet files

Talend Data Integration

v7.x