Skip to main content
Announcements
Join us on Sept. 17 to hear how our new GenAI Assistant empowers data professionals: REGISTER
cancel
Showing results for 
Search instead for 
Did you mean: 
JackStrong
Contributor II
Contributor II

CSV files (double quotes) and parquet files

Hi All.

I have 3 questions. If someone can help I will grateful 🙂 .

1.[tFileOutputDelimited] Is there any way to force Talend to add double quotes as the text enclosure but only for such values where there is a field separator inside the string?

Desired output:

e.g. name;age;comment

Agent Smith;30;convalescent

Neo;29;"convalescent;12/04/21"

(only one field (semicolon inside) is double quoted)

Whenever I tried to achieve above I received:

e.g. name;age;comment

"Agent Smith";"30";"convalescent"

"Neo";"29";"convalescent;12/04/21"

2.Is it possible to write Parquete file by using only the standard jobs? And how to do that?

3.Is it possible to call the Big Data Batch child job from standard job? And if it is possible, is it safe/stable and recommended?

Best regards,

Jack Strong

Labels (2)
1 Solution

Accepted Solutions
Anonymous
Not applicable

There isn't anything "out of the box" that will do this BUT this is one of the big advantages of Talend Studio. You can build the functionality to do this in a routine.

 

Here is a very quick example I have just knocked up.....

 

package routines;

 

 

public class ExtraCSVOptions {

 

  public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

   String returnVal = null;

   

   if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

   returnVal = wrapCharacter+data+wrapCharacter;

   }

   

   return returnVal;

  }

}

 

You would use this in a tMap or similar. It would be used for every column. So if you have 3 String columns like this.....

 

myColumn1

myColumn2

myColumn3

 

....you would use the above routine like this (assuming the separator is ";" and the wrap character is ")...

 

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn1, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn2, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn3, ";", "\"")

 

For Parquet files we have the tFileInputParquet and tFileOutputParquet components.

 

You can start jobs via the TMC or TAC using the APIs.

 

View solution in original post

3 Replies
Anonymous
Not applicable

There isn't anything "out of the box" that will do this BUT this is one of the big advantages of Talend Studio. You can build the functionality to do this in a routine.

 

Here is a very quick example I have just knocked up.....

 

package routines;

 

 

public class ExtraCSVOptions {

 

  public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

   String returnVal = null;

   

   if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

   returnVal = wrapCharacter+data+wrapCharacter;

   }

   

   return returnVal;

  }

}

 

You would use this in a tMap or similar. It would be used for every column. So if you have 3 String columns like this.....

 

myColumn1

myColumn2

myColumn3

 

....you would use the above routine like this (assuming the separator is ";" and the wrap character is ")...

 

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn1, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn2, ";", "\"")

routines.ExtraCSVOptions.wrapSeparatorStrings(row1.myColumn3, ";", "\"")

 

For Parquet files we have the tFileInputParquet and tFileOutputParquet components.

 

You can start jobs via the TMC or TAC using the APIs.

 

JackStrong
Contributor II
Contributor II
Author

Sorry for being late.

 

@rhall, thank you very much. We added "else if" and it works fine!

 

 

Below whole routine:

 

package routines;

public class textWrapper {

 

 public static String wrapSeparatorStrings(String data, String separator, String wrapCharacter) {

 

  String returnVal = null;

 

  if(data!=null && separator!=null && wrapCharacter!=null && data.indexOf(separator)>-1) {

   returnVal = wrapCharacter+data+wrapCharacter;

  }

  else if(data!=null && separator!=null && wrapCharacter!=null && !(data.indexOf(separator)>-1)) {

   returnVal = data;

  }

 

  return returnVal;

  

 }

 

}

 

 

Regarding parquet file I'm not sure if I understand. Are you saying that I can trigger the standard job in TMC and then in some way I can call Big Data job?

Anonymous
Not applicable

Oops, I missed the "else" condition there. I think I was focussed on solving the issue for records affected and overlooked unaffected records. Well spotted.

 

Regarding reading and writing Parquet files, the components I mentioned are mentioned here....

 

https://help.talend.com/r/en-US/7.3/parquet/parquet

 

You can make use of the TMC API for running jobs. The swagger API documentation is here...

 

https://api.us.cloud.talend.com/tmc/swagger/swagger-ui.html