Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
J_Ruiz
Contributor II
Contributor II

java.lang.IllegalArgumentException: Illegal pattern component: XXX

Greetings.

 

I'm getting the following error on an apache spark batch job:

 

java.lang.IllegalArgumentException: Illegal pattern component: XXX

at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)

at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)

at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)

at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)

at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)

at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)

at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)

at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)

at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)

 

Here's my system's and my talend product's specs:

 

  • SO: W10 Pro
  • Talend Version: Talend Big Data (R2020-09-7.3.1)
  • Using CDH 6.3.2 to run spark batch jobs on YARN Cluster with spark version 2.4.0
  • Working on a remote bitbucket repo

 

Any help is appreciated. Thanks.

Labels (3)
1 Solution

Accepted Solutions
J_Ruiz
Contributor II
Contributor II
Author

Thank you for the reply.

 

Upgrading to a newer patch is unfortunately not an option to us at the current moment. However, I managed to find a workaround!

 

The component that was causing this issue was the two tFileInputDelimeted in the job. I discovered you can replace each of these for a combination of the tFileInputFullRow and tJavaRow.

 

IMPORTANT NOTE: Somehow, just deactivating the tFileInputDelimeted still causes the error to pop. You need to delete the component from the job.

 

The procedure:

  1. Use the same configuration for the tFileInputFullRow as you would use for the tFileInputDelimeted.
  2. Join it with the tJavaRow and put the output schema as the schema of the tFileInputDelimeted.
  3. Optional If the CSV file/s have their fields with escape/enclosure character, remove the first and last character of input.line (I used 2 substring methods for this)
  4. Use Java's split() method to split the input.line with your field separator (if the CSV file/s have their fields with escape/enclosure character, use a combination of "esc. character+field separator+esc. character" for example "\"\\|\"")
  5. Use each element of the array to assign each input, cast if needed. Such as this:
    1. output.id=fields[0];
    2. output.name=fields[1];
    3. output.date=TalendDate.parseDate("yyyy-MM-dd",fields[2]);

 

This is the workaround I came up with, it might work with a tExtractDelimetedFields instead of a tJavaRow. Will edit in the future with an answer to this.

View solution in original post

3 Replies
Anonymous
Not applicable

Hello,

Does this job work fine in spark local mode with Spark 2.4?

It seems to be a known jira issue and it is reported and fixed in R2020-08 but regression from R2020-09 until R2020-12.

You'd better raise a support case on talend support portal so that our colleagues from support team would check your issue and see if a new patch could be delivered to you to fix it through support cycle with priority.

Best regards

Sabrina

J_Ruiz
Contributor II
Contributor II
Author

Thank you for the reply.

 

Upgrading to a newer patch is unfortunately not an option to us at the current moment. However, I managed to find a workaround!

 

The component that was causing this issue was the two tFileInputDelimeted in the job. I discovered you can replace each of these for a combination of the tFileInputFullRow and tJavaRow.

 

IMPORTANT NOTE: Somehow, just deactivating the tFileInputDelimeted still causes the error to pop. You need to delete the component from the job.

 

The procedure:

  1. Use the same configuration for the tFileInputFullRow as you would use for the tFileInputDelimeted.
  2. Join it with the tJavaRow and put the output schema as the schema of the tFileInputDelimeted.
  3. Optional If the CSV file/s have their fields with escape/enclosure character, remove the first and last character of input.line (I used 2 substring methods for this)
  4. Use Java's split() method to split the input.line with your field separator (if the CSV file/s have their fields with escape/enclosure character, use a combination of "esc. character+field separator+esc. character" for example "\"\\|\"")
  5. Use each element of the array to assign each input, cast if needed. Such as this:
    1. output.id=fields[0];
    2. output.name=fields[1];
    3. output.date=TalendDate.parseDate("yyyy-MM-dd",fields[2]);

 

This is the workaround I came up with, it might work with a tExtractDelimetedFields instead of a tJavaRow. Will edit in the future with an answer to this.

Anonymous
Not applicable

Hello,

Thanks for sharing your workaround with us on Community.

Best regards

Sabrina