Solved: Re: java.lang.IllegalArgumentException: Illegal pa... - Qlik Community - 2310202

Skip to main content

Qlik.com | Qlik Help | Resources

Ask a Question

Greetings.

I'm getting the following error on an apache spark batch job:

java.lang.IllegalArgumentException: Illegal pattern component: XXX

at org.apache.commons.lang3.time.FastDatePrinter.parsePattern(FastDatePrinter.java:282)

at org.apache.commons.lang3.time.FastDatePrinter.init(FastDatePrinter.java:149)

at org.apache.commons.lang3.time.FastDatePrinter.<init>(FastDatePrinter.java:142)

at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:384)

at org.apache.commons.lang3.time.FastDateFormat.<init>(FastDateFormat.java:369)

at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:91)

at org.apache.commons.lang3.time.FastDateFormat$1.createInstance(FastDateFormat.java:88)

at org.apache.commons.lang3.time.FormatCache.getInstance(FormatCache.java:82)

at org.apache.commons.lang3.time.FastDateFormat.getInstance(FastDateFormat.java:165)

Here's my system's and my talend product's specs:

SO: W10 Pro
Talend Version: Talend Big Data (R2020-09-7.3.1)
Using CDH 6.3.2 to run spark batch jobs on YARN Cluster with spark version 2.4.0
Working on a remote bitbucket repo

Any help is appreciated. Thanks.

1 Solution

Accepted Solutions

Author

Thank you for the reply.

Upgrading to a newer patch is unfortunately not an option to us at the current moment. However, I managed to find a workaround!

The component that was causing this issue was the two tFileInputDelimeted in the job. I discovered you can replace each of these for a combination of the tFileInputFullRow and tJavaRow.

IMPORTANT NOTE: Somehow, just deactivating the tFileInputDelimeted still causes the error to pop. You need to delete the component from the job.

The procedure:

Use the same configuration for the tFileInputFullRow as you would use for the tFileInputDelimeted.
Join it with the tJavaRow and put the output schema as the schema of the tFileInputDelimeted.
Optional If the CSV file/s have their fields with escape/enclosure character, remove the first and last character of input.line (I used 2 substring methods for this)
Use Java's split() method to split the input.line with your field separator (if the CSV file/s have their fields with escape/enclosure character, use a combination of "esc. character+field separator+esc. character" for example "\"\\|\"")
Use each element of the array to assign each input, cast if needed. Such as this:
1. output.id=fields[0];
2. output.name=fields[1];
3. output.date=TalendDate.parseDate("yyyy-MM-dd",fields[2]);

This is the workaround I came up with, it might work with a tExtractDelimetedFields instead of a tJavaRow. Will edit in the future with an answer to this.

View solution in original post

3 Replies

Hello,

Does this job work fine in spark local mode with Spark 2.4?

It seems to be a known jira issue and it is reported and fixed in R2020-08 but regression from R2020-09 until R2020-12.

You'd better raise a support case on talend support portal so that our colleagues from support team would check your issue and see if a new patch could be delivered to you to fix it through support cycle with priority.

Best regards

Sabrina

Author

Thank you for the reply.

Upgrading to a newer patch is unfortunately not an option to us at the current moment. However, I managed to find a workaround!

The component that was causing this issue was the two tFileInputDelimeted in the job. I discovered you can replace each of these for a combination of the tFileInputFullRow and tJavaRow.

IMPORTANT NOTE: Somehow, just deactivating the tFileInputDelimeted still causes the error to pop. You need to delete the component from the job.

The procedure:

Use the same configuration for the tFileInputFullRow as you would use for the tFileInputDelimeted.
Join it with the tJavaRow and put the output schema as the schema of the tFileInputDelimeted.
Optional If the CSV file/s have their fields with escape/enclosure character, remove the first and last character of input.line (I used 2 substring methods for this)
Use Java's split() method to split the input.line with your field separator (if the CSV file/s have their fields with escape/enclosure character, use a combination of "esc. character+field separator+esc. character" for example "\"\\|\"")
Use each element of the array to assign each input, cast if needed. Such as this:
1. output.id=fields[0];
2. output.name=fields[1];
3. output.date=TalendDate.parseDate("yyyy-MM-dd",fields[2]);

This is the workaround I came up with, it might work with a tExtractDelimetedFields instead of a tJavaRow. Will edit in the future with an answer to this.

Hello,

Thanks for sharing your workaround with us on Community.

Best regards

Sabrina