Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Error handling in Bigdata job

Hi Team,

 

What are the ways to handle errors in Talend BigData Batch job?

 

Some of the bigdata components do not have "OnSubJobError" for eg. tHiveInput. How should we enable error handling in spark?

Labels (2)
7 Replies
Anonymous
Not applicable
Author

Hello,

The onSubjobError connector is not supported by most Spark components.

Best regards

Sabrina

 

Anonymous
Not applicable
Author

Hi Sabrina,

Then how we should catch errors? Any best practice around that?
Anonymous
Not applicable
Author

Hello,

What's kind of error do you want to handle? To capture an exception in Spark jobs?

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi Sabrina,

 

Lets say, I am performing below operation.

tHiveInput --> tMap (Some Transformation) --> tHiveOutput

 

Even though there is some issue in transformation, YARN application gets completed successfully. I am not able to record/track any issue, unless I go and check the logs.

 

Anonymous
Not applicable
Author

Hi Sabrina,

 

Lets say, I am performing below operation.

tHiveInput --> tMap (Some Transformation) --> tHiveOutput

 

Even though there is some issue in transformation, YARN application gets completed successfully. I am not able to record/track any issue, unless I go and check the logs.

 

prawanth
Contributor
Contributor

Hi Guys, 

 

We are facing the similar situation building big data batch, did you know the resolution / best practices around handling errors in big data batch ?

 

 

Thanks,

KP

prawanth
Contributor
Contributor

Below is the response from Talend on the same question.

 

"You wouldn't find components such as tlogcatcher, tstatcatcher in BD jobs. It is always recommended to have a DI orchestrator job for your spark jobs and then trigger spark jobs from DI jobs. You can pass context of statistics, error and other information using the orchestrator DI job to BD job. Orchestration should always happen through DI, the purpose of BD jobs is to do the processing of huge data. https://www.talend.com/blog/2016/10/05/talend-job-design-patterns-best-practices-part-3/ Error handling is explained in detail in the above blog."

 

cheers !!

KP