Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
See why IDC MarketScape names Qlik a 2025 Leader! Read more
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Big Data Spark Job - Subjob and Logs creation

Hi,

 

I am creating big data spark job. I want to create one sub job and want to use into another job. Like standard job, we can use tBufferOutput. Which component we can use to create sub job?

 

I want to maintain the log as well for spark big data job. I am using tWarn -> tLogCatcher -> tLogRow -> tFileOutputDelimited in standard job. Which component I can use for big data spark job?

 

Thanks.

Labels (3)
1 Solution

Accepted Solutions
Anonymous
Not applicable
Author

Hi,

log4jLevel,this feature allows you to change the output level at runtime for log4j loggers activated in components in the Job.

For more information, please see:https://help.talend.com/display/TalendDataFabricStudioUserGuide63EN/7.9.5+How+to+customize+log4j+out...

Best regards

Sabrina

View solution in original post

13 Replies
Anonymous
Not applicable
Author

Hi,

The DI tRunJob component can work with spark batch job.

Here is a "Log4jLevel" option in Advanced settings of Run view, which will output component-related logging information at runtime. Let us know if it is Ok with you.

Best regards

Sabrina

0683p000009LuEf.png

Anonymous
Not applicable
Author

 

 

Thanks for the reply.. Actually, I want customised message to print on log files. Is it possible with big data spark job? If yes, how?

 

I am trying to create subjob using tCacheOut and tChachein but getting error message like 

 

17/04/18 15:10:38 INFO SparkContext: Successfully stopped SparkContext
java.lang.NullPointerException
	at org.talend.bigdata.dataflow.spark.batch.hmap.SparkHMapTransform.build(SparkHMapTransform.java:52)
tFoundException: /etc/spark/conf/fairscheduler.xml (No such file or directory)
	at java.io.FileInputStream.open(Native Method)

but that file config file is present in the location. PFA job

0683p000009LuGg.jpg

 

I also connected tCacheout -> (on Component ok) tCacheIn but it didn't generate any output, not even the output folder.

 

There is no null in the data set, I am doing sync data set with every component. Really confused, what went wrong. Not much help is there for tCache in/out.

 

Please let me know, is there any issue with the workflow? Am I using these component correctly?

Anonymous
Not applicable
Author

Hi,

For your issue, could you please use the connection type "OnSubjobOK" instead to see if this issue still repro?

Let us know if it is Ok with you.

Best regards

Sabrina

Anonymous
Not applicable
Author

Hi,

 

I tried OnComponentOK, job ran successfully but it does not generate any output. Have you tried any simple workflow with tCache in/out. 

 

0683p000009LuJQ.png

 

 

OnSubjobOK is not there to connect cache out to cache in.

 

Please help.

Anonymous
Not applicable
Author

Hi,

Please design your work flow like:

tfileinput-->tCacheOutput

| onsubjobok

tCacheInput-->tfileoutput

Best regards

Sabrina

Anonymous
Not applicable
Author

Thanks.. Worked for me. How we decide whether to use onSubJobOk or onComponentOK?

 

Can we create customised log with bigdata job.

 

 

Anonymous
Not applicable
Author

Hi,

Please refer to this article about the difference between OnSubjobOk and OnComponentOk.

https://help.talend.com/pages/viewpage.action?pageId=190513190

What does your customised log look like? The log4jLevel cannot meet your needs?

Best regards

Sabrina

Anonymous
Not applicable
Author

I do not know much about log4j. If I enable this option, where I can see this logs? How I can store this in log or output file?

 

As far as customised message is concerend, it would be like -

Execution started at <datetime>

File loaded at <datetime>

filteration done, <n> row flown to next level at <datetime>

and so on....

 

Thanks...

Anonymous
Not applicable
Author

Hi,

log4jLevel,this feature allows you to change the output level at runtime for log4j loggers activated in components in the Job.

For more information, please see:https://help.talend.com/display/TalendDataFabricStudioUserGuide63EN/7.9.5+How+to+customize+log4j+out...

Best regards

Sabrina