Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
Is it possible to write custom java code in tJavarow for Big data jobs.
I am using tJavarow inside a spark job.
but the custom code is not executed.
The following is the custom code
context.Flag="YES"; System.out.println("###################### This output is from tJavaRow ###################");
the value is not assigned as well as the message is not printed.
Thanks
not sure about something more serious, but as described - yes it work
variables - not assigned
Hi ,
is that a talend bigdata spark job?
If so , instead of running as local, run it in a spark cluster.
I am running my spark job in the cluster.
if i put custom in tJava instead of tJavaRow, its getting executed.
I am able to see the output in spark application logs.
but custom code in tJavarow is not getting executed.
Thanks
yes, will test, but may be You are right - it will not work
let wait - what Talend staff answer 🙂
yeah..lets wait..
Hello,
Custom code components (tJava and tJavaRow) behave and have to be used differently depending on what type of job you are building. For instance, Spark batch jobs you need to write with Spark Java API syntax to work with the input and output RDD (read the comments in the component when you first add it for help on how to do a test print on your input RDD, try that instead of your system.out*). In Spark streaming job, you'll be working with RDD in Dstream. tJava and tJavaRow behave differently too which tJavaRow uses Spark DataFrames API and tJava is purely working with RDDs.
See the documentation for the differences between them when using across various types of jobs:
Hope that helps.
Hi jpmauss,
Could you please provide an example like wordcount or something on how to write custom spark code in tjava/tjavarow ,I tried doing the same by reading the description in the component but was not successful.Could not find any example in knowledge base.
Any help will be much appreciated.
Best Regards,
Ojasvi Gambhir
Hi ALL,
if you find any materials to write custom java code in tJava for bigdata version. Please let me know.
Thanks
I can look to share some examples, however you'd need to be familiar with Spark Java API which is different than straight java like in the standard jobs. Also with tjavarow you'd need to be familiar with Spark SQL and dataframes API.
See this link for intro to programming in Spark, click the Java tab to see how to work with the data using RDDs:
https://spark.apache.org/docs/1.6.2/programming-guide.html
See this link for intro to Spark SQL and DataFrames API:
https://spark.apache.org/docs/1.6.2/sql-programming-guide.html
When working in Talend, the tInput(whatever) creates an RDD that is to be used in the tJava. See the 'code' tab in studio for how it initializes the spark context and loads the data to an RDD.
Hi,
these is a example code for tJava with Spark job. the code sample of the component is wrong ( Talend 6.4.1)
in the basic setting :
outputrdd_tJava_1 = rdd_tJava_1.map(new mapInToOut(job)).
in the advanced setting, in class java field
public static class mapInToOut implements org.apache.spark.api.java.function.Function<inputStruct, RecordOut_tJava_1> { private ContextProperties context = null; private java.util.List<org.apache.avro.Schema.Field> fieldsList; public mapInToOut(JobConf job) { this.context = new ContextProperties(job); } @Override public RecordOut_tJava_1 call(inputStruct origStruct) { if (fieldsList == null) { this.fieldsList = (new inputStruct()).getSchema() .getFields(); } RecordOut_tJava_1 value = new RecordOut_tJava_1(); for (org.apache.avro.Schema.Field field : fieldsList) { value.put(field.pos(), origStruct.get(field.pos())); } return value; } }