Custom Spark code through tjava

Anonymous · ‎2017-09-01

Hi,

Is there any wordcount like example available on how to integrate our already written custom spark code in talend.I checked the knowledge base and could see tjava for spark supports it but I am not able to find an example to understand the syntax described in the component comments.

Any help will be much appreciated.

Best Regards,

Ojasvi Gambhir

Irshad1 · ‎2017-09-01

Hi Ojasvi

You can check the below code for your reference and can customize according to your requirement:

package PackageDemo;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop..

To see the whole post, download it here
OriginalPost.pdf

Anonymous · ‎2017-09-01

thank you fuzzyedy for you quick response.

Will this code work for spark job tjava component .

My requirement basically is to know how to write spark custom code in tjava component with talend specified syntax present as comments in tjava component for spark, I do have code for wordcount in java spark but not able to integrate with tjava nomenclature.

thanks and regards,

Ojasvi Gambhir

Irshad1 · ‎2017-09-12

yes , this should work. Go and give a try!

goodluck!

Anonymous · ‎2017-10-24

Hi,

these is a example code for tJava with Spark job. the code sample of the component is wrong ( Talend 6.4.1)

in the basic setting :

outputrdd_tJava_1 = rdd_tJava_1.map(new mapInToOut(job)).

in the advanced setting, in class java field

	public static class mapInToOut
			implements
			org.apache.spark.api.java.function.Function<inputStruct, RecordOut_tJava_1> {

		private ContextProperties context = null;
        private java.util.List<org.apache.avro.Schema.Field> fieldsList;
		
		public mapInToOut(JobConf job) {
			this.context = new ContextProperties(job);
		}
		
		@Override
		public RecordOut_tJava_1 call(inputStruct origStruct) {		
			
		if (fieldsList == null) {
				this.fieldsList = (new inputStruct()).getSchema()
						.getFields();
			}

			RecordOut_tJava_1 value = new RecordOut_tJava_1();

			for (org.apache.avro.Schema.Field field : fieldsList) {
				value.put(field.pos(), origStruct.get(field.pos()));
			}

			return value;		
			
		}
	}

Anonymous · ‎2018-01-07

Hi emenuet,

Have you tried this with spark 2.x as well?

Regards,

Dhaval

Big Data

Talend Data Integration