How to load data into HDFS using Spark streaming or Batch job with reference to data

Bluemoon — Sat, 16 Nov 2024 08:08:14 GMT

Hi All,

Usecase:

I have a data in coming form a file.

For example

RawDataA, A

RawDataB, B

RawDataC, C

RawDataZ, Z

Now I wanted to store "RawDataX" in corresponding X value location

/X/RawDataX

Note:

I don't want to create 26 tFileOutputDelimited in job

Is there any possible way where i can use single tFileOutputDelimited for all records

Heads up

In DI, we can use tFlowtoIterate and context variable in tFileOutputDelimited to generate above requirement

Can anyone give some ideas how to implement same thing in spark or map-reduce job ?

Re: How to load data into HDFS using Spark streaming or Batch job with reference to data

Anonymous — Wed, 26 Sep 2018 09:01:26 GMT

Hello,

So far, tflowtoIterate is available in Standard ETL only.

Hope it will help.

Best regards

Sabrina