Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

S[ark Streaming creating multiple hdfs directory

Hi All,

 

I have a spark streaming job which is consuming messages from MapR stream. I am trying to put the messages into HDFS location and from there I am trying to process it using a batch process. 

The problem is every batch (I have set  a 2 minutes batch) in the streaming job is creating a separate directory in HDFS with a timestamp value. I am not sure how to merge all the files for a particular day and feed the merged file to my end of day batch job to be processed.

 

Can anybody please help here?

 

Labels (1)
1 Reply
Anonymous
Not applicable
Author

You can use tHiveoutput component to append to the same directory, which invokes df.append method in partitioned manner.One way i could think of.