Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Connect 2026! Turn data into bold moves, April 13 -15: Learn More!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

S[ark Streaming creating multiple hdfs directory

Hi All,

 

I have a spark streaming job which is consuming messages from MapR stream. I am trying to put the messages into HDFS location and from there I am trying to process it using a batch process. 

The problem is every batch (I have set  a 2 minutes batch) in the streaming job is creating a separate directory in HDFS with a timestamp value. I am not sure how to merge all the files for a particular day and feed the merged file to my end of day batch job to be processed.

 

Can anybody please help here?

 

Labels (1)
1 Reply
Anonymous
Not applicable
Author

You can use tHiveoutput component to append to the same directory, which invokes df.append method in partitioned manner.One way i could think of.