Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Kafka input file store into HDFS.

Trying to store kafka input file into HDFS.. it is long running. File created into HDFS but content not copied into file. Job is long running. Can please help me what is the things I am doing wrong.

 

 

Labels (2)
1 Solution

Accepted Solutions
vapukov
Master II
Master II

if it slow with local file or logRow - you need seriously investigate your network architecture

Kafka extremely fast and no visible bottlenecks in this job

 

do you test your kafka connection with any other tools? like command line client 

View solution in original post

3 Replies
vapukov
Master II
Master II

Hi,

 

first of all - I suggest you delete connection between Kafaka Connection and KafkaInput

this is 2 independent parts:

  • everything whaat run before job
  • and real job

 

2nd - there are many paramters affected for total performance, but insert each record non stop into HDFS direct might be not. the best idea

try to test in main part of job

 

create infinite loop:

  • fetch some limited number of records as a variant 10000 messages
  • store all to local csv file
  • append file to HDFS

it could be faster overall 

 

P.S.

but as I mention above - real performance depends on many factors, network latency is one of them

for example:

  • zip 100 000 2kb files
  • transfer them to the remote server and unzip all over 1Gb network

will be 50+ times faster than copy files direct, same with databases - import from csv or batch inserts up to 100+ times faster than insert by single row

 

Anonymous
Not applicable
Author

According you suggestion I tried to to store data of kafka topic into csv files or tlogrows because this topic has only 280 records. But both case taking long time.

vapukov
Master II
Master II

if it slow with local file or logRow - you need seriously investigate your network architecture

Kafka extremely fast and no visible bottlenecks in this job

 

do you test your kafka connection with any other tools? like command line client