Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Trying to store kafka input file into HDFS.. it is long running. File created into HDFS but content not copied into file. Job is long running. Can please help me what is the things I am doing wrong.
if it slow with local file or logRow - you need seriously investigate your network architecture
Kafka extremely fast and no visible bottlenecks in this job
do you test your kafka connection with any other tools? like command line client
Hi,
first of all - I suggest you delete connection between Kafaka Connection and KafkaInput
this is 2 independent parts:
2nd - there are many paramters affected for total performance, but insert each record non stop into HDFS direct might be not. the best idea
try to test in main part of job
create infinite loop:
it could be faster overall
P.S.
but as I mention above - real performance depends on many factors, network latency is one of them
for example:
will be 50+ times faster than copy files direct, same with databases - import from csv or batch inserts up to 100+ times faster than insert by single row
According you suggestion I tried to to store data of kafka topic into csv files or tlogrows because this topic has only 280 records. But both case taking long time.
if it slow with local file or logRow - you need seriously investigate your network architecture
Kafka extremely fast and no visible bottlenecks in this job
do you test your kafka connection with any other tools? like command line client