Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Save $650 on Qlik Connect, Dec 1 - 7, our lowest price of the year. Register with code CYBERWEEK: Register
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Real-Time Big Data - Storm job - help with configuration

I'm completed a job that uses Storm components by following this tutorial: 'Getting started with a Storm Job' on help talend website.
I'm using Talend Fabric 6.2.1
When I run the job I get the following error:
java.lang.RuntimeException: java.io.FileNotFoundException: stormtestfromstandard_0_1.jar (The system cannot find the file specified)
at backtype.storm.StormSubmitter.submitJar(StormSubmitter.java:164)
at org.talend.libs.tbd.ee.libstorm.ClusterStormJobRunHelper.submitJob(ClusterStormJobRunHelper.java:66)
at org.talend.libs.tbd.ee.libstorm.StormJobRunHelper.runStorm(StormJobRunHelper.java:96)
at bigdata_project.stormtestfromstandard_0_1.StormTestFromStandard.runJobInTOS(StormTestFromStandard.java:627)
at bigdata_project.stormtestfromstandard_0_1.StormTestFromStandard.main(StormTestFromStandard.java:572)
Caused by: java.io.FileNotFoundException: stormtestfromstandard_0_1.jar (The system cannot find the file specified)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at java.io.FileInputStream.<init>(FileInputStream.java:93)
at backtype.storm.utils.BufferFileInputStream.<init>(BufferFileInputStream.java:31)

What I noticed is that when I convert a standard job to a 'Big Data Streaming' job, some components are marked as missing, like for example: tHDFSConnection and KafkaConnection that are critical components to configure Kafka and Hadoop. When I look at the configuration available under 'Storm Configuration' (Run tab) there is a section called Storm Configuration where you are supposed to enter configuration parameters in order to connect to Hadoop/Storm cluster. I couldn't find any document and/or tutorial that explains which parameters are required and how the 'Name' should be formatted.
Below is my job (it's very basic, but quite frustrating):



0683p000009MFhB.jpg


The main components I'm using:
- tKafkaInput: to ingest a stream of data
- tJavaStorm (for Storm job) and tJavarow (for Spark job) to convert the incoming string to an array
- tAggregaterow to count the number of elements after applying a grouping
- tLogRow to display the result.

Any help or direction is highly appreciated.

Thanks!

Labels (2)
2 Replies
Anonymous
Not applicable
Author

Other elements for the troubleshooting:
- the file that is claimed to be missing, it's actually not missing; I can see it in the folder C:/Users/user/workspace/.Java/target
- I can see also on the remote Hadoop cluster the jar file, under /hadoop/storm/nimbus/inbox/

I guess it's an issue with permissions, but like I said in my previous post, there is not flexibility to configure HDFS or Kafka in Talend (when selecting a Big Data Streaming job); the only hope is in the 'Storm Configuration' parameters, but I'm going blind without any documents that explains how to use them.

Thanks.
Anonymous
Not applicable
Author

My conclusion at this point:
- Talend doesn't actively support Storm; they recommend using Spark Streaming (so why not just removing Storm connector?)
- Buying Talend Fabric (Talend Enterprise edition) is a waste of money without getting support; better trying to work with Talend Open Studio at that point.