Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am trying to upload my json data to HDFS before storing it into Hive using talend data fabric. My workflow is:
**** > tmaps / tLogRows (structuring json data) > tHDFSPut > THiveCreateTable > tHiveLoad.
Workflow:
Result from tLogRow:
tHDFSPut configuration:
I already managed to structure the data into a proper table (with schema). However once I tried to upload the data into HDFS using tHDFSPut component, it gives the error (see the detail error in attachment), as seems like I didnt fetch any data from tLogRow.. e.g. ((String)globalMap.get("tLogRow_1_CURRENT_FILEPATH") - no idea how to use this in this case) :
"java.lang.NullPointerException: null"
My question is:
Note:
Check the 'Use Perl5 Regex...' box if you set a regular expression in FileMask, or write the full file name if there is only one file in the folder.
Another one trial was using THDFSOutput, where I sync the schema from tMap_1. but same error persists.
Hi
You are using an existing connection on HDFS component, make sure the connection is created before it is used. we usually create the connection in the beginning of job.
PreJob--oncomponentok--tHDFSConnection.
You don't use tHDFSPut correctly, this component load the local file to HDFS system, but I see the local folder field is empty. Change your job design as below:
PreJob--oncomponentok--tHDFSConnection
....tmaps / tLogRows (structuring json data)
|onsubjobok
tHDFSPut
|onsubjobok
THiveCreateTable --oncomponentok->tHiveLoad.
Regards
Shong
Hi Shong,
Yes I actually already make the hdfs connection in the workflow.
And tried the suggested workflow you mentioned above.
Setting in tHDFSPut:
However, I am quite confused how do I pass the result from tLogRow to tHDFSPut? because here I see only the local (where the source file is stored) and hdfs dir (destination dir to store file).
store the result to a local file using tFileOutputDelimited first, then select the local file on tHDFSPut.
Hi Shong,
I already put the data into local in csv file. But when connecting to HDFSPut, I still receive following error:
java.lang.NullPointerException: null
why I see the 'Use an existing connection' box is unchecked? In Local directory field, set the path to directory path, not the file path. In Files table, set the New name.
The existing connection is checked (connected to tHDFSConnection - the feature my of studio attempts to highlight the layout of the box).
Already change the local dir and for new name, i will leave it blank as I dont want to use any new name and will just retain the existing name. However, it still gives the null value error. Did I miss any config in that component?
Check the 'Use Perl5 Regex...' box if you set a regular expression in FileMask, or write the full file name if there is only one file in the folder.