Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Discover how organizations are unlocking new revenue streams: Watch here
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Hadoop to Oracle with Dynamic schema

Hi All,

 

I have the following scenario at hands :

  • I receive a flat file on a daily basis that is imported into HDFS.
  • This file has a variable amount of columns
  • I need to design a job that takes the data from this HDFS file, and populates an Oracle table.
  • This Oracle table will be dropped and recreated at every run due to the variable amount of columns
  • All datatypes can be defined as String (VARCHAR2) in the Oracle table. (As I think there's no way to 'guess' the datatypes at runtime from a CSV.

I've been breaking my head around the design, and have come up with the following design limitations at the moment.

 

  • Scenario 1
    • tHDFSInput -> tOracleOutput
    • I would be using the a 1 column schema of type 'Dynamic' to achieve this
    • Issue : Type Dynamic is not supported by tHDFSInput
  • Scenario 2
    • Script a Hive Create Table statement based on the header row of the hdfs file
    • Create the external table pointing to the HDFS file
    • tHiveInput -> tOracleOutput
    • Issue : Type Dynamic is not supported by tHiveInput

Is there a trick to be able to use a Dynamic schema in Hive/HDFS components?

Or I there a better way all together to approach this design?

 

Thanks a lot for your input,

Kr,

Michael 

Labels (1)
2 Replies
Anonymous
Not applicable
Author

In Scenario 2 , Build the select query dynamically using Hive Metastore for that table. 

Anonymous
Not applicable
Author

Hello,

Here is a new feature jira issue about "Support Hive Metastore HA".

https://jira.talendforge.org/browse/TBD-4059

Best regards

Sabrina