Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to load data from PostgreSQL to Hive

I am new to Talend, and I've been tasked (database course project) with loading data from a database (postgresql) and transfer it to a data warehouse (hive) via ETL, we were suggested to use Talend. However, I'm not sure how to transfer data from pgsql to hive, since there is no tHiveInput component to "map" directly the data from pgsql to Hive. I've also tried converting the data from pgsql to a .csv file and try to load this data with a tHiveLoad component but this didn't work either because I'm unable to connect the tFileOutput component to a tHiveload component.

 

So I'm unsure on what to do. TLDR, not sure how to load data from pgsql to hive via talend.

Labels (3)
9 Replies
Anonymous
Not applicable
Author

Hi,

 

   One option is to read data from PostgresSQL and push it to HDFS layer using tHDFSOutput component.

 

    Then read the file using using a tfileinputdelimited in a Bigdata batch job and push it into Hive layer by tHiveOutput.

 

    There are other methods also but this is quite simple and straight forward method since you are doing Talend for first time.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Hi Nikhil,

I'm trying to implement your solution, what I have right now is simply my Postgres connection -> tHDFSOutput. However, I'm a bit confused on what to do next, should I connect my tHDFSOutput component to a tFileInput? Or is there another step in between HDFS and tFileInput?

 

Also, I was browsing my Hive components, and I don't have a tHiveOutput, not sure why.

Anonymous
Not applicable
Author

Hi,

 

    You will have to create a separate BigData Batch job to do the rest. The HDFS file can be read by the tfileinputdelimited component in the Bigdata job. Once you create both these jobs separately, its a matter of orchestrating both by calling them one after another through a parent Talend standard job.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Do you know if this is possible working with Open Studio for Big Data, or is it only possible with the Big Data Platform?

 

I'm trying it from Open Studio for Big Data but there is no combo box to convert a job into Big Data Batch.

 

Thanks in advance.

Anonymous
Not applicable
Author

Hi,

 

     I was referring like the belwo flow. You do not have to convert any standard job to Bigdata batch job.

0683p000009M5Fm.png

 

0683p000009M5Fr.png

 

0683p000009M5Fw.png

 

Here you are calling the Bigdata batch job after loading the data to HDFS layer through standard job. And the jobs are called in a sequential fashion.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Hi Nikhil,

I'm trying to set my job as yours, but the problem is we don't have a tHiveOutput component. I've attached the options that Talend provides me when I drag the connection to my tFileInputDelimited -> tHiveOutput job. As you can see, there's not tHiveOutput component.


output.PNG
Anonymous
Not applicable
Author

Hi,

 

   Could you please go to File -> Edit Project Properties and check whether the component has been added to Palette under Bigdata Spark jobs?

 

0683p000009M5GL.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Hi,

I don't have a Big Data Batch Job component anywhere. I'm using Talend Open Studio for Big Data 7.11, is Big Data Batch Job a premium feature or something?

Anonymous
Not applicable
Author

Hi,

 

     Apologies. This feature is only available in Subscription version.

0683p000009M5CG.png

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂