Performing join in Spark job using JDBC

Anonymous · ‎2018-08-22

Hi,

I am trying to fetch from/ write to SSO redshift cluster in spark job. for this i have used JDBC component as native redshift component in spark framework doesn't support for SSO redshift cluster.

But i am facing issue while performing join as its not able to recognize the table name if we are providing more than 1 table name in JDBC input component query.

is there any parameter i am missing here. please help.

Thanks,

Bhushan

Anonymous · ‎2018-08-24

Hello,

We will appreciate if you could post the tJDBCInput component setting screenshot on forum.

Please mask your sensitive data.

Best regards

Sabrina

Anonymous · ‎2018-08-28

Hi Sabrina,

Please find the tJDBCInput component screenshot -

IN SQL query section of screenshot, if we are providing more than 1 table name (which is compulsory) while doing join, then its getting fail with error " table or view doesn't exist". so ultimately we have to pass only table name which is given in "table_name" section. if we are keeping table name section empty then also its getting fail.

tJDBCInput.PNG

Anonymous · ‎2018-09-06

Hello,

Are you performing left join on multiple tables?Spark has a constraint on Spark sql queries to have all the tables declared before hand as dataframe in order to execute it.

Sth like

myOrdersTableDataframe.registerAsTable("Orders")

could you please try 2 JDBCInput each loading one of the involved tables in the join (Customers and Orders here) and perform a tSqlRow performing the left outer join.
Spark will merge the 3 steps into one stage at execution time.

Here exists a workitem jira issue.

https://jira.talendforge.org/browse/TBD-5707

Let us know if it is OK with you.

Best regards

Sabrina

Big Data

JDBC

Talend Data Integration

v7.x