Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Toronto Sept 9th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Update database from CSV files when each tables have different keys

Hi there,

 

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

0683p000009M9rq.pnginput

0683p000009M9go.pngoutput

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5). 

For example:

0683p000009M9Ho.pngmysql_pk0683p000009M9aL.pngmysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated 0683p000009MACn.png

 

Currently, I only have a dynamic column in my schema since all my keys are different.

0683p000009M9x4.pngtMap

 

When I tried to to run the job, it stated that 'schema must have a key'.

 

Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

 

 

 

Labels (3)
4 Replies
lennelei
Creator III
Creator III

Hello,

 

I'm not sure you'll be able to achieve this without using some tricks.

 

I think you should first load csv data into "loading tables" (table you'll truncate and insert only) and then update your final tables from those data.

 

You can either write manually all the queries or you can build them on the fly from the MySQL catalog.

 

Regards

Anonymous
Not applicable
Author

@lennelei
Thank you for your response.
Can you explain more on the job design/components used? or maybe a job design screenshot??
**I'm totally new on Talend and I'm not good in database as well ....

Thank you so much
Anonymous
Not applicable
Author


@crotmn wrote:

Hi there,

 

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

0683p000009M9rq.pnginput

0683p000009M9go.pngoutput

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5). 

For example:

0683p000009M9Ho.pngmysql_pk0683p000009M9aL.pngmysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated 0683p000009MACn.png

 

Currently, I only have a dynamic column in my schema since all my keys are different.

0683p000009M9x4.pngtMap

 

When I tried to to run the job, it stated that 'schema must have a key'.

 

Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

 

 

 


Right now, I'm trying a lengthy method below.

 

Job1: Load CSV and insert into a temporary database(dynamic schema)

Job2: Inject constraint into the tables (temporary DB)

Job3: Update tables (temporary DB)'s data into the real database

 

If there's a precise and better way to do this, please let me know.

 

Thank you

lennelei
Creator III
Creator III

Hi,

 

first load all the data into empty tables used only as intermediate storage either with Talend (easy with Dynamic schema) or directly from the database:

Talend BulkExec components or MSSQL BULK INSERT, ORACLE SQL LOADER, etc.

then, update final tables using intermediate tables ; two solutions :

 

1) create as many update_[tableName].sql files as needed ; in each file, manually enter the required update query such as :

/*update_tableTarget1.sql*/
UPDATE tableTarget1 t, tableSource1 s SET t.data1 = s.data1, t.data2 = s.data2 WHERE t.id = s.id

Then, within your Talend job, loop over the SQL files and use a tDatabaseRow to execute the update query (you can even parallelize the iterate link after the tFileList) :

tFileList (*.sql) -iterate-> tFileInputFullRow -row-> tFlowToITerate -iterate-> t<DataBase>Row

the tFileInput* reads the SQL file and the tFlowToIterate put the query in globalMap but you can use others methods as well.

 

Or

2) build dynamically the queries from the database catalog.

You can get a list of tables to update (this heavily depends on your database) :

select table_name, ... from <system_tables> where ...

You then loop over the tables, retrieve the list of columns, identify which are keys and so one and build all the queries dynamically and use a tDatabaseRow to do all the updates.

 

The first method is easier but harder to maintain. The second method will ask more work at first but then, you'll be able to add tables, change formats much faster.