Update database from CSV files when each tables ha... - Qlik Community

Anonymous · ‎2020-03-12

Hi there,

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

input

output

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5).

For example:

mysql_pkmysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated

Currently, I only have a dynamic column in my schema since all my keys are different.

tMap

When I tried to to run the job, it stated that 'schema must have a key'.

Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

lennelei · ‎2020-03-12

Hello,

I'm not sure you'll be able to achieve this without using some tricks.

I think you should first load csv data into "loading tables" (table you'll truncate and insert only) and then update your final tables from those data.

You can either write manually all the queries or you can build them on the fly from the MySQL catalog.

Regards

Anonymous · ‎2020-03-12

@lennelei
Thank you for your response.
Can you explain more on the job design/components used? or maybe a job design screenshot??
**I'm totally new on Talend and I'm not good in database as well ....

Thank you so much

Anonymous · ‎2020-03-16

@crotmn wrote:

Hi there,

I'm trying to update database with multiple tables (100+) from CSV files (in a directory) using the following job design and components.

input

output

The issue here is that all my tables (in a DB) have different schemas and different primary keys (most of my PK is a composite key made from multiple PK columns, usually from 1-5).

For example:

mysql_pkmysql_pk1

So I'm not sure how I could dynamically set my PK in schema.

Does anyone have an idea how to make this work?

A screenshot of job design/components would be very much appreciated

Currently, I only have a dynamic column in my schema since all my keys are different.

tMap

When I tried to to run the job, it stated that 'schema must have a key'.
Exception in component tDBOutput_2 (test02)
java.lang.RuntimeException: For update, Schema must have a key

Right now, I'm trying a lengthy method below.

Job1: Load CSV and insert into a temporary database(dynamic schema)

Job2: Inject constraint into the tables (temporary DB)

Job3: Update tables (temporary DB)'s data into the real database

If there's a precise and better way to do this, please let me know.

Thank you

lennelei · ‎2020-03-17

Hi,

first load all the data into empty tables used only as intermediate storage either with Talend (easy with Dynamic schema) or directly from the database:

Talend BulkExec components or MSSQL BULK INSERT, ORACLE SQL LOADER, etc.

then, update final tables using intermediate tables ; two solutions :

1) create as many update_[tableName].sql files as needed ; in each file, manually enter the required update query such as :

/*update_tableTarget1.sql*/
UPDATE tableTarget1 t,
       tableSource1 s
SET    t.data1 = s.data1, t.data2 = s.data2
WHERE     t.id = s.id

Then, within your Talend job, loop over the SQL files and use a tDatabaseRow to execute the update query (you can even parallelize the iterate link after the tFileList) :

tFileList (*.sql) -iterate-> tFileInputFullRow -row-> tFlowToITerate -iterate-> t<DataBase>Row

the tFileInput* reads the SQL file and the tFlowToIterate put the query in globalMap but you can use others methods as well.

Or

2) build dynamically the queries from the database catalog.

You can get a list of tables to update (this heavily depends on your database) :

select table_name, ... from <system_tables> where ...

You then loop over the tables, retrieve the list of columns, identify which are keys and so one and build all the queries dynamically and use a tDatabaseRow to do all the updates.

The first method is easier but harder to maintain. The second method will ask more work at first but then, you'll be able to add tables, change formats much faster.

Update database from CSV files when each tables have different keys

Cloud

Studio

v7.x