Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi
I want to insert rows of data in a Postgresql table via Talend, how can i make sure that if a row already exist in the database it won't get inserted ? I don't have a specific key to differ each row .
Here's the job im working on
of course, you can use a set of columns as key
you do not need to define all these columns in a database as a primary key (it good but not mandatory)
but definitely, you need to create indexes for these columns (could be 1 combined)
after, you could use "insert or update" in tPostgreSQLOutput component
add. if you need "insert-ignore" which is not available, you have a choice from 3:
- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS
- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend
- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns)
Hi,
not depending from the database and used tools (not only for PostgreSQL + Talend)
for prevent duplicates - you MUST have a unique key or combination of columns
in other case you will always have a risk of duplicates
what possible to do, always depend from environment and you real goals and needs
hello vapukov and thank you for your answer ,
The dataset im working on are tickets extracted from JIRA. The table basically have the name of the ticket, date of creation, status of the ticket ( open, in progress, in review , closed...) . Now my new question is can i have a set of column as a key? for exemple those three columns at once as a key ( because obviously the same ticket can have different updated status in one day )
Thank you for your time
of course, you can use a set of columns as key
you do not need to define all these columns in a database as a primary key (it good but not mandatory)
but definitely, you need to create indexes for these columns (could be 1 combined)
after, you could use "insert or update" in tPostgreSQLOutput component
add. if you need "insert-ignore" which is not available, you have a choice from 3:
- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS
- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend
- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns)