Skip to main content
Announcements
See what Drew Clarke has to say about the Qlik Talend Cloud launch! READ THE BLOG
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to perform data insertion in an already filled postgresql table without duplication ?

Hi 

I want to insert rows of data in a Postgresql table via Talend, how can i make sure that if a row already exist in the database it won't get inserted ? I don't have a specific key to differ each row .

 

Here's the job im working on 

Labels (1)
1 Solution

Accepted Solutions
vapukov
Master II
Master II

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

View solution in original post

4 Replies
vapukov
Master II
Master II

Hi,

 

not depending from the database and used tools (not only for PostgreSQL + Talend)

 

for prevent duplicates - you MUST have a unique key or combination of columns

 

in other case you will always have a risk of duplicates

 

what possible to do, always depend from environment and you real goals and needs

Anonymous
Not applicable
Author

hello vapukov and thank you for your answer ,

The dataset im working on are tickets extracted from JIRA. The table basically have the name of the ticket, date of creation, status of the ticket ( open, in progress, in review , closed...) . Now my new question is can i have a set of column as a key? for exemple those three columns at once as a key ( because obviously the same ticket can have different updated status in one day ) 

Thank you for your time 

vapukov
Master II
Master II

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

Anonymous
Not applicable
Author

Thank you!