Skip to main content
Announcements
Accelerate Your Success: Fuel your data and AI journey with the right services, delivered by our experts. Learn More
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

How to perform data insertion in an already filled postgresql table without duplication ?

Hi 

I want to insert rows of data in a Postgresql table via Talend, how can i make sure that if a row already exist in the database it won't get inserted ? I don't have a specific key to differ each row .

 

Here's the job im working on 

Labels (1)
1 Solution

Accepted Solutions
vapukov
Master II
Master II

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

View solution in original post

4 Replies
vapukov
Master II
Master II

Hi,

 

not depending from the database and used tools (not only for PostgreSQL + Talend)

 

for prevent duplicates - you MUST have a unique key or combination of columns

 

in other case you will always have a risk of duplicates

 

what possible to do, always depend from environment and you real goals and needs

Anonymous
Not applicable
Author

hello vapukov and thank you for your answer ,

The dataset im working on are tickets extracted from JIRA. The table basically have the name of the ticket, date of creation, status of the ticket ( open, in progress, in review , closed...) . Now my new question is can i have a set of column as a key? for exemple those three columns at once as a key ( because obviously the same ticket can have different updated status in one day ) 

Thank you for your time 

vapukov
Master II
Master II

of course, you can use a set of columns as key

 

you do not need to define all these columns in a database as a primary key (it good but not mandatory)

but definitely, you need to create indexes for these columns (could be 1 combined)

 

after, you could use "insert or update" in tPostgreSQLOutput component

 

add. if you need "insert-ignore" which is not available, you have a choice from 3:

- insert all data into temp table on target server and use tSQLRow for run INSERT SELECT WHERE NOT EXISTS

- use tSQLRow with prepared statements instead of tDBOutput - and run INSERT INTO TABLE (a,b,c) values (?,?,?) - check docs or forum about prepared statements in Talend

- if data relativly small - use tMap with catch rejected by INNER JOIN (tMap also can join by many columns) 

Anonymous
Not applicable
Author

Thank you!