Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Talend CDC question in PostgresQL Database

Hi,

 

I tried Talend CDC with a sample table called Account.  It created the following view.

 

CREATE OR REPLACE VIEW "sample"."TCDC_VIEW_account" AS
SELECT
SRC."TALEND_CDC_TYPE" AS "TALEND_CDC_TYPE",
SRC."TALEND_CDC_SUBSCRIBERS_NAME" AS "TALEND_CDC_SUBSCRIBERS_NAME",
max(SRC."TALEND_CDC_CREATION_DATE") AS "TALEND_CDC_CREATION_DATE",
SRC."user_id" AS "user_id",
TARG."username" AS "username",
TARG."password" AS "password",
TARG."email" AS "email",
TARG."created_on" AS "created_on",
TARG."last_login" AS "last_login"
FROM "sample"."TCDC_account" AS SRC LEFT JOIN "sample"."account" AS TARG
ON SRC."user_id"=TARG."user_id"
WHERE SRC."TALEND_CDC_STATE"='1'
GROUP BY SRC."TALEND_CDC_SUBSCRIBERS_NAME", SRC."user_id", TARG."username", TARG."password", TARG."email", TARG."created_on", TARG."last_login",
SRC."TALEND_CDC_TYPE" ;

 

If we do 2 updates to a row in account table before subscriber pulls the changes, according to the above view it will only pull the most recent update out of those. If we want to pull both the updates for maintaining history records, How can we do that?

 

Does Talend CDC using Trigger mode in Postgres Database supports that?

 

Labels (2)
4 Replies
Anonymous
Not applicable
Author

Hi,

 

    The reason it is picking only latest record is due to the Group by clause you have in your Talend view.

 

GROUP BY SRC."TALEND_CDC_SUBSCRIBERS_NAME", SRC."user_id", TARG."username", TARG."password", TARG."email", TARG."created_on", TARG."last_login",
SRC."TALEND_CDC_TYPE" ;

  If you are removing the group by, all the individual transactions will be picked for processing. Could you please try it and let us know the results?

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

Even if we remove the Group by, it will pick 2 rows for 2 updates but, those 2 rows are going to contain same values because SRC tbale is same (Account) foo both the rows. So, it will be pointing to same values in that. Do you agree? For those 2 updates, primary key will be same and CDC_TYPE_IND will be 'U'.

Anonymous
Not applicable
Author

Hi,

 

    I agree and in this case, you will have to capture the entire table snapshot in your interim table rather than just primary key.

 

Warm Regards,
Nikhil Thampi

Please appreciate our Talend community members by giving Kudos for sharing their time for your query. If your query is answered, please mark the topic as resolved 🙂

Anonymous
Not applicable
Author

This is the structure of the intermediate table which is automatically created by Talend.

 

CREATE TABLE "sample"."TCDC_account"
(
"TALEND_CDC_SUBSCRIBERS_NAME" VARCHAR(50) NOT NULL,
"TALEND_CDC_STATE" VARCHAR(1),
"TALEND_CDC_TYPE" VARCHAR(1),
"TALEND_CDC_CREATION_DATE" TIMESTAMP,
"user_id" SERIAL NOT NULL
) ;

 

In the above, user_id is the primary key of the main table. That's why, it only has user_id in the above table.

 

Can we modify the above table structure and  view to accommodate all the needs? Will the CDC program going to work if this structure is different than what is expected?

 

 

Also, Does Talend provides any better solution for this scenario (Real-time data Integration) from Oracle to Oracle, Oracle to Postgres, Postgres to Postgres and Postgres to Oracle?