Creating a delta from two sources

gurn · ‎2022-11-21

Hi

I'd like some thoughts from the community on the best approach to creating a delta table. We are using snowflake and have one master table containing up to 9m rows and the data supplier can only provide the full file (9m) which will be provided every 3 months, without a change date or flag which means we need to compare both data sets, identify what has changed and then only load the new / changed records.

We are thinking of converting both tables (all columns) into a JSON string and comparing but before we proceed I wanted to gather any other ideas people may have.

Thanks

Dave

Anonymous · ‎2022-12-05

Hi

It is a bad option to compare full data from two tables for big data volumes using tMap component, maybe you can try the way to compare two tables in SQL, refer to this page.

Regards

Shong

jlolling · ‎2022-12-06

On better way would be to create 2 hash-keys. One hash-key build from all key columns and one hash key build from all relevant attributes.

Now you only have to compare 2 columns and not all and this could potentially be done within a tMap.

But keep in mind, you always need to different flows (with their own tMap and exchanged main and lookup flow) for finding what is missing from a->b and b->a.

Talend Data Integration

Talend Pipeline Designer

Talend Studio

v8.x