Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Which to choose - SCD component or manual coding of SCD?

Hi, I am loading 25 tables, from mssql to postgres. Each table approximately contains 1 million rows. Now, I load it first using regular components.

 

Once, the full load is done, I want to run the delta load, meaning I want to use cdc on sql server and use the changed data to either update or insert data in the tables of postgres

 

I can do this using SCD components of talend or make my own SCD flow.

 

I prefer using the SCD component as it takes care of a lot manual update/insert code I have to write otherwise

 

I see the performance with SCD is not as good as the manual update/insert code

 

What can I do to enhance the performance of SCD? I have read in a few places, that SCD is slow, if yes, why would this component be exposed when SCD coding could be done manually?

 

I would prefer to go with SCD as maintaining type1,type2, type3 and hybrid type is managed by the SCD component. This would also prevent a lot of other bug prone code in the manual update/delete route

 

Please suggest as to how I can go about this?

 

Thanks

Labels (2)
4 Replies
vapukov
Master II
Master II

Hi,

 

all depends from number of daily changes

 

yes, SCD is slow, but for example:

 

You have 500 000 customers, for an average company it could be up to 2000 customers changed per day

and this with 5 SCD operations per seconds give us 400sec, 15min - not dramatical

 

but if You try to use the same technics for 100k daily updates - it is another case

Anonymous
Not applicable
Author

Hi vapukov,

 

Thanks for the reply. Let's say I decide to run my job every 15 minutes and the change in data is in terms of 10-15 rows every run. 

 

And rarely , I might get a bulk change say around 5000 rows. 

 

Will using the SCD under such circumstances, be still okay?

 

Thanks
Rathi

Anonymous
Not applicable
Author

Hi vapukov,

 

Any suggestions please?

 

Thanks

Anonymous
Not applicable
Author

Hi Rathi,

 

      There is no standard benchmarkign number for the scenario you have explained. There are multiple factors which will decide the performance in this case. Some of the basic factors are CPU utilization, memory utilization, network bandwidth. Also we need to keep in mind that you need to factor the system resources for both Talend and the database you are using for the SCD. 

 

      A performance bottleneck in one single area can create problems in your plan. 

 

       Please refer the Talend reference architecture to get some high level system resources to be allocated while installing Talend.

 

https://talendpnp.github.io/architecture-center/reference-architecture/summer-2018-7.0/index

 

         But you need to talk to your DBAs and see where you can tune at DB level also. Even after all these precautions, sometimes there will be bottle necks. In those cases, you will have to see how you can parallelize your tasks, the areas where you can fine tune the flow etc.

 

         It is a continuous exercise but create some PoCs and verify in which stage you are facing the bottle necks and then you can plan the tuning exercise from that stage.

 

Warm Regards,

 

Nikhil Thampi