Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Talend Cloud AWS EU Scheduled Outage: Starting Tues 26 May 21:00 CEST with expected completion Wed 27 May 01:00 CEST
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Why removing somemany times?

I have an update job that is updating data for my "Bonds" entity in my "SecurityMaster" Model.
Below is the output from the exist.log. I'm wondering why is it trying to remove the same "Id" 28646 somany times? This is 10 seconds to update each record. EXTREMELY SLOW!
I really hope i'm doing something wrong here... any ideas?
2011-03-08 02:41:42,158 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:42,159 INFO (RpcConnection.java :307) - query took 0ms.
2011-03-08 02:41:43,098 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:44,537 INFO (RpcConnection.java :307) - query took 0ms.
2011-03-08 02:41:45,449 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:46,990 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:47,037 INFO (RpcConnection.java :307) - query took 0ms.
2011-03-08 02:41:48,042 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:49,538 INFO (RpcConnection.java :307) - query took 0ms.
2011-03-08 02:41:50,280 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:51,139 INFO (NativeBroker.java :2222) - Removing document SecurityMaster.Bonds.28646 (38216) ...
2011-03-08 02:41:52,040 INFO (RpcConnection.java :307) - query took 0ms.

Labels (2)
14 Replies
Anonymous
Not applicable
Author

hello Talend Support,
Do you have any idea what is happening above. I have been updating 50K records for 2 days now. This is such bad performance.

image upload
Please help/suggest. What more information do you need?
Anonymous
Not applicable
Author

when I use bulkmode, its does not improve the performance either. Am I doing something wrong?

png upload
Anonymous
Not applicable
Author

Hi muraliv,
To improve performance with tMDMBulkLoad, you can have a look at this component help page (select it in the designer, press F1).
There's a scenario that has been recently added (in 4.2.1), that explains how to chunk the XML that you send to tMDMBulkLoad.
When you face mid/high volumetrics, tWriteXMLField will slow down the process. One workaround is to create small temporary XML chunks to speed up the data integration.
It also enables you to put parallelisation in your job.
At a customer, I reached an average 90-100 rows/sec on 300k rows, using Talend MDM EE 4.1.2 (eXist) with this technique.
Hope that helps,
Cyril.
Anonymous
Not applicable
Author

Cyril,
I created the job as per the documentation, I still see same poor performance on "updates". You reached 90-100 rows/sec with inserts or updates? I get good speed with inserts but not with updates.
To take it up a notch, I created the XML files in chunks of 500 records and distributed it to 3 machines and kicked-off the process last night, Its still running! it actually got worse. My talendMDM test server is CE 4.1.2 with 4 CPU and 8G on linux. I obviously did not try this update on my PRODUCTION server which is talendMDM EE 4.1.2.
- Would EE perform better over CE?
- Do I need to do something with the exist logging to improve performance?
- Also I do not see any entries in the journal for all the updates that occured. There is no documentation that states journal will be skipped, at least from what I have read.

image upload
Please suggest.
Anonymous
Not applicable
Author

My mistake, I though it was about inserting data. Indeed, I reached 100 rows/sec on insert with 4.1.2 eXist, but not update (actually, I didn't benchmarked this kind of job on updates).
Do you have any index set on the primary key of your entity for this update ?
If the underlying database is still eXist, performance should remain the same between CE & EE.
But EE enables to use Qizx as a database since 4.2.x, which greatly enhance performance. Moreover, since 4.2.x, tMDMBulkLoad has been deeply reworked to achieve better performance on insert with Qizx (1500/2000+ rows/sec). I didn't tested it yet on bug updates, but I'm pretty sure that performance must be way higher than before, as data is indexed on the fly with Qizx.
I personnally customize the default log4j configuration to show less logs. It could maybe enhance perfomance a little bit.
Finally, tMDMBulkLoad doesn't write in the journal. Once again, sorry, I though it was about inserting, not updating...
Anonymous
Not applicable
Author

Cyril,
Thanks for the confirmation. Does Qizx come with 4.2.x or it needs to be purchased separately? I'm already tired of eXist.
Anonymous
Not applicable
Author

Qizx is now the default XML database since Talend MDM EE 4.2.x, even though you can still choose eXist.
Regards,
Cyril.
Anonymous
Not applicable
Author

How do I get the talendMDM EE 4.2.x ? Who do I contact? I currently have MDM EE 4.1.2.
Thanks!
Anonymous
Not applicable
Author

Don't hesitate to contact the support by opening a new ticket to ask for the upgrade, or to ask your sales representative.
Regards,
Cyril.