Solved: Re: Slow SQL Server updates - Page 5 - Qlik Community

Anonymous · ‎2018-03-21

Is Talend making any effort to improve Talend performance with MS SQL Server?

Having searched this community and old talendforge forums entries, it appears there are performance issues with Talend using SQL Server. I found this article (Data loading from MS SQL Server DB to other MS SQL Server DB is very low compared to SSIS) which confirms my own experience in comparing Talend to Datastage (2005 Server Edition -- an old, no longer supported version).

I built a Talend job to mirror what the old Datastage job is doing -- syncing content between two tables: a source table on one SQL Server and a target table on another SQL Server. The data is about 220k rows. The Datastage jobs does this (with MSOLEDB stage) in < 2 minutes with 2238 rows/sec. Talend is poking along (with tMSSqlOutput component) and barely able to make less than 30 rows/sec! Really?? Almost 100x slower? What the heck is it doing?

I tried changing the batch size, changing to use a component connection and changing the Commit Every setting; tried using select top to limit rows (just to see the impact). None of these made significant differences in performance. Still achieves < 30 rows/sec.

Anonymous · ‎2019-09-06

Your Jira looks fine.

If you find anything further that is relevant, feel free to add it in comments.

By the way, have you taken part in the survey (advertised in the right side bar at the top on all posts)? I'd value your input. It only takes a few minutes (less than 5)

rmartin2 · ‎2019-09-26

Hello @rhall ,

There's still no assignment to my issue, but this issue is slowing me down a lot !

I took a peek at the generation code of the component and found the part responsible for not building the sp_prepare statement. the main reason seems to be that it messes with the stats !

Basically, the code is doing for INSERT/UPDATE operation:

try {

sp_execute INSERT;

NB_INSERT++;

}

catch (fail) {

TRY{

sp_execute UPDATE;

NB_UPDATE++;

}

Meanwhile a pure INSERT is doing :

batch_size=10000

sp_prepare INSERT;

try {

executeBatch(INSERT);

NB_INSERT+=batch_size;

}

So with this, in order to maintain the stats, you need to know the volume updated and the volume inserted.

I think it's possible doing the following code :

batch_size=10000

sp_prepare INSERT;

sp_prepare UPDATE;

try {

sp_execute INSERT;

NB_INSERT++;

}

catch (fail) {

TRY{

sp_execute UPDATE;

NB_UPDATE++;

}

It will not use the "batch" function, but at least, the statement will be prepared, hoping that it will resolve the issue. It's using a anonymous class to do it (with 2 similiar function, I don't why) :

// ////////batch execute by batch size///////
class LimitBytesHelper_tDBOutput_2 {
	public int limitBytePart1(
			int counter,
			java.sql.PreparedStatement pstmt_tDBOutput_2)
			throws Exception {
		try {

			for (int countEach_tDBOutput_2 : pstmt_tDBOutput_2
					.executeBatch()) {
				if (countEach_tDBOutput_2 == -2
						|| countEach_tDBOutput_2 == -3) {
					break;
				}
				counter += countEach_tDBOutput_2;
			}

		} catch (java.sql.BatchUpdateException e) {

			int countSum_tDBOutput_2 = 0;
			for (int countEach_tDBOutput_2 : e
					.getUpdateCounts()) {
				counter += (countEach_tDBOutput_2 < 0 ? 0
						: countEach_tDBOutput_2);
			}

			System.err.println(e
					.getMessage());

		}
		return counter;
	}

	public int limitBytePart2(
			int counter,
			java.sql.PreparedStatement pstmt_tDBOutput_2)
			throws Exception {
		try {

			for (int countEach_tDBOutput_2 : pstmt_tDBOutput_2
					.executeBatch()) {
				if (countEach_tDBOutput_2 == -2
						|| countEach_tDBOutput_2 == -3) {
					break;
				}
				counter += countEach_tDBOutput_2;
			}

		} catch (java.sql.BatchUpdateException e) {

			for (int countEach_tDBOutput_2 : e
					.getUpdateCounts()) {
				counter += (countEach_tDBOutput_2 < 0 ? 0
						: countEach_tDBOutput_2);
			}

			System.err.println(e
					.getMessage());

		}
		return counter;
	}
}

The public int limitBytePart2(int counter,java.sql.PreparedStatement pstmt_tDBOutput_1) doesn't seem to be used ...

The part to prepare the statement (in the component generation) is a bit big so I don't know how to update it properly, and I'm not familiar with the language used (still Javajet framework in 7.x or pure maven?).

Thank you in advance for your help.

Sincerely,

Anonymous · ‎2019-09-26

Hi @mhodent,

I will investigate this immediately

Regards

Richard

Samridhi · ‎2020-05-04

Hi Herb_Appirio

Any new insight from your testing ?I have the same problem

Toracleinput ->tmssqloutput

But this is very slow

Slow SQL Server updates

Talend Data Integration

v6.x