topic Re: Fragmentation PK on UUID for SQL Server in Talend Data Catalog

Fragmentation PK on UUID for SQL Server

Anonymous — Sat, 16 Nov 2024 10:42:11 GMT

Hi everybody.
in a customer MDM project, we set Talend MDM on SQL Server 2008 R2 with talend 5.6.1.
After using autoIncrement, we decide to move on UUID generated in the job. So field's type for PrimaryKey are nvarchar(255).
I encounter the following problem: fragmentation are too important on each PK (99%)
the first big load records around 15 millions data in each entities.
weekly charges will records 200 000 lines.
I know that we can launch a reorganization but i should be too much 1 per week. . .

i try to change type in one entity setting in uniqueidentifier but same result

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Fri, 08 Apr 2016 13:29:10 GMT

Hello,
the right datatype for UUID is either char(36) (this is a UUID: 0862164E-235D-4DD3-9964-905F471EFD06) or a databaseinternal type the dbms vendor recommends for this.
if you have problems with a corrupted PK i think you are in the wrong Forum, you should ask in a SQL-Server Forum

read here: stackoverflow.com/questions/4503501/microsoft-sql-server-2008-99-fragmentation-on-non-clustered-non-unique-index
or here: www.sqlskills.com/blogs/paul/can-guid-cluster-keys-cause-non-clustered-index-fragmentation/

anyway, AFAIK you first load data to a table, then create the index.

regards
dj

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Tue, 12 Apr 2016 08:57:06 GMT

Hi,
maybe i have to try char(36).
but Sorry, i think you don't understand my problem.
we launched the MDM project with SQL Server recommanded by Talend itself.
we were in autoIncrement (best practise from my customer) but in MDM, autoincrement is managed by an unique table, so to add data in parallel, we had bottleneck.
So to try with UUID, we change all PK type and now it's much better.
The last and not the least is when we generate UUID to inject, data is directly fragmented.
I have no problem with my PK, just bad performance due to fragmentation.
Maybe you have right, it's a SQL Server topic, but it is a typical case to use MDM with SQL server when we want to use UUID.
the goal is to find the best type or to organize UUID.
in the futur we will inject 100 000 new data a week, we can't sort UUID before injection or reorganize index every day. . . too heavy

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Tue, 12 Apr 2016 10:53:01 GMT

you can forget immediatly type char(36).
too much time too load datas and SQL server doesn't like it fot a PK type.
Moreover fragmentation stay the same.

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Tue, 12 Apr 2016 12:51:00 GMT

please Post the create table Statement with the pk ceration here

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Tue, 12 Apr 2016 15:31:07 GMT

why not but i don't understand the interest, Talend MDM generate itself the model and so the table.
you just have to choose string in the MDM model and it creates the table in SQL Server
by default: nvarchar(255) CLUSTERED for a pk in string

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Wed, 13 Apr 2016 18:02:45 GMT

First: i dont run MDM, so if you ask for help ...

Second: UUID is random data, randomdata is not good ( IMHO the worst) for clustered indicies with heavy upload.
A Clustered index is ONLY good when you append data, otherwise the leafs of the table are reorged all the time.
Change this, if it is MDM internal open a case with talend.

Re: Fragmentation PK on UUID for SQL Server

Anonymous — Wed, 20 Apr 2016 09:29:30 GMT

Hi,
one more information, Talend MDM generate itself PK named "x_talend_id" when a sub table (complexetype in an entity) is created.
this x_talend_id is by default in UUID and it's amazing, this UUID is sorted so no fragmentation!

To conclude i know that there is a solution known by Talend. . . i try to open an issue