Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
TomG1
Creator
Creator

Numeric Sequence Generation function giving duplicate numbers

Hi,

 

I have created a bigdata spark job.

I am reading rows from a file.

Basically I am generating an id for each record in the file in tmap using the following code.

Numeric.sequence("IDGen", 1000000, 1)

I checked the file and found duplicate IDs generated.

 

Why is this happening ?

 

Please note that this is a bigdata spark job and i am running this job in a spark cluster.

 

Is there a workaround for this issue?.

 

Thanks 

Labels (3)
22 Replies
TomG1
Creator
Creator
Author

No . the other fields are not duplicated.  I just compared two rows whose IDs are same.

Is this a drawback of big data spark Job?

TRF
Champion II
Champion II

I've just tried the same job with 50000 records and get no duplicate records.

I use TDI 6.3.1

TomG1
Creator
Creator
Author

do you create a standard job or big data job?

we have to create a bigdata spark job and then run this job in spark cluster.

I am using talend big data platform 6.2

 

TRF
Champion II
Champion II

No, I just use standard job.
Maybe the problem is due to cluster mode.
I don't know very well how it works but if the job execution is distributed over many nodes, I suppose it should be a problem for sequence calculation which is an in memory operation (probably not shared between nodes).
cterenzi
Specialist
Specialist

Agreed, this is probably a threading issue, and the Numeric routines likely aren't threadsafe. You may have to create your own sequence.
Anonymous
Not applicable

Hi Tom,

first of all sorry for my English ...

happened also in a Talend Data Integration job, during a TMap row number assigner ...

it is very strange ... the job has running for many days ed worked many times ...

when counting 500 rows it produced 20 duplicated row numbers ...

 

What workaround you founded ?

 

 


Cattura.PNG
TomG1
Creator
Creator
Author

For me the error occurred in talend big data job.

I found a workaround using tsql row.

tsql row will be connected with two rows. First row will be the incoming records.

The other row will be containing the last generated max sequence.

in tSQLrow i used a row_number() to generate a unique number for each incoming row.

ex:query

 

select (row_number() over()) + row2.lastseq from row1,row2

TomG1
Creator
Creator
Author

if your talend is licensed and it is happening in data integration job, then raise a bug with talend

vimal_kumar
Contributor
Contributor

Hi Tom/All,

 

is there a way to generate a fixed length/Cyclic sequence in Talend like the way we can do in Oracle sequencer.

 

Example: Say, i use something like this to generate sequence and i want it to be always of size 8 in length.

Numeric.sequence("s1",1,1)

 

I want output as :

00000001

00000002

00000003

TomG1
Creator
Creator
Author

Hi,

 

You can generate sequence numbers using tsqlrow like I mentioned in the above reply.

then you can connect output of tsqlrow to tmap and append zeros to the left of the number using lpad() in java.