Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
TomG1
Creator
Creator

Numeric Sequence Generation function giving duplicate numbers

Hi,

 

I have created a bigdata spark job.

I am reading rows from a file.

Basically I am generating an id for each record in the file in tmap using the following code.

Numeric.sequence("IDGen", 1000000, 1)

I checked the file and found duplicate IDs generated.

 

Why is this happening ?

 

Please note that this is a bigdata spark job and i am running this job in a spark cluster.

 

Is there a workaround for this issue?.

 

Thanks 

Labels (3)
22 Replies
Anonymous
Not applicable

hi,
I already solved this. This can be duplicated because you execute it within big data batch (you have more that 1 executor to execute the job). Try to set the executor to be 1 within spark configuration menu.
Anonymous
Not applicable

Yes this agood solution to generate sequence on talend spark just add this configuration  on spark 2.2

sparkConf.set("spark.sql.crossJoin.enabled", "true")

-1st step "select  max(A.XX_ID) as max_id  from XXXXXX A" 

-2 step 

select (row_number() over(PARTITION BY D.xxxxxx... ORDER BY D.xxx )) + R.max_id as id_max , D.xxx,  D.xx,  D.xx,   D.descr_,........."

 

You can forgot Numeric.sequence("max_id_seq",Var.var1,1)  doesn't work !on spark !!!!

 

thx Malek

Anonymous
Not applicable

this is exemple, thx

 


sparkseq.PNG