Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in Bucharest on Sept 18th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
TomG1
Creator
Creator

Numeric Sequence Generation function giving duplicate numbers

Hi,

 

I have created a bigdata spark job.

I am reading rows from a file.

Basically I am generating an id for each record in the file in tmap using the following code.

Numeric.sequence("IDGen", 1000000, 1)

I checked the file and found duplicate IDs generated.

 

Why is this happening ?

 

Please note that this is a bigdata spark job and i am running this job in a spark cluster.

 

Is there a workaround for this issue?.

 

Thanks 

Labels (3)
22 Replies
Anonymous
Not applicable

hi,
I already solved this. This can be duplicated because you execute it within big data batch (you have more that 1 executor to execute the job). Try to set the executor to be 1 within spark configuration menu.
Anonymous
Not applicable

Yes this agood solution to generate sequence on talend spark just add this configuration  on spark 2.2

sparkConf.set("spark.sql.crossJoin.enabled", "true")

-1st step "select  max(A.XX_ID) as max_id  from XXXXXX A" 

-2 step 

select (row_number() over(PARTITION BY D.xxxxxx... ORDER BY D.xxx )) + R.max_id as id_max , D.xxx,  D.xx,  D.xx,   D.descr_,........."

 

You can forgot Numeric.sequence("max_id_seq",Var.var1,1)  doesn't work !on spark !!!!

 

thx Malek

Anonymous
Not applicable

this is exemple, thx

 


sparkseq.PNG