Numeric Sequence Generation function giving duplic... - Page 3 - Qlik Community

TomG1 · ‎2017-06-07

Hi,

I have created a bigdata spark job.

I am reading rows from a file.

Basically I am generating an id for each record in the file in tmap using the following code.

Numeric.sequence("IDGen", 1000000, 1)

I checked the file and found duplicate IDs generated.

Why is this happening ?

Please note that this is a bigdata spark job and i am running this job in a spark cluster.

Is there a workaround for this issue?.

Thanks

Anonymous · ‎2018-06-01

hi,
I already solved this. This can be duplicated because you execute it within big data batch (you have more that 1 executor to execute the job). Try to set the executor to be 1 within spark configuration menu.

Anonymous · ‎2018-09-18

Yes this agood solution to generate sequence on talend spark just add this configuration on spark 2.2

sparkConf.set("spark.sql.crossJoin.enabled", "true")

-1st step "select max(A.XX_ID) as max_id from XXXXXX A"

-2 step

select (row_number() over(PARTITION BY D.xxxxxx... ORDER BY D.xxx )) + R.max_id as id_max , D.xxx, D.xx, D.xx, D.descr_,........."

You can forgot Numeric.sequence("max_id_seq",Var.var1,1) doesn't work !on spark !!!!

thx Malek

Anonymous · ‎2018-09-18

this is exemple, thx

sparkseq.PNG

Numeric Sequence Generation function giving duplicate numbers

Big Data

Talend Data Integration

v6.x