Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have created a bigdata spark job.
I am reading rows from a file.
Basically I am generating an id for each record in the file in tmap using the following code.
Numeric.sequence("IDGen", 1000000, 1)
I checked the file and found duplicate IDs generated.
Why is this happening ?
Please note that this is a bigdata spark job and i am running this job in a spark cluster.
Is there a workaround for this issue?.
Thanks
Yes this agood solution to generate sequence on talend spark just add this configuration on spark 2.2
sparkConf.set("spark.sql.crossJoin.enabled", "true")
-1st step "select max(A.XX_ID) as max_id from XXXXXX A"
-2 step
select (row_number() over(PARTITION BY D.xxxxxx... ORDER BY D.xxx )) + R.max_id as id_max , D.xxx, D.xx, D.xx, D.descr_,........."
You can forgot Numeric.sequence("max_id_seq",Var.var1,1) doesn't work !on spark !!!!
thx Malek