Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I have created a bigdata spark job.
I am reading rows from a file.
Basically I am generating an id for each record in the file in tmap using the following code.
Numeric.sequence("IDGen", 1000000, 1)
I checked the file and found duplicate IDs generated.
Why is this happening ?
Please note that this is a bigdata spark job and i am running this job in a spark cluster.
Is there a workaround for this issue?.
Thanks
No . the other fields are not duplicated. I just compared two rows whose IDs are same.
Is this a drawback of big data spark Job?
I've just tried the same job with 50000 records and get no duplicate records.
I use TDI 6.3.1
do you create a standard job or big data job?
we have to create a bigdata spark job and then run this job in spark cluster.
I am using talend big data platform 6.2
Hi Tom,
first of all sorry for my English ...
happened also in a Talend Data Integration job, during a TMap row number assigner ...
it is very strange ... the job has running for many days ed worked many times ...
when counting 500 rows it produced 20 duplicated row numbers ...
What workaround you founded ?
For me the error occurred in talend big data job.
I found a workaround using tsql row.
tsql row will be connected with two rows. First row will be the incoming records.
The other row will be containing the last generated max sequence.
in tSQLrow i used a row_number() to generate a unique number for each incoming row.
ex:query
select (row_number() over()) + row2.lastseq from row1,row2
if your talend is licensed and it is happening in data integration job, then raise a bug with talend
Hi Tom/All,
is there a way to generate a fixed length/Cyclic sequence in Talend like the way we can do in Oracle sequencer.
Example: Say, i use something like this to generate sequence and i want it to be always of size 8 in length.
Numeric.sequence("s1",1,1)
I want output as :
00000001
00000002
00000003
Hi,
You can generate sequence numbers using tsqlrow like I mentioned in the above reply.
then you can connect output of tsqlrow to tmap and append zeros to the left of the number using lpad() in java.