Re: Multiple issues with Talend 6.1.1 if amount of... - Page 2 - Qlik Community

Anonymous · ‎2016-04-20

I upgraded Talend 5.1.2 (r90681) to 6.1.1.20151214_1327. This due to the fact I needed to convert UTC to GMT and vice versa, which was not possible in 5.1.2.
But in this version I have multiple issues, all based on how much records which have to be processed.
In general all jobs work perfect, until the throughput is above a several hundreds, of several thousands of rows. Then I get errors on drivers, or in this case wrong sorting and garbage in .csv output.
Source files:
http:// www . filedropper . com / statussenentijdennaarchainware02
If component cwReturnJobLogs1 has this where clause:
WHERE ID > 161670059
and ID < 161760492
All goes smooth and I get the result I want, in the right order. (Result_1.csv)
But if the where clause is changed to:
WHERE ID > 161640059
and ID < 161763492
The result isn't in the correct order. (Result_2.csv)
If you look in the .csv files, you can search for 25050690. In the correct order, the second line says: LO START and later in the file LO GEREED. But in the second result file, you first see LO GEREED and then LO START.
The Where clause if normally dynamic. Every 15 minutes the job is started, retrieving the MAX_ID of the last session and processing all ID since then, saving the new MAX_ID in the database. So the amount of rows processed can be between 0 and thousands of rows.
In 5.1.2 we never had these issues.
If something is not clear, ask!

Anonymous · ‎2016-04-25

I just realized, I uploaded the work-around version. (with an itterate, so for every row a .csv file is generated)

I suspect that you issue is related to either this logic, or a similar type of issue.

Maybe in the original version, but not in this work-around version.
The output is complete and in the right order.

This is the faulty version, where one .csv file is generated with all rows. The output is not always complete and in a random order if the amount of rows is high.
StatussenEnTijdenNaarChainware2.zip.zip

Anonymous · ‎2016-04-25

Although I didn't see anything unusual, I do respect your feedback. Do I use the tflowToIterate in this way?:

Anonymous · ‎2016-04-25

OK, there are a couple of issues with this job that I have spotted. I will list them below....
1) You are trying to write out to the same file context.LogDir+"temp.csv" using multiple tFileOutputDelimited components. This is not safe, but may have worked due to luck. Try opening two versions of the same .txt file and add text to both. Then save one. Then save the other. Open the file and you will not have all of the text you have written. That is a simplified example of what I expect is happening.
2) You are relying on a combination of your SQL query data order AND (maybe you do not realise this) the order of preference for the tMAP outputs (seen below)

This is not ideal at all and seems to tally with what you are experiencing when you say as the rows go up, the ordering starts to get thrown out.
What I suggest is that you rebuild the job to output to tHashOutput components instead of your file. Link the tHashOutputs so they are saving to the same location. Then use a tHashInput followed by a tSortRow component to order your data as required (this may need a bit of massaging to get the order you wish). Then write the file once you have controlled the ordering.

Anonymous · ‎2016-04-25

The tFlowToIterate should go straight after the SQL component. If the row connector between the SQL component and the tFlowToIterate is called "row1" and you have a column that you want to use called "ID", then it will create a globalMap variable called "row1.ID". You then access this using the following code....

((Integer)globalMap.get("row1.ID"))

It needs to be cast to an Integer (assuming the type of ID is Integer) before it is used.
You do not need to the tJavaRow in this.

Anonymous · ‎2016-04-25

OK, there are a couple of issues with this job that I have spotted. I will list them below....
1) You are trying to write out to the same file context.LogDir+"temp.csv" using multiple tFileOutputDelimited components. This is not safe, but may have worked due to luck. Try opening two versions of the same .txt file and add text to both. Then save one. Then save the other. Open the file and you will not have all of the text you have written. That is a simplified example of what I expect is happening.

Okay, I understand this. Although I wonder what has changed since Talend 6, because in Talend 5, we never have experienced this in years.
So, something has changed since Talend 6, or we had tons of luck with Talend 5

The job itself is build around 6 years ago. I changed this job a few weeks ago, to include the tJavaRow_1 component, so I could convert UTC time to GMT time. This was not possible in Talend 5, that's the reason why I started using Talend 6.
(Don't fix things, if they ain't broke. Well this change, to Talend 6, broke some things)

2) You are relying on a combination of your SQL query data order AND (maybe you do not realise this) the order of preference for the tMAP outputs (seen below)
295954/mini_blob_20160425-0535.png
This is not ideal at all and seems to tally with what you are experiencing when you say as the rows go up, the ordering starts to get thrown out.

The tMap order is deliberately set in this order, so yes I did realize this.

What I suggest is that you rebuild the job to output to tHashOutput components instead of your file. Link the tHashOutputs so they are saving to the same location. Then use a tHashInput followed by a tSortRow component to order your data as required (this may need a bit of massaging to get the order you wish). Then write the file once you have controlled the ordering.

I know how to use the tHashOutput component, I use this component in several other jobs.

Anonymous · ‎2016-04-25

I have to say that since I wasn't able to actually run it, I had to make an educated guess as to what was likely to be causing your ordering issue. There may be another cause in there that I did not spot (....I have a day job so was only able to look at it briefly

). With regard to the tMap ordering, lots of people are not aware of this and it is just left as the order in which outputs connections are made. I didn't spend a great deal of time on the internals of the tMap, but did notice that the filtering there seemed to be driven by values computed by the tMap variables, using the values in the data. Are you sure that the data is correct?
I suspect that if you take control of your data ordering ordering (as suggested) that you will get rid of this issue. There will have been changes between v5 and v6. There are also quite substantial changes between java 7 and 8. Although the data was ordered by your input query, I find that it is not a good idea to depend on that order being maintained throughout jobs. So if data order is important, you should always ensure that you take steps to guarantee this.
Good luck with getting this sorted 🙂

Multiple issues with Talend 6.1.1 if amount of rows increases

Other

Talend Data Integration

v6.x