How to iterate on tFileInputFullRow rows?

Lorenzo5 · ‎2017-06-01

Hi,

I have a tFileInputFullRow followed by a tHashOutput component.

If they are linked by a row link, then tHashOutput component makes available their hash when all the rows have been read.

I need an iterations on the rows, in the way that the tHashOutput component makes available data row by row.

I have tried in this way:

tFileInputFullRow ---iterate---> tIterateToFlow ---row---> tHashOutput

but I'm not able to configure the tIterateToFlow, because I don't know which global variable contains the row, row by row, from tFileInputFullRow component.

Can you help, please?

Regards,

Lorenzo

Lorenzo5 · ‎2017-06-06

Well,

I took a look to tJavaFlex, and as far as I understand you suggest me to put the rows (all together) in the GlobalMap, and then loop on them after, wherever I want.

Is it right?

Please pardon me if I insist, I still don't believe that what I'm looking for is so hard. Let's simplify, and let's say:

tFileInputFullRow --- .... ---> tPostgresqlOutput

With (mandatory) the two components above need to be in two different subjobs.

Question: what I have to put in place of "..." to insert (into DB) per row, I mean to insert row by row?

Anonymous · ‎2017-06-06

@lorenzolucioni wrote:

Please pardon me if I insist, I still don't believe that what I'm looking for is so hard. Let's simplify, and let's say:

tFileInputFullRow --- .... ---> tPostgresqlOutput

With (mandatory) the two components above need to be in two different subjobs.

Question: what I have to put in place of "..." to insert (into DB) per row, I mean to insert row by row?

But you say it is not that simple. I feel like the requirements keep changing. Can you give an data flow example of exactly why the data from tFileInputFullRow must be iterated through instead of being processed on a row link? My assumption was that it is due to a complex timing requirement (hence my convoluted suggestions), but it appears you think it should be simpler.

What exactly do you need to do to the data one row at a time in-between reading it and writing?

Lorenzo5 · ‎2017-06-06

My Job need to:

- read rows from a text file

- parse (regex) row by row, extracting fields: some of them need to be inserted separately, in specific tables, getting back the ID of the row; some other of them need to be elaborated (all logic is already in place) to generate different fields

- IDs and new generated fields (from the previous step) need to be collected, all together, joined (I already have an unique key that comes from the original row, and that let me to join all the data later) and the resulting row have to be inserted.

All of above need to be made row by row, for lookup reason, and also because I would have the flow more flexible as possible, to add (in future) other stuff that could require a row-by-row flow.

So, my design is:

read file --...--> thashoutput (it should row-by-row) --OnComponentOK--> tHashInput_1 (row-by-row) ----> a lot of elaborations / lookup inserting ----> tHashOuput (one for each ID or generated field

and, from tHashInput_1 --OnSubjobOk--> here I collect all tHashInput for single IDs and Fields, join them, and insert into DB

So, yes, if you help me in get a real row-by-row data passing between two different subjobs, it would be great (and enough, I think).

Thank you for your help.

Lorenzo5 · ‎2017-06-06

Here is an extract of my Job.

As you can see, "ok" on OnComponentOK (output link of tHashOutput) comes just after all (in this case 2) rows are read.

That means tHashInput will receive ALL rows at the same time, and it does not permit me a "per row" elaboration.

So, the question is simple, I think:

How to pass row by row toward to tHashOuput component, from a tFileInputFullRow?

Of course, tFlowToIterate and fIterateToFlow are already "data-linked" thanks to a global variable defined in the first one, and mapped in output by the second one.

Anonymous · ‎2017-06-06

OK, the tHash components will not work in that way. They must be completely read from or written to before the "OnComponentOK link will work. That is the same with all components using the OnComponentOK link. You *could* use a tFlowToIterate after the tHashOutput (the data will flow through this) and then you can iterate over the next subjob using a tFixedFlowInput.

Lorenzo5 · ‎2017-06-06

It does not work.

tHashInput read twice (in case of two records) the hash filled by tHashOutput, and it is fine, but the first time finds the first record and the second time finds the first and the second records together.

I could try to read always the last record only, but... how? and when we are handling thousands of records, the process could take too much effort?

Anonymous · ‎2017-06-06

You shouldn't need the tFlowToIterate or tIterateToFlow at the beginning. You also shouldn't need the tSleeps (unless you want to slow it down for some reason).

Do not read from the tHashInput (in the second subjob). Read from the tFixedFlowInput (after setting the column values using the globalMap variables your tFlowToIterate will create). The tHashOutput in the first subjob I told you to leave in there so that you could keep an in-memory store of your initial data. But it will have nothing to do with the second subjob at all. You have been trying to use the tHash components in a way that they are not meant to work. They will not let you read from them while you are writing to them in the way that you want.

Lorenzo5 · ‎2017-06-06

Thanks, so it is almost confusing.

I don't think to be able to follow your "attempts", so far.

If you are able to replicate the job yourself, and be sure it works, and send a screenshot or a clear explanation about which components to use and where and how to link them, well, it will be appreciate. Otherwise it is just a waste of time.

(of course tSleep components are only aimed to visually see statistics and be able to understand if rows are handled one by one or not)

Regards,

Lorenzo

cterenzi · ‎2017-06-06

Can you read in the file, parse out the values you need to insert into separate tables, then take a second pass on the file and pull in the new IDs as lookup values from the database?

The tasks you're describing are abstract, but don't sound like they require Talend gymnastics. You may not be able to do everything for each row one at a time, but you can probably do everything you need to do for the entire file.

Lorenzo5 · ‎2017-06-06

I already have stored procedure that insert a new value (if needed, because sometimes the already exist, since they are lookup) and give back the ID.
So no need to perform to passes on the file.

I'm not asking for a full design of my job, I just would you help in find a way to have a 'by row' execution of a second subjob for rows read from a file, with separation (two/more subtasks) between file reading and data elaborating cannot believe it is not possible.

Talend Data Integration