Re: How to iterate on tFileInputFullRow rows? - Qlik Community

Lorenzo5 · ‎2017-06-01

Hi,

I have a tFileInputFullRow followed by a tHashOutput component.

If they are linked by a row link, then tHashOutput component makes available their hash when all the rows have been read.

I need an iterations on the rows, in the way that the tHashOutput component makes available data row by row.

I have tried in this way:

tFileInputFullRow ---iterate---> tIterateToFlow ---row---> tHashOutput

but I'm not able to configure the tIterateToFlow, because I don't know which global variable contains the row, row by row, from tFileInputFullRow component.

Can you help, please?

Regards,

Lorenzo

Anonymous · ‎2017-06-01

This component will output your rows with one column containing the full row.The simple way of achieving this is to connect your tFileInputFullRow to a tFlowToIterate (via a row link) and then use the "iterate" link from there. The globalMap variable will have the key.....

globalMap.get("{row}.{column}")

Lorenzo5 · ‎2017-06-01

Ok, but my next step has to be a tHashOutput, which does accept only row input link.
And accessing via globalmap is not useful for tHashOuput purpose.
Should I put a tIterateToFlow after the tFlowToIterate, to be able to connect to tHashOutput?
It sounds weird.

Anonymous · ‎2017-06-01

Yes, use a tIterateToFlow to do this.

However, this raises the question as to why you are iterating at all. Why not just use a normal row link from the file to the tHashOutput? What does iterating gain you?

(I'll take another look at the original post once I have sent this..... @talend not being able to see all of the posts while responding is a bit of a pain).

Anonymous · ‎2017-06-01

OK, I have just looked at the original post. I am still struggling to see what iterating will gain you here. Why do you need access to the tHashOutput data inbetween rows arriving? What is your requirement here because there is possibly a better way of achieving this.

Lorenzo5 · ‎2017-06-01

I have to (regex) parse the rows, row by row, elaborate the parsing and insert the result, row by row.
My Job is, indeed, quite complex, and I'll look for a better and more simple design, later (not now).
I parse the row with some regexp in parallel (even not really parallel) with a tReplicate. I extract some fields from the row: some of them I have to insert to lookup tables, get the generated sequence ids (put them in more tHashOutput) and insert the main row (collecting them by tHashInput), together to other fields from the original row.
The "parallel" solution with tReplicate is the first idea to perform separate elaborations, field by field.
If I put some tRegexExtractFields (mentioned by memory) cascade (with a INSERT at the end of the chain), instead of parallel, it makes complicated to handle any field in different way (i.e. inserting in lookup tables, getting id and going ahead in the regex chain).

Anonymous · ‎2017-06-01

OK, I am not claiming to fully understand what you want from that description (and it does sound quite complex), but I *may* have a solution that will simplify it (but not necessarily make it any quicker). All of the complicated parallel logic you have described could be relocated to a child job. If you passed the row into the child job via a context variable, you could then carry out all of the logic you have described in that child job one row at a time. The child job could be connected to your tFile component with either the row link or via a tFlowToIterate component using the iterate link.

Lorenzo5 · ‎2017-06-06

Hi @rhall

if possible, I would prefer not to use a child job now.

I tryed solution you suggested me (FlowToIterate/IterateToFlow) but it does not works:

tFileInputFullRow --row1--> tFlowToIterate --iterate--> tIterateToFlow --row2--> tHashOutput --OnComponentOK--> ...

I have configured the tIterateToFlow to map (String)globalMap.get("row1.<key>") BUT the tHashOutput component waits for all rows, before putting data in output.

does it depend to the fact that I mapped "row1", I mean the output of a component (tFileInputFullRow) designed to supply the whole rows at the same time?

Should I, in some way, use the "iterate output link" of tFileInputFullRow?

it seems so strange, to me, that I'm not simply able to elaborate row-by-row, really iterating on them, instead of having the all together.

Hope you can help, without child-job.

Anonymous · ‎2017-06-06

The row link does process "row by row" but the iterate method allows you to complete whole subjobs per row. That is the key difference.

You are suffering from a timing issue here. I *think* you are relying on parallel processing within a subjob, which you simply won't get without hacking a solution together. It sounds like it would be much better to break this problem down into several subjobs, storing your intermediary steps in tHash components. However, I am not fully aware of the requirement, so maybe there is a good reason for approaching it like this.

However, to approach this problem and get free access to your data without a complete load (the issue you have with the tHash components) you might be able to get round this with a tJavaFlex and a HashMap and/or Arraylist. If you load your data into one of these data types and save them to the globalMap everyrow, you will have access to that data anywhere else in the same job immediately (allowing for timing constraints).

Lorenzo5 · ‎2017-06-06

So, before approaching to tJavaFlex (that I don't know yet), let me better understand and better explain my job.

You told that the row link does process "row by row" but the iterate method allows to complete whole subjobs per row.

It sounds to me like row and iterate links would be enough for me.

Maybe the "problem" is that tFileInputFullRow does not let subsequent components to elaborate row by row?

All (and only) what I need are just two subjobs: the firts to load row-by-row from file and the second one to elaborate data. And the connection between them should be tHashOuput -- OnComponentOK --> tHashInput.

This is because I need to setup a "OnSubjobOK" output link from the second subjob (to a third one) to let me complete elaborations and insert results into DB.

Well, the only way I found to connect two subjobs (file reading / bunch of elaborations row by row) and passing data from the first to the second, is tHashOutput / tHashInput.

But I'm not able to let the tHashOutput making its output available row by row.

Do I really need, then, tJavaFlex and a HashMap to send row-by-row data to a tHashOutput (and letting it to make its output available row-by-row)? "row link" and "iterate link" are not, somehow, enough?

Let me investigate tJavaFlex.

How to iterate on tFileInputFullRow rows?

Talend Data Integration