Need help to retrieve first matched rows for mult... - Qlik Community

Anonymous · ‎2020-01-30

I have 100 files and each contains something like below (there are thousands of rows on one file)
There is no guarantee that both "A" rows and "B" rows exist -- but in most cases we should have "A" rows, we may not have "B" rows sometimes.

20200129 900102 A

20200129 000103 A

20200129 000105 A

20200129 008202 B

20200129 009302 B

20200129 010345 B
20200129 010111 C

20200129 010222 D
...
How could I get the first "A" row and the first "B" row?
The expected result from above file should be:
20200129 900102 A
20200129 008202 B
(They are retrieved only because one is the first row with "A" and one is the first row with "B")
I will need to get them at almost the same time, because I have a logic :
if there is no "A" row, use the first "B" row's value, otherwise get "A" row's value.

If I ONLY need to process one file, I know how to do it: just using tFixedFlowInput to pass value
"A" and "B" into tMap as main, and use the file as lookup, use "A" , "B" value to do the join(match), First Match, Inner join, I should be able to get the first "A" row and/or the first "B" row. But since I have 100 files that I want to process in this same way, I should not pass the hardcoded value "A" and "B" as "main".

Could someone please help me out?

Thanks!

TRF · ‎2020-01-30

What if you use this within a tJavaRow :
output_row.line = input_row.line;
input_row.line.contains("A") ? ((String)globalMap.get("A")) == null ? globalMap.put("A", input_row.line) : ((String)globalMap.get("A")) :
((String)globalMap.get("B")) == null ? globalMap.put("B", input_row.line) : (String)globalMap.get("B");

At the end, bot A and B global variables should contain what you want.

Anonymous · ‎2020-01-30

Thank you for the quick feedback!

I don't see how your code would get the FIRST "A" row or the FIRST "B" row.
Did I miss something?

TRF · ‎2020-01-30

This piece of code set a global variable "A" with the first line which contain A.
It also set a global variable "B" with the first line which contain B.
If you want the first A or B, just use the same variable for both.
After tJavaRow you can use a tFixedFlowInput to start a new flow with these variables to populate the desired fields.

Anonymous · ‎2020-01-30

Thanks!

I think I have figured out how to accomplish this with a bunch of components
But I will try your code tomorrow.
I still have a little difficulty understanding how the code would give the First "A" and "B".
I don't have much knowledge of Java language and am not too familiar with tJavaRow either -- the only times that I used tJavaRow was to create global variables when there is only ONE row in the input flow. But for my files, there would be thousands of the rows in the input flow, how it only has one output?

Thanks!

TRF · ‎2020-01-30

Here is the trick:

input_row.line.contains("A") ? ((String)globalMap.get("A")) == null ? globalMap.put("A", input_row.line) : ((String)globalMap.get("A"))

Read it like this:

if current line contains A
  if global variable A is null put current line to A else (because it is mandatory) put current value of A to A

Got it?

akumar2301 · ‎2020-01-30

Hello did you try taggregaterow with First as aggregate function

Anonymous · ‎2020-01-30

I used tSampleRow and Range value "1" to get the first row.

Now I realize tAggregateRow-> First probably would be easier for other developers to understand for future support.

But I don't know from performance point, which one would be better, tSampleRow vs tAggregateRow?

akumar2301 · ‎2020-01-30

I am not sure how did you get correct result based on tSampleRow but i your logic works

tSampleRow will give better perormance.

Anonymous · ‎2020-01-30

Here is my design:

When reading the files, I used the Stringhandling function to get the "A" or "B" value into a Row_Code field. And then inside tMap, I join Row_Code with Code ( hardcoded as "A" and "B" in tFixedFlowInput).

My real logic should be: get the First A row or First B row, which ever is fine.
Because of the "Inner Join" I used in tMap, & tSampleRow (range "1"), I was able to accomplish that.

I am still doing the test, but it seems to me, the job would not fail if the file doesn't have A rows or B rows at all.

Need help to retrieve first matched rows for multiple filter

Administration

Talend Data Integration

v7.x