Re: Compare count of particular strings between tw... - Qlik Community

Anonymous · ‎2019-06-04

Hi ,

I need to compare the count particular data's in two different file in my job

Eg : File 1
A 2
B 5

File 2
MM23 A1996
MM23 A1997
MM23 B1998
MM23 B1999
MM23 B2000

Here what I have to check:
Since in File 1 - A 2 is there , count of records in File 2 starting with A should be two , similarly for B there should be 5 records in File 2. If not then the job should exit.

fdenis · ‎2019-06-05

ok and where is your problem?

Anonymous · ‎2019-06-05

I have knocked up a quick example of how you can do this. My assumption is that you might have just A and B in File 1, but also might have A through to Z. So this needs to be dynamic.

This screenshot shows the layout of the job. I am using tFixedFlowInput components to emulate your files and am simply splitting the output at the end to be "Pass" or "Fail". You can figure out what you want to do with the result.

I will describe the flow with screenshots. First I'll look at the tMap_1. This is where the count of words starting with letters takes place....

Your data file is "row1" and your letters file is "row2". These inputs are not joined. The letter count is carried out by the following code in the tMap variables.....

row1.Word.indexOf(row2.LeadLetter)==0 ? routines.Numeric.sequence(row2.LeadLetter, 1, 1) : -1

This code uses the Numeric.sequence method to count each instance of lead letter and returns the current count with each row. If the lead letter does not exist in the Word record, then -1 is returned. This is used for filtering. Each row from your data file will be processed for each letter in your "File 1". So if your data file has 10 rows and your "File 1" has 3 rows, the code above will be fired 30 times. The data returned by the tMap will be filtered using the LetterCount value. If it is anything but -1, the row will be used. This gets rid of the row multiplication described above.

After the tMap_1, the tAggregateRow_1 is used to return the last row for each LeadLetter. This will give the max count for each letter. This is shown below...

The LetterCount is the max count of the letter and the RowCount holds the number of instances expected for that letter. These numbers are compared in the next component, the tMap_2.

The flow of this is quite simple. It simply compares the LetterCount and RowCount and if they match, the data leaves via the "Pass" output. If they do not match the data leaves via the "Fail" output. This way you can tell precisely which count failed.

Compare count of particular strings between two files

Talend Data Integration

v7.x