Skip to main content
Announcements
Introducing Qlik Answers: A plug-and-play, Generative AI powered RAG solution. READ ALL ABOUT IT!
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Compare count of particular strings between two files

Hi ,

I need to compare the count particular data's in two different file in my job

Eg : File 1
A 2
B 5

File 2
MM23 A1996
MM23 A1997
MM23 B1998
MM23 B1999
MM23 B2000

Here what I have to check:
Since in File 1 - A 2 is there , count of records in File 2 starting with A should be two , similarly for B there should be 5 records in File 2. If not then the job should exit.

Labels (2)
2 Replies
fdenis
Master
Master

ok and where is your problem?
Anonymous
Not applicable
Author

I have knocked up a quick example of how you can do this. My assumption is that you might have just A and B in File 1, but also might have A through to Z. So this needs to be dynamic.

 

This screenshot shows the layout of the job. I am using tFixedFlowInput components to emulate your files and am simply splitting the output at the end to be "Pass" or "Fail". You can figure out what you want to do with the result.

 

0683p000009M5Rs.png

 

I will describe the flow with screenshots. First I'll look at the tMap_1. This is where the count of words starting with letters takes place....

0683p000009M5S2.png

Your data file is "row1" and your letters file is "row2". These inputs are not joined. The letter count is carried out by the following code in the tMap variables.....

0683p000009M5S7.png

row1.Word.indexOf(row2.LeadLetter)==0 ? routines.Numeric.sequence(row2.LeadLetter, 1, 1) : -1 

This code uses the Numeric.sequence method to count each instance of lead letter and returns the current count with each row. If the lead letter does not exist in the Word record, then -1 is returned. This is used for filtering. Each row from your data file will be processed for each letter in your "File 1". So if your data file has 10 rows and your "File 1" has 3 rows, the code above will be fired 30 times. The data returned by the tMap will be filtered using the LetterCount value. If it is anything but -1, the row will be used. This gets rid of the row multiplication described above.

 

After the tMap_1, the tAggregateRow_1 is used to return the last row for each LeadLetter. This will give the max count for each letter. This is shown below...

0683p000009M5Jz.png

The LetterCount is the max count of the letter and the RowCount holds the number of instances expected for that letter. These numbers are compared in the next component, the tMap_2.

 

0683p000009M5SC.png

The flow of this is quite simple. It simply compares the LetterCount and RowCount and if they match, the data leaves via the "Pass" output. If they do not match the data leaves via the "Fail" output. This way you can tell precisely which count failed.