Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Combining Data During Load

Hello,

QlikView Novice here seeking some guidance.  I am currently creating a dashboard that is meant to make it easy for people to find potential duplicate invoices.  Originally, I thought that I would create a field during the data load that just concatenates the fields I need: Vendor Number, and Invoice Amount. Then I could just look for where that new field is the same.  I quickly realized that these fields reside in different tables. Is there an easy way for me concatenate 2 fields from 2 different tables into a new field during load? 

It may also be that my initial approach is flawed and I need to consider another way of finding potential duplicates.  The criteria for a duplicate is same invoice amount and same vendor with a similar invoice number (which I don't think QlikView has a way of finding, so I planned to leave this third criteria to human eyes scanning the dashboard)

Any guidance is appreciated.  I realize that the question is a little bit vague, but I'm pretty stuck on how I should make this. 

Thanks in advance,

Steve

1 Solution

Accepted Solutions
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

As a simple starter approach, you can create a Straight Table with a calculated Dimension of "Vendor & Amount" and an expression like "count(InvoiceNumber)". Those with 2 or more are potential duplicates. You can further refine this with Set Analysis to show only those where Count(InvoiceNumber)>1.

You could do something similar in script. It's hard to give specific script without seeing your data model. Can you post a screenshot of your table viewer?

-Rob

View solution in original post

4 Replies
swuehl
MVP
MVP

As a first step, you could create a scatter chart with amount of invoice and invoice number on the axes. then similar amounts and numbers should be clustered. you can filter your chart by selecting your vendors or use vendor as dimension.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

As a simple starter approach, you can create a Straight Table with a calculated Dimension of "Vendor & Amount" and an expression like "count(InvoiceNumber)". Those with 2 or more are potential duplicates. You can further refine this with Set Analysis to show only those where Count(InvoiceNumber)>1.

You could do something similar in script. It's hard to give specific script without seeing your data model. Can you post a screenshot of your table viewer?

-Rob

Not applicable
Author

I appreciate both your quick replies and will check into what you said.  Rob, below is my data model and the fields I mentioned are highlighted - what's tricky about the invoice number is there will never be an exact duplicate since the system doesn't let invoice numbers be duplicated, but users get around that by changing one character or appending a character on the end.  I don't think that Qlikview has any type of fuzzy searching capability like that, which is why I'm leaving that piece out.

  Data Model.jpg

swuehl
MVP
MVP

If you want to search the invoices with exact same amount and same Vendor, I think Rob's solution should work perfectly fine to identify the candidates.

You would then need to inspect the invoice numbers or other attributes to check if they are real duplicates.

I have somehow misread your request in a way that amounts could also slightly differ, hence I suggested above more graphical approach. To get the invoice number on the scatter graph axis, it needs to be a number (or you may use another appropriate attribute, like the invoice timestamp). As said, I think Rob's solution is better if you are searching same amounts.

If you are interested in string distance functions to compare the invoice numbers, you can start with

String metric - Wikipedia, the free encyclopedia

AFAIR, there are some threads coping with the Levenshtein algorithm,

Levenshtein distance - Wikipedia, the free encyclopedia

e.g. here

Levenshtein Distance VBScript

If users only append characters, there might be easier solutions (match the smaller string with the beginning of larger string).