5 Replies Latest reply: Feb 7, 2017 9:08 AM by Clint Ficula RSS

    Entity Matching

    Clint Ficula

      What is the best way using Qlikview to match entities across two tables?

       

      For example, I want to load a qvd file with all entities in a database (~500,000 records of names, emails, addresses) and then load a separate ad hoc file (20,000 records in excel, txt, etc.) and use Qlik to show me any matches across different types (Example: First, Last, Email or First, Last, DOB).

       

      This could be accomplished using joins and case statements, but in an ideal world, I would like the user to be able to select the type of match (First, Last, Email or First, Last, Street1 or First, Last, DOB) or create some sort of hierarchy that displays all matches along with the match type (and if a higher match type is found, ignore subsequent match types).

       

      The question is, what is the most efficient way to accomplish this in Qlik? Is there a way to use a user created variable (based on check boxes) that then dynamically matches or joins tables? I am planning to use replaces and functions to match data so that it looks similar in the load files, but I am stuck as to how to optimize matching.

        • Re: Entity Matching
          Colin Albert

          It may be worth looking at the match functions in QlikView - Match(), MixMatch() and WildMatch(), depending on whether you are looking for an exact match including case, a match ignoring case, or a partial match.

          The concat() function can build a list of values to match against, but running this across 500,000 rows with a match list of 20,000 values could need some significant hardware. I would create a variable that holds the list you are matching against.


          Try loading your entities into a 3 column table that has an ID field to identify the entity, an entity name column containing First, Last, DoB etc, and the third field containing the value. This table will become large - if you have 500,000 rows and 4 entities, then there are potentially 2M rows in this table.


          The result of the match function will be an integer showing which field in the list matches the entity value.


          I would suggest testing and developing on smaller datasets first! Then expand the rows of both the entities and match list in a controlled way to see the effect on speed and resources.

          • Re: Entity Matching
            Clint Ficula

            After some time working this out in SQL, I have a matching solution between two tables. However I cannot figure out how to replicate a count() over (partition by) in Qlik. My matching script is below and gives me matches based on various matching strings (FLEA - FirstLastEmailAddress). So my question is, how can you count the number of matches across two tables where the ID number is distinct? For example, if we match FL - First Last from list to Advance, how can we then display the number of counts for matching where a field in Advance in unique (in this case the Entity ID)?

             

            Matcher:

            Load

                lLISTID, lFirstName, lLastName, lEmail, lAddress,

                LFLEA,

                LFLE,

                LFL,

             

              alt(

              if(not(isnull(trim(lFirstName))) and not(isnull(trim(lLastName))) and not(isnull(trim(lEmail))) and not(isnull(trim(lAddress))),

              lookup('Entity Id', 'AFLEA', LFLEA,'Advance')),

              if(not(isnull(trim(lFirstName))) and not(isnull(trim(lLastName))) and not(isnull(trim(lEmail))),

              lookup('Entity Id', 'AFLE', LFLE,'Advance')),

              if(not(isnull(trim(lFirstName))) and not(isnull(trim(lLastName))),

              lookup('Entity Id', 'AFL', LFL,'Advance'))

              ) as IdMatch,

             

              if(

              (not(isnull(lAddress)) and not(isnull(lEmail)) and not(isnull(lookup('Entity Id', 'AFLEA', LFLEA,'Advance')))),

              'FLEA',

              if(

              (not(isnull(lEmail)) and not(isnull(lookup('Entity Id', 'AFLE', LFLE,'Advance')))),

              'FLE',

              if(

              (not(isnull(lFirstName&lLastName)) and not(isnull(lookup('Entity Id', 'AFL', LFL,'Advance')))),

              'FL',

             

              ))) as MatchType

             

            resident List;

              • Re: Entity Matching
                Colin Albert

                Create a chart with Match Type as the dimension and count(IdMatch) or count(id number) as the expression.

                You do not need to create the count in the script.

                  • Re: Entity Matching
                    Clint Ficula

                    Thanks for the reply. I did exactly this while I was awaiting response. Counted the number of matching strings in the larger dataset, grouped by matching string type, then I could simply use lookup from the matching table to pull in the ID number, type of lookup, and count for matches. Works as expected and allows me to nest additional lookup strings when needed.