8 Replies Latest reply: Aug 22, 2017 11:40 AM by Eduardo DImperio RSS

    Remove duplicate data

    Dominique Sitter

      Hi everyone,

      I am a new user of Qliksense and I have a little problem with my data.

      I have a first database with material associate to vendors, but the problem is that I do not have a vendor associate to each material. SO I decided to add a second datebase, for each dimension of material I have a vendor associated. I take the dimension of the material to find the associate vendor.

      The problem is that when I have a different vendor between the first Database et the second, I want to keep only the vendor of the first database. So yet I have duplicate values for the same material.

      Mat_Dim is my first datebase and Dimension_Lancement for my second database.

       

      My script:

       

      Mat_Dim:

      LOAD distinct

          MTRL_NBR AS "Material #",

          MTRL_LENGTH As "Longueur",

          MTRL_WIDTH AS "Largeur",

          MTRL_HEIGHT AS "Hauteur",

          IF(round(MTRL_LENGTH)=MTRL_LENGTH,Num(MTRL_LENGTH,'####0','.'),MTRL_LENGTH)

            & 'x' & IF(round(MTRL_WIDTH)=MTRL_WIDTH,Num(MTRL_WIDTH,'###0','.'),MTRL_WIDTH)

            & IF( IsNull(MTRL_HEIGHT)

                 ,''

                 ,'x' & IF(round(MTRL_HEIGHT)=MTRL_HEIGHT, Num(MTRL_HEIGHT,'###0','.'),MTRL_HEIGHT)) AS "Dimensions",

          MTRL_BUNDLE_PKG AS "Bundle Pkg",

          MTRL_PKG_PALLET AS "Pkg Pallet"

      FROM [lib://Extractions/Dimensions et Palétisation.xlsx]

      (ooxml, embedded labels, table is [Dim&Pal]);

       

      Dimensions_Lancement :
      LOAD Distinct
          Mat_root,
          Fournisseur_ID_Lancement,
          Vendor_Name_Lancement,
          Dimensions_Lancement
      FROM [lib://Documents/Mes fichiers reçus\Dimensions.xlsx]
      (ooxml, embedded labels, table is Feuil1);

      Mat_root:
      Load Distinct "Material #"
      ,left("Material #",2) AS Mat_root
          ,Dimensions
      Resident Mat_Dim;
      Join Load Distinct
      Mat_root
      ,Dimensions_Lancement AS Dimensions
      ,Fournisseur_ID_Lancement AS "Vendor ID"
      Resident Dimensions_Lancement;

      Concatenate (Mat_Vend)
      Load "Material #"
      ,"Vendor ID"
      Resident Mat_root;

       

      Thank you for your time and help !

      Dominique

        • Re: Remove duplicate data
          Felip Drechsler

          Hi Dominique,

           

          Without the data it's quite difficult to analyse it.

          My guess would be that instead of using only join (which would be outter join and keeping both entries from both tables) use left join here:

           

          // Instead of:

          Join Load Distinct

           

          // use:

          left Join Load Distinct


          See if it helps with your problem.


          Felipe.

            • Re: Remove duplicate data
              Dominique Sitter

              Hi Felip,

              Thanks for your help,

              I tried to do this but it does not works.

              My first database is:

              Material            Vendor ID      Dimension  

              Gk...                  133003              55x32

               

              My second database is like:

              Dimension      Vendor ID

              55x32              133003

               

              If I have not the Vendoir ID in the first I take the data in the second database. But yet I have duplicate values.

            • Re: Remove duplicate data
              Eduardo DImperio

              Hi,

               

              To complete Felip's answer, try not to use Concatenate, this could generate duplicate values.

               

              Join Load Distinct
              Mat_root
              ,Dimensions_Lancement AS Dimensions
              ,Fournisseur_ID_Lancement AS "Vendor ID"
              Resident Dimensions_Lancement;

              Concatenate (Mat_Vend)
              Load "Material #"
              ,"Vendor ID"
              Resident Mat_root;

               

              ***********************************************************

               

              Left Join(something)

              Load

              Distinct
              Mat_root
              ,Dimensions_Lancement AS Dimensions
              ,Fournisseur_ID_Lancement AS "Vendor ID"
              Resident Dimensions_Lancement;

               

              Left Join (Mat_Vend)
              Load

              "Material #"
              ,"Vendor ID"
              Resident Mat_root;

              • Re: Remove duplicate data
                Andrea Gigliotti

                or you can use:

                Left Keep Load Distinct

                ...

                ...


                to keep both tables with only the vendor of the first database.

                • Re: Remove duplicate data
                  Michael Solomovich

                  I'm ignoring here your script (most of it is not relevant to the problem description), and answering the question in a more generic way.

                  Say, I have Database1 with material and vendor, and some vendors are missing.

                  Next, there is a Database2 with all material and vendors.

                  If there is vendor in the Database1, I want to keep it, otherwise I want to use a vendor from the Database2.

                  Here is what I'd do:

                   

                  // Mapping table based on the Database2

                  MaterialVendorMap:

                  MAPPING LOAD DISTINCT

                       Material,

                       Vendor

                  FROM <Database2>;

                   

                  // Loading Database 1, replacing empty vendor using map

                  DB1:

                  LOAD

                  Material,

                       if(len(trim(Vendor))>0, Vendor,

                            applymap('MaterialVendorMap',Material), 'Unknown') as Vendor

                  FROM <Database1>;