8 Replies Latest reply: Aug 20, 2009 4:27 PM by Shumail Hussain RSS

    Data Cleansing

    Shumail Hussain

      Hi,

      Is there any way to perform data cleansing on QlikView, Below attached is my data for your reference,

      For Example i want to replace "ADNOC XXXX/XX" with 'ADNOC' only. there are large number of garbage in data and i cannot perform cleansing without doing it mannualy.

      The solution i have found is to daily download the data and identify new garbage cases and update with the valid value.

       

      Regards,
      Shumail Hussain

        • Data Cleansing
          Shumail Hussain

          Moreover, If you look into the data I am exactly looking for some intelligent solution, which automatically select valid value without doing this exercise manually. Previously i was doing it manually and perform cleansing using excel where I have developed customized formulas i.e. put valid values in front of garbage field column and utilize it on the main sheet using vlookup.

          Regards,
          Shumail Hussain

           

            • Data Cleansing
              Oleg Troyansky

              You can use mapping functionality for data cleansing, however, there is no "automatic" algorithm for data cleansing. Once the name is mapped, it will stay mapped for the following reloads.

              cheers,

              Oleg

               

                • Data Cleansing
                  John Witherspoon

                  If I understood your specific example, here's one solution:

                  if(left("Merchant Name",5)='ADNOC','ADNOC',"Merchant Name") as "Merchant Name"

                    • Data Cleansing
                      Shumail Hussain

                      As there are numerious values, I think I have to go for manual option just like create a new table of Merchants with garbage field and updated field link it with the table.

                      Regards,
                      Shumail Hussain

                        • Data Cleansing

                          hi Shumail ..

                           

                          if you trying to create a listbox . inside field list box select expression .............type following code

                          it gives you dimension with clean data

                          =if(Len(<column name>)='4',<column name>)

                          this really defn will reduce manually work :)

                            • Data Cleansing
                              Shumail Hussain

                              Hi Archanah,

                              Length of the value's varies so, i can't specific to any number of length like ADNOC, Carrifor etc...

                              What i have identified is just like in Excel I can do it through customize formula using VlookUp in excel OR using Join by creating new table for Garbbage values so, the correct values can only be obtain through manual data cleansing.

                              Regards,
                              Shumail Hussain

                    • Data Cleansing
                      Christian_Henke

                      This may help as a starting point as long as you have no similarities in the beginning of the names like ADNOC and ADNOCIS...

                      CleanNames:
                      LOAD * INLINE [
                      CleanName,
                      ADNOC,
                      TRY2,
                      ];

                      Data:
                      LOAD * INLINE [
                      Crap, CrapValue
                      ADNOC-xr, 1
                      ADNOC vm, 2
                      ADNOC_xy, 3
                      TRY2 crap, 4
                      TRY2_crap, 5
                      TRY2-crap, 6
                      ];

                      LET NoOfCleanRows = NoOfRows('CleanNames');

                      FOR i = 0 TO NoOfCleanRows - 1
                      CrapNames:
                      LOAD
                      left(Crap,len(peek('CleanName',$(i),'CleanNames'))) as CrapNameCleaned,
                      CrapValue
                      RESIDENT Data
                      WHERE left(Crap,len(peek('CleanName',$(i),'CleanNames'))) = peek('CleanName',$(i),'CleanNames');
                      ;
                      NEXT i

                        • Data Cleansing
                          Shumail Hussain

                          Thanks Christian,

                          Your code would support me in this, I need to create a cleanname table with all dependencies and use the for loop to generate the final merchant table.but manual dependency is still thier because of human error's which cant be rectified without manual update.

                          Regards,
                          Shumail Hussain