12 Replies Latest reply: Sep 1, 2014 5:30 AM by Marco Wedel RSS

    Please help me get the string pattern of each field.

      Hi!  Please let me know the formula to get the string pattern of each field .

        • Re: Please help me get the string pattern of each field.
          Manish Kachhia

          What you exactly want?

          Can you provide some sample data ?

            • Re: Please help me get the string pattern of each field.

              It's part of the data profiling project Im working on. I want to determine the typical string pattern of each column. So that after getting the typical pattern, I can compare and check each value  as valid or not dependending on the pattern defined for that column.

              Ex.  Product_detail_code, Modified_date, etc..

               

              PURCHASE_DATEPRODUCT_DETAIL_CODE
              14-Nov-07Others
              Dec 201010128738
              26-Feb-0811197529
              14-Dec-0711438953
              18-Nov-0511425786
              11-Sep-1311208766
              30-Aug-1311209023
              2-Sep-1311208856
              23-Oct-1311208959
              10-Sep-1311209074
              12-Feb-1311208907
              8-Nov-1311198348
              Nov 201043005270
              9-Sep-1311170032
            • Re: Please help me get the string pattern of each field.
              Manish Kachhia

              Something Like Below?

               

              Load

              RowNo() as NO,

              PRODUCT_DETAIL_CODE,

              PURCHASE_DATE as Original_PURCHASE_DATE,

              Date(Alt(

                Date#(PURCHASE_DATE,'DD-MMM-YY'),

                Date#(PURCHASE_DATE,'MMM YYYY'),

                Date#(PURCHASE_DATE,'YYYYMMDD'),

                Date#(PURCHASE_DATE,'D-MM-YY'),

              ),'DD-MM-YYYY') as Modified_PURCHASE_DATE

               

               

              Inline

              [

                PURCHASE_DATE, PRODUCT_DETAIL_CODE

                14-Nov-07, Others

                Dec 2010, 10128738

                26-Feb-08, 11197529

                14-Dec-07, 11438953

                18-Nov-05, 11425786

                11-Sep-13, 11208766

                30-Aug-13, 11209023

                2-Sep-13, 11208856

                23-Oct-13, 11208959

                10-Sep-13, 11209074

                12-Feb-13, 11208907

                8-Nov-13, 11198348

                Nov 2010, 43005270

                9-Sep-13, 11170032

              ];

                • Re: Please help me get the string pattern of each field.

                  Thanks Manish for quick response but thats not what I want. I want to determine the   typical pattern of  each field

                  Step 1 : Determine the typical pattern of the column. What is that formula to get the pattern

                                                                                   Output

                  formulaXYZ(PURCHASE_DATE )  ==>  "dd-mm-yy"

                  formulaXYZ(Product_detail_code) ==> "NNNNNNNN"  where n = 0 to 9  , 8 digits

                   

                  Step 2: Validate each values in that column follows the right pattern

                   

                  Purchase_date                    Remarks

                       Dec 2010           ==>      Invalid Format

                       14-Nov-07          ==>     Valid

                   

                   

                  Product_Detail_Code

                       Others               ==>      Invalid

                  10128738               ==>      Valid (8 digits)

                • Re: Please help me get the string pattern of each field.
                  Manish Kachhia

                  Got it...

                  Sorry but I dont have idea.. May be someone can suggest...

                  • Re: Please help me get the string pattern of each field.
                    Marco Wedel

                    Hi Rhodora,

                     

                    you are asking for a very complex function that most likely does not exist "off the shelve" in QlikView.

                    If you realy intend to implement such a thing (and you would surely be welcome to post the result) then using regular expressions might be the most promising approach.

                    Here are some links on using RegEx in QlikView:

                     

                     

                    hope this helps

                     

                    regards

                     

                    Marco

                    • Re: Please help me get the string pattern of each field.
                      Marco Wedel

                      Hi,

                       

                      although it's been some time since you opened this thread I would like to propose another solution:

                       

                      QlikCommunity_Thread_111162_Pic5.JPG.jpg

                       

                      QlikCommunity_Thread_111162_Pic4.JPG.jpg

                      QlikCommunity_Thread_111162_Pic1.JPG.jpg

                       

                      QlikCommunity_Thread_111162_Pic2.JPG.jpg

                      QlikCommunity_Thread_111162_Pic3.JPG.jpg

                       

                      table1:
                      LOAD RecNo() as RecID,
                           *
                      FROM [http://community.qlik.com/thread/111162]
                      (html, codepage is 1252, embedded labels, table is @1);
                      
                      FOR i = 1 to NoOfFields('table1')
                        tabFields:
                        LOAD $(i) as FieldNumber,
                            FieldName($(i), 'table1') as FieldName
                        AutoGenerate 1;
                      NEXT i;
                      
                      tabFieldValues:
                      CrossTable (FieldName, FieldValue)
                      LOAD * Resident table1;
                      
                      Left Join (tabFieldValues)
                      LOAD Distinct
                           FieldValue,   
                           Text(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
                           Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
                           Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(
                           Replace(Replace(Replace(Replace(Replace(Upper(FieldValue),
                           '1','0'),'2','0'),'3','0'),'4','0'),'5','0'),'6','0'),'7','0'),'8','0'),'9','0')
                           ,'B','A'),'C','A'),'D','A'),'E','A'),'F','A'),'G','A'),'H','A'),'I','A'),'J','A'),'K','A')
                           ,'L','A'),'M','A'),'N','A'),'O','A'),'P','A'),'Q','A'),'R','A'),'S','A'),'T','A'),'U','A')
                           ,'V','A'),'W','A'),'X','A'),'Y','A'),'Z','A')) as FieldFormat
                      Resident tabFieldValues;
                      
                      Left Join (tabFieldValues)
                      LOAD Distinct
                        FieldName,
                        FieldFormat,
                        AutoNumberHash128(FieldName, FieldFormat) as FieldFormatID
                      Resident tabFieldValues;
                      
                      tabFieldFormats:
                      LOAD *,
                           Text(Replace(PurgeChar(Capitalize(Replace(PurgeChar(Capitalize(Lower(FieldFormat)),'a'),'0','b')),'b'),'B','0')) as FieldFormat2;
                      LOAD FieldName,
                           FieldFormat,
                           AutoNumberHash128(FieldName, FieldFormat) as FieldFormatID,
                           Count(FieldFormat) as FieldFormatCount
                      Resident tabFieldValues
                      Group By FieldName, FieldFormat;
                      
                      DROP Fields FieldName, FieldFormat From tabFieldValues;
                      
                      Left Join (tabFieldFormats)
                      LOAD FieldName,
                           FieldFormat2,
                           Sum(FieldFormatCount) as FieldFormat2Count
                      Resident tabFieldFormats
                      Group By FieldName, FieldFormat2;
                      
                      Left Join (tabFieldFormats)
                      LOAD FieldName,
                           Max(FieldFormatCount) as MaxFieldFormatCount,
                           Max(FieldFormat2Count) as MaxFieldFormat2Count
                      Resident tabFieldFormats
                      Group By FieldName;
                      
                      Left Join (tabFieldFormats)
                      LOAD FieldName,
                           FieldFormat,
                           FieldFormat2,
                           Dual(If(IsValidFormat, 'Yes', 'No'), IsValidFormat) as IsValidFormat,
                           Dual(If(IsValidFormat2, 'Yes', 'No'), IsValidFormat2) as IsValidFormat2;
                      LOAD Distinct
                        FieldName,
                        FieldFormat,
                        FieldFormat2,
                        FieldFormatCount=MaxFieldFormatCount as IsValidFormat,
                        FieldFormat2Count=MaxFieldFormat2Count as IsValidFormat2
                      Resident tabFieldFormats;
                      
                      
                      

                       

                      hope this helps

                       

                      regards

                       

                      Marco