7 Replies Latest reply: Oct 9, 2012 12:18 PM by Aylin Bisiren RSS

    Extract values with a definded pattern

      Hello together,
      i have a question.
      In my sample file, i have different values. So i only want to extract the columns, which have this pattern:
      X= character, n=number
      Xnn
      XnnXnn
      XnnXnnXnn
      I used the regular expression:

      [A-Z]\d{2}[A-Z]\d{2}[A-Z]\d{2}[^A-Z]

      [A-Z]\d{2}[A-Z]\d{2}

      [A-Z]\d{2}
      Please look at my file. Maybe someone can help me. Thanks!
      Best regards,
      Ayliln
        • Re: Extract values with a definded pattern
          Henric Cronström

          It might be possible to do this in a VB function, as you tried. I would however do it in the QlikView script, using several steps. Put the following code after your Sample table:

           

          IncorrectNames:

          Load * where not ( IsText(Left(SubName,1)) and IsNum(mid(SubName,2)) and Len(Trim(SubName))=3 );

          Load

                    Name as IncorrectName,

                    mid(Name,iterno()*3-2,3) as SubName resident Sample

                    While iterno()<= 1+(Len(Name)-1)/3;

           

          CorrectNames:

          Noconcatenate Load * resident Sample Where not Exists(IncorrectName,Name);

          Drop Tables Sample, IncorrectNames;

           

          HIC

            • Re: Extract values with a definded pattern

              Thank you very much HIC!

              It works!

                • Re: Extract values with a definded pattern
                  Dave Riley

                  An alternative (but not as elegant as Henric's!) solution is as follows ...

                   

                  Set validChars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
                  Set validNums = '0123456789';

                   

                  Pattern: // 4 levels
                  LOAD *
                  WHERE candidate1 = 123
                  and (candidate2 = 456 or candidate2 = 000)
                  and (candidate3 = 789 or candidate3 = 000)
                  and (candidate4 = 101112 or candidate4 = 000)
                  and (candidate5 = 000)
                  and len(Name) = len(NameExclInvalidChars);
                  LOAD
                  Name,
                  FINDONEOF(Name,'$(validChars)',1) & FINDONEOF(Name,'$(validNums)',1) & FINDONEOF(Name,'$(validNums)',2) as candidate1,
                  FINDONEOF(Name,'$(validChars)',2) & FINDONEOF(Name,'$(validNums)',3) & FINDONEOF(Name,'$(validNums)',4) as candidate2,
                  FINDONEOF(Name,'$(validChars)',3) & FINDONEOF(Name,'$(validNums)',5) & FINDONEOF(Name,'$(validNums)',6) as candidate3,
                  FINDONEOF(Name,'$(validChars)',4) & FINDONEOF(Name,'$(validNums)',7) & FINDONEOF(Name,'$(validNums)',8) as candidate4,
                  FINDONEOF(Name,'$(validChars)',5) & FINDONEOF(Name,'$(validNums)',9) & FINDONEOF(Name,'$(validNums)',10) as candidate5,
                  KeepChar(Name,'$(validChars)' & '$(validNums)') as NameExclInvalidChars
                  resident Sample;

                   

                  This might however offer a solution if the pattern is a bit less repetitive such as XnnXnXnX etc.

                   

                  flipside

                • Re: Extract values with a definded pattern

                  Hello Henric Cronström,

                   

                  one question. I want to modify your expression for the patterns

                  Xnn

                  XnnXnn

                  XnnXnnXn  (my first example was XnnXnnXnn).

                   

                  How can i do this in your code?

                   

                  Thanks for your help!

                   

                  Best Regards,

                  Aylin

                    • Re: Extract values with a definded pattern
                      Henric Cronström

                      When I filter out the incorrect SubNames, I used

                         Load * where not ( IsText(Left(SubName,1)) and IsNum(mid(SubName,2)) and Len(Trim(SubName))=3 );

                      i.e. I demand that a SubName must have

                      1. the first character as text:  IsText(Left(SubName,1)), and
                      2. the second and third character as number: IsNum(mid(SubName,2)), and
                      3. the sumbname must have three characters: Len(Trim(SubName))=3 )

                       

                      If you also want to accept 'XnnXnnXn', you could just modify the third demand to Len(Trim(SubName))>=2. But then you will also accept 'XnnXn Xnn', i.e. a string where you have a space instead of a digit in the third position. But maybe this is not a problem? If so,

                         Load * where not ( IsText(Left(SubName,1)) and IsNum(mid(SubName,2)) and Len(Trim(SubName))>=2 );

                       

                      If it is a problem, you need to modify your Load statement further, and perhaps use PurgeChar(Name,' ') in the mid() function and in the while loop to remove internal blanks.

                       

                      HIC