16 Replies Latest reply: Feb 20, 2012 4:31 AM by qliksus RSS

Substring from text field from variable start position

qlikvuser

Hi all,

In my script I need to extract from an excel field a substring that begins with the letters "PT" and is followed by 8 numeric characters. The source text field from which to extract the substring can have multiple occurrences of substring "PT....." but only one followed by 8 numeric characters.

Let me explain with an example:

source text string: "Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

The only string to extract is "PT63004910" and not "PT7aW" or "PT8".

Thanks for the help.

  • Re: Substring from text field from variable start position
    swuehl

    I would suggest using a regular expression here. Unfortunately, QV doesn't support regular expressions itself. But we could easily use a VBScript Macro for that and call that from script.

     

    I just reused the Macro code posted here

    http://www.qlikfix.com/2010/10/18/regular-expressions-in-the-load-script/

    which just encapsulates the VB code for a regular expression quite nicely.

     

     

    Then, the resulting script code just looks like

     

    LOAD Source,

    RegExFind(Source, 'PT[0-9]{8}',',',false()) as Result

    INLINE [

    Source

    "Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments"

    ];

    • Re: Substring from text field from variable start position
      qlikvuser

      thank you so much!

    • Re: Substring from text field from variable start position
      cartierr

      Hi Stefan,

       

      Is it possible to apply that same function if you just want to extract one specific string form multi rows?

      Let's say I have text file where I just want to extract the string 'CIP', 'CPT' or 'EXWORKS' and name it as Terms,

      The problem is that text is stored in serveral rows instead of one line or at one fixed position. Would it still be possible to apply the RegExFind function? Or should I look for another function/statement?

       

       

       

      Hope you can help.

       

      Carter

      • Re: Substring from text field from variable start position
        swuehl

        Maybe you don't even need the regular expression for this, simple searches can also be done with QV functions, e.g. Wildmatch(). Maybe like this:

         

        INPUT:

        LOAD

        mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

        date(FileTime()) As DateCreated,

        @1

        FROM

        [.\Commercial_ED_REL_GCPR_*.txt]  // <<<<<< I've removed the rest of the filename in order to load multiple textfiles without adding them manualy.

        (txt, codepage is 1252, no labels, delimiter is ';', msq);

         

        TERMS:

        LOAD OrderNo, DateCreated,

        concat(@1) as Text,

        pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms

        Resident INPUT group by OrderNo,DateCreated;

         

        //drop table INPUT;

        //drop field Text;

        • Re: Substring from text field from variable start position
          cartierr

          Hi Stefan,

           

          Firstly my apologizies for not thank you for your help! I thought I replied to your last message.

           

          I'm trying to extract a City name from an invoice. Unfortunatelly the invoice contains alot of gibberich. The city name happens to be between 'Delivery To: and ', 'Customer No:'

           

          I first tried to extract the destintion between a specific code and two letters by using the following expression:

           

          Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', ))))

           

          Unfortunatelly not all the destination are mention between this two strings, but they are however between 'Delivery To' and 'Customer NO:'

           

          So I tried the following expression:

           

          Trim(Capitalize(TextBetween(Text, pick(WildMatch(Text, '*Delivery To:*', City), City), pick(wildmatch(Text, '*Customer No:*'), City ))))

           

          But that loaded all of the Text :S

           

          Do you know how to tackle this issue?

           

          Please find my qvw file.

           

          Thanks in advance

           

          Carter

          • Re: Substring from text field from variable start position
            swuehl

            Could you point me to one of those OrderNo, please? As you said, there is lot of gibberich around...

            And could you highlight / post the part of the OrderNo Text you are interested in?

            • Re: Substring from text field from variable start position
              cartierr

              Hi stefan,

               

              I analysed the data before loading it in to qlikview. I've noticed that the original textfile, after ' %%[ ProductName: GPL Ghostscript ]%%' , starts with ' Commercial Invoice'.

              When I load the text file in Qlikview, the file also starts with '%%[ ProductName: GPL Ghostscript ]%%',' but not followed by 'Commercial Invoice'.

               

              Deliver To:// from this area

              ell Eval/Demo AccountELL SAS CASANEARSHORE1100 BD EL QODS-sidi maarouf

              CASABLANCA20150Morocco // in this case I need to extract 'CASABLANCA'

              mehdi zbaidiCustomer No:/*till this area */

               

               

               

              I've attached the original files, and converted them to html  in order to point out in what text range i'm interessed in.

              Since there are multiple 'Deliver to:' in the text file, i'm to take a part of the text that I pointed out and 'dump' everything else.

               

              Hope you can help me.

               

              Regards,

               

              Carter

               

              Message was edited by: Carter James

              • Re: Substring from text field from variable start position
                swuehl

                I've also noticed that the Text field shows not the same content is the original file.

                The lines are out of order, since concat without a sort weight will sort the lines alphabetically.

                But you can create a sort weight by creating a LineID field in your INPUT load

                 

                INPUT:

                LOAD

                @1,

                recno() as LineID,

                mid(filename(),Index(filename(),'PR_')+3,9) as OrderNo,

                subfield(filename(), '_', 5) as Order,

                mid(filename(),Index(filename(),'PR_')+3,2) as OrderNot,

                //pick(wildmatch(concat(@1),'*CIP*','*CPT*','*EXWORKS*'),'CIP','CPT','EXWORKS') as Terms,

                if(mid(filename(),index(filename(),'PR_')+3,2),'EDB','ROW') as Service,

                date(FileTime()) As DateCreated

                ...

                 

                and then concat using the weight:

                 

                TERMS:

                LOAD

                OrderNo,

                concat(@1,' ',LineID) as Text,

                pick(WildMatch(Concat(@1,' ',LineID),'*CIP*','*CPT*','*EXWORKS*','*CIF*','*DDP*','*DAP*','*DDU*', '*FOC*', '*free of charge*', '*FCA*', '*CFR*'),

                                          'CIP','CPT','EXWORKS','CIF','DDP','DAP','DDU', 'FOC', 'free of charge', 'FCA', 'CFR') as Term   

                Resident INPUT group by OrderNo;


                Then I changed your third test expression to

                =Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1)))

                 

                (You don't really need the pick / match combinations here)

                I think this will give you the text portions you were asking for, but I think it is still going to be pretty hard to extract the something like 'Casablanca' out of

                 

                : Dell Eval/Demo Accountdell Sas Casanearshore1100 Bd El Qods-Sidi Maarouf Casablanca20150morocco Mehdi Zbaidi

                 

                Please see also attached.

                 

                Regards,

                Stefan

                 

                P.S: I assume that you really need to parse in the postscript files, it would be much easier and better to directly access the original data in a DB. Isn't there any possible way?

                • Re: Substring from text field from variable start position
                  Ed Hilsinger

                  Can you see if this sample code works for you?

                   

                  [Master Data]:

                  LOAD

                            RowNo ()          as master_row,

                            F1

                  INLINE [

                      F1

                      Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments

                            This is a test

                            Second PT test

                            PT12345678 third test

                            PT1234567 fourth PTABCDEFGH

                  ];

                   

                   

                  [Child Data]:

                  NoConcatenate

                  LOAD

                            master_row                              as rowid,

                            F1                                                  as temp_F1,

                            SubField (F1, ' ')          as temp_F2

                  Resident

                            [Master Data];

                   

                   

                   

                  Left Join ([Master Data])

                  LOAD

                            rowid          as master_row,

                            temp_F2          as F2

                  Resident

                            [Child Data]

                  Where

                            (Mid (temp_F2, 1, 2) = 'PT')          and

                            (Len (temp_F2) = 10)                              and

                            (IsNum (Mid (temp_F2, 3, 4)));

                   

                   

                   

                   

                  DROP TAble [Child Data];

                  DROP Field master_row;

                • Substring from text field from variable start position
                  cartierr

                  Hi Stefan,

                   

                  Thanks for you help stefan.

                  You are right! I think this is to difficult to extract a city name. What I wanted to do is first select text between to words in this case:

                  Trim(Capitalize(TextBetween(Text, 'Deliver To', 'Customer',1))) as DestinationTest,

                   

                  Then use something like:

                   

                  =Pick(Wildmatch(DestinationTest, *City*), City)

                   

                  City in this case in not a string but a fieldname that contains all the cities. I wanted to use the wildmatch function to extract the City that matches the field City.


                  So if the field City, contains ''Casablcana" that matches in the DestinationTest, it would pick that City name.

                  I tried it but it doesn't work.

                   

                  But thanks for all your help!!! I really appriciate it.

                  I wish they had the city names in the database, it would have been much easier..

                  • Re: Substring from text field from variable start position
                    swuehl

                    I don't think that you can use a field in Wildmatch (well, no, you probably could using a concat technique, but let's keep that for a later time).

                    You could probably use a mapping table together with mapsubstring function to search and map your city names in case you have all citynames available:

                     

                    CITYMAP:

                    MAPPING LOAD * INLINE [

                    F1, F2

                    ALMATY, §1Almaty§2

                    CASABLANCA, §1Casablanca§2

                    ];

                     

                    ...

                     

                    Destination:

                    Load *,

                    TextBetween(MapSubString('CITYMAP',DestTest1),'§1','§2') as DestTest1Result;

                    Load

                    *,

                    Trim(Capitalize(TextBetween(Text,pick(WildMatch(Text, '*CIP*'), 'CIP'), pick(wildmatch(Text, '*HS*', '*84*', '*TARIFF*'), 'HS', '84', 'TARIFF', )))) as Destination,

                    Trim(UPPER(TextBetween(Text, 'Deliver To', 'Customer',1))) as DestTest1,

                    TextBetween(Text, pick(WildMatch(Text, '*Deliver To:*'), 'Deliver To:'), pick(WildMatch(Text,'*Page 1*'), 'Page 1')) as TestDest

                    ...

                     

                    The Citymap mapping table could be loaded from any source, read the city field one time in in UPPER case, one time concatenated with the two delimiters (which are arbitrary).

                     

                    See also attached.

                     

                    Regards,

                    Stefan

  • Substring from text field from variable start position
    qliksus

    Hi ,

     

    Try this approach if you dont want to use macro

     

    test1:
    load SubStringCount( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-') as new1 ;
    LOAD * INLINE [
    a
    Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments ];

     

    let loopval = peek('new1',0,'test1') ;

     

    test:
    LOAD * INLINE [
    a
    Support Automation    PT7aW. Resolution  PT8. Related Issues Activities PT63004910 Date% Event Log      11. Attachments ];


    for i = 0 to $(loopval)

    load

    left(if ( len(textbetween( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-','-' ,$(i)))>=10 ,

    textbetween( replace(KeepChar(a,'PT0123456789')&'-','PT','-PT'),'-','-' ,$(i)) ),10) as new
     
    resident test ;

    NEXT