9 Replies Latest reply: Jul 21, 2009 7:16 PM by John Witherspoon RSS

    Help with Importing Text File With Mulitple Delimiters

    Simon Kok

      Hi there.

      Newbie trying to transform a text file that looks like the following:

      Request: Login | Date: 200812160750 | Elapsed Time: 367 | User Identifier: 4709
      Request: Profile Search | Date: 200812160905 | Elapsed Time: 969 | User Identifier: Mann123
      Request: View Profile | Date: 200812160905 | Elapsed Time: 54 | User Identifier: 145992

      There's essentially two delimiters. The ":" colon and the "|" pipe.

      My script looks like this for the *FIRST* delimiter ":".

       

      LOAD
      @1,
      @2,
      @3,
      @4,
      @5
      FROM
      [C:\test.txt]
      (txt, utf8, explicit labels, delimiter is ':', no quotes);




      How do I transfer the second delimiter "|" pipe ?

      Thank-you in advance!

        • Help with Importing Text File With Mulitple Delimiters
          Rakesh Mehta

          Would this work for you?

          LOAD
          recno() as RecordNumber,
          subfield(@1, ':',1) as v1, subfield(@1, ':',2) as v1Val,
          subfield(@2, ':',1) as v2, subfield(@2, ':',2) as v2Val,
          subfield(@3, ':',1) as v3, subfield(@3, ':',2) as v3Val,
          subfield(@4, ':',1) as v4, subfield(@4, ':',2) as v4Val,
          subfield(@5, ':',1) as v5, subfield(@5, ':',2) as v5Val
          FROM
          [c:\test.txt]
          (txt, utf8, explicit labels, delimiter is '|', no quotes);

            • Help with Importing Text File With Mulitple Delimiters
              Simon Kok

              Thank-you - your script suggestion worked!

              That did the trick. I now have data in this tool to play with.

               

                • Help with Importing Text File With Mulitple Delimiters
                  John Witherspoon

                  Or if the fields are always in the same order, and including a little formatting, I'm guessing this would work and give you more meaningful field names rather than legend/value pairs:

                  LOAD
                  subfield(@1,': ',2) as "Request"
                  ,date(date#(left (subfield(@2,': ',2),8),'YYYYMMDD'),'MM/DD/YY') as "Date"
                  ,time(time#(right(subfield(@2,': ',2),4),'hhmm'),'hh:mm') as "Time"
                  ,interval(subfield(@3,': ',2)/100,'hh:mm:ss.ff'), as "Elapsed time"
                  ,text(subfield(@4,': ',2)) as "User Identifier"
                  FROM [C:\test.txt] (txt, utf8, explicit labels, delimiter is '|', no quotes);

                    • Help with Importing Text File With Mulitple Delimiters
                      Simon Kok

                      Your timing is impeccable.

                      I was actually going to start tackling the wacky date/time format and was just about to post a question on that. :)

                      I was originally using an open source log analysis tool called Splunk (http://www.splunk.com) - I had to do some regular expressions to parse the log file - but I found the reporting feature to be quite weak.

                      I think the QlikView tool will allow me to monitor, report and possibly trigger email alerts.

                      Thank-you again! I'll let you know how your script suggestion works out...

                       

                      • Help with Importing Text File With Mulitple Delimiters
                        Simon Kok

                        Hi John,

                        I got the following error with your script suggestion:

                        Syntax error, missing/misplaced FROM:
                        LOAD
                        subfield(@1,': ',2) as "Request",
                        date(date#(left (subfield(@2,': ',2),8),'YYYYMMDD'),'MM/DD/YY') as "Date",
                        time(time#(right(subfield(@2,': ',2),4),'hhmm'),'hh:mm') as "Time",
                        interval(subfield(@3,': ',2)/100,'hh:mm:ss.ff'), as "Elapsed time",
                        text(subfield(@4,': ',2)) as "User Identifier"
                        FROM
                        [C:\test.txt] (txt, utf8, explicit labels, delimiter is '|', no quotes)
                        LOAD
                        subfield(@1,': ',2) as "Request",
                        date(date#(left (subfield(@2,': ',2),8),'YYYYMMDD'),'MM/DD/YY') as "Date",
                        time(time#(right(subfield(@2,': ',2),4),'hhmm'),'hh:mm') as "Time",
                        interval(subfield(@3,': ',2)/100,'hh:mm:ss.ff'), as "Elapsed time",
                        text(subfield(@4,': ',2)) as "User Identifier"
                        FROM
                        [C:\test.txt] (txt, utf8, explicit labels, delimiter is '|', no quotes)

                        -----

                          • Help with Importing Text File With Mulitple Delimiters
                            John Witherspoon

                            Well, I probably missed a comma, or have misbalanced parentheses or something. QlikView isn't very good at telling you the real problem, I've found. "Missing/Misplaced FROM", more often than not, means "syntax error in your list of fields".

                            Ah, looks like I put an EXTRA comma after the last parenthesis in the Elapsed Time line. Remove that comma, or just copy the below.

                            LOAD
                            subfield(@1,': ',2) as "Request"
                            ,date(date#(left (subfield(@2,': ',2),8),'YYYYMMDD'),'MM/DD/YY') as "Date"
                            ,time(time#(right(subfield(@2,': ',2),4),'hhmm'),'hh:mm') as "Time"
                            ,interval(subfield(@3,': ',2)/(24*60*60*100),'hh:mm:ss.ff') as "Elapsed time"
                            ,text(subfield(@4,': ',2)) as "User Identifier"
                            FROM [C:\test.txt] (txt, utf8, explicit labels, delimiter is '|', no quotes)
                            ;

                            Oh, and perhaps I should explain what this is doing. Hard to learn otherwise.

                            The subfield() function breaks a field apart into more than one field based on a delimiter, so that's how we're handling the multiple delimiters here - one in subfield, the other in the FROM. We're telling it we want the second subfield and that ': ' is our delimiter, so that should return the value portion.

                            Date#() and time#() functions build a date or time based on a particular format of the input, so that first string is the format of the input.

                            Date() and time() functions tell the system how you WANT the date and time to be formatted, so the second string is the format we want.

                            Interval() tells the system how to format an interval of time. However, now that I look at it, I messed it up. I was assuming that your numbers were hundredths of a second. But an interval is the fraction of an entire day, not of a second, so I should have divided by (24*60*60*100). Fixed.

                            I explicitly tell it that the User Identifier is text() because we've had some issues in the past with IDs like 043E07 being interpreted as scientific notation, and coming out really weird. So when I see a mixture of text and things that could be interpreted as a number, I wrap it in text() just in case.

                    • Help with Importing Text File With Mulitple Delimiters
                      Rob Wunderlich

                      The TextBetween function can also be very useful for this kind of parsing. I like it because it's self documenting and is not dependent on field order. Read the record without a delimiter and then you can do functions like:

                      trim(TextBetween(@1, 'Request:', '|')) as Request
                      trim(TextBetween(@1, 'Elapsed Time:', '|')) as ElapsedTIme

                      etc. You'll still need to do date/time interpretation as before.

                      -Rob