9 Replies Latest reply: Apr 17, 2014 1:03 PM by Kenneth Madsen RSS

    parse tab delimited text file with read custom.

      I have a piece of code that reads a comma delimited text file and writes it out to a schema using the custom read operator.

      I want to do the very same thing for a set of files that are .txt tab delimitted files.

      I don't know how to specify tab in the following two lines of code rather than comma.

       

      Thanks Traci

       

      lines to modify.

      line=string.concatenate(line, 'HT')

          

           --parse line using comma assert field delimiter

           for value in string.iterate(line,"(.-)HT") do

             log.notice(value)

             --values[#values+1] = value

           end


      complete code for comma delimited text file


      line=string.concatenate(line, 'HT')

          

           --parse line using comma assert field delimiter

           for value in string.iterate(line,"(.-)HT") do

             log.notice(value)

             --values[#values+1] = value

           end

        • Re: parse tab delimited text file with read custom.

          Can't really follow what you are trying to do here as both code segments are the same.

           

          But there should be no reason for you to use Read Custom operator to process a comma or tab delimited record as the schema used with the Read File operator can handle either delimiter.

          • Re: parse tab delimited text file with read custom.

            Hi Traci,

             

            If I understand correctly, what you want is to read by the script a text file delimited by "tabs"? if that is your doubt, then check this out.

             

            Separator.png

            • Re: parse tab delimited text file with read custom.
              Mangal Kamble

              Hello,

               

              For tab you can use character value

              chr(09)

              • Re: parse tab delimited text file with read custom.

                This works, thanks everyone.


                for value in string.iterate(line,"(.-)%c") do

                • Re: parse tab delimited text file with read custom.
                  Kenneth Madsen

                  While you have the answer for your situation, I thought, I'd share the approach I took in a Read Custom.  It differs, perhaps, in what I needed to do with the parsed data.  That is, I needed to map the fields to downstream attributes.

                   

                  function read()

                    output = {}

                    local aLine = nil

                    local aLine = proc:read()

                   

                    if not aLine then

                      -- no more records

                      log.information("EOF")

                      return true

                    else

                    f1=ustring.find(aLine, "|")

                    f2=ustring.find(aLine, "|",f1+1)

                    f3=ustring.find(aLine, "|",f2+1)

                    f4=ustring.find(aLine, "|",f3+1)

                    f5=ustring.find(aLine, "|",f4+1)

                    f6=ustring.find(aLine, "|",f5+1)

                    f7=ustring.find(aLine, "|",f6+1)

                    f8=ustring.find(aLine, "|",f7+1)

                    f9=ustring.find(aLine, "|",f8+1)

                   

                   

                    output.SLId       = ustring.substring(aLine,1,f1-1)

                    output.SLGen      = ustring.substring(aLine,f1+1,f2-1)

                    output.SLVer      = ustring.substring(aLine,f2+1,f3-1)

                    output.SLRId      = ustring.substring(aLine,f3+1,f4-1)

                    output.SLRVer     = ustring.substring(aLine,f4+1,f5-1)

                    output.SLRPLRowid = ustring.substring(aLine,f5+1,f6-1)

                    output.SLSeq      = ustring.substring(aLine,f6+1,f7-1)

                    output.LVComp     = ustring.substring(aLine,f7+1,f8-1)

                    output.SLCalc     = ustring.substring(aLine,f8+1,f9-1)

                    output.LVCalc     = ustring.substring(aLine,f9+1)

                   

                    return output

                  end  -- if

                  end; -- function read()

                   

                  Not as elegant as iterating across fields, but at the time I could figure how to iterate across the output attributes, short of a 9 branch if/then/else, which would have resulted in more lines of code anyway.

                   

                  I left out the initialize and finalize.  The initialize is using io.popen to run the program that generates the data that is read in (program sends data to stdout and the read function reads in the data)

                   

                  Here we chose the pipe char as a delimiter.  I am sure we could have used \t instead for tab delimiters.  (Pipe was already in the library code for the program, so we just used that). We also use tab, comma and \0 (null byte) as delimiters in file schemas elsewhere.  Read Custom allowed us to save a step (otherwise the program would write a file and the next step would start with a Read File operator)