5 Replies Latest reply: Feb 7, 2013 8:53 PM by John Lifter RSS

    Read File with Quotes

      Hi,

       

      I have a csv file that;

       

      a) Has some of the fields in Quotes (")

      b) Also has some quotes inside a field as part of a string

       

      Whats the best approach to handle this in QVE, I've noticed the option for Read File >> May have quotes or does not have quotes. I assume this is referring to the fact some strings inside the CSV may be wrapped in quotes?

       

      Some CSV readers are able to distinguish between quotes at the start and end of a string inside the comma seperated fields, like excel.

       

      I am getting an error on a file because of the above, which leads me to believe that I will need to parse the file and clean out the quotes first in QVE before I can

       

      Thanks

      James

        • Re: Read File with Quotes
          Nicole Smith

          I use .csv files all of the time, and what you're trying to do shouldn't be a problem.  I am attaching a sample .csv file that I created in Excel and also a .qvw where I have load script to load in that .csv file.

          • Re: Read File with Quotes
            Hugo Sheng

            The issue is the quote within the quote.  So if you have something like:

             

            123,"abc",456,"this is an "embedded" quote!!!"

             

            it will cause the read file operator get confused as it's looking for matching quote pairs and looking for 4 columns in this case.   What you can do to address this issue is to read the data file in with each record as a single variable length string first, and replace the embedded quotes with single quotes, and then in a subsequent step, read the data in as you're trying to do.

             

            You can use the following function in a transform to do this:

             

            function ParseQuote(WholeLine)

            x = string.replace(WholeLine,'","',"','")

            x = string.replace(x, '""',"''")

            x = string.replace(x,'"',"'")       --change embedded quotes to single quotes

            x = string.replace(x,"','",'","')   --replace quotes

            x = string.replace(x,"^'",'"')     --fix 1st quote

            x = string.replace(x,"'$",'"')     --fix last quote

            return x

            end

             

            You'll wind up with an output like this afterwards:

             

            123,"abc",456,"this is an 'embedded' quote!!!"

              • Re: Read File with Quotes

                Hi Hugo,

                 

                thanks for the quick response.

                 

                I couldn't get this to work, the output was that all double quotes were single quotes. Then I realise that the sting search was looking for neighbouring quotes, (",")  so I added two additional lines to help check for quotes without neighbours and it works;

                 

                However I have spotted a flaw still, if we get a line like;

                 

                123,"abc","test,456,Test,"this is an "embedded" quote!!!"

                 

                Then it still doesn't correct the 3rd column becuase my neighbour fix assumes that no field will contain one double quote at the start or end of the field. So theres still a little work to do, I'm wondering pattern matching might be way forward, or whether logic will need to be applied by created an array table or somethuing.

                 

                My other concern is, what happens if a comma appears in the field as well but is inside quotes...the joys of csv files....

                 

                 

                 

                function transform(input)

                 

                x = string.replace(input.WholeLine,'","',"','")

                 

                 

                 

                x = string.replace(x, '""',"''")

                 

                x = string.replace(x,'"',"'")       --change embedded quotes to single quotes

                 

                x = string.replace(x,"','",'","')   --replace quotes

                 

                x = string.replace(x,"^'",'"')     --fix 1st quote

                 

                x = string.replace(x,"'$",'"')        --fix last quote

                 

                x = string.replace(x, "',",'",')        --Restore quote without neighbour

                 

                x = string.replace(x, ",'",',"')        --Restore quote without neighbour

                 

                output.WholeLine = x

                 

                return output

                 

                end

                 

                 

                 

              • Re: Read File with Quotes

                You may find this approach workable.

                 

                The following representative data includes many possible examples of embedded quotes.

                 

                abc,""Oh, No" he said","an "embedded" quotes"

                ""a,bc"",""Oh, No" he said","an "embedded quotes""

                "abc",""Oh, No" he said","an "embedded" quotes"

                 

                Read the file using a schema that describes a single attribute.  Use CR+LF as the record delimiter and some character combination that cannot be in your data as the field delimiter; for example the Vertical Bar.

                 

                Then use the following code in a Transform operator.  Note that the output is still a single line (so you can use the same schema) but that the embedded quotations have been changed from " to '.  Note that the order of the string.replace statements is important.

                 

                quotes.png

                 

                function transform(input)

                  output = {}

                  x = string.replace(input.line,'""$','"|')

                  x = string.replace(x,'"$','|')

                  x = string.replace(x,'^""','|"')

                  x = string.replace(x,'^"','|')

                  x = string.replace(x,'",','|,')

                  x = string.replace(x,',"',',|')

                  x = string.replace(x,'"',"'")

                  x = string.replace(x,'|','"')

                     

                  output.line = x

                  

                  return output;

                end;