9 Replies Latest reply: Nov 30, 2010 5:44 AM by jbeleza RSS

    Help importing ASCII file with returns


      I'm trying to import an ASCII file from an old database, where I have data that should be on the same line separated in two or more lines with returns, somtehing like this:








      So when I import the data using "|" as the separator I get some problems, in this case I would have 4 lines instead of the real 3, just because the second line got separet in two lines, with the fields divided.


      Can you think of a way to solve the problem?

        • Help importing ASCII file with returns


          Is the extra line feed always after the record separator ? If yes, look for command line tool sed (there is Windows version available) . Replace the regular expression at the end of the line "|CRLF$" with "|" -> that will merge the lines.

          Something like

          sed "/|\n$/|/" file.txt > fixed.txt

          If the extra line feeds can be in random places, you need more sofisticated algorithm. Ex a Pyton / Perl script that reads one line, counts the separator characters. If those are not enough, concatenate with next line.


            • Help importing ASCII file with returns

              Hi Alex,

              Thanks for the reply.

              Actually I was trying to do this when I run the script on import, because it will happen to several files and I didn't want to create a new version of the file. And yes, the extra line feeds can be in random places...

              I already tried to purge and replace the chars on the load script, but I haven't managed to solve the problem. I guess the only way is to read the file creating a "purged" resident file that will later be imported as a new file without the CR and LF, dropping the first.

              If I can't find a similar solution I will have to choose a solution like the one you are proposing, which seems to solve my problem directly on the file.

                • Help importing ASCII file with returns


                  You can use sed to drop all CR (or CRLF) . Than have a Perl / Python script that reads exactly as many delimiters as there should be fields, insert a CRLF, write to fixed.txt, and repeat until EOF.

                  I am not sure you can do this with QlikView script. It does not handle well output to text files, nor reading N fields instead of complete line.


                    • Help importing ASCII file with returns


                      QlikView is very good at reading at line level, but will got confused by line breaks in this case.


                      Command line bellow will remove all CR LF characters:

                      tr -s "\r\n" "|" < bad.txt > temp.txt

                      This results in


                      Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data


                      FNAME = 'temp.txt'
                      NR_FIELDS = 3

                      with open(FNAME) as f:
                      raw = f.readlines()[0].split('|')
                      res = []
                      for i in xrange(len(raw)):
                      if (i % NR_FIELDS == 0):
                      res.append('\n' + rawIdea)

                      print '|'.join(res)


                    • Help importing ASCII file with returns
                      Miguel Angel Baeyens de Arce


                      You may use as well something like the following in your load script


                      LOAD Replace(F1, chr(10), '|') AS F1FROM File;

                      Depending on the charmap you may need to add the chr(13) char too.

                      Hope this helps.

                        • Help importing ASCII file with returns

                          Hi Miguel,

                          I already tried several solutions with purgechar and replace. They don't work because I'm importing the file and when you do that, from what I understood, qv script just reads the separators for the fields, creating the structure of the table, and then it replaces the chars inside the fields created. I also thought on breaking the fields on two temp tables and then joining them, but that isn't either a great solution, because I don't have a 'clean' keyfield to do this (because of the separators and artificial lines).

                          Also removing all CR and LF, like Alex also tried, creates a single structure that is hard to break after, creating all the single lines with the correct structure.

                          I'll keep trying when I have some time available, for now I think I'll just have to change the original files.