Solved: Help importing ASCII file with returns - Qlik Community

Report Inappropriate Content · ‎2010-11-29

Hi,

I'm trying to import an ASCII file from an old database, where I have data that should be on the same line separated in two or more lines with returns, somtehing like this:

field1|field2|field3

right|right|right

right|

wrong|wrong

right|right|right

So when I import the data using "|" as the separator I get some problems, in this case I would have 4 lines instead of the real 3, just because the second line got separet in two lines, with the fields divided.

Can you think of a way to solve the problem?

Report Inappropriate Content · ‎2010-11-30

Hi,

QlikView is very good at reading at line level, but will got confused by line breaks in this case.

----------------------------------------------------------

Command line bellow will remove all CR LF characters:

tr -s "\r\n" "|" < bad.txt > temp.txt

This results in

Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data



FNAME = 'temp.txt'
NR_FIELDS = 3

with open(FNAME) as f:
 raw = f.readlines()[0].split('|')
 res = []
 for i in xrange(len(raw)):
 if (i % NR_FIELDS == 0):
 res.append('\n' + raw)
 else:
 res.append(raw)

 print '|'.join(res)

-Alex

View solution in original post

Report Inappropriate Content · ‎2010-11-29

Hi,

Is the extra line feed always after the record separator ? If yes, look for command line tool sed (there is Windows version available) . Replace the regular expression at the end of the line "|CRLF$" with "|" -> that will merge the lines.

Something like

sed "/|\n$/|/" file.txt > fixed.txt

If the extra line feeds can be in random places, you need more sofisticated algorithm. Ex a Pyton / Perl script that reads one line, counts the separator characters. If those are not enough, concatenate with next line.

-Alex

Report Inappropriate Content · ‎2010-11-29

Hi Alex,

Thanks for the reply.

Actually I was trying to do this when I run the script on import, because it will happen to several files and I didn't want to create a new version of the file. And yes, the extra line feeds can be in random places...

I already tried to purge and replace the chars on the load script, but I haven't managed to solve the problem. I guess the only way is to read the file creating a "purged" resident file that will later be imported as a new file without the CR and LF, dropping the first.

If I can't find a similar solution I will have to choose a solution like the one you are proposing, which seems to solve my problem directly on the file.

Report Inappropriate Content · ‎2010-11-30

Hi,

You can use sed to drop all CR (or CRLF) . Than have a Perl / Python script that reads exactly as many delimiters as there should be fields, insert a CRLF, write to fixed.txt, and repeat until EOF.

I am not sure you can do this with QlikView script. It does not handle well output to text files, nor reading N fields instead of complete line.

-Alex

Miguel_Angel_Baeyens · ‎2010-11-30

Hello,

You may use as well something like the following in your load script

LOAD Replace(F1, chr(10), '|') AS F1FROM File;

Depending on the charmap you may need to add the chr(13) char too.

Hope this helps.

Report Inappropriate Content · ‎2010-11-30

Hi,

QlikView is very good at reading at line level, but will got confused by line breaks in this case.

----------------------------------------------------------

Command line bellow will remove all CR LF characters:

tr -s "\r\n" "|" < bad.txt > temp.txt

This results in

Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data



FNAME = 'temp.txt'
NR_FIELDS = 3

with open(FNAME) as f:
 raw = f.readlines()[0].split('|')
 res = []
 for i in xrange(len(raw)):
 if (i % NR_FIELDS == 0):
 res.append('\n' + raw)
 else:
 res.append(raw)

 print '|'.join(res)

-Alex

Report Inappropriate Content · ‎2010-11-30

Replace the lightbulb with characters [ i ] . Too bad that text between [ c o d e] .. [ / c o d e ] still gets messed up

-Alex

Report Inappropriate Content · ‎2010-11-30

Hi Miguel,

I already tried several solutions with purgechar and replace. They don't work because I'm importing the file and when you do that, from what I understood, qv script just reads the separators for the fields, creating the structure of the table, and then it replaces the chars inside the fields created. I also thought on breaking the fields on two temp tables and then joining them, but that isn't either a great solution, because I don't have a 'clean' keyfield to do this (because of the separators and artificial lines).

Also removing all CR and LF, like Alex also tried, creates a single structure that is hard to break after, creating all the single lines with the correct structure.

I'll keep trying when I have some time available, for now I think I'll just have to change the original files.

Thanks!

Joaquim

Report Inappropriate Content · ‎2010-11-30

The Python script does just that: reading a file without linebreaks, and producing one with linebreaks

-Alex

Report Inappropriate Content · ‎2010-11-30

Alex,

Yeah, I guess I'll have to use something like this on the original file. Like I was telling Miguel I hope to find a solution later to do this directly on qv which will save me some time and make a clear structure to when we have to treat info this way. Counting the 'pipes' seems to be the right way (and then breaking the lines).

Thank you, you were really helpful!

Joaquim