Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Help importing ASCII file with returns

Hi,

I'm trying to import an ASCII file from an old database, where I have data that should be on the same line separated in two or more lines with returns, somtehing like this:

field1|field2|field3

right|right|right

right|

wrong|wrong

right|right|right

So when I import the data using "|" as the separator I get some problems, in this case I would have 4 lines instead of the real 3, just because the second line got separet in two lines, with the fields divided.

Can you think of a way to solve the problem?

1 Solution

Accepted Solutions
Not applicable
Author

Hi,

QlikView is very good at reading at line level, but will got confused by line breaks in this case.

----------------------------------------------------------

Command line bellow will remove all CR LF characters:

tr -s "\r\n" "|" < bad.txt > temp.txt

This results in

field1|field2|field3|right|right|right|right|wrong|wrong|right|right|right

Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data



FNAME = 'temp.txt'
NR_FIELDS = 3

with open(FNAME) as f:
raw = f.readlines()[0].split('|')
res = []
for i in xrange(len(raw)):
if (i % NR_FIELDS == 0):
res.append('\n' + rawIdea)
else:
res.append(rawIdea)

print '|'.join(res)


-Alex

View solution in original post

9 Replies
Not applicable
Author

Hi,

Is the extra line feed always after the record separator ? If yes, look for command line tool sed (there is Windows version available) . Replace the regular expression at the end of the line "|CRLF$" with "|" -> that will merge the lines.

Something like

sed "/|\n$/|/" file.txt > fixed.txt

If the extra line feeds can be in random places, you need more sofisticated algorithm. Ex a Pyton / Perl script that reads one line, counts the separator characters. If those are not enough, concatenate with next line.

-Alex

Not applicable
Author

Hi Alex,

Thanks for the reply.

Actually I was trying to do this when I run the script on import, because it will happen to several files and I didn't want to create a new version of the file. And yes, the extra line feeds can be in random places...

I already tried to purge and replace the chars on the load script, but I haven't managed to solve the problem. I guess the only way is to read the file creating a "purged" resident file that will later be imported as a new file without the CR and LF, dropping the first.

If I can't find a similar solution I will have to choose a solution like the one you are proposing, which seems to solve my problem directly on the file.

Not applicable
Author

Hi,

You can use sed to drop all CR (or CRLF) . Than have a Perl / Python script that reads exactly as many delimiters as there should be fields, insert a CRLF, write to fixed.txt, and repeat until EOF.

I am not sure you can do this with QlikView script. It does not handle well output to text files, nor reading N fields instead of complete line.

-Alex

Miguel_Angel_Baeyens

Hello,

You may use as well something like the following in your load script

LOAD Replace(F1, chr(10), '|') AS F1FROM File;


Depending on the charmap you may need to add the chr(13) char too.

Hope this helps.

Not applicable
Author

Hi,

QlikView is very good at reading at line level, but will got confused by line breaks in this case.

----------------------------------------------------------

Command line bellow will remove all CR LF characters:

tr -s "\r\n" "|" < bad.txt > temp.txt

This results in

field1|field2|field3|right|right|right|right|wrong|wrong|right|right|right

Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data



FNAME = 'temp.txt'
NR_FIELDS = 3

with open(FNAME) as f:
raw = f.readlines()[0].split('|')
res = []
for i in xrange(len(raw)):
if (i % NR_FIELDS == 0):
res.append('\n' + rawIdea)
else:
res.append(rawIdea)

print '|'.join(res)


-Alex

Not applicable
Author

Replace the lightbulb with characters [ i ] . Too bad that text between [ c o d e] .. [ / c o d e ] still gets messed up

-Alex

Not applicable
Author

Hi Miguel,

I already tried several solutions with purgechar and replace. They don't work because I'm importing the file and when you do that, from what I understood, qv script just reads the separators for the fields, creating the structure of the table, and then it replaces the chars inside the fields created. I also thought on breaking the fields on two temp tables and then joining them, but that isn't either a great solution, because I don't have a 'clean' keyfield to do this (because of the separators and artificial lines).

Also removing all CR and LF, like Alex also tried, creates a single structure that is hard to break after, creating all the single lines with the correct structure.

I'll keep trying when I have some time available, for now I think I'll just have to change the original files.

Thanks!

Joaquim

Not applicable
Author

The Python script does just that: reading a file without linebreaks, and producing one with linebreaks

-Alex

Not applicable
Author

Alex,

Yeah, I guess I'll have to use something like this on the original file. Like I was telling Miguel I hope to find a solution later to do this directly on qv which will save me some time and make a clear structure to when we have to treat info this way. Counting the 'pipes' seems to be the right way (and then breaking the lines).

Thank you, you were really helpful!

Joaquim