Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hi,
I'm trying to import an ASCII file from an old database, where I have data that should be on the same line separated in two or more lines with returns, somtehing like this:
field1|field2|field3
right|right|right
right|
wrong|wrong
right|right|right
So when I import the data using "|" as the separator I get some problems, in this case I would have 4 lines instead of the real 3, just because the second line got separet in two lines, with the fields divided.
Can you think of a way to solve the problem?
Hi,
QlikView is very good at reading at line level, but will got confused by line breaks in this case.
----------------------------------------------------------
Command line bellow will remove all CR LF characters:
tr -s "\r\n" "|" < bad.txt > temp.txt
This results in
field1|field2|field3|right|right|right|right|wrong|wrong|right|right|right
Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data
FNAME = 'temp.txt'
NR_FIELDS = 3
with open(FNAME) as f:
raw = f.readlines()[0].split('|')
res = []
for i in xrange(len(raw)):
if (i % NR_FIELDS == 0):
res.append('\n' + raw)
else:
res.append(raw)
print '|'.join(res)
-Alex
Hi,
Is the extra line feed always after the record separator ? If yes, look for command line tool sed (there is Windows version available) . Replace the regular expression at the end of the line "|CRLF$" with "|" -> that will merge the lines.
Something like
sed "/|\n$/|/" file.txt > fixed.txt
If the extra line feeds can be in random places, you need more sofisticated algorithm. Ex a Pyton / Perl script that reads one line, counts the separator characters. If those are not enough, concatenate with next line.
-Alex
Hi Alex,
Thanks for the reply.
Actually I was trying to do this when I run the script on import, because it will happen to several files and I didn't want to create a new version of the file. And yes, the extra line feeds can be in random places...
I already tried to purge and replace the chars on the load script, but I haven't managed to solve the problem. I guess the only way is to read the file creating a "purged" resident file that will later be imported as a new file without the CR and LF, dropping the first.
If I can't find a similar solution I will have to choose a solution like the one you are proposing, which seems to solve my problem directly on the file.
Hi,
You can use sed to drop all CR (or CRLF) . Than have a Perl / Python script that reads exactly as many delimiters as there should be fields, insert a CRLF, write to fixed.txt, and repeat until EOF.
I am not sure you can do this with QlikView script. It does not handle well output to text files, nor reading N fields instead of complete line.
-Alex
Hello,
You may use as well something like the following in your load script
LOAD Replace(F1, chr(10), '|') AS F1FROM File;
Depending on the charmap you may need to add the chr(13) char too.
Hope this helps.
Hi,
QlikView is very good at reading at line level, but will got confused by line breaks in this case.
----------------------------------------------------------
Command line bellow will remove all CR LF characters:
tr -s "\r\n" "|" < bad.txt > temp.txt
This results in
field1|field2|field3|right|right|right|right|wrong|wrong|right|right|right
Couldn't find one-liner to replace every 3rd separator with separator and CRLF . So try a Python script that produces 4 column of data
FNAME = 'temp.txt'
NR_FIELDS = 3
with open(FNAME) as f:
raw = f.readlines()[0].split('|')
res = []
for i in xrange(len(raw)):
if (i % NR_FIELDS == 0):
res.append('\n' + raw)
else:
res.append(raw)
print '|'.join(res)
-Alex
Replace the lightbulb with characters [ i ] . Too bad that text between [ c o d e] .. [ / c o d e ] still gets messed up
-Alex
Hi Miguel,
I already tried several solutions with purgechar and replace. They don't work because I'm importing the file and when you do that, from what I understood, qv script just reads the separators for the fields, creating the structure of the table, and then it replaces the chars inside the fields created. I also thought on breaking the fields on two temp tables and then joining them, but that isn't either a great solution, because I don't have a 'clean' keyfield to do this (because of the separators and artificial lines).
Also removing all CR and LF, like Alex also tried, creates a single structure that is hard to break after, creating all the single lines with the correct structure.
I'll keep trying when I have some time available, for now I think I'll just have to change the original files.
Thanks!
Joaquim
The Python script does just that: reading a file without linebreaks, and producing one with linebreaks
-Alex
Alex,
Yeah, I guess I'll have to use something like this on the original file. Like I was telling Miguel I hope to find a solution later to do this directly on qv which will save me some time and make a clear structure to when we have to treat info this way. Counting the 'pipes' seems to be the right way (and then breaking the lines).
Thank you, you were really helpful!
Joaquim