I'm trying to import an ASCII file from an old database, where I have data that should be on the same line separated in two or more lines with returns, somtehing like this:
So when I import the data using "|" as the separator I get some problems, in this case I would have 4 lines instead of the real 3, just because the second line got separet in two lines, with the fields divided.
Is the extra line feed always after the record separator ? If yes, look for command line tool sed (there is Windows version available) . Replace the regular expression at the end of the line "|CRLF$" with "|" -> that will merge the lines.
sed "/|\n$/|/" file.txt > fixed.txt
If the extra line feeds can be in random places, you need more sofisticated algorithm. Ex a Pyton / Perl script that reads one line, counts the separator characters. If those are not enough, concatenate with next line.
Actually I was trying to do this when I run the script on import, because it will happen to several files and I didn't want to create a new version of the file. And yes, the extra line feeds can be in random places...
I already tried to purge and replace the chars on the load script, but I haven't managed to solve the problem. I guess the only way is to read the file creating a "purged" resident file that will later be imported as a new file without the CR and LF, dropping the first.
If I can't find a similar solution I will have to choose a solution like the one you are proposing, which seems to solve my problem directly on the file.
I already tried several solutions with purgechar and replace. They don't work because I'm importing the file and when you do that, from what I understood, qv script just reads the separators for the fields, creating the structure of the table, and then it replaces the chars inside the fields created. I also thought on breaking the fields on two temp tables and then joining them, but that isn't either a great solution, because I don't have a 'clean' keyfield to do this (because of the separators and artificial lines).
Also removing all CR and LF, like Alex also tried, creates a single structure that is hard to break after, creating all the single lines with the correct structure.
I'll keep trying when I have some time available, for now I think I'll just have to change the original files.
Yeah, I guess I'll have to use something like this on the original file. Like I was telling Miguel I hope to find a solution later to do this directly on qv which will save me some time and make a clear structure to when we have to treat info this way. Counting the 'pipes' seems to be the right way (and then breaking the lines).