Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Import delimited text file that spills over to new line

Hi all!

I'm trying to import a text file that is semicolon delimited. The only problem is that some lines spill over to the next line which creates a problem that the end of the line gets registered on a new column. For example like this:

Id;Dateline;UserId;FullName;Email;

AAA;BBB;CCC;DDD;EEE;

VVV;WWW;XXX;YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

YYYYY;ZZZ;

In the "Id" column that gives the values:

AAA

VVV

YYYYY

But I only want it to get:

AAA

VVV

Is there any way of doing this or do we need to restructure our data?

4 Replies
rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I usually deal with this by making two passes over the file. First pass, combine the multi-lines into a single line. Than a second pass to read the individual fields.

-Rob

Anonymous
Not applicable
Author

Thanks for your answer Rob!

How do you make the first pass where you combine the multi lines into a single line?

/Hampus

Ralf-Narfeldt
Employee
Employee

I would probably do like this:

- Read the file with each record in a single field by specifying a delimiter not present in your data

- Pass through the records and count semicolons in each field. If it's not 4, concatenate to previous record

- Split up the fields with SubField()

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

If the file is not too big I might approach it by combining all the data into one big string and then dividing into new rows by every fifth ";". The individual fields could then be extracted as Ralf suggested with SubField() or LOAD FROM_FIELD. Here's a script coded as one preceding load. Final table is just one field, "OneRow". This would handle wrapping at any point on any field.

FixedData:

LOAD

  mid(BigField, start, (end-start)) as OneRow

;

LOAD

  BigField,

  Index(BigField, ';',(IterNo()-1)*5)+1 as start,

  Index(BigField, ';',IterNo()*5) as end

WHILE (SubStringCount(BigField, ';') / 5) >= IterNo()

;

LOAD

  concat(Raw,'',RecNo()) as BigField

;

LOAD * INLINE [

Raw

AAA;BBB;CCC;

DDD;EEE;

VVV;WWW;XXX;YYYYYYYYYYYYYY

YYYYY;ZZZ;

AAA;BBBBBBBB

BBB;CCC;DDD;EEE;

AAA;BBB;CCC;DDD;ZZZ;

] (delimiter is '|')

;

-Rob

http://masterssummit.com

http://robwunderlich.com