Skip to main content
Announcements
Happy New Year! Cheers to another year of collaboration, connections and success.
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Semicolon issue in source file

I try to laod a semicolon delimited file with several thousands rows and get in trouble with a filed containing the file path and file name. Unfortunately do some file have semicolons in their filenames what mixes up whole imported list.

Source File (Example)

c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014

c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014

I've  tryed to fix this issue in the load script but need some help.

The folloing "if statements" can handle one semicolon in the file name. But this is static. It fails with only one additional semicolon in the file name. The source file can contain file paths with up to ten semicolons and it cannot be requested with another delimiter.

LOAD

RowNo() as Row,

    IF (Right(@3,3) = ' MB', @1 & @2, 'xxx') as [Full Path],

    IF (Right(@3,3) = ' MB', @3, 'xxx') as [Size MB],

    @4 as [Folder],

    @5 as [Date 1],

    @6 as [Date 2]

Thank for any ideas.

6 Replies
maxgro
MVP
MVP

try this: load the row as one field and then split using ; as separator starting from the end of the row

RESULT

1.png

SCRIPT

source:

load * inline [

row

c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014

c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014

];

final:

load

*,

SubField(newrow, ';', 1) as field1,

SubField(newrow, ';', 2) as field2,

SubField(newrow, ';', 3) as field3,

SubField(newrow, ';', 4) as field4,

SubField(newrow, ';', 5) as field5

;

load

index(row, ';',-4),

replace(left(row, index(row, ';',-4)-1), ';', '')

& Right(row, len(row) - index(row, ';',-4) +1) as newrow,

*

resident source;

DROP table source;

Not applicable
Author

Hi Massimo

Thanks, It is a big step forward. The list contains some more fileds so I cannot count directly from the last semicolon backwards as their number is not always equal.

The best trigger is the filed with the MB. This devides the file path form the rest. The next 5 or 6 fields are well formated and do not contain any semicolons. After field 6 can be any number of addtional fileds which may be different from line to line. These fileds are only "trash" and do not to be loaded.

c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014; txt; 2013; Hello; no

c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014; csv; ;2013; Add; info; test, no; difficult

c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014; csv; 2013;

c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014; docx; 2013; this is; additional supprt which I apprecaite, Year;  Soluiton

I' ll use your tipp to go on and figure out the final solution.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

How about:

source:

LOAD

  FilePath

  ,SubField(Group2, ';', 1) as Filesize

  ,SubField(Group2, ';', 2) as WhatIsThis

  ,SubField(Group2, ';', 3) as StartDate

  ,SubField(Group2, ';', 4) as EndDate

;

LOAD

  *,

  mid(row,1,Field1Split-1) as FilePath

  ,mid(row,Field1Split+1) as Group2

  ;

load *

  ,index(TextBetween(row,'','MB'),';',-1) as Field1Split

inline [

row

c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014

c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014

c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014

];


-Rob

http://robwunderlich.com

Not applicable
Author

Hi Rob

Thnanks. One step closer. Unfortunately does the following  line also affect corret MB strings in folder or file names. 

  ,index(TextBetween(row,'',' MB;'),';',-1) as Field1Split


I added a "[Blank] " in front and a ";" at the end to the index statement.

All works well with 1.1Mio rows loaded.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

Do you mean if the path contains ' MB;' as part of it's name? Can you post a sample?

-Rob

Not applicable
Author

Some addtional rows:

Row 1 (Import incorrect)

c:\temp\Test\test\this information contains MB; a; regular; entry.txt 019; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;

Row 2

c:\temp\Test\test\this is; a; MB information regular; entry.txt 020; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;

Row 3

c:\temp\Test\test\this is; alls ifnroamtion in MB\File_010; entry.txt 021; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;

Row 1 Filter ' MB;' --> NOK

Row 2 Filter ' MB;' --> OK

Row 3 Filter ' MB;' --> OK