Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I try to laod a semicolon delimited file with several thousands rows and get in trouble with a filed containing the file path and file name. Unfortunately do some file have semicolons in their filenames what mixes up whole imported list.
Source File (Example)
c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014
c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014
I've tryed to fix this issue in the load script but need some help.
The folloing "if statements" can handle one semicolon in the file name. But this is static. It fails with only one additional semicolon in the file name. The source file can contain file paths with up to ten semicolons and it cannot be requested with another delimiter.
LOAD
RowNo() as Row, |
IF (Right(@3,3) = ' MB', @1 & @2, 'xxx') as [Full Path],
IF (Right(@3,3) = ' MB', @3, 'xxx') as [Size MB],
@4 as [Folder],
@5 as [Date 1],
@6 as [Date 2]
Thank for any ideas.
try this: load the row as one field and then split using ; as separator starting from the end of the row
RESULT
SCRIPT
source:
load * inline [
row
c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014
c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014
];
final:
load
*,
SubField(newrow, ';', 1) as field1,
SubField(newrow, ';', 2) as field2,
SubField(newrow, ';', 3) as field3,
SubField(newrow, ';', 4) as field4,
SubField(newrow, ';', 5) as field5
;
load
index(row, ';',-4),
replace(left(row, index(row, ';',-4)-1), ';', '')
& Right(row, len(row) - index(row, ';',-4) +1) as newrow,
*
resident source;
DROP table source;
Hi Massimo
Thanks, It is a big step forward. The list contains some more fileds so I cannot count directly from the last semicolon backwards as their number is not always equal.
The best trigger is the filed with the MB. This devides the file path form the rest. The next 5 or 6 fields are well formated and do not contain any semicolons. After field 6 can be any number of addtional fileds which may be different from line to line. These fileds are only "trash" and do not to be loaded.
c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014; txt; 2013; Hello; no
c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014; csv; ;2013; Add; info; test, no; difficult
c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014; csv; 2013;
c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014; docx; 2013; this is; additional supprt which I apprecaite, Year; Soluiton
I' ll use your tipp to go on and figure out the final solution.
How about:
source:
LOAD
FilePath
,SubField(Group2, ';', 1) as Filesize
,SubField(Group2, ';', 2) as WhatIsThis
,SubField(Group2, ';', 3) as StartDate
,SubField(Group2, ';', 4) as EndDate
;
LOAD
*,
mid(row,1,Field1Split-1) as FilePath
,mid(row,Field1Split+1) as Group2
;
load *
,index(TextBetween(row,'','MB'),';',-1) as Field1Split
inline [
row
c:\temp\Test\test\this is a regular entry.txt 015 ; 25 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\016; 26 MB; 1; 01.01.2014; 04.01.2014
c:\temp\Test\test\this; is a ;regular entry.txt 017; 27 MB; 0; 01.01.2014; 04.01.2014
c:\temp\Test\test\this is; a; regular; entry.txt 018; 28 MB; 0; 01.01.2014; 04.01.2014
];
-Rob
Hi Rob
Thnanks. One step closer. Unfortunately does the following line also affect corret MB strings in folder or file names.
,index(TextBetween(row,'',' MB;'),';',-1) as Field1Split
I added a "[Blank] " in front and a ";" at the end to the index statement.
All works well with 1.1Mio rows loaded.
Do you mean if the path contains ' MB;' as part of it's name? Can you post a sample?
-Rob
Some addtional rows:
Row 1 (Import incorrect)
c:\temp\Test\test\this information contains MB; a; regular; entry.txt 019; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;
Row 2
c:\temp\Test\test\this is; a; MB information regular; entry.txt 020; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;
Row 3
c:\temp\Test\test\this is; alls ifnroamtion in MB\File_010; entry.txt 021; 28 MB; 0; 01.01.2014; 04.01.2014;;Test;
Row 1 Filter ' MB;' --> NOK
Row 2 Filter ' MB;' --> OK
Row 3 Filter ' MB;' --> OK