Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello community,
I haven't done much with the Excel Transformation Wizard and now I got a file that needs a little clean up before loading.
In below screenshot on the left side is an example of my input file. The tricky part for me is how can I split column A into three different columns as shown in my desired output table on the right?
Description:
A GroupContainer can contain one or more Groups. This is done for each customer.
If it's possible to split the column into three different ones I could then delete the rows where column B is empty.
I hope you can help me out with this problem.
Attached I added the csv-version of the Excel file.
UPDATE: The words "Group" and "GroupContainer" are just placeholders for random text.
Thank you very much
You could do something like this:
t0:
load *, recno() as RecNo inline [
SplitTo3Columns Description Value1 Value2 Customer
GroupContainer1 16 17 A
Group1 16 17 A
X001 text1 16 17 A
X002 text2 0 0 A
X003 text3 0 0 A
X001 text1 0 0 A
GroupContainer2 1.565 344 A
Group2 1.565 344 A
X005 text 23 343 A
X006 text 1542 1 A
GroupContainer3 451 34.356 A
Group3 451 34.356 A
X005 text 232 123 A
X006 text 215 2 A
X007 text 4 34.231 A
Total 2.032 34.717 A
GroupContainer1 155 557 B
Group5 155 557 B
X001 text1 11 113 B
X002 text2 23 21 B
X003 text3 121 423 B
X001 text1 0 0 B
GroupContainer5 244 34 B
Group7 244 34 B
X005 text 243 11 B
X006 text 1 23 B
GroupContainer12 470 250 B
Group18 145 141 B
X005 text 128 32 B
X006 text 13 78 B
X007 text 4 31 B
Group19 325 109 B
X008 text 12 35 B
X010 text 313 74 B
Total 869 841 B
] (txt, delimiter is \t);
t1:
load
*,
if(len(Description) = 0 and wildmatch(SplitTo3Columns, '*Container*'), SplitTo3Columns, peek('GroupContainer')) as GroupContainer,
if(len(Description) = 0 and not wildmatch(SplitTo3Columns, '*Container*') and wildmatch(SplitTo3Columns, '*Group*'), SplitTo3Columns, peek('Group')) as Group,
if(len(Description) = 0, '', SplitTo3Columns) as Material
Resident t0 where SplitTo3Columns <> 'Total';
t2:
load *, rowno() as RowNo resident t1 where len(Material) > 0;
drop tables t0, t1;
- Marcus
I should have added that the names for GroupContainer and Group can be any name. So an if-stmt on the name won't work.
Are there any other characteristics which could be used to identify the data-type of this column - maybe Material is always numeric and the Container/Groups are strings or at least parts from them or something similar?
- Marcus
unfortunately the Material is also a String. It also does not always start with X0...
I only get as far as shown in below screenshot:
If I could grab the GroupContainer values and put them into a new column the empty lines based on column 2 could be deleted.
Then we need a load-step more by generating a datatype:
t0:
load *, recno() as RecNo inline [
SplitTo3Columns Description Value1 Value2 Customer
GroupContainer1 16 17 A
Group1 16 17 A
X001 text1 16 17 A
X002 text2 0 0 A
X003 text3 0 0 A
X001 text1 0 0 A
GroupContainer2 1.565 344 A
Group2 1.565 344 A
X005 text 23 343 A
X006 text 1542 1 A
GroupContainer3 451 34.356 A
Group3 451 34.356 A
X005 text 232 123 A
X006 text 215 2 A
X007 text 4 34.231 A
Total 2.032 34.717 A
GroupContainer1 155 557 B
Group5 155 557 B
X001 text1 11 113 B
X002 text2 23 21 B
X003 text3 121 423 B
X001 text1 0 0 B
GroupContainer5 244 34 B
Group7 244 34 B
X005 text 243 11 B
X006 text 1 23 B
GroupContainer12 470 250 B
Group18 145 141 B
X005 text 128 32 B
X006 text 13 78 B
X007 text 4 31 B
Group19 325 109 B
X008 text 12 35 B
X010 text 313 74 B
Total 869 841 B
] (txt, delimiter is \t);
t1:
load
*,
if(rowno() = 1, 1, if(len(Description) = 0 and len(peek('Description')) = 0, 2, if(len(Description) = 0, 1, 3))) as DataType,
if(len(Description) = 0, '', SplitTo3Columns) as Material
Resident t0 where SplitTo3Columns <> 'Total';
t2:
load
*,
if(DataType = 1, SplitTo3Columns, peek('GroupContainer')) as GroupContainer,
if(DataType = 2, SplitTo3Columns, peek('Group')) as Group
resident t1;
t3:
load *, rowno() as RowNo resident t2 where len(Material) > 0;
drop tables t0, t1, t2;
The red marked record is wrong in this logic because there is no Container above which caused wrong results - but I assume this way a copy & paste error for this example.
- Marcus
May be like this
Directory;
T1:
LOAD F6 as GroupContainer,
F7 as Group,
SplitTo3Columns as Material,
Description,
Value1,
Value2,
Customer
FROM
example.csv
(txt, codepage is 1252, embedded labels, delimiter is '\t', msq, filters(
ColXtr(1, RowCnd(CellValue, 1, StrCnd(contain, 'Container')), 0),
Replace(6, top, StrCnd(null)),
ColXtr(1, RowCnd(CellValue, 1, StrCnd(start, 'Group')), 0),
Replace(7, top, StrCnd(null)),
Remove(Row, RowCnd(CellValue, 1, StrCnd(contain, 'Group'))),
Remove(Row, RowCnd(CellValue, 1, StrCnd(contain, 'Total')))
));
Unfortunately "Group19" is no copy and paste error. See my initial post: "A GroupContainer can contain one or more Groups."
thanks for this but unfortunately the words "Group" and "GroupContianer" are just placeholders for any random text.
In this case there must be something which could identify this record. Above I mentioned it could be the datatype or some special char/word or maybe the length from container, group and material could be determined.
- Marcus