Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Transform Excel (split columns)

Hello community,

I haven't done much with the Excel Transformation Wizard and now I got a file that needs a little clean up before loading.

In below screenshot on the left side is an example of my input file. The tricky part for me is how can I split column A into three different columns as shown in my desired output table on the right?

Description:

A GroupContainer can contain one or more Groups. This is done for each customer.

If it's possible to split the column into three different ones I could then delete the rows where column B is empty.

example.PNG

I hope you can help me out with this problem.

Attached I added the csv-version of the Excel file.

UPDATE: The words "Group" and "GroupContainer" are just placeholders for random text.

Thank you very much

9 Replies
marcus_sommer

You could do something like this:

t0:

load *, recno() as RecNo inline [

SplitTo3Columns    Description    Value1    Value2    Customer

GroupContainer1        16    17    A

Group1        16    17    A

X001    text1    16    17    A

X002    text2    0    0    A

X003    text3    0    0    A

X001    text1    0    0    A

GroupContainer2        1.565    344    A

Group2        1.565    344    A

X005    text    23    343    A

X006    text    1542    1    A

GroupContainer3        451    34.356    A

Group3        451    34.356    A

X005    text    232    123    A

X006    text    215    2    A

X007    text    4    34.231    A

Total        2.032    34.717    A

GroupContainer1        155    557    B

Group5        155    557    B

X001    text1    11    113    B

X002    text2    23    21    B

X003    text3    121    423    B

X001    text1    0    0    B

GroupContainer5        244    34    B

Group7        244    34    B

X005    text    243    11    B

X006    text    1    23    B

GroupContainer12        470    250    B

Group18        145    141    B

X005    text    128    32    B

X006    text    13    78    B

X007    text    4    31    B

Group19        325    109    B

X008    text    12    35    B

X010    text    313    74    B

Total        869    841    B

] (txt, delimiter is \t);

t1:

load

    *,

    if(len(Description) = 0 and wildmatch(SplitTo3Columns, '*Container*'), SplitTo3Columns, peek('GroupContainer')) as GroupContainer,

    if(len(Description) = 0 and not wildmatch(SplitTo3Columns, '*Container*') and wildmatch(SplitTo3Columns, '*Group*'), SplitTo3Columns, peek('Group')) as Group,

    if(len(Description) = 0, '', SplitTo3Columns) as Material

Resident t0 where SplitTo3Columns <> 'Total';

t2:

load *, rowno() as RowNo resident t1 where len(Material) > 0;

drop tables t0, t1;

- Marcus

Not applicable
Author

I should have added that the names for GroupContainer and Group can be any name. So an if-stmt on the name won't work.

marcus_sommer

Are there any other characteristics which could be used to identify the data-type of this column - maybe Material is always numeric and the Container/Groups are strings or at least parts from them or something similar?

- Marcus

Not applicable
Author

unfortunately the Material is also a String. It also does not always start with X0...

I only get as far as shown in below screenshot:

x.PNG

If I could grab the GroupContainer values and put them into a new column the empty lines based on column 2 could be deleted.

marcus_sommer

Then we need a load-step more by generating a datatype:

t0:

load *, recno() as RecNo inline [

SplitTo3Columns    Description    Value1    Value2    Customer

GroupContainer1        16    17    A

Group1        16    17    A

X001    text1    16    17    A

X002    text2    0    0    A

X003    text3    0    0    A

X001    text1    0    0    A

GroupContainer2        1.565    344    A

Group2        1.565    344    A

X005    text    23    343    A

X006    text    1542    1    A

GroupContainer3        451    34.356    A

Group3        451    34.356    A

X005    text    232    123    A

X006    text    215    2    A

X007    text    4    34.231    A

Total        2.032    34.717    A

GroupContainer1        155    557    B

Group5        155    557    B

X001    text1    11    113    B

X002    text2    23    21    B

X003    text3    121    423    B

X001    text1    0    0    B

GroupContainer5        244    34    B

Group7        244    34    B

X005    text    243    11    B

X006    text    1    23    B

GroupContainer12        470    250    B

Group18        145    141    B

X005    text    128    32    B

X006    text    13    78    B

X007    text    4    31    B

Group19        325    109    B

X008    text    12    35    B

X010    text    313    74    B

Total        869    841    B

] (txt, delimiter is \t);

t1:

load

    *,

    if(rowno() = 1, 1, if(len(Description) = 0 and len(peek('Description')) = 0, 2, if(len(Description) = 0, 1, 3))) as DataType,

    if(len(Description) = 0, '', SplitTo3Columns) as Material

Resident t0 where SplitTo3Columns <> 'Total';

t2:

load

    *,

    if(DataType = 1, SplitTo3Columns,  peek('GroupContainer')) as GroupContainer,

    if(DataType = 2, SplitTo3Columns,  peek('Group')) as Group

resident t1;

t3:

load *, rowno() as RowNo resident t2 where len(Material) > 0;

drop tables t0, t1, t2;

The red marked record is wrong in this logic because there is no Container above which caused wrong results - but I assume this way a copy & paste error for this example.

- Marcus

settu_periasamy
Master III
Master III

May be like this

Directory;
T1:
LOAD F6 as GroupContainer,
F7 as Group,
SplitTo3Columns as Material,
Description,
Value1,
Value2,
Customer    
FROM
example.csv
(
txt, codepage is 1252, embedded labels, delimiter is '\t', msq, filters(
ColXtr(1, RowCnd(CellValue, 1, StrCnd(contain, 'Container')), 0),
Replace(6, top, StrCnd(null)),
ColXtr(1, RowCnd(CellValue, 1, StrCnd(start, 'Group')), 0),
Replace(7, top, StrCnd(null)),
Remove(Row, RowCnd(CellValue, 1, StrCnd(contain, 'Group'))),
Remove(Row, RowCnd(CellValue, 1, StrCnd(contain, 'Total')))
));

Capture.JPG

Not applicable
Author

Unfortunately "Group19" is no copy and paste error. See my initial post: "A GroupContainer can contain one or more Groups."

Not applicable
Author

thanks for this but unfortunately the words "Group" and "GroupContianer" are just placeholders for any random text.

marcus_sommer

In this case there must be something which could identify this record. Above I mentioned it could be the datatype or some special char/word or maybe the length from container, group and material could be determined.

- Marcus