Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Dear all,
I have an Excel File with two sheets.
Sheet1 contains person names and attributes (a person can have 0-n attributes)
Name | Attribute |
Marc | male |
Marc | blond |
Jenny | female |
John | blond |
Alex | |
Jeff |
Sheet2 contains all available attributes:
Attributes |
male |
female |
blond |
red |
I now want to create a table that tells me the attribute distribution. If the attribute is not named for a person, it can be assumed that the person does not have it. The result should look like this:
Attribute | Distribution |
male | 0.2 |
female | 0.2 |
blond | 0.4 |
red | 0 |
I already tried a lot with joins and count(total Name) etc., but just can't get it right. Could somebody help me with it?
Thanks a lot
Philipp
How about this?
Load script
LOAD
Namae,
"Attribute"
FROM [lib://????/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle1);
LOAD
Attributes as "Attribute"
FROM [lib://????/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle2);
Then make a table as below.
Dimension : Attribute
Measure: count(distinct Namae)/count(distinct total Namae)
Dear Yuz,
thanks a lot for this option. Is there also a chance to calculate the proposed table within the Load Script?
Thanks a lot
Philipp
Philipp,
Yes you can but it will be a little complicated procedure. First, find the number of Unique Namae and the number of loads per attribute. Finally, create a table with calculating using them.
Tabelle1:
Load
Namae,
Attribute
FROM [lib://downloads/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle1);
//How many Unique Namae as max Rowno
Count1:
Load
RowNo() as No,
Namae,
count(Namae) as dummy
Resident Tabelle1 group by Namae;
Let vNamaeCount = Num(Peek('No', -1, 'Count1')); //Set the last No of Count1 to vNameCount
//How many times each attribute is attached
Count2:
Load
Attribute,
count(Attribute) as AttrCount
Resident Tabelle1 group by Attribute;
//Make a table with calculations
Tabelle2:
LOAD
Attributes as Attribute
FROM [lib://downloads/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle2);
outer join (Tabelle2) LOAD
Attribute,
AttrCount,
AttrCount / $(vNamaeCount) as Distribution
Resident Count2;
//Delete unnecessary tables
drop tables Count1, Count2;
I hope this will help.
Regards,