Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
daumkep
Contributor II
Contributor II

Join and calculate on Multiple Excel Sheets

Dear all,

I have an Excel File with two sheets.

Sheet1 contains person names and attributes (a person can have 0-n attributes)

NameAttribute
Marcmale
Marcblond
Jennyfemale
Johnblond
Alex 
Jeff 

 

Sheet2 contains all available attributes:

Attributes
male
female
blond
red

 

I now want to create a table that tells me the attribute distribution. If the attribute is not named for a person, it can be assumed that the person does not have it. The result should look like this:

AttributeDistribution
male0.2
female0.2
blond0.4
red0

 

I already tried a lot with joins and count(total Name) etc., but just can't get it right. Could somebody help me with it?

Thanks a lot

Philipp

3 Replies
Yuki_Suzuki
Employee
Employee

How about this?

 

Load script

LOAD
Namae,
"Attribute"
FROM [lib://????/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle1);

LOAD
Attributes as "Attribute"
FROM [lib://????/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle2);

 

Then make a table as below.

Dimension : Attribute

Measure: count(distinct Namae)/count(distinct total Namae)

 

daumkep
Contributor II
Contributor II
Author

Dear Yuz,

thanks a lot for this option. Is there also a chance to calculate the proposed table within the Load Script?

Thanks a lot

Philipp

Yuki_Suzuki
Employee
Employee

Philipp,

Yes you can but it will be a little complicated procedure. First, find the number of Unique Namae and the number of loads per attribute. Finally, create a table with calculating using them.

Tabelle1:
Load
Namae,
Attribute
FROM [lib://downloads/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle1);

//How many Unique Namae as max Rowno
Count1:
Load
RowNo() as No,
Namae,
count(Namae) as dummy
Resident Tabelle1 group by Namae;

Let vNamaeCount = Num(Peek('No', -1, 'Count1')); //Set the last No of Count1 to vNameCount

//How many times each attribute is attached
Count2:
Load
Attribute,
count(Attribute) as AttrCount
Resident Tabelle1 group by Attribute;

//Make a table with calculations
Tabelle2:
LOAD
Attributes as Attribute
FROM [lib://downloads/Demo.xlsx]
(ooxml, embedded labels, table is Tabelle2);

outer join (Tabelle2) LOAD
Attribute,
AttrCount,
AttrCount / $(vNamaeCount) as Distribution
Resident Count2;

//Delete unnecessary tables
drop tables Count1, Count2;

I hope this will help. 

Regards,