Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
datanibbler
Champion
Champion

General How-to_question: Link table on speed...


Hi,

in one of my apps - the HR_app - there are quite some tables - data loaded from several tables in the database and for reasons of clarity - to keep the different charts in the app somehow separate - I have it organized in a nr. of tables

BUT:

=> All of those tables share 4 or 5 common fields:

- A date (usually) to link with the master_calendar

- a personell_ID (which is unique)

- A plant_ID (loaded from a mapping_table)

- A super_area (mapping)

- A sub_area (mapping).

=> Up to now, I have avoided synthetic keys by always renaming all of those fields. However, that does not seem to be ideal.

=> I would thus need a link_table. That would not be awfully big - well, about 300k records I guess - to link all of those tables, approx. 8 or 10.

I generally know how to create a link table, just haven't done it before - so I wanted to ask first, that often helps to avoid common pitfalls.

=> Is there anything different from creating a link_table between just two tables to keep in mind?

Thanks a lot!

Best regards,

DataNibbler

1 Solution

Accepted Solutions
Not applicable

To explain deeper the purpose:

It is better to link the different tables with a field that is an integer (like SQL)

It is complicated to determine by yourself a unique ID number.

If you use AutoNumberHash256 (or 128) that will do it for you. You give an input of several dimensions, it returns to you a unique integer. For the second table, whatever the order, the same value of the dimensions will return you the same integer => you can do the link. See the order in the XLS file that I joined.

 

Table1:
LOAD AutonumberHash256(Month, Country, Product) as ID,
Units
FROM [100161.xlsx](ooxml, embedded labels, table is Feuil1);

Table2:
LOAD AutonumberHash256(Month, Country, Product) as ID,

Amount
FROM

[100161.xlsx](ooxml, embedded labels, table is Feuil2);

 

For each Feuille in 'Feuil1', 'Feuil2'

TableDim:
LOAD DISTINCT

AutonumberHash256(Month, Country, Product) as ID,
Month,
Country,
Product
FROM [100161.xlsx](ooxml, embedded labels, table is $(Feuille));
Next Feuille

To get this model:

Answer_100161.JPG

And the ID are integer.

See the qvw and the XLSX files.

Fabrice

View solution in original post

6 Replies
Not applicable

I would use one of the Autonumber() functions with the different fields you have.

The data table would have only the id number and the link table would have this id and the different keys.

Fabrice

datanibbler
Champion
Champion
Author


Hi Fabrice,

that sounds very good.

I'll definitely put that on my TODO-list - as it will make my app more elegant and probably smaller, but not add any new functionality, however, it will have to wait.

I don't quite understand just now, but as I don't think I will find time to do this so soon, I'll just be back here at the time.

Thanks a lot!

Best regards,

DataNibbler

Not applicable

To explain deeper the purpose:

It is better to link the different tables with a field that is an integer (like SQL)

It is complicated to determine by yourself a unique ID number.

If you use AutoNumberHash256 (or 128) that will do it for you. You give an input of several dimensions, it returns to you a unique integer. For the second table, whatever the order, the same value of the dimensions will return you the same integer => you can do the link. See the order in the XLS file that I joined.

 

Table1:
LOAD AutonumberHash256(Month, Country, Product) as ID,
Units
FROM [100161.xlsx](ooxml, embedded labels, table is Feuil1);

Table2:
LOAD AutonumberHash256(Month, Country, Product) as ID,

Amount
FROM

[100161.xlsx](ooxml, embedded labels, table is Feuil2);

 

For each Feuille in 'Feuil1', 'Feuil2'

TableDim:
LOAD DISTINCT

AutonumberHash256(Month, Country, Product) as ID,
Month,
Country,
Product
FROM [100161.xlsx](ooxml, embedded labels, table is $(Feuille));
Next Feuille

To get this model:

Answer_100161.JPG

And the ID are integer.

See the qvw and the XLSX files.

Fabrice

Not applicable

First of all, remove unused fields from data model. Concatenate the fact tables and create one big fact tables and it helps a lot in the calculation times.

Please post your application with some data so our experts will help you a lot

datanibbler
Champion
Champion
Author

Hi Fabrice,

now I get the picture. So that ID generated by AutoNumber will simply take the place of the compound_key that I have to use to link the different data_tables to my linking_table.

I just thought the AutoNumber() functions were dependent on the load order - so I would need to use a RESIDENT LOAD of every table so I can sort them and make sure the emps come in the same order.

I will read up on that once more. Generally, that's just perfect.

Thanks a lot!

Best regards,

DataNibbler

datanibbler
Champion
Champion
Author

Hi dathu.qv,

you're right. Concatenating it all and having only one big table in the background makes the whole thing a lot faster, of course.

I will consider both methods - a linking_table and making it all into one - when I find the time.

For now, I have to do it like I have it. There are too many other things waiting.