Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I am doing a inner join for multiple tables.
One table is having 21000 records and second table is having 22700 records
Both tables have one common column name. While doing inner join the resulting table is having 599k records. Ideally it should have <=22700 records but in reality it's cross multiplying. don't know where is the issue.
inner join([<table>])
Load
fields..
from <table1>
inner join([<table>])
Load
fields..
from <table2>
inner join([<table>])
Load
fields..
from <table3>
Since you didn't include the actual code, it's hard to guess where the issue is. I do note that you have three tables, not two, but your original scenario only describes two tables. Perhaps the third table is the issue? Alternatively, perhaps the tables have a relationship that results in multiple instances of the key field value, e.g.
Table1:
Load * INline [
Field1, Field2
A, 1
A, 2
B, 3
C, 4];
INNER JOIN
Load * INLINE [
Field1, Field3
A, 10
A, 11
A, 12
B, 20
C, 30];
This will result in 6 lines for Field1 = A, one for each combination of Field2 and Field3.
Hello,
As a general rule, when a join does not work correctly, this is due to the name of the field which is not strictly the same or when the join is done on several fields without that being wanted. To answer more precisely, it will take the contents of the script.
As Or indicates, you also have to take into account the cardinality of your tables and the "duplicates" row that there may be.
My code. Please advise
[REPORT]:
LOAD
plan,
plan_name,
plan_desc,
plan_location
FROM [plan.xlsx]
(ooxml, embedded labels, table is [plan_detail])
where Match(plan,204243, 15138, 204850, 15572, 15589, 201403, 204829, 210395);
inner Join ([REPORT])
LOAD
plan,
plan_title,
plan_staff,
plann_detail
FROM [plan.xlsx]
(ooxml, embedded labels, table is [plan_detail])
inner Join ([REPORT])
LOAD
plan,
plan_emp,
plan_emp_loc,
plan_emp_code,
plan_emp_age
FROM [plan_worker.xlsx]
(ooxml, embedded labels, table is Workers);
Am I doing right way or missing something here. If I read all tables separately, getting right number of records but when joining getting a very high number of records
If I had to guess, you have one line per plan in the first table, multiple lines per plan in the second table, and multiple lines per plan in the third table. Since you are joining both the second and third tables together with the first, this results in a many to many relationship. Of course, that's just a guess based on the field names and without access to the underlying data.
To be sure of the cardinality of your tables, you should load them without a join (thanks to a qualify or by loading only one) and look in the Data model viewer if your tables have duplicate key values.
Yes. You are right
Table has duplicate key value but the following values are different and making it unique record. Now it's becoming many-to-many relationship and so leading to a larger set of records. I can see them when I do a concatenate rather than a join. So to avoid many-to-many relationship, what is the best solution here?
In this scenario, typically what you want to do is simply Keep instead of Join. Unlike SQL, you don't have to merge everything into one "query" here.