Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

LEFT JOIN and exponential growing of records

Hello there

I am trying to achieve the following: link to a weekly-updated selection of items some static datamaster information (think about price, weight etcetc), fetched from TWO different tables.

The script goes as following

BO:

LOAD

*,

today() as DayStamp

RESIDENT Lines_temp WHERE QtyUnAll>0 and DateInterpret<today();

DROP TABLE Lines_temp;

LEFT JOIN(BO)

LOAD

*

RESIDENT PartMaster;

LEFT JOIN(BO)

LOAD

*

RESIDENT PDMCTA;

The script goes well, but the outcomes leaves me baffled a bit. When I reaload the QVW, you can see what happens

BOreport.JPG

the weekly selection is 3k lines, however by LEFT JOINING the two datamaster tables, lines increase exponentially.

From what I have been reading and re-reading, with LEFT JOIN the outcome of lines should be equal to the amount of lines in the first table.

Can you help me out here?

thank you very much!

1 Solution

Accepted Solutions
Not applicable
Author

Ok, simply enough the "solution" to this problem is as follows:

BOreport.JPG

QV is fetching all the lines from the joinee tables, regardless of how many of those match the keys in the joiner table.

On the base of Jonathan Dienst example, I created the following

LOAD * Inline

[

          Key, Value1

          A, 1

          B, 3

          C, 2

];

Left Join

LOAD * Inline

[

          Key, Value2

          A, 5

          A, 7

          A, 9

          B, 3

          D, 5

];

Left Join

LOAD * Inline

[

          Key, Value3

          A, a

          A, a1

          A, a2

          A, a3

          B, b

          C, c

          D, d

          E, e

          F, f

          G, g

          H, h

          I, i

          J, l

          K, l1

          L, l2

          M, l4

          N, l5

          O, l6

          P, p1

          P, p2

          P, p3

];

as you see, the inline tables are made originally of 3, 5 and 21 lines

the script execution progress shows the following:

left join QC fetch.JPG

so the script is "fetching" at first everything regardless of duplicates, keys and so forth.

Then only the lines matching the Keys are kept; and duplicated in case the joinee tables have duplicates!

BOreportt4.JPG

My thought was that QV was fetching and keeping those 600k and more lines.

Meanwhile with "fetched" it is meant barely "read".

thank you all for your support

View solution in original post

6 Replies
jonathandienst
Partner - Champion III
Partner - Champion III

Hi

If you have duplicates in the key fields used for the right side of join, then there will be one copy for each of those duplicate values in the result.Here you have duplicates in both Partmaster and PDMCTA (of the fields used as the key for the join.

A SQL left join has the same behaviour.

Regards

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Not applicable
Author

Please check, it seems that your left table's key field is not unique.

Regards,

Amit

jonathandienst
Partner - Champion III
Partner - Champion III

By way of example - the following script:

LOAD * Inline

[

          Key, Value1

          A, 1

          B, 3

          C, 2

];

Left Join

LOAD * Inline

[

          Key, Value2

          A, 5

          A, 7

          A, 9

          B, 3

          D, 5

];

Will have the following result table:

          Key             Value1          Value2

          A                    1                    5

          A                    1                    7

          A                    1                    9

          B                    3                    3

          C                    2                  null

Regards

Jonathan

Logic will get you from a to b. Imagination will take you everywhere. - A Einstein
Not applicable
Author

is there an easy way to check for duplicates?

I had the certainty that both PartMaster and PDMCTA had a huge amoun of UNIQUE keys in the key_field....but apprarently is not..!

==

never mind, I managed to find a way.

This a test of PartMaster, apparently there are just TWO duplicated keys, out of +500k keys available.

I still don't understand then why so many records are generated with the LEFT JOIN...

duplicates.JPG

Not applicable
Author

I checked also PDMCTA and there is a small extent of duplications, but we are talking about 0.5% or so of the total.

Still the jump from 3k lines to  680k lines I think is not explained

Not applicable
Author

Ok, simply enough the "solution" to this problem is as follows:

BOreport.JPG

QV is fetching all the lines from the joinee tables, regardless of how many of those match the keys in the joiner table.

On the base of Jonathan Dienst example, I created the following

LOAD * Inline

[

          Key, Value1

          A, 1

          B, 3

          C, 2

];

Left Join

LOAD * Inline

[

          Key, Value2

          A, 5

          A, 7

          A, 9

          B, 3

          D, 5

];

Left Join

LOAD * Inline

[

          Key, Value3

          A, a

          A, a1

          A, a2

          A, a3

          B, b

          C, c

          D, d

          E, e

          F, f

          G, g

          H, h

          I, i

          J, l

          K, l1

          L, l2

          M, l4

          N, l5

          O, l6

          P, p1

          P, p2

          P, p3

];

as you see, the inline tables are made originally of 3, 5 and 21 lines

the script execution progress shows the following:

left join QC fetch.JPG

so the script is "fetching" at first everything regardless of duplicates, keys and so forth.

Then only the lines matching the Keys are kept; and duplicated in case the joinee tables have duplicates!

BOreportt4.JPG

My thought was that QV was fetching and keeping those 600k and more lines.

Meanwhile with "fetched" it is meant barely "read".

thank you all for your support