Skip to main content
Announcements
Qlik Connect 2024! Seize endless possibilities! LEARN MORE
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Duplication Issue when joining tables

Hi,

I am relatively new to QlikView and am having an issue when joining two tables.   In my load script, I have an orders table and a shipment table.  From orders, I am creating a sum of an order quantity field and receipt quantity field.  In my shipment table, I am creating a sum of a shipped quantity field.  Seperate, both work perfect.  However, when I join the tables, I am getting a duplication.  I have been able to identify why this is happening, from a data perspective, but have not been able to resolove it in QV. 

The reason behind the duplication is that on the shipment table, I have the following data:

ORDER NO

LINE NO

INVOICE NO

ENTRY NO

I am joining the two tables on ORDER NO and LINE NO, which in 99% of the orders, it works fine.  However, there are times when an order line # can have multiple shipment records in the shipment table, with distinct entry/invoice #'s.  For example:

SHIPMENT:

ORDER_NO     LINE_NO     ENTRY_NO     INVOICE_NO     SHIP QTY

69589              2                 1719820-4        SSL228679        150

69589              2                 1719821-2        SSL228678          50

69589              1                 1720225-3        SSL228669        150

69589              1                  1720224-6       SSL228677         250

For that ORDER #, the orders table looks like this:

ORDER:

ORDER_NO     LINE_NO     ORDER QTY    RECEIPT QTY

69589              2                  200                 100

69589              1                  500                 400

When I left join orders to shipment, the results I am getting are duplicating on the order side.  This is happening because the LINE NO values in shipment do not match exactly with order, as there can be multiple shipments from an order line.  Post join, there is a duplicate row of order.line_no created for line 1 and 2 and it duplicates the order qty and receipt qty.

ORDER_NO     LINE_NO     ORDER QTY    RECEIPT QTY

69589              2                  200                 100

69589              2                  200                 100

69589              1                  500                 400

69589              1                  500                 400

Essentially, on the order side I am getting a doubled sum total for order and receipt qty because of the multiple shiment lines with distict entry and invoice values. 

Can anyone offer any insight into a fix in my scripting?

1 Solution

Accepted Solutions
Not applicable
Author

brownoae wrote:

Entry no and invoice no need added for an additional join to an invoice table, which I have built seperately.  The dashboard contains a pivot that shows summed order qty, receipt qty, shipped qty, and entry amount for a fiscal month year.  In order to get this all included in the table there are a number of tables that need to be included, invioce being one which is the reason for need of entry and invoice no's to be added.

Biggest mistake in your script is a concatination of Orders and Shipments. You are loosing relationship between orders and shipments. Download the fixed script and reload it.

To avoid synthetic key you can concatinate columns

PO & '-' & [PO Row] as OrderLineId

in OrderPrep and ShipPrep tables. But this is optional in your situation.

View solution in original post

10 Replies
Not applicable
Author

Hi there, joining is not the path you should take for this particular problem. I always use the concatenation approach when I found my data model having multiple fact tables, for instance:

Transactions:

//Load of Shipment table

Load 'Shipments' as TransactionsFlagDescription, 1 as TransactionsFlagID, ORDER_NO ,    LINE_NO ,    ENTRY_NO ,    INVOICE_NO ,    SHIP QTY

from Shipment;

Orders:

concatenate(Transactions)

Load 'Orders' as TransactionsFlagDescription, 2 as TransactionsFlagID, ORDER_NO,     LINE_NO,     ORDER QTY,    RECEIPT QTY

from Orders;

Notice you are creating a single fact table, containing in the first portion the shipments and in the second portion the orders. Also, I added a TransactionFlag to each table, so you can easily filter through set analysis in your expressions, like this:

sum({$<TransactionFlagID = {1}>} [SHIP QTY])

or

sum({$<TransactionFlagID = {2}>} [ORDER QTY])

In this case, filtering with set analysis is not necessary, however I always use it for sanity

Regards

Not applicable
Author

There is no any duplications here. There is a granularity level to separate shipment entries. If you don't need this granularity you don't need to load shipment at all or load it as aggregated data for combination of ORDER_NO  and   LINE_NO

Otherwise you need to use data in your charts in proper way like:

Orders:

ORDER_NO     LINE_NO     ORDER QTY    RECEIPT QTY

69589              2                  200                 100

69589              1                  500                 400

Shipments:

ORDER_NO    ENTRY_NO     INVOICE_NO     SHIP QTY

69589              1719820-4        SSL228679        150

69589              1719821-2        SSL228678          50

69589              1720225-3        SSL228669        150

69589              1720224-6       SSL228677         250

Orders vs. Shipments:

ORDER_NO     LINE_NO     ORDER QTY    RECEIPT QTY SHIP QTY

69589              2                  200                 100                         as sum(SHIP QTY)

69589              1                  500                 400

See attachment

Not applicable
Author

Nick,

I can see your point, however when I add entry no and invoice no to the "orders vs shipments" table, I am getting duplicated order qty and receipt qty values because of the additional line no's in shipment.

Not applicable
Author

Right. This is drill down level for shipment detalisation. There is nothing wrong with it. It tells you that the invoice-no/entry-no belongs to the order/line combination where you can see a quantity for whole line in the order. It's not a duplication.

Not applicable
Author

Will I then be ok to add in the distinct invoice and entry no's by creating a join them to the new fact table?

Not applicable
Author

What for? First of all you need to understand what is a goal of this dashboard. What do you need to do? Can you tell us what kind of analysis you are going to do?

Not applicable
Author

Ivan,

The fact table looks good, however when I try to join the remainder of the shipments table (entry no, invoice no), which I will need later, I still get the additional order rows.  Essentially, there are two order detail rows for order 69589 with two order qty value and two receipt qty values.  Those need to be summed as "order qty" and "receipt qty".  Then on the shipment record, there are two shipment records for order row 1 and two for order row 2.  Those need to be summed as "shipped qty".  Due to the mismatch in rows between the two tables, it is adding an additional row for each line no on order detail and copying the qty values from the original row no, which is doubling my sums on order detail.  the shipment sum is correct however.

Not applicable
Author

Entry no and invoice no need added for an additional join to an invoice table, which I have built seperately.  The dashboard contains a pivot that shows summed order qty, receipt qty, shipped qty, and entry amount for a fiscal month year.  In order to get this all included in the table there are a number of tables that need to be included, invioce being one which is the reason for need of entry and invoice no's to be added.

Not applicable
Author

brownoae wrote:

Ivan,

The fact table looks good, however when I try to join the remainder of the shipments table (entry no, invoice no), which I will need later, I still get the additional order rows.  Essentially, there are two order detail rows for order 69589 with two order qty value and two receipt qty values.  Those need to be summed as "order qty" and "receipt qty".  Then on the shipment record, there are two shipment records for order row 1 and two for order row 2.  Those need to be summed as "shipped qty".  Due to the mismatch in rows between the two tables, it is adding an additional row for each line no on order detail and copying the qty values from the original row no, which is doubling my sums on order detail.  the shipment sum is correct however.

You need to understand difference between data duplication in a data structure and duplication on presentation level.

orders.png

On "Order vs. Shipment" chart you can see some duplications for "Order Qty" and "Receipt Qty". But total numbers are correct. It happens because QV just shows aggregated numbers for each combination of dimension values. All what you need is properly interprete what you want to see or understand what you see. For example, here, it will be more correct to rename columns "Order Qty" and "Receipt Qty" as "Total Qty for Order Line" and "Total Qty for Receipt Line"

See the example in attachment