Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
t-gilchrist
Contributor III
Contributor III

Removing duplicate rows from an internal table based on one column

Hello all,

I have two separate tables I'm loading from an SQL database in which my only option is to associate the data by using IntervalMatch with the start and end order times from one table with the downtime start in the other.

My issues is there are a few instances where two orders can be running at the same time. So IntervalMatch correctly associates a single downtime start to two Orders.

I want to remove every second instance of a downtime start leaving only the association with the first order.

Below is an example of my script so far...

[Orders]:

Select

    Order,

    SpecName,

    ActualOrderStartTime,

    ActualOrderEndTime,

    Equipment,

    EquipmentNo

FROM MYPlant.dbo."Orders";

[Downtime]:

Select

    Equipment,

    EquipmentNo,

    CategoryName,

    ReasonName,

    DowntimeStart,

    DowntimeEnd

FROM MYPlant.dbo."Downtime";

X1Temp:

Left Join (Downtime)

IntervalMatch(DowntimeStart, EquipmentNo)

Load ActualOrderStartTime, ActualOrderEndTime, EquipmentNo

Resident [Orders];

Thanks for any help in advance

1 Solution

Accepted Solutions
jensmunnichs
Creator III
Creator III

Alright this is breaking my brain a little bit, because IntervalMatch always does and the fact that it's a friday probably isn't helping. So forgive me if I make a mistake in this.

Correct me if I'm wrong, but the table you create with this script is something like this (I left some fields out):

OrderActualOrderStartTimeActualOrderEndTimeDowntimeStartEquipment
A01:00:0001:30:0001:15:00a
B01:14:0001:31:0001:15:00a
C02:30:0002:32:0002:31:00b
D03:00:0004:00:0003:34:00c
D03:00:0004:00:0003:57:00d
E03:50:0004:10:0003:57:00d

And from this you would like:

OrderActualOrderStartTimeActualOrderEndTimeDowntimeStartEquipment
A01:00:0001:30:0001:15:00a
C02:30:0002:32:0002:31:00b
D03:00:0004:00:0003:34:00c
D03:00:0004:00:0003:57:00d

Assuming that the order that started first has to stay, and one order can be associated with multiple DowntimeStarts but not vice versa.

If you haven't joined the 2 tables resulting from the intervalmatch together yet, just do so by adding this to your script from the OP:

Inner join (Downtime) //'Inner' assumes you want to drop all orders/downtimes that have no match

LOAD *

Resident Orders;

Drop table Orders;

After that, add:

Right join (Downtime)

LOAD

     Equipment,

     Min(ActualOrderStartTime) as ActualOrderStartTime

Resident Downtime

Group by Equipment;

Note: If one Equipment can have multiple DowntimeStarts, and you want 1 order per DowntimeStart/Equipment combination, add DowntimeStart to the Right Join Load, and to the group by statement as well.

I hope this does what you want it to, sorry for the late reply and I apologise if I misinterpreted something or made a mistake somewhere. If you need more help let me know

View solution in original post

4 Replies
jensmunnichs
Creator III
Creator III

This might help:

Load data without duplicate records - multiple columns

Keep in mind, if you want to keep the one that started first, you'll have to use Min() instead of Max().

t-gilchrist
Contributor III
Contributor III
Author

Thanks for the reply

Through my googling I've seen similar posts solving issues with "group by" but I've not been able to apply it correctly for my situation.

How and where do you believe it should be used?

jensmunnichs
Creator III
Creator III

Alright this is breaking my brain a little bit, because IntervalMatch always does and the fact that it's a friday probably isn't helping. So forgive me if I make a mistake in this.

Correct me if I'm wrong, but the table you create with this script is something like this (I left some fields out):

OrderActualOrderStartTimeActualOrderEndTimeDowntimeStartEquipment
A01:00:0001:30:0001:15:00a
B01:14:0001:31:0001:15:00a
C02:30:0002:32:0002:31:00b
D03:00:0004:00:0003:34:00c
D03:00:0004:00:0003:57:00d
E03:50:0004:10:0003:57:00d

And from this you would like:

OrderActualOrderStartTimeActualOrderEndTimeDowntimeStartEquipment
A01:00:0001:30:0001:15:00a
C02:30:0002:32:0002:31:00b
D03:00:0004:00:0003:34:00c
D03:00:0004:00:0003:57:00d

Assuming that the order that started first has to stay, and one order can be associated with multiple DowntimeStarts but not vice versa.

If you haven't joined the 2 tables resulting from the intervalmatch together yet, just do so by adding this to your script from the OP:

Inner join (Downtime) //'Inner' assumes you want to drop all orders/downtimes that have no match

LOAD *

Resident Orders;

Drop table Orders;

After that, add:

Right join (Downtime)

LOAD

     Equipment,

     Min(ActualOrderStartTime) as ActualOrderStartTime

Resident Downtime

Group by Equipment;

Note: If one Equipment can have multiple DowntimeStarts, and you want 1 order per DowntimeStart/Equipment combination, add DowntimeStart to the Right Join Load, and to the group by statement as well.

I hope this does what you want it to, sorry for the late reply and I apologise if I misinterpreted something or made a mistake somewhere. If you need more help let me know

t-gilchrist
Contributor III
Contributor III
Author

Thank you for your help

Between the link you provided and your examples I was able to solve my issue and understand "group by" a little better.

I ended up uses a left join to merge the two tables after my interval match because I also have downtime starts that fall in between orders as well. Then I used your example to with the right join (with a few tweaks) to remove the duplicate instances like so...


Right Join (Downtime)

Load

    Equipment,

    DowntimeStart,

    DowntimeEnd,

    CategoryName,

    SubCategoryName,

    ReasonName,

    Comment,

    min(ActualOrderStartTime) as ActualOrderStartTime

    Resident Downtime

    Group by DowntimeStart,Equipment,DowntimeEnd,CategoryName,SubCategoryName,ReasonName,Comment;