Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
Hello,
I am still face up a problem while removing overlapping intervals.
The data:
Client | Product | From | To |
A | A | 2015.01.01 | 2015.01.15 |
A | A | 2015.01.15 | 2015.01.15 |
A | A | 2015.01.10 | 2015.01.20 |
A | A | 2015.02.01 | 2015.02.10 |
B | B | 2015.01.01 | 2015.01.10 |
B | B | 2015.01.15 | 2015.01.20 |
B | B | 2015.01.13 | 2015.01.13 |
B | B | 2015.01.13 | 2015.01.15 |
B | C | 2015.01.01 | 2015.01.30 |
B | C | 2015.01.10 | 2015.02.15 |
B | C | 2015.02.20 | 2015.02.25 |
B | C | 2015.02.19 | 2015.02.21 |
B | C | 2015.02.23 | 2015.02.26 |
The result should be:
Client | Product | From | To |
A | A | 2015.01.01 | 2015.01.20 |
A | A | 2015.02.01 | 2015.02.10 |
B | B | 2015.01.01 | 2015.01.10 |
B | B | 2015.01.13 | 2015.01.20 |
B | C | 2015.01.01 | 2015.02.15 |
B | C | 2015.02.19 | 2015.02.26 |
I need to remove overlapping intervals depending on Client and Product fields.
Thank you for your help!
May be with these modifications
tmp:
LOAD Product,
From,
To,
Client
FROM Data.xls
(biff, embedded labels, table is Sheet1$);
tmp2:
LOAD *,
if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), 'Remove', 'Leave') as Status,
if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), Peek('LastFrom'), From) as LastFrom,
if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), Peek('LastTo'), To) as LastTo
Resident tmp
Order By Client, Product, From desc, To desc;
DROP Table tmp;
tmp3:
NoConcatenate
LOAD Product, From, To, Client,
if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', 'Remove', 'Leave') as Status,
if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', Peek('LastFrom'), From) as LastFrom,
if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', Peek('LastTo'), To) as LastTo
Resident tmp2
Order By Client, Product, From, To;
DROP Table tmp2;
Have you tried this?
Unfortunately this do not work :
Check this, still i have doubt on this.
Not working.
Hi Mindaugas,
try following:
// 1. Dissolving intervals into discrete rows
TEMP:
LOAD
Min(From) as MinFrom,
Max(To) as MaxTo
Resident TEST;
let vMinFrom=FieldValue('MinFrom',1);
let vMaxTo=FieldValue('MaxTo',1);
DROP Table TEMP;
DATES:
LOAD
RecNo()+$(vMinFrom)-1 as Date
AutoGenerate $(vMaxTo)-$(vMinFrom)+1;
TEMP:
IntervalMatch(Date)
LOAD
From,
To
Resident TEST;
Left Join (TEST)
LOAD Distinct * Resident TEMP;
DROP Fields From,To;
DROP Tables TEMP,DATES;
// 2. Creating new intervals
TEMP:
LOAD
*,
if(Client=Previous(Client) and Product=Previous(Product) and Date-Previous(Date)<2,Peek('No'),if(IsNull(Peek('No')),1,Peek('No')+1)) as No
Resident TEST
Order By Client,Product,Date;
DROP Table TEST;
Left Join(TEMP)
LOAD
No,
Date(Min(Date)) as From,
Date(Max(Date)) as To
Resident TEMP
Group by No;
DROP Field Date;
TEST:
NoConcatenate
LOAD Distinct * Resident TEMP;
DROP Table TEMP;
It works great on small scale.
But if I have period of 5 years and 1.000.000 distinct intervals the method struggles to do it's job.
Maybe you have any other ideas?
The script can be optimized a little bit. See attachment.
Actually the problem appears in this part of the script as there are creating millions of rows.
TEMP:
IntervalMatch(Date)
LOAD
From,
To
Resident TEST;
Left Join (TEST)
LOAD Distinct * Resident TEMP;
Another (although similar) option
Table:
LOAD DISTINCT Client,
Product,
Date(From + IterNo() - 1) as Date
While From + IterNo() - 1 <= To;
LOAD * INLINE [
Client, Product, From, To
A, A, 2015.01.01, 2015.01.15
A, A, 2015.01.15, 2015.01.15
A, A, 2015.01.10, 2015.01.20
A, A, 2015.02.01, 2015.02.10
B, B, 2015.01.01, 2015.01.10
B, B, 2015.01.15, 2015.01.20
B, B, 2015.01.13, 2015.01.13
B, B, 2015.01.13, 2015.01.15
B, C, 2015.01.01, 2015.01.30
B, C, 2015.01.10, 2015.02.15
B, C, 2015.02.20, 2015.02.25
B, C, 2015.02.19, 2015.02.21
B, C, 2015.02.23, 2015.02.26
];
TempTable:
LOAD *,
If(Client = Previous(Client) and Product = Previous(Product) and Date = Previous(Date + 1), Alt(Peek('SNo'), 1), RangeSum(Peek('SNo'), 1)) as SNo
Resident Table
Order By Client, Product, Date;
FinalTable:
LOAD Client,
Product,
SNo,
Date(Min(Date)) as From,
Date(Max(Date)) as To
Resident TempTable
Group By Client, Product, SNo;
DROP Table Table, TempTable;