Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 
Anonymous
Not applicable

Merging dates during Load

Hi,

I have a table with CustomerID, Contract Start Date and Contract End Date fields.

One Customer can have multiple following contracts.

I need to merge rows where the gap between an older and a newer contract is <= 3 days. Contracts with longer gap should not be merged. Here is an example:

Rows 2-4 and 5-6 should be merged. The result should look like this:

I've already tried to merge rows using resident load and Min/Max functions, but I have no idea how and where to define the condition of <=3 days. Any suggestions?

Thank You!

1 Solution

Accepted Solutions
sunny_talwar

For this specific data the below script might work. But I think this might not work with your actual data. Would you be able to add more data to your sample?

Table:

LOAD * Inline [

RowNo, CustomerID, StartDate, EndDate

1, 1, 01.01.2008, 31.12.2008

2, 1, 05.01.2009, 31.12.2009

3, 1, 02.01.2010, 31.12.2010

4, 1, 01.01.2011, 21.12.2011

5, 2, 01.01.2012, 31.12.2012

6, 2, 01.01.2013, 31.12.2013

];

Temp:

LOAD *,

  If(CustomerID = Peek('CustomerID'), Peek('StartDate')) - EndDate as Diff

Resident Table

Order By CustomerID, StartDate desc;

FinalTable:

LOAD CustomerID,

  Date(Min(StartDate)) as StartDate,

  Date(Max(EndDate)) as EndDate

Resident Temp

Where Diff <= 3

Group By CustomerID;

Concatenate(FinalTable)

LOAD CustomerID,

  StartDate,

  EndDate

Resident Temp

Where Diff > 3;

FinalFinalTable:

NoConcatenate

LOAD RowNo() as RowNo,

  *

Resident FinalTable;

DROP Tables Table, Temp, FinalTable;

View solution in original post

2 Replies
sunny_talwar

For this specific data the below script might work. But I think this might not work with your actual data. Would you be able to add more data to your sample?

Table:

LOAD * Inline [

RowNo, CustomerID, StartDate, EndDate

1, 1, 01.01.2008, 31.12.2008

2, 1, 05.01.2009, 31.12.2009

3, 1, 02.01.2010, 31.12.2010

4, 1, 01.01.2011, 21.12.2011

5, 2, 01.01.2012, 31.12.2012

6, 2, 01.01.2013, 31.12.2013

];

Temp:

LOAD *,

  If(CustomerID = Peek('CustomerID'), Peek('StartDate')) - EndDate as Diff

Resident Table

Order By CustomerID, StartDate desc;

FinalTable:

LOAD CustomerID,

  Date(Min(StartDate)) as StartDate,

  Date(Max(EndDate)) as EndDate

Resident Temp

Where Diff <= 3

Group By CustomerID;

Concatenate(FinalTable)

LOAD CustomerID,

  StartDate,

  EndDate

Resident Temp

Where Diff > 3;

FinalFinalTable:

NoConcatenate

LOAD RowNo() as RowNo,

  *

Resident FinalTable;

DROP Tables Table, Temp, FinalTable;

Anonymous
Not applicable
Author

That's great! I've already figured out a similar solution, but mine has more steps.

Thank you Sunny T!