Skip to main content
Announcements
Customer Spotlight: Discover what’s possible with embedded analytics Oct. 16 at 10:00 AM ET: REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

intervalmatch duplication of rows

Hi everybody,

here is an extremely simplified version of my script:

agende:

LOAD * INLINE [

    COD_EROG, DATAJ_INIZIO, DATAJ_FINE

    E1, 03/01/2011, 10/01/2011

    E2, 03/01/2011, 10/01/2011   

];

calendar:

LOAD * INLINE [

          DAY

          07/01/2011

];

match:

left join(agende)

IntervalMatch (DAY) LOAD DATAJ_INIZIO, DATAJ_FINE RESIDENT agende;

The problem is that doing this way agenda rows are duplicated. I have already read other post discussing alternative ways to do the same job and to avoid duplications. But I still haven't understood why rows are duplicated (sorry, maybe it is simple, forgive me if I'm a little dumb...).

Can someone explain me why intervalmatch creates the duplication (or point me to an already made explanation)?

Thanks in advance,

Fil

1 Solution

Accepted Solutions
erichshiino
Partner - Master
Partner - Master

hI,

I used this now:

agende:

LOAD * INLINE [

    COD_EROG, DATAJ_INIZIO, DATAJ_FINE

    E1, 03/01/2011, 10/01/2011

    E2, 03/01/2011, 10/01/2011   

];

calendar:

LOAD * INLINE [

          DAY

          07/01/2011

];

match:

left join(agende)

IntervalMatch (DAY) LOAD DISTINCT  DATAJ_INIZIO, DATAJ_FINE RESIDENT agende;

I included distinct because during interval match cod_erog was not considered. Your date was associated with the same interval and you got two lines. Then when you use the left join (with two identical lines) you duplicated it again and get two for each code.

Hope it helps,

Erich

View solution in original post

4 Replies
erichshiino
Partner - Master
Partner - Master

Hi, Fil

I believe the duplication of your records is expected because the date 7/01 is in the middle of the line with E1 and E2. The date on calendar is associated with both intervals because they are the same.

I don't know what your data mean but the intervals should not overlap if you don't want duplicates.

Hope it helps,

Erich

Not applicable
Author

Hi Erich,

I would have expected to get a table like:

COD_EROG     DATAJ_INIZIO     DATAJ_FINE     DAY

E1                       03/01/2011           10/01/2011         07/01/2011

E2                       03/01/2011           10/01/2011         07/01/2011

while I get:

COD_EROG     DATAJ_INIZIO     DATAJ_FINE     DAY

E1                       03/01/2011           10/01/2011         07/01/2011

E1                       03/01/2011           10/01/2011         07/01/2011

E2                       03/01/2011           10/01/2011         07/01/2011

E2                       03/01/2011           10/01/2011         07/01/2011

Why is like that?

Thanks

erichshiino
Partner - Master
Partner - Master

hI,

I used this now:

agende:

LOAD * INLINE [

    COD_EROG, DATAJ_INIZIO, DATAJ_FINE

    E1, 03/01/2011, 10/01/2011

    E2, 03/01/2011, 10/01/2011   

];

calendar:

LOAD * INLINE [

          DAY

          07/01/2011

];

match:

left join(agende)

IntervalMatch (DAY) LOAD DISTINCT  DATAJ_INIZIO, DATAJ_FINE RESIDENT agende;

I included distinct because during interval match cod_erog was not considered. Your date was associated with the same interval and you got two lines. Then when you use the left join (with two identical lines) you duplicated it again and get two for each code.

Hope it helps,

Erich

Not applicable
Author

Thank you, I had already discovered that using distinct the problem would be solved. But I hadn't understood exactly why, without, distinct, the rows were duplicated.

Now your reply has enlighted me! Sometimes, just speaking with some else helps to understand...

Thanks,

Fil