15 Replies Latest reply: Jan 6, 2017 5:22 PM by Marco Wedel RSS

    Overlapping intervals

    Mindaugas Bacius

      Hello,

       

      I am still face up a problem while removing overlapping intervals.

      The data:

      ClientProductFromTo
      AA2015.01.012015.01.15
      AA2015.01.152015.01.15
      AA2015.01.102015.01.20
      AA2015.02.012015.02.10
      BB2015.01.012015.01.10
      BB2015.01.152015.01.20
      BB2015.01.132015.01.13
      BB2015.01.132015.01.15
      BC2015.01.012015.01.30
      BC2015.01.102015.02.15
      BC2015.02.202015.02.25
      BC2015.02.192015.02.21
      BC2015.02.232015.02.26

       

      The result should be:

      ClientProductFromTo
      AA2015.01.012015.01.20
      AA2015.02.012015.02.10
      BB2015.01.012015.01.10
      BB2015.01.132015.01.20
      BC2015.01.012015.02.15
      BC2015.02.192015.02.26

       

      I need to remove overlapping intervals depending on Client and Product fields.

       

      Thank you for your help!

        • Re: Overlapping intervals
          Aswin raaj

          Have you tried this?

          • Re: Overlapping intervals
            Christian Wolf

            Hi Mindaugas,

             

            try following:

             

            // 1. Dissolving intervals into discrete rows
            
            
            TEMP:
            LOAD 
            Min(From) as MinFrom,
            Max(To) as MaxTo 
            Resident TEST; 
            let vMinFrom=FieldValue('MinFrom',1);
            let vMaxTo=FieldValue('MaxTo',1);
            
            
            DROP Table TEMP;
            
            
            DATES:
            LOAD
            RecNo()+$(vMinFrom)-1 as Date
            AutoGenerate $(vMaxTo)-$(vMinFrom)+1;
            
            
            TEMP:
            IntervalMatch(Date)
            LOAD
            From,
            To
            Resident TEST;
            
            
            Left Join (TEST)
            LOAD Distinct * Resident TEMP;
            
            
            DROP Fields From,To;
            DROP Tables TEMP,DATES;
            
            
            // 2. Creating new intervals
            
            
            TEMP:
            LOAD
            *,
            if(Client=Previous(Client) and Product=Previous(Product) and Date-Previous(Date)<2,Peek('No'),if(IsNull(Peek('No')),1,Peek('No')+1)) as No
            Resident TEST
            Order By Client,Product,Date;
            
            
            DROP Table TEST;
            
            
            Left Join(TEMP)
            LOAD
            No,
            Date(Min(Date)) as From,
            Date(Max(Date)) as To
            Resident TEMP
            Group by No;
            
            
            DROP Field Date;
            
            
            TEST:
            NoConcatenate
            LOAD Distinct * Resident TEMP;
            
            
            DROP Table TEMP;
            
              • Re: Overlapping intervals
                Mindaugas Bacius

                It works great on small scale.

                But if I have period of 5 years and 1.000.000 distinct intervals the method struggles to do it's job.

                 

                Maybe you have any other ideas?

                  • Re: Overlapping intervals
                    Christian Wolf

                    The script can be optimized a little bit. See attachment.

                      • Re: Overlapping intervals
                        Mindaugas Bacius

                        Actually the problem appears in this part of the script as there are creating millions of rows.

                        TEMP:
                        IntervalMatch(Date)
                        LOAD
                        From,
                        To
                        Resident TEST;
                        
                        
                        Left Join (TEST)
                        LOAD Distinct * Resident TEMP;
                        
                          • Re: Overlapping intervals
                            Christian Wolf

                            An other way is it to loop over the distinct intervals until they are clean. So it will not produce millions of discrete rows, but it will needs 4 up to 6 loops to get the clean intervals. But try it.

                             

                            INPUTTAB:
                            LOAD
                            Hash128(Client,Product,From,To) as Key0,
                            *
                            ;
                            LOAD * INLINE [
                            Client    Product    From    To
                            A    A    2015.01.01    2015.01.15    
                            ...
                            ](delimiter is '\t');
                            
                            LOOPTAB:
                            LOAD Distinct
                            Key0,
                            AutoNumber(Hash128(Client,Product),'Key') as Key,
                            num(From) as From#,
                            num(To) as To#
                            Resident INPUTTAB;
                            
                            DROP Fields From,To;
                            
                            let nor1=NoOfRows('LOOPTAB');
                            
                            for i= 1 to 100
                            
                                Trace *** Loop $(i) ***;
                                
                                let h=i-1;
                            
                                TEMP:
                                NoConcatenate
                                LOAD
                                Hash128(Key,From#,To#) as Key$(i),
                                *
                                ;
                                LOAD
                                Key$(h),
                                Key,
                                If(Key=Previous(Key) and From#>=Previous(From#) and From#-1<=Previous(To#),Previous(From#),From#) as From#, 
                                If(Key=Previous(Key) and To#<=Previous(To#) and To#>=Previous(From#),Previous(To#),To#) as To#
                                Resident LOOPTAB
                                Order By Key,From#,To# desc;
                                
                                DROP Table LOOPTAB; 
                                
                                LOOPTAB:
                                LOAD Distinct
                                Key$(i),
                                Key,
                                From#,
                                To#
                                Resident TEMP;
                                
                                let nor2=NoOfRows('LOOPTAB');
                                
                                if nor1=nor2 then
                                    Left Join(INPUTTAB)
                                    LOAD
                                    Key$(h),
                                    Date(From#) as From,
                                    Date(To#) as To
                                    Resident TEMP;
                                    
                                    DROP Field Key$(h);
                                    DROP Tables TEMP,LOOPTAB;
                                    EXIT For;
                                else
                                    Left Join(INPUTTAB)
                                    LOAD
                                    Key$(h),
                                    Key$(i)
                                    Resident TEMP;
                                    
                                    DROP Field Key$(h);
                                    DROP Table TEMP;
                            
                                    set nor1=$(nor2);
                                end if;
                            
                            next
                            
                            OUTPUTTAB:
                            NoConcatenate
                            LOAD Distinct * Resident INPUTTAB;
                            
                            DROP Table INPUTTAB;
                            
                    • Re: Overlapping intervals
                      Sunny Talwar

                      Another (although similar) option

                       

                      Table:

                      LOAD DISTINCT Client,

                        Product,

                        Date(From + IterNo() - 1) as Date

                      While From + IterNo() - 1 <= To;

                      LOAD * INLINE [

                          Client, Product, From, To

                          A, A, 2015.01.01, 2015.01.15

                          A, A, 2015.01.15, 2015.01.15

                          A, A, 2015.01.10, 2015.01.20

                          A, A, 2015.02.01, 2015.02.10

                          B, B, 2015.01.01, 2015.01.10

                          B, B, 2015.01.15, 2015.01.20

                          B, B, 2015.01.13, 2015.01.13

                          B, B, 2015.01.13, 2015.01.15

                          B, C, 2015.01.01, 2015.01.30

                          B, C, 2015.01.10, 2015.02.15

                          B, C, 2015.02.20, 2015.02.25

                          B, C, 2015.02.19, 2015.02.21

                          B, C, 2015.02.23, 2015.02.26

                      ];

                       

                      TempTable:

                      LOAD *,

                        If(Client = Previous(Client) and Product = Previous(Product) and Date = Previous(Date + 1), Alt(Peek('SNo'), 1), RangeSum(Peek('SNo'), 1)) as SNo

                      Resident Table

                      Order By Client, Product, Date;

                       

                      FinalTable:

                      LOAD Client,

                        Product,

                        SNo,

                        Date(Min(Date)) as From,

                        Date(Max(Date)) as To

                      Resident TempTable

                      Group By Client, Product, SNo;

                       

                      DROP Table Table, TempTable;

                       

                      Capture.PNG

                      • Re: Overlapping intervals
                        Sunny Talwar

                        May be with these modifications

                         

                        tmp:

                        LOAD Product,

                            From,

                            To,

                            Client

                        FROM Data.xls

                        (biff, embedded labels, table is Sheet1$);

                         

                        tmp2:

                        LOAD *,

                          if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), 'Remove', 'Leave') as Status,

                          if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), Peek('LastFrom'), From) as LastFrom,

                          if(Product = Previous(Product) and Client = Previous(Client) and From = Peek('LastFrom') and To >= Peek('LastFrom') and To <= Peek('LastTo'), Peek('LastTo'), To) as LastTo

                        Resident tmp

                        Order By Client, Product, From desc, To desc;

                        DROP Table tmp;

                         

                        tmp3:

                        NoConcatenate

                        LOAD Product, From, To, Client,

                          if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', 'Remove', 'Leave') as Status,

                          if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', Peek('LastFrom'), From) as LastFrom,

                          if(Product = Previous(Product) and Client = Previous(Client) and From >= Peek('LastFrom') and From <= Peek('LastTo') or To >= Peek('LastFrom') and To <= Peek('LastTo') or Status='Remove', Peek('LastTo'), To) as LastTo

                        Resident tmp2

                        Order By Client, Product, From, To;

                         

                        DROP Table tmp2;

                        • Re: Overlapping intervals
                          Marco Wedel

                          Hi,

                           

                          I tried for a different method by identifying overlapping intervals using the intervalmatch prefix and then clustering those intervals using the hierarchy prefix.

                          Although the result is a bit lengthy and not performing as well as hoped, I want to post it nevertheless:

                           

                           

                          QlikCommunity_Thread_244866_Pic1.JPG

                          QlikCommunity_Thread_244866_Pic2.JPG

                           

                          tabInput:
                          LOAD RecNo() as ID,
                              AutoNumberHash128(Client,Product) as ClntProdID,
                              From as Date, *  
                          FROM [https://community.qlik.com/thread/244866] (html, codepage is 1252, embedded labels, table is @1);
                          
                          tabIntervals:
                          IntervalMatch (Date, ClntProdID)
                          LOAD From,
                              To,
                              ClntProdID
                          Resident tabInput;
                            
                          Left Join (tabIntervals)
                          LOAD From,
                              To,
                              ClntProdID,
                              ID as ParentID
                          Resident tabInput;
                          
                          Left Join (tabIntervals)
                          LOAD Date,
                              ClntProdID,
                              ID
                          Resident tabInput;
                          
                          Left Join (tabIntervals)
                          LOAD ID,
                              Min(From) as MinFrom
                          Resident tabIntervals
                          Group By ID;
                          
                          Left Join (tabInput)
                          LOAD ID,
                              Min(ParentID) as ParentID
                          Resident tabIntervals
                          Where From = MinFrom
                          Group By ID;
                          
                          DROP Table tabIntervals;
                          
                          tabOutput:
                          Hierarchy (ChildID, ParentID, ID)
                          LOAD Client,
                              Product,
                              From,
                              To,
                              ID,
                              ID as ChildID,
                              ParentID
                          Resident tabInput;
                          
                          DROP Table tabInput;
                          
                          Left Join (tabOutput)
                          LOAD Distinct
                              ID1,
                              AutoNumber(ID1) as ClusterID
                          Resident tabOutput;
                          
                          tabClusters:
                          LOAD ClusterID,
                              Date(Min(From)) as ClusterFrom,
                              Date(Max(To)) as ClusterTo
                          Resident tabOutput
                          Group By ClusterID;
                          
                          

                           

                          hope this helps

                           

                          regards

                           

                          Marco