8 Replies Latest reply: Feb 14, 2013 9:14 AM by Dave Riley RSS

    Extracting Min(Date) and Max(Date) from large QVD

      Hi

       

      I have large QVD of 200m records.

       

      How do I extract the min and max dates from this large table in the most efficient way possible. Is there a way to keep it an optimized load?

       

      Thanks

        • Re: Extracting Min(Date) and Max(Date) from large QVD
          Fernando Suzuki

          Hi,

           

          I'd use this:

          temp:
          LOAD Date
          FROM [YourQVD.qvd] (qvd);
          
          MinMaxDate:
          Load min(Date) as MinDate,
                  max(Date) as MaxDate;
          Load FieldValue('Date',IterNo()) as Date
          autogenerate(1)
          while not IsNull(FieldValue('Date',Iterno()));
          
          Drop Table temp;
          

           

          Not 100% if this is the best way, but it is the fastest I've seen in some of my tests.

           

           

          Hope this helps

          Fernando

          • Re: Extracting Min(Date) and Max(Date) from large QVD
            Vinay kumar Bangari

            To add one more point to Daniel: Since it is a large table reading the whole table will take time while loading the script the best bet is when querying from your database. Create a separate QVD for the dates. Lets say like below:

             

             

            temp:
            LOAD Date;

            Select Date
            FROM [Database_table] group by Date ;// this will give you distinct and less number of Dates so you dont need to read the complete QVD which is of 200 million records//

             

            below script would be the same..

             

            temp2:

            load

                max(Date) as MaxDate,

                min(Date) as MinDate

            resident temp; 

             

            drop table temp;

              • Re: Extracting Min(Date) and Max(Date) from large QVD
                Daniel Rozental

                I disagree, it will depend on weather the database has an index on that field or not, the optimized load of a single field into memory will be faster in my opinion.

                 

                I did a test with 50 million rows.

                 

                This code took 2 mins to complete

                 

                LOAD max(OrderDate) as MaxDate,      
                          min(OrderDate) as MinDate
                FROM
                [R00.QVD]
                (qvd);
                

                 

                This code took slightly under a minute

                temp:
                LOAD distinct OrderDate
                FROM
                [R00.QVD]
                (qvd);
                
                
                temp2:
                LOAD max(OrderDate) as MaxDate,      
                          min(OrderDate) as MinDate
                resident temp;
                
                
                drop table temp;
                

                 

                Fernando's solution took  6 seconds (Fernando you're the man!) and there is no way to beat that time since just loading the field takes 5 secs.

                 

                temp:
                LOAD OrderDate
                FROM
                [R00.QVD]
                (qvd);
                
                
                MinMaxDate:
                Load min(OrderDate) as MinDate,
                        max(OrderDate) as MaxDate;
                Load FieldValue('OrderDate',IterNo()) as OrderDate
                autogenerate(1)
                while not IsNull(FieldValue('OrderDate',Iterno()));
                
                Drop Table temp;
                

                 

                This took 13 secs, kinda makes you wonder how qlikview does things

                 

                temp:
                LOAD OrderDate
                FROM
                [R00.QVD]
                (qvd);
                
                store temp into orderdate.qvd;
                drop table temp;
                  
                Directory;
                LOAD max(OrderDate) as MaxDate,
                min(OrderDate) as MinDate
                FROM
                orderdate.qvd
                (qvd);
                
                
              • Re: Extracting Min(Date) and Max(Date) from large QVD
                Dave Riley

                Although you have resolved your problem, there may be a quicker way of storing QVD info and retrieving it through the use of field comments (also table comments).

                 

                e.g. If you identify the range values in the QVD creation routine and comment the field as this ...

                 

                fName:

                Load min(OrderDate) & '|' & max(OrderDate) as dRange resident SampleData;

                 

                let v = peek('dRange');

                 

                Comment OrderDate with '$(v)';


                store SampleData into SampleData.qvd (qvd);

                 

                ... the comment is stored in the QVD and can be retrieved via XML without the need to load any data - you just might need to parse the comment in the QVD reading document ...

                 

                QvdFieldHeader:

                Load date(subfield(Comment,'|',1)) as minOrderDate,

                    date(subfield(Comment,'|',-1)) as maxOrderDate;

                LOAD Comment

                FROM [SampleData.qvd] (XmlSimple, Table is [QvdTableHeader/Fields/QvdFieldHeader])

                where FieldName = 'OrderDate';

                 

                flipside