Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
Not applicable

Extracting Min(Date) and Max(Date) from large QVD

Hi

I have large QVD of 200m records.

How do I extract the min and max dates from this large table in the most efficient way possible. Is there a way to keep it an optimized load?

Thanks

1 Solution

Accepted Solutions
fosuzuki
Partner - Specialist III
Partner - Specialist III

Hi,

I'd use this:

temp:

LOAD Date

FROM [YourQVD.qvd] (qvd);

MinMaxDate:

Load min(Date) as MinDate,

        max(Date) as MaxDate;

Load FieldValue('Date',IterNo()) as Date

autogenerate(1)

while not IsNull(FieldValue('Date',Iterno()));

Drop Table temp;

Not 100% if this is the best way, but it is the fastest I've seen in some of my tests.

Hope this helps

Fernando

View solution in original post

8 Replies
fosuzuki
Partner - Specialist III
Partner - Specialist III

Hi,

I'd use this:

temp:

LOAD Date

FROM [YourQVD.qvd] (qvd);

MinMaxDate:

Load min(Date) as MinDate,

        max(Date) as MaxDate;

Load FieldValue('Date',IterNo()) as Date

autogenerate(1)

while not IsNull(FieldValue('Date',Iterno()));

Drop Table temp;

Not 100% if this is the best way, but it is the fastest I've seen in some of my tests.

Hope this helps

Fernando

danielrozental
Master II
Master II

Haven't tested it but taking Fernando's suggestions into consideration I would try

temp:
LOAD Date
FROM [YourQVD.qvd] (qvd);

temp2:

load

    max(Date) as MaxDate,

    min(Date) as MinDate

resident temp;  

drop table temp;

Anonymous
Not applicable
Author

To add one more point to Daniel: Since it is a large table reading the whole table will take time while loading the script the best bet is when querying from your database. Create a separate QVD for the dates. Lets say like below:

temp:
LOAD Date;

Select Date
FROM [Database_table] group by Date ;// this will give you distinct and less number of Dates so you dont need to read the complete QVD which is of 200 million records//

below script would be the same..

temp2:

load

    max(Date) as MaxDate,

    min(Date) as MinDate

resident temp; 

drop table temp;

danielrozental
Master II
Master II

I disagree, it will depend on weather the database has an index on that field or not, the optimized load of a single field into memory will be faster in my opinion.

I did a test with 50 million rows.

This code took 2 mins to complete

LOAD max(OrderDate) as MaxDate,     

          min(OrderDate) as MinDate

FROM

[R00.QVD]

(qvd);

This code took slightly under a minute

temp:

LOAD distinct OrderDate

FROM

[R00.QVD]

(qvd);

temp2:

LOAD max(OrderDate) as MaxDate,     

          min(OrderDate) as MinDate

resident temp;

drop table temp;

Fernando's solution took  6 seconds (Fernando you're the man!) and there is no way to beat that time since just loading the field takes 5 secs.

temp:

LOAD OrderDate

FROM

[R00.QVD]

(qvd);

MinMaxDate:

Load min(OrderDate) as MinDate,

        max(OrderDate) as MaxDate;

Load FieldValue('OrderDate',IterNo()) as OrderDate

autogenerate(1)

while not IsNull(FieldValue('OrderDate',Iterno()));

Drop Table temp;

This took 13 secs, kinda makes you wonder how qlikview does things

temp:

LOAD OrderDate

FROM

[R00.QVD]

(qvd);

store temp into orderdate.qvd;

drop table temp;

Directory;

LOAD max(OrderDate) as MaxDate,

min(OrderDate) as MinDate

FROM

orderdate.qvd

(qvd);

Not applicable
Author

Hi Fernando

Is there any benefit in using IterNo in your statement? The MinMaxDate table only has 1 record, so FieldValue would have worked on its own. Or, is there a performance beneifit of using IterNo?

Thanks

-----------

MinMaxDate:

Load min(Date) as MinDate,

        max(Date) as MaxDate;

Load FieldValue('OrderDate',IterNo()) as OrderDate

autogenerate(1)

while not IsNull(FieldValue('OrderDate',IterNo()));

Drop Table temp;

Not applicable
Author

Hi Daniel

Are the times you specify including the "Load OrderDate" statement too, as this alone (on my machine) takes about 55 seconds using Fernando's approach. How do you achieve 6 seconds - I assume you are running this directly on the server?

Thanks

Andrew

danielrozental
Master II
Master II

Times include the entire script execution time, I'm not using the same data as you are and my machine specifications may defer from yours.

Make sure the date load from the QVD is optimized.

Please mark Fernando's answer as the correct one.

flipside
Partner - Specialist II
Partner - Specialist II

Although you have resolved your problem, there may be a quicker way of storing QVD info and retrieving it through the use of field comments (also table comments).

e.g. If you identify the range values in the QVD creation routine and comment the field as this ...

fName:

Load min(OrderDate) & '|' & max(OrderDate) as dRange resident SampleData;

let v = peek('dRange');

Comment OrderDate with '$(v)';


store SampleData into SampleData.qvd (qvd);

... the comment is stored in the QVD and can be retrieved via XML without the need to load any data - you just might need to parse the comment in the QVD reading document ...

QvdFieldHeader:

Load date(subfield(Comment,'|',1)) as minOrderDate,

    date(subfield(Comment,'|',-1)) as maxOrderDate;

LOAD Comment

FROM [SampleData.qvd] (XmlSimple, Table is [QvdTableHeader/Fields/QvdFieldHeader])

where FieldName = 'OrderDate';

flipside