Qlik Community

App Development

Discussion board where members can learn more about Qlik Sense App Development and Usage.

Announcements
Customer & Partners, DEC. 9, 11 AM ET: Qlik Product & Strategy Roadmap Session: Data Analytics REGISTER NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
quriouss
Creator III
Creator III

Speed up splitting large QVD by month?

I have a QVD of 500 million rows which is unwieldy.  Therefore I split it by month based on a timestamp.

TempTable:

LOAD

    *

FROM [lib://Qlik Data Files/LargeFile.qvd] (qvd)

For vYear = 2010 to YEAR (today()-1)

For vMonth = 1 to 12

Let vMonth2 = Num (vMonth , '00');

Let vDate = $(vYear) & '-' & $(vMonth2) &'-01';

Let vShortDate = $(vYear) & '-' & $(vMonth2);

SplitTable:

LOAD

    *

RESIDENT TempTable

WHERE timestamp >= MonthStart(DATE#('$(vDate)' , 'YYYY-MM-DD')) and timestamp <= MonthEnd(DATE#('$(vDate)' , 'YYYY-MM-DD'));

Store * from SplitTable into 'lib://Qlik Data Files/Month-$(vShortDate).qvd';

Drop Table SplitTable;

Next vMonth

Next vYear;

The trouble is that even though it's a RESIDENT table, it is still quite slow (it runs through the same table multiple times, each time for the next month). Is there anyway to speed it up by either;

  1. Ignoring the dates already processed (in essence dropping the rows) so the table will shrink in size by a month with every pass, or
  2. If I load the table (or store the QVD) pre-sorted on timestamp, will this increase the search speed?

(Unfortunately I can't assume that 'old' timestamps won't be inserted into the data so I must process the whole file every time - I can't just do the current/past month.)

3 Replies
mkelemen
Creator III
Creator III

Hi,

you can limit the date directly in the load.

Be careful that to keep the qvd load optimized you can only use "where exists". That means you will have to load a temp table with all dates with the same field name.

If you use a real timestamp you will need to create a new field with date - timestamp has too much unique values to be used in where exists.

This is the code - never tested it for speed but I believe it should be faster.


For vYear = 2010 to YEAR (today()-1)

For vMonth = 1 to 12

  vDays = MonthEnd(MakeDate($(vYear), $(vMonth),1)) - MonthStart(MakeDate($(vYear), $(vMonth),1)) + 1;

  tmpDates:

  Load

  MakeDate($(vYear), $(vMonth), RecNo())  As timestamp

  AutoGenerate (vDays);

  TempTable:

  LOAD

  *

  FROM [lib://Qlik Data Files/LargeFile.qvd] (qvd)

  Where Exists(timestamp);


  //Store table

  //Drop both tables

Next vMonth;

Next vYear;

BR,

Matus

rwunderlich

I think you may need to add floor() to Matus' solution.

vDays = floor(MonthEnd(MakeDate($(vYear), $(vMonth),1))) - MonthStart(MakeDate($(vYear), $(vMonth),1)) + 1;

marcus_sommer
MVP & Luminary
MVP & Luminary

I suggest that you replaced your timestamp within the origin qvd with a date and a time like:

date(floor(timestamp)) as date,

time(frac(timestamp)) as time

to reduce the number of distinct fieldvalues - for the background, see: The Importance Of Being Distinct and it will reduce the size of the qvd quite significantly and all following loadings should be a lot faster. Also it makes it easier to create such exists() temp-table like suggest from mkelemen and I'm not sure that the date-approach there will be really work with your timestamps.

- Marcus