Unlock a world of possibilities! Login now and discover the exclusive benefits awaiting you.
I have a QVD of 500 million rows which is unwieldy. Therefore I split it by month based on a timestamp.
TempTable:
LOAD
*
FROM [lib://Qlik Data Files/LargeFile.qvd] (qvd)
For vYear = 2010 to YEAR (today()-1)
For vMonth = 1 to 12
Let vMonth2 = Num (vMonth , '00');
Let vDate = $(vYear) & '-' & $(vMonth2) &'-01';
Let vShortDate = $(vYear) & '-' & $(vMonth2);
SplitTable:
LOAD
*
RESIDENT TempTable
WHERE timestamp >= MonthStart(DATE#('$(vDate)' , 'YYYY-MM-DD')) and timestamp <= MonthEnd(DATE#('$(vDate)' , 'YYYY-MM-DD'));
Store * from SplitTable into 'lib://Qlik Data Files/Month-$(vShortDate).qvd';
Drop Table SplitTable;
Next vMonth
Next vYear;
The trouble is that even though it's a RESIDENT table, it is still quite slow (it runs through the same table multiple times, each time for the next month). Is there anyway to speed it up by either;
(Unfortunately I can't assume that 'old' timestamps won't be inserted into the data so I must process the whole file every time - I can't just do the current/past month.)
Hi,
you can limit the date directly in the load.
Be careful that to keep the qvd load optimized you can only use "where exists". That means you will have to load a temp table with all dates with the same field name.
If you use a real timestamp you will need to create a new field with date - timestamp has too much unique values to be used in where exists.
This is the code - never tested it for speed but I believe it should be faster.
For vYear = 2010 to YEAR (today()-1)
For vMonth = 1 to 12
vDays = MonthEnd(MakeDate($(vYear), $(vMonth),1)) - MonthStart(MakeDate($(vYear), $(vMonth),1)) + 1;
tmpDates:
Load
MakeDate($(vYear), $(vMonth), RecNo()) As timestamp
AutoGenerate (vDays);
TempTable:
LOAD
*
FROM [lib://Qlik Data Files/LargeFile.qvd] (qvd)
Where Exists(timestamp);
//Store table
//Drop both tables
Next vMonth;
Next vYear;
BR,
Matus
I think you may need to add floor() to Matus' solution.
vDays = floor(MonthEnd(MakeDate($(vYear), $(vMonth),1))) - MonthStart(MakeDate($(vYear), $(vMonth),1)) + 1;
I suggest that you replaced your timestamp within the origin qvd with a date and a time like:
date(floor(timestamp)) as date,
time(frac(timestamp)) as time
to reduce the number of distinct fieldvalues - for the background, see: The Importance Of Being Distinct and it will reduce the size of the qvd quite significantly and all following loadings should be a lot faster. Also it makes it easier to create such exists() temp-table like suggest from mkelemen and I'm not sure that the date-approach there will be really work with your timestamps.
- Marcus