Skip to main content
Announcements
NEW: Seamless Public Data Sharing with Qlik's New Anonymous Access Capability: TELL ME MORE!
cancel
Showing results for 
Search instead for 
Did you mean: 
Halesha_Bandri
Contributor II
Contributor II

is there any faster way to create Timestamp field for any given TimeStamp Range?

Hello Everyone,

I have a QVD file that requires an optimized QVD load by filtering on the "timestamp field" (sent_date_time) for a specific "date and time range"(for our calculation, let's assume the date range is from 01/Mar/2023 00:00:00 to 31/Mar/2023 23:59:59).

To achieve an optimized QVD load, I have created a timestamp field (sent_date_time) using the following code:

//===============================================================
//Setting variables
//===============================================================

SET vDateFormat ='DD/MM/YYYY';

SET vTimeFormat ='hh:mm:ss';

SET vTimeStampFormat ='DD/MM/YYYY hh:mm:ss';

//===============================================================
//Step1: Generating Dates for Specific Period
//===============================================================

Let vStartDate = Num(Date#('01/03/2023','DD/MM/YYYY'));
Let vEndDate = Num(Date#('31/03/2023','DD/MM/YYYY'));
Trace vStartDate : $(vStartDate);
Trace vEndDate : $(vEndDate);

TempMasterDate:

Load
Num(Date($(vStartDate) + IterNo()-1)) as Numdate
AutoGenerate 1
while $(vStartDate) + IterNo()-1 <= $(vEndDate);

//===============================================================
//Step2: Generating Time for a Day
//===============================================================
NoConcatenate

TempMasterTime:

Load
Num(Time((1/86400 * (RecNo() - 1)))) as NumTime
Autogenerate 86400;

//===============================================================
//Step3: Cross Join "TempMasterDate" and "TempMasterTime" tables
//===============================================================

NoConcatenate

TempDateTime:

LOAD * Resident TempMasterDate;

JOIN LOAD * Resident TempMasterTime;

DROP Tables TempMasterDate,TempMasterTime;

//===============================================================
//Step4: Deriving Required fields
//===============================================================
NoConcatenate

FinalDateTime:

LOAD
Date(Numdate,'$(vDateFormat)') AS DatePart,
Time(NumTime,'$(vTimeFormat)') AS TimePart,
Timestamp(Numdate+NumTime,'$(vTimeStampFormat)') AS DateTimePart,
Timestamp(Numdate+NumTime,'$(vTimeStampFormat)') AS sent_date_time

Resident TempDateTime;

DROP Table TempDateTime;
//===============================================================
//Step5 : Loading QVD file by filtering
//===============================================================
SentEmail:

LOAD * FROM [$(vQVDExtractPath)/SentEmail.qvd](qvd)
WHERE EXISTS(sent_date_time);

DROP TABLE FinalDateTime;
//===============================================================

The above code works perfectly for me.

However, I am open to suggestions and alternative methods to create the timestamp field more efficiently. If anyone has any insights or recommendations on how to achieve this task in a faster and more efficient way, please feel free to share your thoughts on this?

Labels (4)
2 Solutions

Accepted Solutions
marcus_sommer

You may simplify the timestamp-creation with something like this:

load Date + (1/24/60/60 * (iterno() - 1)) as TimeStamp
    while iterno() - 1 < 86400;
load makedate(2023, 3) + recno() - 1 as Date
   autogenerate makedate(2023, 3, 31) - makedate(2023, 3) + 1;

But I suggest to consider not to filter against a timestamp else just a date which would mean only the date-creation would be enough. Further splitting the timestamp in the source into a date- and a time-field and reducing within the number of distinct field-values significantly - from about 2.6 M of values against 31 days and 86400 seconds. It will save a lot of storage space and RAM consumption.

View solution in original post

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I think you can simply the loop even more to 

Let vStartDate = Num(Date#('01/03/2023','DD/MM/YYYY'));
Let vEndDate = Num(Date#('31/03/2023','DD/MM/YYYY'));

Times:
LOAD
Timestamp($(vStartDate) + (RecNo() -1) / 86400) as Timestamp
AutoGenerate ($(vEndDate) - $(vStartDate) + 1) * 86400
;

-Rob

View solution in original post

5 Replies
vincent_ardiet_
Specialist
Specialist

Depending on your QVD size, one alternative could be:

tmpTimeStamp:
Load Distinct sent_date_time FROM [$(vQVDExtractPath)/SentEmail.qvd](qvd);

tmpTimeStamp2Keep:
NoConcatenate Load sent_date_time Resident tmpTimeStamp Where sent_date_time>=$(vStartDate) And sent_date_time <= $(vEndDate);

Drop Table tmpTimeStamp;

SentEmail:
LOAD * FROM [$(vQVDExtractPath)/SentEmail.qvd](qvd)
WHERE EXISTS(sent_date_time);

Drop Table tmpTimeStamp2Keep;



 

marcus_sommer

You may simplify the timestamp-creation with something like this:

load Date + (1/24/60/60 * (iterno() - 1)) as TimeStamp
    while iterno() - 1 < 86400;
load makedate(2023, 3) + recno() - 1 as Date
   autogenerate makedate(2023, 3, 31) - makedate(2023, 3) + 1;

But I suggest to consider not to filter against a timestamp else just a date which would mean only the date-creation would be enough. Further splitting the timestamp in the source into a date- and a time-field and reducing within the number of distinct field-values significantly - from about 2.6 M of values against 31 days and 86400 seconds. It will save a lot of storage space and RAM consumption.

rwunderlich
Partner Ambassador/MVP
Partner Ambassador/MVP

I think you can simply the loop even more to 

Let vStartDate = Num(Date#('01/03/2023','DD/MM/YYYY'));
Let vEndDate = Num(Date#('31/03/2023','DD/MM/YYYY'));

Times:
LOAD
Timestamp($(vStartDate) + (RecNo() -1) / 86400) as Timestamp
AutoGenerate ($(vEndDate) - $(vStartDate) + 1) * 86400
;

-Rob

Halesha_Bandri
Contributor II
Contributor II
Author

Thanks for your solution @marcus_sommer .

Yes, I accept your suggestion regarding the "filtering" of the data based on the date field instead of the timestamp field. It means timestamp field is adding 86,400 rows for each day, which indirectly results in higher space consumption.

So, I will try to extract date from the timestamp field in the extract layer itself, such that I can easily filter out the data on the basis of date field in the transform layer.

marcus_sommer

Beside splitting the timestamp with floor() and frac() into dates and times it might be further useful to add a separate period-field of YYYYMM maybe created per year(Date) * 100 + month(Date) within the source-data which could simplify the filter-creation even more (the few field-values in this extra field would have no significantly effect on the file-size or the RAM consumption).

More optimizing is possible by slicing the data in YYYYMM chunks and adding the YYYYMM information within the file-name. This enables the possibility to run per filelist() through the folder, extracting the period from the file-name and then picking directly the wanted periods. Looping through the folder and fetching the relevant information is very fast and even by loading multiple periods the load + extra overhead-time should be lesser as reading from a big qvd.

Especially if there is more as a single filter-parameter like periods + channels/categories/... the approach to include content-information within the file-name has benefits against a direct file-filter because exists() allows only a single parameter to keep an optimized load.