topic Data loading crashes with 20GB qvd in Connectivity & Data Prep

Data loading crashes with 20GB qvd

ChrisAess — Thu, 17 Apr 2025 08:42:33 GMT

Hello,
I need to optimize a loading script (that I did not create) that handles QVDs ranging from 10 to 20 GB.
The problem is that data retrieval takes a very long time and then fails due to a lack of server resources for the largest QVD (the server has 64 GB).

The script will retrieve data for the current day, 
then concatenate it with the existing data in the QVD, 
and finally replace the QVD with the new data.
There is a "where not exists" clause to avoid duplicates.

Changes already implemented :
CreateSearchIndexOnReload is set to 0
We only perform "where not exists" on data with a date within the loading period and we retrieve everything before this period

This solved the problem with daily data recovery, but due to repeated crashes, we have several days to recover and this is too much for the server for the largest qvds.

I'm currently trying to purge the data to keep only one year but the server doesn't have enough resources to do something like "load * from qvd where date > 04/17/2024 if the qvd exceeds 15gb

Also, we currently have 1.5 years of data and as i said, one of the qvd's is 20gb. That seems huge to me.

Do you have any ideas on how the script can be optimized and if the size of the qvd seems normal to you?

Thanks in advance

The script looks like this :

LET LastExecTime = if(LastExecTime, LastExecTime, MakeDate(2025,4,1));
Let ThisExecTime = now();

LET v_begin = date(LastExecTime, 'YYYYMMDD hh:mm:ss.fff');
LET v_end = date(ThisExecTime, 'YYYYMMDD hh:mm:ss.fff') ;

TEST:
LOAD 
DateTime, 
TagName,
Num(Value,'##,##') AS Value, 
Date(DateTime) & '/' & Time(DateTime) & '/' & TagName & 'XXX' AS Key;

SQL SELECT [DateTime], 
[TagName], 
[Value], 
FROM XXX
AND [DateTime] >= '$(v_begin)' 
AND [DateTime] <= '$(v_end)';

TMP:
NoConcatenate
LOAD *
FROM [lib://XXX/TEST.qvd] (qvd) 
Where DateTime >= '$(v_begin)';

Concatenate(TEST) 
LOAD *
Resident TMP
WHERE not exists(Key);

DROP TABLE TMP;

Concatenate(TEST) 
LOAD *
FROM [lib://XXX/TEST.qvd] (qvd) 
Where DateTime < '$(v_begin)' ;

store * FROM TEST into [lib://XXX/TEST.qvd];

DROP TABLE TEST;

Re: Data loading crashes with 20GB qvd

Bhushan_Mahajan — Thu, 17 Apr 2025 10:58:56 GMT

@ChrisAess I think it is append and update incremental load.

// 1. Load new data from SQL source
NEW:
LOAD
DateTime,
TagName,
Num(Value, '##,##') AS Value,
Date(DateTime) & '/' & Time(DateTime) & '/' & TagName & 'XXX' AS Key;

SQL SELECT
[DateTime],
[TagName],
[Value]
FROM XXX
WHERE [DateTime] >= '$(v_begin)' AND [DateTime] <= '$(v_end)';

// 2. Load existing data from QVD
OLD:
LOAD
DateTime,
TagName,

Value
FROM [lib://XXX/TEST.qvd] (qvd);

OLD_N:

Load

DateTime,

TagName,
Num(Value, '##,##') AS Value,
Date(DateTime) & '/' & Time(DateTime) & '/' & TagName & 'XXX' AS Key

Resident OLD;

Drop table OLD;

// 3. Keep only old records that are NOT in the new data (removing duplicates or rows to be updated)
CLEANED_OLD:
NoConcatenate
LOAD *
Resident OLD_N
WHERE NOT Exists(Key);

DROP TABLE OLD_N;

// 4. Merge old (cleaned) + new
Concatenate(CLEANED_OLD)
LOAD *
Resident NEW;

DROP TABLE NEW;

// 5. Store final merged table back to QVD
STORE CLEANED_OLD INTO [lib://XXX/TEST.qvd] (qvd);

DROP TABLE CLEANED_OLD;

Re: Data loading crashes with 20GB qvd

marcus_sommer — Thu, 17 Apr 2025 13:44:29 GMT

This approach must be slow - because no qvd-load is optimized. To make a qvd-load optimized no transformations must be applied - allowed is only a where exists() with a single parameter. This means not doing:

where date >= MyDate;

else it should look like:

where exists(MyDate);

These date-values could be created in beforehand with something like:

t0: load date(FirstDate + recno() - 1) as MyDate
autogenerate LastExecutionDate - FirstDate;

then comes the load of the historic data and then the current ones, like:

t1: load * from QVD where exists(MyDate);

concatenate(t1)

load *;
sql select * from DB where MyDate > LastExecutionDate;

store t1 into ...; drop tables t0;
let LastExecutionDate = today();

No further resident loads and/or creating combined keys.

If your filter-field is really a timestamp and not a date you will with this suggestion remove some records and loading them n times - but this could be neglected.

Be further aware that you should load a time-part only if it's mandatory in any views and if it's in the most scenarios better to split the timestamp into dates and times (maybe even separating seconds from milli-secods). By reducing the distinct number of field-values the RAM consumption will be (significantly) minimized.

Re: Data loading crashes with 20GB qvd

ChrisAess — Wed, 30 Apr 2025 08:31:43 GMT

Thank you very much,
The loading is running normally after several optimizations.