Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
qv_testing
Specialist II
Specialist II

Reload new data only

Hi Community,

I have huge data, I am using 3 tier QVD Architecture..

QVD Generator  (Level 1  QVD's)

Intermediate Level (Level 2 QVD's)

Final Application (Final Application)

Earlier i have developed Incremental load.  (if i am using it's taking more time)

My requirement i don't need to full reload every time, with out using incremental load.


Even my three application i have to reload only new data.

is there any good approach.

Thanks in Advance...

9 Replies
mohamed_ahid
Partner - Specialist
Partner - Specialist

hi

how do you compare new data with the old ones ?

johnw
Champion III
Champion III

An incremental load MEANS only loading new data. Only loading new data MEANS you're doing an incremental load.

If your incremental load is taking more time than a full load, it seems likely to me that your incremental load could be written better, or the underlying databases themselves need modifications, such as adding a field or an index to support the incremental load. It is difficult to diagnose without knowing the details of your databases, your indexes, and your incremental load script.

But as some advice for figuring out what's wrong with your incremental load, where is the time being spent? Is it during the read of the database itself? Maybe there's an issue with your SQL (if it's SQL), or the database needs a field like change date to better support incremental load, or needs an index on the fields you're using. Is it during the concatenation of the existing QVD? Are you getting an optimized load? Anyway, figure out what part of the process is going slowly, and figure out how to speed it up, or ask for specific advice on the specific problem.

qv_testing
Specialist II
Specialist II
Author

Hi John,

I made sample data.... Actually i have SQL source.

is there anything wrong ??

If i have any new data also.. it's loading full.


PFA,

Anonymous
Not applicable

Hard to tell from your sample qvw as you have edited it from your actual qvw that has the issue.  Apart from me having to frig force it to do one off Full Load to create an initial qvd as otherwise it just skips through and does nothing - maybe due to having more IF's than END IF's and some other issues, the incremental load itself does seem to work ok

Would you be able to share the log file from the reload of your real qvw ?

johnw
Champion III
Champion III

As Bill says, your script does nothing due to various errors, but those errors may only be in your test script.

What you have is likely the most efficient way of getting the max date as it should only have to read the symbol table for the date field, which is relatively small no matter how big the overall QVD is, so that should be fine even though it's unoptimized. I could create an optimized version, but it would likely be slower in practice, or at least no faster.

You're loading in Excel data for step 1. Is that true of the real application? If it's actually from a QVD, for instance, your where predicate will not give you an optimized QVD load, and it would be easy to modify the script to do so - generate a list of dates from the previous max forward, and do where exists on that.

I'm unsure if your concatenate will be an optimized load or not in practice. Depending on how the IDs vs. dates work, another option would be to generate a list of dates prior to the max, and do a where exists on that.

johnw
Champion III
Champion III

Reading up a bit, it looks like the fastest way to get the max date is likely to be this:

Dates:   LOAD date FROM $(vQvdFile) (QVD) WHERE not exists(date);
MaxDate: LOAD date(max(date),'YYYY-MM-DD') as maxdate RESIDENT Dates;

qv_testing
Specialist II
Specialist II
Author

Hi Bill,

we don't have access to share any file..

I am working in Remote Machine.

qv_testing
Specialist II
Specialist II
Author

HI Bill and John,

I am confusing....

Please can you tell error here....(how to optimize in this script)

SET vQvdFile='D:\Incremental Load\step1.qvd';

SET vTableName='Step1';

LET vQvdExists = if(FileSize('$(vQvdFile)') > 0, -1, 0);

IF $(vQvdExists) THEN

maxdateTab:

LOAD Date(max("date"),'YYYY-MM-DD') as maxdate

  FROM $(vQvdFile) (qvd);

LET vIncrementalExpression = 'Where ("date") >' & Chr(39) & peek('maxdate') & Chr(39);

DROP table maxdateTab;

Step1:

LOAD id,

     Date(date,'YYYY-MM-DD') as date,

     cost

FROM

(ooxml, embedded labels, table is Sheet3)

$(vIncrementalExpression);

IF $(vQvdExists) THEN

CONCATENATE ($(vTableName))

LOAD * FROM $(vQvdFile) (qvd)

WHERE NOT exists("date");

End IF

Thanks..

johnw
Champion III
Champion III

Your sample script sets three variables and exits. The optimized version of your script is therefore three lines to set your variables.

SET vQvdFile='step1.qvd';
SET vTableName='Step1';
LET vQvdExists = 0;

But yes, I realize that's not what you want. I can't really tell what you want, and I know this isn't your real script or loading from your real data source. So...

Attached is a relatively random example of something that kind of looks like maybe what you were trying to attempt to do. Since Date is what seems to distinguish old data from new data, we can load the old data first. We can then get the max date from that in what I believe is the fastest way. Then I'm loading random data intended to simulate your SQL database, and from that taking only what has a date greater than what we already loaded. It just auto-concatenates onto the existing data. Finally, store it.

Here's the script.

LET vMaxDate = num(makedate(2010));

IF  filesize('Step1.qvd') THEN
    Step1:  LOAD * FROM Step1.qvd (QVD);
    MaxDate: LOAD num(max(fieldvalue('Date',recno()))) as MaxDate AUTOGENERATE fieldvaluecount('Date');
LET vMaxDate = peek('MaxDate');
    DROP TABLE MaxDate;
END IF

// This simulates the SQL to load from your actual database.
// You should have an index on the Date field.
Step1:
LOAD
ID
,Date
,Cost
WHERE Date > $(vMaxDate)
;
// This simulates the current state of your actual database.
// Most data will be at or below your max previous and won't be loaded.
// The data gets on average a little more recent each time to simulate running this every few days.
LOAD
num(ceil(rand()*999),'000') as ID
,date($(vMaxDate)+ceil(rand()*400)-390) as Date
,ceil(rand()*1000) as Cost
AUTOGENERATE 100
;
STORE Step1 INTO Step1.qvd (QVD);