Skip to main content
Announcements
Join us at Qlik Connect for 3 magical days of learning, networking,and inspiration! REGISTER TODAY and save!
cancel
Showing results for 
Search instead for 
Did you mean: 
t-gilchrist
Contributor III
Contributor III

How to only load "new" data to Table? (Large table association causing slow loading)

Hello all,

I have two large tables that are only going to get bigger over time. I need one column out of table B to be associated with table A but no matter how I try to associate them, the data manager either fails to load at all (gets stuck on "Loading Tables..." screen). I've tried multiple joins, creating the association in the data manager after the load, and apply map. The only way I was even able to get it to load to verify the association was working was to limit the data pulled from TableA using "WHERE "timestamp" > '2018-08-29';" and that still takes probably a few minutes on the "Loading Tables..." portion of the data manager and about a minute and a half to load. Ideally I would like to pull data as far back as the first of the year.

What would be perfect is a way to only load "new" data or data that isn't already existing from the previous load. I did run across a post or two suggesting using a WHERE NOT EXIST but I wasn't able to make it work in my script without errors.

What can I do to the script below to only load new data or decrease load time while still associating "Kind_V" with the "UnqID"?

LIB CONNECT TO 'My-DBO';

[TableA]:

Mapping Load

    Order&Code as UnqID,

    "Kind_V";

SELECT

    Order,

    Code,

    "Kind_V"

FROM MY.dbo."V_Mat"

WHERE "timestamp" > '2018-08-29';

[TableB]:

Load

Order&Code as UnqID

From

['lib://MyFolder\*.xlsx']

(ooxml, embedded labels, table is [MatInfo]);

[TableC]:

Load

     *,

    ApplyMap('TableA',UnqID,Null()) as Mat_Kind

Resident TableB;

Drop Table TableB;

3 Replies
vamsee
Specialist
Specialist

Qlik is in-memory analysis tool means all analysis done in the System RAM.

. Like


Qlik is in-memory analysis tool means all analysis done in the System RAM.

Always Load * or Select * fetches data faster from the source, so keeping that in mind I would suggest you not to make transformations on the preceding load of TableA, instead create a Two Tier Architecture by doing the extracts separately.

Store TableA into a QVD .

Store TableB into a QVD.

Use a different file to read both and do transformations.

/** File (QVW) 1 **/
LIB CONNECT TO 'My-DBO';
[TableA]:
Load
*;
SELECT
Order,
Code,
"Kind_V"
FROM MY.dbo."V_Mat";

A:
LOAD
Order&Code as UnqID,
"Kind_V"
Resident TableA;
DROP Table A;
STORE A into TableA.qvd (qvd);

/** File (QVW) 2 **/
LIB CONNECT TO 'My-DBO';
[TableB]:
Load
Order,
Code
From
['lib://MyFolder\*.xlsx']
(
ooxml, embedded labels, table is [MatInfo]);

STORE TableB into TableB.qvd (qvd);
DROP Table TableB;


/***File (QVW) 3 **/

TableA_Map:
/** Optimized Load **/
Mapping LOAD
UnqID,
"Kind_V"
FROM TableA.qvd (qvd);

TableB:
NoConcatenate
Load
Order,
Code
FROM TableB.qvd (qvd);

/*** If your TableB is small then you can add the applymap into the QVD load itself or else use below**/
Final:
NoConcatenate
LOAD
Order,
Code,
Order&Code as UnqID,
ApplyMap('TableA_Map',Order&Code, NUll()) as Mat_Kind
Resident TableB;
DROP Table TableB;

Coming to the tables getting larger over time. This is handled by Incremental Load and its easy on a QVD than the source. Refer

Overview of Qlik Incremental Loading.

https://www.analyticsvidhya.com/blog/2014/09/qlikview-incremental-load/

t-gilchrist
Contributor III
Contributor III
Author

Thanks for the detailed explanation.

I did implement your suggest code and there was an improvement in load time but as you mentioned to be able to handle the data increasing over time I would need to do some kind of incremental load. In my earlier example one source was multiple xlsx files that were from a report that was run automatically once a day, I'm no longer using that connection source. I thought this could have potentially helped speed up the load time vs pulling from the server but with an incremental load that is no longer the case.


My latest plan is to have a "historical load" QDW that loads all my desired fields from "My-dbo" WHERE "timestamp" > X AND "timestamp" < x then STORE to "MY-QVD"

I will change the x and y variables month by month to build up the history of the file until i have everything from 2018-01-01 to present.

THEN

I will have another "incremental load" QDW that does the job of the incremental load and use this as the "working" file but I'm having an issue with this script getting stuck on the "loading tables..." screen but this time the data manager actually loads, its after I click "synchronize scripted tables" it gets stuck. I've stripped the script down to what should be a minimal load so I believe its actually something I'm doing wrong rather than loading to much data.... below are my current scripts for the historical and incremental loads.


//** HISTORICAL LOAD **//

LIB CONNECT TO 'My-DBO';

[Historical_Load]:

SQL SELECT

    Order,

    "timestamp"

FROM My-DBO.dbo."V_Mat"

WHERE "timestamp" > '2018-10-02 AND "timestamp" < '2018-10-03';

STORE Historical_Load into 'lib://MyQVD/WorkingFile.qvd' (qvd);

//**NOTE: "WHERE "timestamp" > '2018-10-02 AND "timestamp" < '2018-10-03'" was done to limit data pulled to verify if the increment was working **//


//** INCREMENTAL LOAD **//

[CurrentMaxDate]:

Load Max(Date("timestamp",'YYYY-MM-DD')) as MaxDate

FROM 'lib://MyQVD/WorkingFile.qvd' (qvd);

Let MaxDate = peek('MaxDate',0,CurrentMaxDate);

LIB CONNECT TO 'My-DBO';

[Incremental_Load]:

SQL SELECT

Order,

"timestamp"

FROM My-DBO.dbo."V_Mat"

WHERE "timestamp" > '$(MaxDate)';

/*Concatenate

LOAD *

FROM ['lib://MyQVD/WorkingFile.qvd'];

STORE Incremental_Load into 'lib://MyQVD/WorkingFile.qvd' (qvd);*/

//**NOTE: The above commented out section was done so to test the script output**//

vamsee
Specialist
Specialist

Hi Trey,

Maybe you can further optimize your code by the following

As your historical is only pulled once, ill let you decide the best way to do that.

Maybe store your daily files in intervals of days months or weeks so that you don't trigger the large historical file every day for Max Date.
I have done it using days as an example



// Sorry if I got your file names wrong, just to be clear I am naming your
historical file as Historical.qvd,
Daily File as WorkingFile_Date.qvd
Final_File

/*** File to Extract *****/


[CurrentMaxDate]:

Load Max(Date(
"timestamp",'YYYY-MM-DD')) as MaxDate //Change this to MaxDate after the initial load (no Max() required)

FROM 'lib://MyQVD/WorkingFileMaxDate.qvd' (qvd);

Let MaxDate = peek('MaxDate',0,CurrentMaxDate);

DROP Table CurrentMaxDate;

LIB CONNECT TO 'My-DBO';

[Incremental_Load]:

SQL SELECT

Order,

"timestamp"

FROM My-DBO.dbo."V_Mat"

WHERE
"timestamp" > '$(MaxDate)';

STORE Incremental_Load into WorkingFile_$(MaxDate).qvd (QVD);

CurrentMaxDate:
Load
Max(Date(MaxDate,'YYYY-MM-DD')) as MaxDate
Resident Incremental_Load;
DROP Table Incremental_Load;


STORE CurrentMaxDate into WorkingFileMaxDate.qvd (QVD);

/*** File to Concatenate **/

Daily_Files:
LOAD
*

FROM ['lib://MyQVD/WorkingFile_*.qvd'] (qvd); //Loads all daily Files in the folder

Historical:
Concatenate(Daily_Files)
LOAD
*
FROM ['lib://MyQVD/Historical.qvd'] (qvd)
Where not Exists (timestamp); // This can be omitted if you are sure that your load process is perfect.


STORE Daily_Files into 'lib://MyQVD/Final_File.qvd' (qvd);



NOTES:

1. Double check if your date formats are same in all places i.e. MaxDate, Max(MaxDate), timestamp, $(MaxDate)

2. As a suggestion store your daily files in WorkingFile_20181005.qvd (YYYYMMDD) format.