Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Join us in NYC Sept 4th for Qlik's AI Reality Tour! Register Now
cancel
Showing results for 
Search instead for 
Did you mean: 
ioannagr
Creator III
Creator III

hard delete incremental load on very large table

Hi all.

So I've figured out a way of reading for the first time my large table. I'm reading it in parts, by each Year, so I have ended up with Qvd_2011, Qvd_2012, ... Qvd.2021, so the final table would be a concatenation of them.

My question is, how do I do the hard delete incremental loading without it taking AGAIN so long?

 

My script for incremental load is

NoConcatenate
TempYear:
load Distinct year(date(floor(datetime),'DD/MM/YYYY')) as Year;

SQL SELECT datetime
FROM public.tab1;


LET NumRows=NoOfRows('TempYear');
trace $(NumRows);
For n=1 To $(NumRows)

let vYear = Peek('Year',$(n)-1,'TempYear');
trace $(vYear);
TempSize:
NoConcatenate

LOAD
tab1_pk,
size,
year(date(floor(datetime),'DD/MM/YYYY')) as YearOfInterest;

SQL select
tab1.`pk` "tab1_pk",tab1.`datetime`,
tab2.`fk` "tab2_fk", tab2.`size` "size"
from
public.tab1 tab1
INNER JOIN
public.tab2 tab2
on  tab2.`fk`=tab1.`pk`
where year(tab1.`datetime`)='$(vYear)' and  tab1.`pk`>'$(MaxInitialID)';     //based on another table

Size:
LOAD
tab1_pk,
sum(size) 
resident TempSize
group by tab1_pk;

DROP TABLE TempSize;


// let vFileName='Size-' &'$(vYear)';
// store Image_Size into '$(vPathQVD)/$(vFileName).qvd';//into $(vFileName).qvd (qvd);

next n;
drop table TempYear;


Concatenate
Load * from [$(vPathQVD)Initial.qvd] (qvd)
WHERE NOT EXISTS(tab1_pk);


Inner Join
Load tab1_pk as tab1_pk;
SQL SELECT tab1.`pk` "tab1_pk" from
public.tab1 tab1
INNER JOIN
public.tab2 tab2
on tab1.`pk`=tab2.`fk`;

How do i handle the red part, so that it doesn't take AGES to load? In fact it took at least 7 days to load and was still going... How do I do the year split in here as well? 

 

Many many thanks in advance!

3 Replies
Gysbert_Wassenaar
Partner - Champion III
Partner - Champion III

If I understand the problem correctly then you first load the primary keys of the deleted items and then load the qvd with a where not exists clause so the deleted items are not loaded from the qvd's. Then you store the qvd's again to 'delete' the items from the qvd's.

Since you have qvd's per year you'll probably have to use a loop to iterate of the qvd's and once all the qvd's are processed load them again into one table

DeletedItems:
SQL Select MyPrimaryKey FROM MyTable Where Status = 'deleted';

For vYear = 2010 To 2021

TmpQVD:
    LOAD * FROM [LIB:/QVDPath/qvd_$(vYear).qvd] (qvd) WHERE Not Exists (MyPrimaryKey);
    STORE TmpQVD INTO [LIB:/QVDPath/qvd_$(vYear).qvd] (qvd) ;
    DROP TABLE TmpQVD ;
Next

DataWithoutDeletedItems:
LOAD * FROM [LIB://QVDPath/qvd_*.qvd] (qvd);


talk is cheap, supply exceeds demand
ioannagr
Creator III
Creator III
Author

@Gysbert_Wassenaar i don't have such a field status='Deleted' though.

 

There is a routine for hard delete  incremental loading implementation ( https://help.qlik.com/en-US/sense/February2021/Subsystems/Hub/Content/Sense_Hub/LoadData/use-QVD-fil...

which i followed. There needs to be a final inner join (as you said to "delete" from the table) to happen which reads again from the database... that is my problem and i don't know what to do... 😞

Gysbert_Wassenaar
Partner - Champion III
Partner - Champion III

It doesn't matter that you don't have a field called status. What matters is that there is a way to identify the records that should be deleted. I don't know your data so I can't tell you how to do that. The point is you should load a set of unique primary keys that identify the items that should be deleted. Once you have that you can load the data from the qvd's you created earlier and use a where not exists clause when loading those qvd's so the deleted items don't get loaded from the qvd's. And then store that data again in qvd's so the only the items that are not deleted.


talk is cheap, supply exceeds demand