Skip to main content
Announcements
Have questions about Qlik Connect? Join us live on April 10th, at 11 AM ET: SIGN UP NOW
cancel
Showing results for 
Search instead for 
Did you mean: 
jjordaan
Partner - Specialist
Partner - Specialist

Incremental load without a unique sequential id and modification date

Hi everyone,

I have a data loading situation were I'm stuck with, so I hope some off you can help me.

For almost all our customers we use a incremental load approach based on a unique sequential id or modification date.

Unfortunately with this customer there is no unique sequential id (by combining 3 fields we can create a unique key but it is not sequential) and there is no modification date. There is a date field but we cannot use it.

I cannot use Where Exists with the SQL statement.

The base QVD has about 4,5 million entries.

Unfortunately we can also not create a additional field in the database with a trigger or something.

Can anyone suggest an approach?

Thanks in advance

1 Solution

Accepted Solutions
jjordaan
Partner - Specialist
Partner - Specialist
Author

Hi everyone,

Thank you all for your time to help me with this topic.

To resolve this issue I did the following.

After gathering information about the database it seems that there is a field which is unique and sequential but only per category.

So I created a loop per category which fetching the highest number for every category.

Again thank you for all your help.

View solution in original post

8 Replies
Gysbert_Wassenaar

The only way I know of is to create a hash key from all the fields (or at least all non-key fields). You can use the three fields that combine to a unique key to keep track of new or removed items. You can check for each key value if the calculated hash has changed. If the hash value is different at least one field has a new value. Barry Harmsen wrote a bit about it here: www.qlikfix.com/2014/03/11/hash-functions-collisions/


talk is cheap, supply exceeds demand
Not applicable

Hi, Jeroen.

4,5 million entries may be loaded without any conditions.

In most cases, this approach is faster on the tables of this type.

The only thing that can be used to implement incremental load is the row number.  But this approach will not work with UPDATE statment.

Sorry, but my english so so.

jjordaan
Partner - Specialist
Partner - Specialist
Author

Hi Gysbert,

I have also thought about that but without testing it I think it would be performance intensive because it needs to verify record per record.

But if it is the only solution then I have no choice

jjordaan
Partner - Specialist
Partner - Specialist
Author

Hi Alexander,

The thing also is that this customer has multiple stores around the world so NOT loading incremental is not an option.

Can you be a bit more specific about your idea with rownumber?

Thanks in advance

Not applicable

Jeroen, I mean the following implementation:

tempRowsNo:

LOAD RowNo() as ID

From MyTable.qvd(qvd);

tempRowsNum:

first 1 LOAD  ID as RowsNum

Resident tempRowsNo

Order By ID desc;

DROP Table tempRowsNo;

LET vRowsNum = peek('RowsNum',0,'tempRowsNum');

DROP Table tempRowsNum;

Diff:

LOAD *;

SQL SELECT * FROM MyTable

EXCEPT

SELECT TOP $(vRowsNum ) * FROM MyTable;

Not applicable

This example works on SQL Server, but achievable in other DBMS.

SELECT * is just example . Read it like SELECT {MyFieldsList}

Peter_Cammaert
Partner - Champion III
Partner - Champion III

I think you're right. Calculating a hash for primary key fields doesn't tell you what has changed and including all fields that may have been modified comes down to reading all records anyway.

Isn't there a technique in RDBMS to define triggers that add your virtual primary keys to a new (delta) table whenever a record changes in the original table? Could serve as a delta index. 

jjordaan
Partner - Specialist
Partner - Specialist
Author

Hi everyone,

Thank you all for your time to help me with this topic.

To resolve this issue I did the following.

After gathering information about the database it seems that there is a field which is unique and sequential but only per category.

So I created a loop per category which fetching the highest number for every category.

Again thank you for all your help.