Do not input private or sensitive data. View Qlik Privacy & Cookie Policy.
Skip to main content

Announcements
Qlik Open Lakehouse is Now Generally Available! Discover the key highlights and partner resources here.
cancel
Showing results for 
Search instead for 
Did you mean: 
INESBK
Creator
Creator

[CDC]Insert only data that has changed since last run (Tos_DI)

I would like to insert in my database only the new data. So I used incremental loading by comparing my source (set of files) and my target (sql server table) with inner join but since the number of rows inserted in the database is huge this solution is not feasible.
So I thought of doing the CDC by date comparison (last date of run and my current date)
Unfortunately I don't know how to do it.
Someone can help me please !

 

Labels (2)
1 Solution

Accepted Solutions
vapukov
Master II
Master II

It could be because - tMSSQLInput not run before tMap

In Your case - this process - independent ... somewhere in parallel world 

 

You can do like:

0683p000009LujW.png

 

it just example, but be careful about order - first read value, than use it

View solution in original post

27 Replies
TRF
Champion II
Champion II

Hi,

Think (to be confirmed) CDC is available only for Enterprise version, not Community.

However, what do you mean by "huge", billions rows?

These links may help:

https://www.talend.com/blog/2017/01/05/talend-job-design-patterns-best-practices-part-4/

http://bekwam.blogspot.fr/2011/07/complex-joins-out-of-tmap-in-talend.html

 

Also this topic https://community.talend.com/t5/Design-and-Development/tMap-compare-integer-before-UPDATE-in-express...

INESBK
Creator
Creator
Author

Thanks for your replay.

I have 80,000,000 million line in my table so when I made inner join on tMsSqlInput to capture the new line I no longer have memory (It consumes almost 90% of physical memory) and my job is crashing ! 

TRF
Champion II
Champion II

The solution described here on the secttion "tMap Lookups" may helps.

It gives you the key to not have to load all 80,000,000 records at a time.

INESBK
Creator
Creator
Author

I applied this solution but it takes a lot of time because it looks and compares each file line by the result of my query in the composant tMsSQLInput ! 

For this reason I want to change this solution and use the comparison by timestamp! 

TRF
Champion II
Champion II

Can you share your job?

INESBK
Creator
Creator
Author

Hi,

Here is an example of my job.
Usually tMySqlInput allows me to read the rows of the database to do lookup but as they are many lines (80000000 lines) the job crashes. So I need another soltuion for incremental loading ! 

 

0683p000009LvAX.png

TRF
Champion II
Champion II

You should refer yourself to this discussion.

The idea is to split the large table to smallest pieces then, using a master job, to run the chld job as many times as necessary depending on the number of files generated when the table content is splited.

For example, if each file contains 10,000,000 records, you need to process 8 times.

INESBK
Creator
Creator
Author

The solution that you  recommended to me makes it possible to run the job in Thread but for me i need a different solution than the internal join and lookup, which allows me to insert only the new lines in my database ! 

 

vapukov
Master II
Master II


@INESBK wrote:


So I thought of doing the CDC by date comparison (last date of run and my current date)
Unfortunately I don't know how to do it.
Someone can help me please !

 


why in this case not read MAX date from database and than filter new data with date bigger than?

it could be also ID, or other incremental properties.

if You not store timestamp in database, but use it for lookup - You can store in additional table - last_insert, than read and use

 

in case when Lookup more complicate - try use insert ignore or insert/update

would be good if You little more explain Your Job logic