Solved: [CDC]Insert only data that has changed since last ... - Page 3 - Qlik Community

INESBK · ‎2017-05-24

I would like to insert in my database only the new data. So I used incremental loading by comparing my source (set of files) and my target (sql server table) with inner join but since the number of rows inserted in the database is huge this solution is not feasible.
So I thought of doing the CDC by date comparison (last date of run and my current date)
Unfortunately I don't know how to do it.
Someone can help me please !

INESBK · ‎2017-05-28

My structure is like this :

I have a folder and subfolder structure where each subfolder is located file.dbf( the name represents the sensor).

This file contains thousands of lines that represent the values of this sensor at some point.

Here is an example of a sensor name : "c:\folder\subfolder\128.dbf"

My need forces me to change the structure of this name in "folder.subfolder.MES".

So the form of the input file name (in tfilelist) is different from the form of file name that in database.

For this reason I used tfilelist to browse all the files, pass each filename to tSystem to read it and change the name structure with python script. Then make other transformations with tmap and finally insert in the table.

So , If I understood correctly, we must follow alternative 4

vapukov · ‎2017-05-28

@INESBK wrote:

So , If I understood correctly, we must follow alternative 4

If You ask my opinion - no, You do not need exactly go by this way :-), as well as You do not need Python for change name structure

But You can go any way of course

INESBK · ‎2017-05-28

No I have to use python to read the structure of the file.dbf and add a column that contains the name of file with this form.

[If You ask my opinion - no, You do not need exactly go by this way :-)]

And if I am not use the alternative 4 then wich soltuion can solve my problem?

vapukov · ‎2017-05-28

If You want full solution - You must provide full information

There above - a lot of working ideas (from really working processes)

but I do not need guess - how it will work in Your case, if You not provide:

- what structure - full description! what we have in files, what we have after python script, how data organised (sorted, unsorted and etc)

- what structure of database - columns, indexes and etc

- what exclusions possible

based on already presented information, You are just need:

1) request from SQL server - last time for selected sensor (taken by Python from file name)

2) read all files for this sensor and filter input data using tFilter or tMap by time column where value bigger than taken from database

that all

It is like a Google - as much more correct question, as much more relevant search result

You can request - "Green" or "Green Pub London", results will be little different 🙂

INESBK · ‎2017-05-28

Thank you very much for your advice and your time.

I try this

I got this error :

vapukov · ‎2017-05-28

It could be because - tMSSQLInput not run before tMap

In Your case - this process - independent ... somewhere in parallel world

You can do like:

it just example, but be careful about order - first read value, than use it

INESBK · ‎2017-05-28

Ah finally, thank you another time and sorry if I have not explained well, I am a beginner with talend.

Thanks

vapukov · ‎2017-05-28

Welcome to community! 🙂

[CDC]Insert only data that has changed since last run (Tos_DI)

Talend Data Integration

v6.x