<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: [CDC]Insert only data that has changed since last run (Tos_DI) in Talend Studio</title>
    <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221331#M15382</link>
    <description>&lt;P&gt;You should refer yourself &lt;A href="https://community.qlik.com/s/feed/0D53p00007vCnn8CAC#M4202" target="_self"&gt;to this discussion&lt;/A&gt;.&lt;/P&gt; 
&lt;P&gt;The idea is to split the large table to smallest pieces then, using a master job, to run the chld job as many times as necessary depending on the number of files generated when the table&amp;nbsp;content is splited.&lt;/P&gt; 
&lt;P&gt;For example, if each file contains 10,000,000 records, you need to process 8 times.&lt;/P&gt;</description>
    <pubDate>Sat, 27 May 2017 10:38:43 GMT</pubDate>
    <dc:creator>TRF</dc:creator>
    <dc:date>2017-05-27T10:38:43Z</dc:date>
    <item>
      <title>[CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221324#M15375</link>
      <description>&lt;P&gt;I would like to insert in my database only the new data. So I used incremental loading by comparing my source (set of files) and my target (sql server table) with inner join but since the number of rows inserted in the database is huge this solution is not feasible.&lt;BR /&gt;So I thought of doing the CDC by date comparison (last date of run and my current date)&lt;BR /&gt;Unfortunately I don't know how to do it.&lt;BR /&gt;Someone can help me please !&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 15:27:26 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221324#M15375</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-24T15:27:26Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221325#M15376</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;Think (to be confirmed) CDC is available only for Enterprise version, not Community.&lt;/P&gt; 
&lt;P&gt;However, what do you mean by "huge", billions rows?&lt;/P&gt; 
&lt;P&gt;These links may help:&lt;/P&gt; 
&lt;P&gt;-&amp;nbsp;&lt;A href="https://www.talend.com/blog/2017/01/05/talend-job-design-patterns-best-practices-part-4/" target="_self" rel="nofollow noopener noreferrer"&gt;https://www.talend.com/blog/2017/01/05/talend-job-design-patterns-best-practices-part-4/&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;-&amp;nbsp;&lt;A href="http://bekwam.blogspot.fr/2011/07/complex-joins-out-of-tmap-in-talend.html" target="_self" rel="nofollow noopener noreferrer"&gt;http://bekwam.blogspot.fr/2011/07/complex-joins-out-of-tmap-in-talend.html&lt;/A&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Also this topic&amp;nbsp;&lt;A href="https://community.qlik.com/s/feed/0D53p00007vCwYdCAK" target="_self"&gt;https://community.talend.com/t5/Design-and-Development/tMap-compare-integer-before-UPDATE-in-expression-filter/m-p/25112&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 15:56:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221325#M15376</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2017-05-24T15:56:47Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221326#M15377</link>
      <description>&lt;P&gt;Thanks for your replay.&lt;/P&gt; 
&lt;P&gt;I have 80,000,000 million line in my table so when I made inner join&amp;nbsp;on tMsSqlInput to capture the new line I no longer have memory (It consumes almost 90% of physical memory) and my job is crashing !&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 16:19:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221326#M15377</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-24T16:19:56Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221327#M15378</link>
      <description>&lt;P&gt;The solution &lt;A href="https://www.talend.com/blog/2017/01/05/talend-job-design-patterns-best-practices-part-4/" target="_self" rel="nofollow noopener noreferrer"&gt;described here&lt;/A&gt;&amp;nbsp;on the secttion "tMap Lookups" may helps.&lt;/P&gt;&lt;P&gt;It gives you the key to not have to load all 80,000,000 records at a time.&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 16:38:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221327#M15378</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2017-05-24T16:38:52Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221328#M15379</link>
      <description>&lt;P&gt;I applied this solution but it takes a lot of time because it looks and compares each file line by the result of my query in the composant tMsSQLInput !&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For this reason I want to change this solution and use the comparison by timestamp!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 16:45:19 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221328#M15379</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-24T16:45:19Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221329#M15380</link>
      <description>&lt;P&gt;Can you share your job?&lt;/P&gt;</description>
      <pubDate>Wed, 24 May 2017 16:59:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221329#M15380</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2017-05-24T16:59:00Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221330#M15381</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt; 
&lt;P&gt;Here is an example of my job.&lt;BR /&gt;Usually tMySqlInput allows me to read the rows of the database to do lookup but as they are many lines (80000000 lines) the job crashes. So I need another soltuion for incremental loading !&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.PNG" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009LvAX.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/151793iB6DE51446C025F3C/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009LvAX.png" alt="0683p000009LvAX.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 09:54:08 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221330#M15381</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-27T09:54:08Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221331#M15382</link>
      <description>&lt;P&gt;You should refer yourself &lt;A href="https://community.qlik.com/s/feed/0D53p00007vCnn8CAC#M4202" target="_self"&gt;to this discussion&lt;/A&gt;.&lt;/P&gt; 
&lt;P&gt;The idea is to split the large table to smallest pieces then, using a master job, to run the chld job as many times as necessary depending on the number of files generated when the table&amp;nbsp;content is splited.&lt;/P&gt; 
&lt;P&gt;For example, if each file contains 10,000,000 records, you need to process 8 times.&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 10:38:43 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221331#M15382</guid>
      <dc:creator>TRF</dc:creator>
      <dc:date>2017-05-27T10:38:43Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221332#M15383</link>
      <description>&lt;P&gt;The solution that you &amp;nbsp;recommended to me makes it possible to run the job in Thread but for me i need&amp;nbsp;a different solution than the internal join and lookup, which allows me to insert only the new lines in my database !&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 14:56:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221332#M15383</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-27T14:56:15Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221333#M15384</link>
      <description>&lt;BLOCKQUOTE&gt; 
 &lt;HR /&gt; 
 &lt;A href="https://community.qlik.com/s/profile/0053p000007LLuLAAW"&gt;@INESBK&lt;/A&gt; wrote: 
 &lt;BR /&gt; 
 &lt;P&gt;&lt;BR /&gt;So I thought of doing the CDC by date comparison (last date of run and my current date)&lt;BR /&gt;Unfortunately I don't know how to do it.&lt;BR /&gt;Someone can help me please !&lt;/P&gt; 
 &lt;P&gt;&amp;nbsp;&lt;/P&gt; 
 &lt;HR /&gt; 
&lt;/BLOCKQUOTE&gt; 
&lt;P&gt;why in this case not read MAX date from database and than filter new data with date bigger than?&lt;/P&gt; 
&lt;P&gt;it could be also ID, or other incremental properties.&lt;/P&gt; 
&lt;P&gt;if You not store timestamp in database, but use it for lookup - You can store in additional table - last_insert, than read and use&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;in case when Lookup more complicate - try use insert ignore or insert/update&lt;/P&gt; 
&lt;P&gt;would be good if You little more explain Your Job logic&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 15:08:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221333#M15384</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-27T15:08:04Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221334#M15385</link>
      <description>&lt;P&gt;In fact this is what I search, in my base I have a dimestamp so I want to compare max date with the new incoming date in the files...&lt;BR /&gt;This comparison is going to be in the query of tmssqlinput or in the tmap filter ?&lt;/P&gt;
&lt;P&gt;And should I delete the inner join?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 15:45:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221334#M15385</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-27T15:45:24Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221335#M15386</link>
      <description>&lt;P&gt;I do this in 2 steps:&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;step 1:&lt;/P&gt; 
&lt;P&gt;Input from database, something like this (AWS Redshift):&lt;/P&gt; 
&lt;PRE&gt;"SELECT dateadd(day, 1, date(Max(\"whendt\"))) as last_date from "+context.AWS_RS_schema+"."+"tbldatatransactions"&lt;/PRE&gt; 
&lt;PRE&gt;or

"SELECT NVL(max(TransactionID), 0)  as maxid  FROM "+context.AWS_RS_schema+".tbltransactions"&lt;/PRE&gt; 
&lt;P&gt;than tFlowToIterate for save this value to variable&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;than on next steps - use variable for:&lt;/P&gt; 
&lt;P&gt;- filter in SQL queries in Input components&lt;/P&gt; 
&lt;PRE&gt;"SELECT 
     something
FROM `tblDataTransactions`
WHERE `When` &amp;gt;= " + "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", (Date)globalMap.get("v_last_date")) + "'"  + " AND `When` &amp;lt; "  
+ "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", TalendDate.getCurrentDate()) + "'" &lt;/PRE&gt; 
&lt;P&gt;- tFilter or tMap if source files csv or other format&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screen Shot 2017-05-28 at 2.52.08 AM.png" style="width: 999px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009Lv7K.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/148252i22DDC34C0F9C55AB/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009Lv7K.png" alt="0683p000009Lv7K.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 15:51:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221335#M15386</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-27T15:51:47Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221336#M15387</link>
      <description>&lt;P&gt;I tried to apply your method.&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;Here is a screenshot&amp;nbsp;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="a.PNG" style="width: 956px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009LuaN.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/132371i41271F071DBB1B29/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009LuaN.png" alt="0683p000009LuaN.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;So in tMSSqlInput_1 i made this query&amp;nbsp;"SELECT datediff(day, Max(Trends2.TS), getdate()) as last_load FROM Trends2"&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;than tFlowToIterate for save this value to variable&amp;nbsp;"v_last_date"&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;After that in tMSSqlInput_2 i made this query :&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;"SELECT&lt;BR /&gt;Trends2.chrono,Trends2.name,Trends2.value,Trends2.quality&lt;BR /&gt;FROM Trends2&lt;BR /&gt;WHERE S.TS &amp;gt;= " + "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", (Date)globalMap.get("v_last_date")) + "'" + " AND S.TS &amp;lt; "&lt;BR /&gt;+ "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", TalendDate.getCurrentDate()) + "'"&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;S.TS it's a timestamp retrive from tmap_2.&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;But when i clik Guess schema i have this error :&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="b.PNG" style="width: 759px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009LujC.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/141406iD5B79BE78471A35F/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009LujC.png" alt="0683p000009LujC.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 16:48:49 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221336#M15387</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-27T16:48:49Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221337#M15388</link>
      <description>&lt;P&gt;of course, schema guess - will not work with null value, and on this moment it null&lt;/P&gt;
&lt;P&gt;if what Guess schema - change query for this period&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;but (!!!) - you do something wrong completely&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;First:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;"SELECT datediff(day, Max(Trends2.TS), getdate()) as last_load FROM Trends2"&lt;/PRE&gt;
&lt;P&gt;return INT&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;but than You put INT into DateFormat&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Second:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In my example this variable used for filter Main Input - in Your case Lecture_DBF&lt;/P&gt;
&lt;P&gt;in my example - I take MAX from AWS Redshift and use it for filter data from MySQL, You try to use it on same table&lt;/P&gt;
&lt;P&gt;if You want use it for filter lookup data, You must change logic - no reason request MAX value from lookup file and than try to use it for filter same lookup table - it always will return NULL result&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;"SELECT datediff(day, Max(Trends2.TS), getdate()) as last_load &lt;STRONG&gt;FROM Trends2&lt;/STRONG&gt;"

"SELECT
Trends2.chrono,Trends2.name,Trends2.value,Trends2.quality
&lt;STRONG&gt;FROM Trends2&lt;/STRONG&gt;
WHERE S.TS &amp;gt;= " + "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", (Date)globalMap.get("v_last_date")) + "'" + " AND S.TS &amp;lt; "
+ "'" + TalendDate.formatDate("yyyy-MM-dd 00:00:00", TalendDate.getCurrentDate()) + "'"&lt;/PRE&gt;
&lt;P&gt;&lt;BR /&gt;Double check - what You request and what want to filter&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 16:56:55 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221337#M15388</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-27T16:56:55Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221338#M15389</link>
      <description>&lt;P&gt;For fisrt :&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;when i write this query "SELECT datediff(day, date(Max(Trends2.TS)), date(getdate())) as last_load FROM Trends2"&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;i get ths error :&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="c.PNG" style="width: 582px;"&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="0683p000009LuP0.png"&gt;&lt;img src="https://community.qlik.com/t5/image/serverpage/image-id/139304iD3404ECC24067717/image-size/large?v=v2&amp;amp;px=999" role="button" title="0683p000009LuP0.png" alt="0683p000009LuP0.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;For this reason i change my query to :"SELECT datediff(day, Max(Trends2.TS), getdate()) as last_load FROM Trends2"&lt;/P&gt; 
&lt;P&gt;Second:&lt;/P&gt; 
&lt;P&gt;I can't do the filter in&amp;nbsp;&lt;SPAN&gt;Lecture_DBF because this component use to execute script python in order te read file.dbf.&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;For this reason i uses S.TS from tmap to retrive date came from file.&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;My job allows me to read files.dbf to insert them into a sql server table.&lt;BR /&gt;Every time I have new values to insert in the database (since this is data from sensors) so I need to insert only the new lines.&lt;BR /&gt;I use the inner join in the beginning&amp;nbsp;but it doesn't work because problem of memory so I have to use the comparison with dates.&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;The idea is to compare date_max in the table(t_MSSqlInput) with the date_in_file&lt;BR /&gt;If date_in_file&amp;gt; date_in_base I insert the data, else I ignore it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;Excuse me if I misunderstood you...&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 27 May 2017 17:37:28 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221338#M15389</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-27T17:37:28Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221339#M15390</link>
      <description>&lt;P&gt;let go step by step&lt;BR /&gt;&lt;BR /&gt;1) to close SQL problem, not depending from form&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;"SELECT datediff(day, date(Max(Trends2.TS)), date(getdate())) as last_load FROM Trends2" or&amp;nbsp;"SELECT datediff(day, Max(Trends2.TS), getdate()) as last_load FROM Trends2"&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;by this SQL query You are request difference in days - INTEGER value, like 5 days&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;so this is can not help You in&amp;nbsp;feature filters!!!&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;2) even if You request just &lt;SPAN&gt;SELECT date(Max(Trends2.TS)) - it return You MAX date already in database, so - what reason use this value for SELECT from same table - it will empty result always&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;3) for proper resolve the problem need understand what You try to do:&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- is it constant flow from&amp;nbsp;&lt;/SPAN&gt;sensors? in this case why You worry about date? new data always newer&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;- is it batch process? When You parse logs and want write only new? in this case - why not sort this files by date and than just filter all newer than requested MAX(date)?&lt;/P&gt; 
&lt;P&gt;- if it sensors - it normally mean, not only date is unique parameter, but as well sensor ID as well?&lt;/P&gt; 
&lt;P&gt;- we know size of You table - what about new data? what size (number of new rows) per iteration?&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2017 03:14:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221339#M15390</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-28T03:14:04Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221340#M15391</link>
      <description>&lt;P&gt;Thanks for your replay.&lt;/P&gt; 
&lt;P&gt;1)The flux is not constant from the sensors. The number of files that represents the sensors is constant (about 3000 file) but each file contains a different number of rows.&lt;/P&gt; 
&lt;P&gt;I agree with you that the new data always newer but in case of problems or an interruption of the execution (since I will launch the job every night) I need incremental loading to insert only the new lines.&lt;/P&gt; 
&lt;P&gt;2)Sorry but i didn't understand your question about batch process !&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;3)Id of the sensor is composed of 4 fields: value of the sensor, quality, name, chrono (number of milliseconds since 1970)&lt;/P&gt; 
&lt;P&gt;4)The number of lines differs from one file to another (it can be 2000 as 30 0000)&lt;/P&gt; 
&lt;P&gt;And since I will run the job every night to insert new values of sensor, the table size will increase !&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;I hope I was a little clearer.&lt;/P&gt; 
&lt;P&gt;Thanks !&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2017 08:42:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221340#M15391</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-28T08:42:15Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221341#M15392</link>
      <description>&lt;P&gt;Look, this problem not related to Talend, not to databases, just logic and process organisation&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You have 3000 files from different sensors&lt;/P&gt;
&lt;P&gt;You can have many file for 1 sensors, or You can have many sensors inside single file. This is variant for logic&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Variant 1&lt;/STRONG&gt; - many file for same server, but inside single file - only 1 sensor present&lt;/P&gt;
&lt;P&gt;in this case You just need:&lt;/P&gt;
&lt;P&gt;1) request MAX loading time from database for this sensor&lt;/P&gt;
&lt;P&gt;2) process files related to this sensor 1 by 1 starting from oldest and filter all data where date bigger than saved from database&lt;/P&gt;
&lt;P&gt;if time not distributed by files - You must first merge all files for this sensor, and do the same after&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Variant 2&lt;/STRONG&gt; - many sensors in same file&lt;/P&gt;
&lt;P&gt;1) select file, again oldest - first&lt;/P&gt;
&lt;P&gt;2) check - what sensors inside - tUnique by sensor name&lt;/P&gt;
&lt;P&gt;3) request from database MAX date for each sensor inside this file&lt;/P&gt;
&lt;P&gt;4) filter data by sensor name and date&lt;/P&gt;
&lt;P&gt;in case of variant 2 You can:&lt;/P&gt;
&lt;P&gt;- select all sensors from file&lt;/P&gt;
&lt;P&gt;- run query like SELECT MAX(date) as last_date, sensor_name FROM table. It will return 1 row per sensor. You can save it into tHashInput&lt;/P&gt;
&lt;P&gt;- than tMap with main flow from file and lookup from tHashOutput with filter on sensor name and date bigger than saved&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;as You can see in both variant - &lt;STRONG&gt;You are filter input flow&lt;/STRONG&gt;, but not database&lt;/P&gt;
&lt;P&gt;in both variant - &lt;STRONG&gt;You must organise processing of files sorted by date, or first merge files and sort them by date, than filter&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;I use both of this variants in real life, so could confirm - all work as expected, but always must be adjusted for real structure&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2017 09:22:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221341#M15392</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-28T09:22:24Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221342#M15393</link>
      <description>&lt;P&gt;For me is variant 1:&amp;nbsp;Each file represents a sensor !&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you can explain to me a little more the logic of the first proposition:&lt;BR /&gt;Does this filter have to be in tmap and connected by a tmssqlinput to get max (date) from the table?&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2017 09:51:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221342#M15393</guid>
      <dc:creator>INESBK</dc:creator>
      <dc:date>2017-05-28T09:51:15Z</dc:date>
    </item>
    <item>
      <title>Re: [CDC]Insert only data that has changed since last run (Tos_DI)</title>
      <link>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221343#M15394</link>
      <description>&lt;P&gt;keep in mind - I do not have Your data, Your structure and etc, so I more or less try adopt my experience ... but I do this blind&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;if You want have full answer - always public all information&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;Example:&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;sensor1_20170528_1300-1400.csv&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;sensor1_20170528_1400-1500.csv&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;sensor1_20170528_1500-1600.csv&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;etc&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;so we know:&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- sensor1 - it part of file name&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- we have 24 file per day&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- file name include&amp;nbsp;date time pattern&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- sensors could be new, not all&amp;nbsp;already in database&amp;nbsp;(if not - we change logic)&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;We do:&lt;/STRONG&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;1) tFileList - prepare list of files, extract name of sensors from filename or from first row inside each file, make list unique&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;Iterate over Sensor names array:&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;2) tMSSQLRow - SELECT COALESCE(MAX(date), &lt;STRONG&gt;yesterday&lt;/STRONG&gt;) as last_date from table WHERE sensor_name = variable_sensor_name, save it to variable&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;3) tFileList with pattern:&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;variable_sensor_name&lt;/STRONG&gt;+pattern_for_yesterday+.csv &lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;- pattern could be different - even for past Hour or etc. &lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&lt;SPAN&gt;We are sort files by time - we can guess - earliest filename pattern have&amp;nbsp;easiest create time (if not - we change algorithm)&lt;/SPAN&gt;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;4) tFileInputDelimited -&amp;gt; tFilter (or tMap) where we use last_date as filter condition -&amp;gt; tMSSQLOutput&lt;/P&gt; 
&lt;P&gt;&lt;STRONG&gt;alternative for 4)&amp;nbsp;&lt;/STRONG&gt;tMSSQLInput with SELECT sensor_name, date FROM table WHERE sensor_name= variable_sensor_name AND date &amp;gt; yesterday&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;and use it as Lookup as Your original Job for INNER JOIN rejects&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;This is only 1 from many variants:&lt;/P&gt; 
&lt;P&gt;- You can already have all sensor names in database&lt;/P&gt; 
&lt;P&gt;- You can have not time patterns in filename&lt;/P&gt; 
&lt;P&gt;- You can have more than 1 message for same server with same time (for this case good to have &lt;STRONG&gt;(sensor_name, date)&lt;/STRONG&gt; as primary or unique key)&lt;/P&gt; 
&lt;P&gt;- etc&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;For any of combination - we draw process diagram and look, how we can achieve result,&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt; 
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 28 May 2017 10:38:52 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Talend-Studio/CDC-Insert-only-data-that-has-changed-since-last-run-Tos-DI/m-p/2221343#M15394</guid>
      <dc:creator>vapukov</dc:creator>
      <dc:date>2017-05-28T10:38:52Z</dc:date>
    </item>
  </channel>
</rss>

