<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: store changes and duplicate records in Qlik Replicate</title>
    <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499848#M13478</link>
    <description>&lt;P&gt;Hello Hannah,&amp;nbsp;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/312731"&gt;@hcmjensen&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No, unfortunately not. In Databricks, primary keys (PK) are not enforced; they are informational only. As a result, primary keys cannot prevent duplicate entries from being written into Databricks Delta tables. This behavior is inherent to Databricks Delta.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;John.&lt;/P&gt;</description>
    <pubDate>Fri, 03 Jan 2025 03:09:20 GMT</pubDate>
    <dc:creator>john_wang</dc:creator>
    <dc:date>2025-01-03T03:09:20Z</dc:date>
    <item>
      <title>store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498406#M13457</link>
      <description>&lt;P&gt;We have an Oracle source table coming from EBS that has no primary key and contains many records for which all column values are exactly the same.&amp;nbsp; Unfortunately, we need to keep all of these duplicate records and are having a lot of trouble figuring out to handle them.&amp;nbsp; We are currently storing changes for this table and see DML of all types coming in.&amp;nbsp; The metadata columns in the change table show 'I', 'D', 'U' and 'B' operation types for this table.&amp;nbsp; How would we use these operations to apply the stored changes on the base table without having a key?&amp;nbsp; Is there some sort of internal key intrinsic to each record that can be used to identify the record in the target?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, if the store changes table showed a deletion operation, how would I determine which record in the base table should be deleted?&amp;nbsp; I couldn't delete all records for which all column values matched the column values in the changes table because that might delete some duplicate records that we need to keep.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately, this is a large table so we don't have the option of doing a full load, we need to figure out how to apply changes.&amp;nbsp; How can this be done?&lt;/P&gt;
&lt;P&gt;Thanks in advance for any advice, I'm fairly new to Qlik Replicate.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Dec 2024 20:32:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498406#M13457</guid>
      <dc:creator>hcmjensen</dc:creator>
      <dc:date>2024-12-17T20:32:00Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498441#M13458</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/312731"&gt;@hcmjensen&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;Welcome to Qlik Community forum and thanks for reaching out here!&lt;/P&gt;
&lt;P&gt;In the Store Changes Table, the operation types are represented as follows:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;CODE&gt;'I'&lt;/CODE&gt; for &lt;STRONG&gt;INSERT&lt;/STRONG&gt;,&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;'D'&lt;/CODE&gt; for &lt;STRONG&gt;DELETE&lt;/STRONG&gt;,&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;'U'&lt;/CODE&gt; for &lt;STRONG&gt;UPDATE (afterImage)&lt;/STRONG&gt;, and&lt;/LI&gt;
&lt;LI&gt;&lt;CODE&gt;'B'&lt;/CODE&gt; for &lt;STRONG&gt;UPDATE (beforeImage)&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Your own program is to process these change records from the Store Changes Table and apply the corresponding operations to your target systems, such as databases, streaming platforms, file systems, etc.&lt;/P&gt;
&lt;P&gt;Regarding the deletion of only one row from duplicate rows: this approach is reasonable. In fact, some databases already implement this functionality via Qlik Replicate. Could you share the type of the target database? This information will allow us to provide a more precise and tailored solution for your requirements.&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;John.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 01:34:48 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498441#M13458</guid>
      <dc:creator>john_wang</dc:creator>
      <dc:date>2024-12-18T01:34:48Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498598#M13460</link>
      <description>&lt;P&gt;Hi, John.&amp;nbsp; Thanks for your response.&amp;nbsp; The target system is ADLS parquet files that we are loading into Azure Databricks delta tables.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Hannah&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 13:44:56 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498598#M13460</guid>
      <dc:creator>hcmjensen</dc:creator>
      <dc:date>2024-12-18T13:44:56Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498696#M13461</link>
      <description>&lt;P&gt;Hello Hannah,&amp;nbsp;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/312731"&gt;@hcmjensen&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I’m not sure if the ADLS Parquet files data is being loaded into Databricks Delta by your own programs or SQL scripts.&lt;/P&gt;
&lt;P&gt;My two cents, to address this scenario in Databricks Delta, where you want to perform a &lt;A href="https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/delta-merge-into" target="_blank" rel="noopener"&gt;MERGE INTO&lt;/A&gt; operation by your own but ensure that duplicate rows are handled correctly (specifically deleting only one instance of a duplicate group rather than all duplicates), you can use Delta Lake's MATCHED clause combined with a &lt;A href="https://docs.databricks.com/en/sql/language-manual/functions/row_number.html" target="_blank" rel="noopener"&gt;ROW_NUMBER&lt;/A&gt;() or RANK() window function.&amp;nbsp;The ROW_NUMBER() function ensures that only one row is deleted for each duplicate group.&lt;/P&gt;
&lt;P&gt;However I'm not Databricks expert, please check with Databricks for expert.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;John.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 02:11:34 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498696#M13461</guid>
      <dc:creator>john_wang</dc:creator>
      <dc:date>2024-12-19T02:11:34Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498859#M13464</link>
      <description>&lt;P&gt;Hi, John.&amp;nbsp; Thank you very much for your two cents.&amp;nbsp; So in the strategy you suggested, there is not internal key of any sort, I would just need to make sure that in case of update or delete only one record is affected.&amp;nbsp; I will give this a shot!&amp;nbsp; Thanks again.&lt;/P&gt;
&lt;P&gt;Hannah&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 21:40:15 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2498859#M13464</guid>
      <dc:creator>hcmjensen</dc:creator>
      <dc:date>2024-12-19T21:40:15Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499034#M13465</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;SPAN&gt;Hannah,&amp;nbsp;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/312731"&gt;@hcmjensen&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;It's just like the virtual column &lt;STRONG&gt;ROWNUM&lt;/STRONG&gt; in Oracle return dataset:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;EM&gt;DELETE FROM...&amp;nbsp; AND &lt;STRONG&gt;ROWNUM = 1&lt;/STRONG&gt; ;&lt;/EM&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;While &lt;STRONG&gt;ROW_NUMBER()&lt;/STRONG&gt; function is used in Databricks&amp;nbsp;to identify a single row from a duplicate group.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Regards,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;John.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 21 Dec 2024 14:33:44 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499034#M13465</guid>
      <dc:creator>john_wang</dc:creator>
      <dc:date>2024-12-21T14:33:44Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499820#M13477</link>
      <description>&lt;P&gt;Hi, John.&amp;nbsp; What if instead of using a change table with ADLS as the target, we set up a load with apply changes and a target of Databricks Delta tables?&amp;nbsp; With this configuration would Qlik handle the duplicates correctly?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Hannah&lt;/P&gt;</description>
      <pubDate>Thu, 02 Jan 2025 18:34:42 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499820#M13477</guid>
      <dc:creator>hcmjensen</dc:creator>
      <dc:date>2025-01-02T18:34:42Z</dc:date>
    </item>
    <item>
      <title>Re: store changes and duplicate records</title>
      <link>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499848#M13478</link>
      <description>&lt;P&gt;Hello Hannah,&amp;nbsp;&lt;a href="https://community.qlik.com/t5/user/viewprofilepage/user-id/312731"&gt;@hcmjensen&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;No, unfortunately not. In Databricks, primary keys (PK) are not enforced; they are informational only. As a result, primary keys cannot prevent duplicate entries from being written into Databricks Delta tables. This behavior is inherent to Databricks Delta.&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;John.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Jan 2025 03:09:20 GMT</pubDate>
      <guid>https://community.qlik.com/t5/Qlik-Replicate/store-changes-and-duplicate-records/m-p/2499848#M13478</guid>
      <dc:creator>john_wang</dc:creator>
      <dc:date>2025-01-03T03:09:20Z</dc:date>
    </item>
  </channel>
</rss>

