<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Remove duplicate values based on a selected column value in data load in App Development</title>
    <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674614#M51196</link>
    <description>&lt;P&gt;Another way you could do this is to flag the duplicates, you can then keep them in the data in case you want to be aware of them later (e.g, if this is a data quality issue it can be useful to make them visible) or you can then just load the ones not flagged as duplicates:&lt;/P&gt;&lt;P&gt;DeDup:&lt;BR /&gt;NoConcatenate&lt;BR /&gt;Load&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;DeDupKey,&lt;BR /&gt;if(Peek(DeDupKey) = DeDupKey, 'Y', 'N') as Duplicate&lt;BR /&gt;Resident DeDupTemp&lt;BR /&gt;order by DeDupKey;&lt;/P&gt;&lt;P&gt;That is probably a neater solution that gives you more flexibility and doesn't seem to have the issue with the unintended disappearing records.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 12 Feb 2020 03:44:59 GMT</pubDate>
    <dc:creator>Rodj</dc:creator>
    <dc:date>2020-02-12T03:44:59Z</dc:date>
    <item>
      <title>Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674605#M51191</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;I am currently working on developing an application and need to remove duplicate records only for a selected data rows in the data load event.&lt;/P&gt;&lt;P&gt;Please refer the sample table below;&lt;/P&gt;&lt;DIV class="mceNonEditable lia-copypaste-placeholder"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;TABLE width="538"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="87"&gt;Customer_ID&lt;/TD&gt;&lt;TD width="115"&gt;Transaction_Date&lt;/TD&gt;&lt;TD width="148"&gt;Funding_Arrangement&lt;/TD&gt;&lt;TD width="55"&gt;Class&lt;/TD&gt;&lt;TD width="64"&gt;Hours&lt;/TD&gt;&lt;TD width="69"&gt;Employee&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;100&lt;/TD&gt;&lt;TD&gt;2/01/2020&lt;/TD&gt;&lt;TD&gt;CWH&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;101&lt;/TD&gt;&lt;TD&gt;5/01/2020&lt;/TD&gt;&lt;TD&gt;SCO&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;101&lt;/TD&gt;&lt;TD&gt;5/01/2020&lt;/TD&gt;&lt;TD&gt;SCO&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;102&lt;/TD&gt;&lt;TD&gt;3/01/2020&lt;/TD&gt;&lt;TD&gt;HACC&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;103&lt;/TD&gt;&lt;TD&gt;5/01/2020&lt;/TD&gt;&lt;TD&gt;CWH&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;104&lt;/TD&gt;&lt;TD&gt;7/01/2020&lt;/TD&gt;&lt;TD&gt;SCO&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;104&lt;/TD&gt;&lt;TD&gt;8/01/2020&lt;/TD&gt;&lt;TD&gt;SCO&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;105&lt;/TD&gt;&lt;TD&gt;8/01/2020&lt;/TD&gt;&lt;TD&gt;HACC&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;106&lt;/TD&gt;&lt;TD&gt;9/01/2020&lt;/TD&gt;&lt;TD&gt;CWH&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;TD&gt;1.5&lt;/TD&gt;&lt;TD&gt;David&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;107&lt;/TD&gt;&lt;TD&gt;10/01/2020&lt;/TD&gt;&lt;TD&gt;HACC&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;107&lt;/TD&gt;&lt;TD&gt;10/01/2020&lt;/TD&gt;&lt;TD&gt;HACC&lt;/TD&gt;&lt;TD&gt;C&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;108&lt;/TD&gt;&lt;TD&gt;12/01/2020&lt;/TD&gt;&lt;TD&gt;CWH&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;109&lt;/TD&gt;&lt;TD&gt;13/01/2020&lt;/TD&gt;&lt;TD&gt;HACC&lt;/TD&gt;&lt;TD&gt;A&lt;/TD&gt;&lt;TD&gt;2.5&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;110&lt;/TD&gt;&lt;TD&gt;10/01/2020&lt;/TD&gt;&lt;TD&gt;SCO&lt;/TD&gt;&lt;TD&gt;B&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;Andy&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;According to the above data set, how I figure out the duplicate values are based on three columns. Those are, "Customer_ID", "Transaction_Date" and "Class".&lt;/P&gt;&lt;P&gt;Further, the removal of duplicate values has to be only for the records where the "Funding_Arrangement" is 'SCO' and not for any other "Funding_Arrangement"s (for example, CWH, HACC, etc.).&lt;/P&gt;&lt;P&gt;In order to achieve this, I thought of creating a "Key" field as;&lt;/P&gt;&lt;P&gt;Funding_Arrangement &amp;amp; Customer_ID &amp;amp; Transaction_Date &amp;amp; "Class" AS KEY&amp;nbsp; &amp;nbsp; &amp;nbsp;in the data load event and to remove the duplicate records WHICH STARTS with the letters, "SCO........".&lt;/P&gt;&lt;P&gt;Additionally, I thought of using the function "Peek()" in the Where Clause of the load script something like;&lt;/P&gt;&lt;P&gt;WHERE Peek(KEY)&amp;lt;&amp;gt;KEY ;&lt;/P&gt;&lt;P&gt;However, I am not sure how to use the Peek() function with the starts with letters to remove the duplicate value where the "Funding_Arrangement" is only equals to 'SCO'.&lt;/P&gt;&lt;P&gt;Please advice me on my said requirement and appreciate a lot if you could let me know a better way of playing around this issue.&lt;/P&gt;&lt;P&gt;Thank you in advance.&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Andy&lt;/P&gt;</description>
      <pubDate>Sat, 16 Nov 2024 19:08:24 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674605#M51191</guid>
      <dc:creator>andymanu</dc:creator>
      <dc:date>2024-11-16T19:08:24Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674611#M51193</link>
      <description>&lt;P&gt;Here is how I'd go about it, I think you're on the right track with creating a key. There's probably another way to do this that might be more efficient, I'll have a think about it if I have time. Note that I created a quick excel file as my source but if you replace that reference with yours it should work for you.&lt;/P&gt;&lt;P&gt;// load the table into Qlik memory for faster work&lt;BR /&gt;DeDupTemp:&lt;BR /&gt;LOAD&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;[Customer_ID]&amp;amp;'|'&amp;amp;[Transaction_Date]&amp;amp;'|'&amp;amp;[Funding_Arrangement] as DeDupKey&lt;BR /&gt;FROM [lib://AttachedFiles/test.xlsx]&lt;BR /&gt;(ooxml, embedded labels, table is Sheet1);&lt;/P&gt;&lt;P&gt;// Start to build our final table&lt;BR /&gt;NoConcatenate&lt;BR /&gt;DeDup:&lt;BR /&gt;Load Distinct // This will give us a list of unique keys to join back on to&lt;BR /&gt;DeDupKey&lt;BR /&gt;resident DeDupTemp&lt;BR /&gt;where [Funding_Arrangement] = 'SCO';&lt;BR /&gt;left Join // so that we only get the one record from the original table&lt;BR /&gt;LOAD&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;DeDupKey&lt;BR /&gt;resident DeDupTemp;&lt;/P&gt;&lt;P&gt;DeDup:&lt;BR /&gt;LOAD&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;DeDupKey&lt;BR /&gt;resident DeDupTemp&lt;BR /&gt;where [Funding_Arrangement] &amp;lt;&amp;gt; 'SCO';&lt;/P&gt;&lt;P&gt;drop table DeDupTemp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Rod&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 02:33:50 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674611#M51193</guid>
      <dc:creator>Rodj</dc:creator>
      <dc:date>2020-02-12T02:33:50Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674612#M51194</link>
      <description>&lt;P&gt;Hi Rod,&lt;/P&gt;&lt;P&gt;Thank you very much for your respond.&lt;/P&gt;&lt;P&gt;I ran your code and still it gives me the same results I got with my previous attempt.&lt;/P&gt;&lt;P&gt;Yes, it removes the duplicate record related to "Customer_D", 101 where the "Funding_Arrangement" is 'SCO' but also it removes the duplicate record related to the "Customer_ID" 107 where the respective "Funding_Arrangement" is 'HACC'.&lt;/P&gt;&lt;P&gt;My code was something like below;&lt;/P&gt;&lt;P&gt;Funded_Data:&lt;BR /&gt;LOAD&lt;BR /&gt;RecNo(),&lt;BR /&gt;Customer_ID,&lt;BR /&gt;Attendance,&lt;BR /&gt;Transaction_Date,&lt;BR /&gt;Funding_Arrangement,&lt;BR /&gt;Program,&lt;BR /&gt;"Class",&lt;BR /&gt;Customer_ID &amp;amp; Transaction_Date &amp;amp; "Class" AS KEY,&lt;BR /&gt;"Hours",&lt;BR /&gt;Employee&lt;BR /&gt;FROM [lib://ANDREW/Test_Apps/Fill Blank Cells/Funded Data.xlsx]&lt;BR /&gt;(ooxml, embedded labels, table is [Funded Data]);&lt;/P&gt;&lt;P&gt;NoConcatenate&lt;BR /&gt;LOAD *&lt;BR /&gt;Resident Funded_Data&lt;BR /&gt;WHERE Peek(KEY)&amp;lt;&amp;gt;KEY ;&lt;BR /&gt;DROP TABLE Funded_Data;&lt;/P&gt;&lt;P&gt;I have attached my sample data file herewith.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate your valuable feedback.&lt;/P&gt;&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;kind regards,&lt;/P&gt;&lt;P&gt;Andy&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 03:00:04 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674612#M51194</guid>
      <dc:creator>andymanu</dc:creator>
      <dc:date>2020-02-12T03:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674613#M51195</link>
      <description>&lt;P&gt;That is really weird! If I load just the records &amp;lt;&amp;gt; 'SCO' it leaves the duplicate HACC records in however if I also load (and deduplicate) the 'SCO' records it removes the HACC duplicate as well, even though logically there's no way it should be doing so. I tried this another way that assumes that you aren't going to add any other columns and just did a simple distinct:&lt;/P&gt;&lt;P&gt;(with the same temp table load as before)&lt;/P&gt;&lt;P&gt;DeDup:&lt;BR /&gt;NoConcatenate&lt;BR /&gt;LOAD&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;RowNo() as Counter&lt;BR /&gt;resident DeDupTemp&lt;BR /&gt;where [Funding_Arrangement] &amp;lt;&amp;gt; 'SCO';&lt;/P&gt;&lt;P&gt;Concatenate&lt;BR /&gt;Load Distinct&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee]&lt;BR /&gt;Resident DeDupTemp&lt;BR /&gt;Where [Funding_Arrangement] = 'SCO';&lt;/P&gt;&lt;P&gt;drop table DeDupTemp;&amp;nbsp;drop Field Counter;&lt;/P&gt;&lt;P&gt;NOTE that I added a rownum() into the load of the non SCO records (and later delete it), this appears to keep the duplicated HACC record and achieves what you want. I still can't fathom why the HACC record goes missing without it. There's either a gap in my understanding somewhere or a bug in the Qlik engine.&lt;/P&gt;&lt;P&gt;Your method of using Peek will work provided you:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Do a separate load of the 'SCO' records and concatenate it to a load of the non 'SCO' records&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;Sort the records so that the duplicate records are always in order (safer to assume that the won't appear as they do in your sample data.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I reckon the method I have included here will be quicker on large volumes, but you might not have a large enough volume to worry about.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 03:37:07 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674613#M51195</guid>
      <dc:creator>Rodj</dc:creator>
      <dc:date>2020-02-12T03:37:07Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674614#M51196</link>
      <description>&lt;P&gt;Another way you could do this is to flag the duplicates, you can then keep them in the data in case you want to be aware of them later (e.g, if this is a data quality issue it can be useful to make them visible) or you can then just load the ones not flagged as duplicates:&lt;/P&gt;&lt;P&gt;DeDup:&lt;BR /&gt;NoConcatenate&lt;BR /&gt;Load&lt;BR /&gt;[Customer_ID],&lt;BR /&gt;[Transaction_Date],&lt;BR /&gt;[Funding_Arrangement],&lt;BR /&gt;[Class],&lt;BR /&gt;[Hours],&lt;BR /&gt;[Employee],&lt;BR /&gt;DeDupKey,&lt;BR /&gt;if(Peek(DeDupKey) = DeDupKey, 'Y', 'N') as Duplicate&lt;BR /&gt;Resident DeDupTemp&lt;BR /&gt;order by DeDupKey;&lt;/P&gt;&lt;P&gt;That is probably a neater solution that gives you more flexibility and doesn't seem to have the issue with the unintended disappearing records.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 03:44:59 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674614#M51196</guid>
      <dc:creator>Rodj</dc:creator>
      <dc:date>2020-02-12T03:44:59Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674615#M51197</link>
      <description>&lt;P&gt;Hi Rod,&lt;/P&gt;&lt;P&gt;Thank you very much. It's working now as intended.&lt;/P&gt;&lt;P&gt;Was able to learn something new.&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Andy&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 03:50:01 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674615#M51197</guid>
      <dc:creator>andymanu</dc:creator>
      <dc:date>2020-02-12T03:50:01Z</dc:date>
    </item>
    <item>
      <title>Re: Remove duplicate values based on a selected column value in data load</title>
      <link>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674616#M51198</link>
      <description>&lt;P&gt;No problem. Don't forget to mark a post as the solution in order to help others.&lt;/P&gt;&lt;P&gt;Cheers,&lt;/P&gt;&lt;P&gt;Rod&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2020 03:53:18 GMT</pubDate>
      <guid>https://community.qlik.com/t5/App-Development/Remove-duplicate-values-based-on-a-selected-column-value-in-data/m-p/1674616#M51198</guid>
      <dc:creator>Rodj</dc:creator>
      <dc:date>2020-02-12T03:53:18Z</dc:date>
    </item>
  </channel>
</rss>

