<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sample Sets of Data in QlikView</title>
    <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291535#M621720</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok thank you. So does my code randomly select 5% of unique identifiers or 5% of all data. In my mind, those two things are different. I would want to capture the former. Perhaps, this is a misunderstanding of what this function returns?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 16 Feb 2017 15:29:16 GMT</pubDate>
    <dc:creator />
    <dc:date>2017-02-16T15:29:16Z</dc:date>
    <item>
      <title>Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291532#M621717</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I'm looking to create a flag that randomly selects 5% of my unique identifiers and flags them as a "Trial Group". See my code below&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;if((rand()&amp;lt;=0.05+now()*0) = 0, 'Standard Group', 'Trial Group') as [Trial Flag]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm uncertain if this is doing what I want it to do. At a quick look it does, but I'm a little confused on how this is randomly selecting 5% of unique identifiers without referencing the unique identifier in this line of code??&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:03:00 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291532#M621717</guid>
      <dc:creator />
      <dc:date>2017-02-16T15:03:00Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291533#M621718</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;hmmm ignore what I said, I think your method seems to be the accepted solution!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:06:27 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291533#M621718</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-16T15:06:27Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291534#M621719</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;based on my testing this doesn't produce an accurate 5% figure, on 20,000 rows I got figures between 964 and 1037 over 5 reloads.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How precious are you with it being exactly 5%?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:24:11 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291534#M621719</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-16T15:24:11Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291535#M621720</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Ok thank you. So does my code randomly select 5% of unique identifiers or 5% of all data. In my mind, those two things are different. I would want to capture the former. Perhaps, this is a misunderstanding of what this function returns?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:29:16 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291535#M621720</guid>
      <dc:creator />
      <dc:date>2017-02-16T15:29:16Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291536#M621721</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This will flag *roughly* 5% of your data in the table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you share any data at all? or at least your column names and expected results, this problem has nagged me for ages so I will find a solution!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:32:29 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291536#M621721</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-16T15:32:29Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291537#M621722</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Here is my working, sorry im just heading out of the office but will pick up in the morning.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;This is selecting an EXACT 5% sample, not just a rough one, i.e. in my data 1093 rows every time&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sorry its a work in progress, need to tidy it all up&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:38:47 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291537#M621722</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-16T15:38:47Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291538#M621723</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I cannot share any data. However, the column names is not an issue. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Sequence is my unique identifier (customer)&lt;/P&gt;&lt;P&gt;All other fields are related to this unique identifier(customer attributes)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I want to randomly select 5% of customers (5% of Sequences) and flag them as "Trial Group" and leave the others as "Standard Group"&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 16 Feb 2017 15:40:03 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291538#M621723</guid>
      <dc:creator />
      <dc:date>2017-02-16T15:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291539#M621724</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Seth,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am working on this, but decided to make a blog of it as its vexed me for ages, so I am doing some proper testing.&lt;/P&gt;&lt;P&gt;So far I am happy with the results though&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG __jive_id="153469" alt="Capture.PNG" class="jive-image image-1" src="https://community.qlik.com/legacyfs/online/153469_Capture.PNG" style="height: auto;" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Feb 2017 09:05:23 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291539#M621724</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-17T09:05:23Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291540#M621725</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OK so the simple answer is:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1- &lt;SPAN style="font-size: 13.3333px;"&gt; *i assume*&lt;/SPAN&gt; your current set up will be loading 5% of all of your transactions not customers&lt;/P&gt;&lt;P&gt;2- the current setup is not completely accurate so if you want your sample tolerance to be within +/- 0.05% then check out my blog (&lt;A href="https://community.qlik.com/docs/DOC-18124"&gt;Accurately selecting a random percentage sample&lt;/A&gt;)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you're happy with a tolerance of what appears to be +/- 1% then you can do something like the below.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If your current code is say:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;fact:&lt;/P&gt;&lt;P&gt;Load&lt;/P&gt;&lt;P&gt;transactionid,&lt;/P&gt;&lt;P&gt;sequence,&lt;/P&gt;&lt;P&gt;somotherdata,&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;if((rand()&amp;lt;=0.05+now()*0) = 0, 'Standard Group', 'Trial Group') as [Trial Flag]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;from blah;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Then this will be a 5% (ish) sample of all data.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;To make this be a 5% sample of sequence then something like this needs to be done (and you could also left join if you wanted) and apologies if my syntax is out anywhere I have written this on the fly:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;tmp:&lt;/P&gt;&lt;P&gt;load distinct&lt;/P&gt;&lt;P&gt;sequence&lt;/P&gt;&lt;P&gt;from blah;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;noconcatenate&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;tmp2:&lt;/P&gt;&lt;P&gt;mapping load&lt;/P&gt;&lt;P&gt;sequence&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;if((rand()&amp;lt;=0.05+now()*0) = 0, 'Standard Group', 'Trial Group') as [Trial Flag]&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;resident tmp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;drop table tmp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;fact:&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;Load&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;transactionid,&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;sequence,&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;somotherdata,&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;&lt;SPAN style="color: #3d3d3d; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 13px;"&gt;applymap('tmp2',source,null()) as [Trial Flag]&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 13.3333px;"&gt;from blah;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Feb 2017 10:40:12 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291540#M621725</guid>
      <dc:creator>adamdavi3s</dc:creator>
      <dc:date>2017-02-17T10:40:12Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Sets of Data</title>
      <link>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291541#M621726</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you want to randomly select 5% of unique Customer names, you can use this in your script:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;PRE __default_attr="xml" __jive_macro_name="code" class="jive_macro_code jive_text_macro _jivemacro_uid_14873421490083077" jivemacro_uid="_14873421490083077"&gt;
&lt;P&gt;CustomerTypes:&lt;/P&gt;
&lt;P&gt;LOAD FieldValue('CustomerName', RowNo()) AS CustomerName,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if((rand()&amp;lt;=0.05+now()*0) = 0, 'Standard Group', 'Trial Group') as [Trial Flag] &lt;/P&gt;
&lt;P&gt;AUTOGENERATE FieldValueCount('CustomerName');&lt;/P&gt;
&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can use whatever field in whatever facts table to extract only distinct values. Of course, the same would be true if you use a LOAD DISTINCT, but this code loads from the symbol table, which may be faster.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Peter&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 17 Feb 2017 14:37:36 GMT</pubDate>
      <guid>https://community.qlik.com/t5/QlikView/Sample-Sets-of-Data/m-p/1291541#M621726</guid>
      <dc:creator>Peter_Cammaert</dc:creator>
      <dc:date>2017-02-17T14:37:36Z</dc:date>
    </item>
  </channel>
</rss>

